Skip to main content

IBM Cambridge Research Center

  Technical Report: Exploiting E-mail Structure to Improve Summarization

Exploiting E-mail Structure to Improve Summarization

Technical Report #:02-02
Author(s): Derek Lam, Steven L. Rohall, Chris Schmandt, Mia K. Stern
Category(s):e-mail, e-mail thread, feature extraction, knowledge management, named entity extraction, text summarization

Abstract

A Collaborative User Experience Technical Report: more about CUE...

This paper presents the design and implementation of a system to summarize e-mail messages. The system exploits two aspects of e-mail, thread reply chains, and commonly-found features to generate summaries. The system uses existing software designed to summarize single-text documents. Such software typically performs best on well-authored, formal documents. E-mail messages, however, are typically neither well-authored, nor formal. As a result, existing summarization software gives a poor summary of e-mail messages. To remedy this poor performance, our system preprocesses e-mail messages using heuristics to remove e-mail signatures, header fields, and quoted text from parent messages. We also present a heuristics-based approach to identifying and reporting names, dates, and companies found in e-mail messages. Lastly, we discuss conclusions from a pilot user study of the summarization system and conclude with areas for further investigation.

Full Report


For more information, or to order a Technical Report, contact us.