Abstract
Mail extraction is a critical task whose objective is to extract valuable data from the content of mail messages. This task is key for many types of applications including re-targeting, mail search, and mail summarization, which utilize the important personal data pieces in mail messages to achieve their objectives. We focus on machine generated traffic, which comprises most of the Web mail traffic today, and use its structured and large-scale repetitive nature to devise a fully automated extraction method. Our solution builds on an advanced structural clustering technique previously presented by some of the authors of this work. The heart of our solution is an offline process that leverages the structural mail-specific characteristics of the clustering, and automatically creates extraction rules that are later applied online for each new arriving message. We provide of a full description of our process, which has been productized in Yahoo mail backend. We complete our work with large-scale experiments carried over real Yahoo mail traffic, and evaluate the performance of our automatic extraction method.
Author supplied keywords
Cite
CITATION STYLE
Di Castro, D., Gamzu, I., Grabovitch-Zuyev, I., Lewin-Eytan, L., Pundir, A., Sahoo, N. R., & Viderman, M. (2018). Automated Extractions for Machine Generated Mail. In The Web Conference 2018 - Companion of the World Wide Web Conference, WWW 2018 (pp. 655–662). Association for Computing Machinery, Inc. https://doi.org/10.1145/3184558.3186582
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.