Automated Extractions for Machine Generated Mail

7Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Mail extraction is a critical task whose objective is to extract valuable data from the content of mail messages. This task is key for many types of applications including re-targeting, mail search, and mail summarization, which utilize the important personal data pieces in mail messages to achieve their objectives. We focus on machine generated traffic, which comprises most of the Web mail traffic today, and use its structured and large-scale repetitive nature to devise a fully automated extraction method. Our solution builds on an advanced structural clustering technique previously presented by some of the authors of this work. The heart of our solution is an offline process that leverages the structural mail-specific characteristics of the clustering, and automatically creates extraction rules that are later applied online for each new arriving message. We provide of a full description of our process, which has been productized in Yahoo mail backend. We complete our work with large-scale experiments carried over real Yahoo mail traffic, and evaluate the performance of our automatic extraction method.

Cite

CITATION STYLE

APA

Di Castro, D., Gamzu, I., Grabovitch-Zuyev, I., Lewin-Eytan, L., Pundir, A., Sahoo, N. R., & Viderman, M. (2018). Automated Extractions for Machine Generated Mail. In The Web Conference 2018 - Companion of the World Wide Web Conference, WWW 2018 (pp. 655–662). Association for Computing Machinery, Inc. https://doi.org/10.1145/3184558.3186582

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free