Categorizing Emails Using Machine Learning with Textual Features

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We developed an application that automates the process of assigning emails received in a generic request inbox to one of fourteen predefined topic categories. To build this application, we compared the performance of several classifiers in predicting the topic category, using an email dataset extracted from this inbox, which consisted of 8,841 emails over three years. The algorithms ranged from linear classifiers operating on n-gram features to deep learning techniques such as CNNs and LSTMs. For our objective, we found that the best-performing model was a logistic regression classifier using n-grams with TF-IDF weights, achieving 90.9% accuracy. The traditional models performed better than the deep learning models for this dataset, likely in part due to the small dataset size, and also because this particular classification task may not require the ordered sequence representation of tokens that deep learning models provide. Eventually, a bagged voting model was selected which combines the predictive power of the top eight models, with accuracy of 92.7%, surpassing the performance of any of the individual models.

Cite

CITATION STYLE

APA

Zhang, H., Rangrej, J., Rais, S., Hillmer, M., Rudzicz, F., & Malikov, K. (2019). Categorizing Emails Using Machine Learning with Textual Features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11489 LNAI, pp. 3–15). Springer Verlag. https://doi.org/10.1007/978-3-030-18305-9_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free