Categorizing Emails Using Machine Learning with Textual Features

Haoran Zhang; Jagadish Rangrej; Saad Rais; Michael Hillmer; Frank Rudzicz; Kamil Malikov

Conference Proceedings

Categorizing Emails Using Machine Learning with Textual Features

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11489 LNAI 3-15

DOI: 10.1007/978-3-030-18305-9_1

3Citations

7Readers

Get full text

Abstract

We developed an application that automates the process of assigning emails received in a generic request inbox to one of fourteen predefined topic categories. To build this application, we compared the performance of several classifiers in predicting the topic category, using an email dataset extracted from this inbox, which consisted of 8,841 emails over three years. The algorithms ranged from linear classifiers operating on n-gram features to deep learning techniques such as CNNs and LSTMs. For our objective, we found that the best-performing model was a logistic regression classifier using n-grams with TF-IDF weights, achieving 90.9% accuracy. The traditional models performed better than the deep learning models for this dataset, likely in part due to the small dataset size, and also because this particular classification task may not require the ordered sequence representation of tokens that deep learning models provide. Eventually, a bagged voting model was selected which combines the predictive power of the top eight models, with accuracy of 92.7%, surpassing the performance of any of the individual models.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, H., Rangrej, J., Rais, S., Hillmer, M., Rudzicz, F., & Malikov, K. (2019). Categorizing Emails Using Machine Learning with Textual Features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11489 LNAI, pp. 3–15). Springer Verlag. https://doi.org/10.1007/978-3-030-18305-9_1

Categorizing Emails Using Machine Learning with Textual Features

Abstract

Author supplied keywords

Cite

Register to see more suggestions