Improving classification of tweets using linguistic information from a large external corpus

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The bag of words representation of documents is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. Improvements might be achieved by expanding the vocabulary with other relevant word, like synonyms. In this paper we use word-word co-occurence information from a large corpus to expand the vocabulary of another corpus consisting of tweets. Several different methods on how to include the co-occurence information are constructed and tested out on the classification of real twitter data. Our results show that we are able to reduce the number of erroneous classifications by 14% using co-occurence information.

Cite

CITATION STYLE

APA

Hammer, H. L., Yazidi, A., Bai, A., & Engelstad, P. (2017). Improving classification of tweets using linguistic information from a large external corpus. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST (Vol. 188, pp. 122–134). Springer Verlag. https://doi.org/10.1007/978-3-319-52569-3_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free