Improving classification of tweets using linguistic information from a large external corpus

Hugo Lewi Hammer; Anis Yazidi; Aleksander Bai; Paal Engelstad

Conference Proceedings

Improving classification of tweets using linguistic information from a large external corpus

Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST (2017) 188 122-134

DOI: 10.1007/978-3-319-52569-3_11

0Citations

5Readers

Get full text

Abstract

The bag of words representation of documents is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. Improvements might be achieved by expanding the vocabulary with other relevant word, like synonyms. In this paper we use word-word co-occurence information from a large corpus to expand the vocabulary of another corpus consisting of tweets. Several different methods on how to include the co-occurence information are constructed and tested out on the classification of real twitter data. Our results show that we are able to reduce the number of erroneous classifications by 14% using co-occurence information.

Author supplied keywords

Cite

CITATION STYLE

APA

Hammer, H. L., Yazidi, A., Bai, A., & Engelstad, P. (2017). Improving classification of tweets using linguistic information from a large external corpus. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST (Vol. 188, pp. 122–134). Springer Verlag. https://doi.org/10.1007/978-3-319-52569-3_11

Improving classification of tweets using linguistic information from a large external corpus

Abstract

Author supplied keywords

Cite

Register to see more suggestions