Identifying domains and concepts in short texts via partial taxonomy and unlabeled data

Yihong Zhang; Claudia Szabo; Quan Z. Sheng; Wei Emma Zhang; Yongrui Qin

Conference ProceedingsOPEN ACCESS

Identifying domains and concepts in short texts via partial taxonomy and unlabeled data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10253 LNCS 127-143

DOI: 10.1007/978-3-319-59536-8_9

3Citations

13Readers

Abstract

Accurate and real-time identification of domains and concepts discussed in microblogging texts is crucial for many important applications such as earthquake monitoring, influenza surveillance and disaster management. Existing techniques such as machine learning and keyword generation are application specific and require significant amount of training in order to achieve high accuracy. In this paper, we propose to use a multiple domain taxonomy (MDT) to capture general user knowledge. We formally define the problems of domain classification and concept tagging. Using the MDT, we devise domain-independent pure frequency count methods that do not require any training data nor annotations and that are not sensitive to misspellings or shortened word forms. Our extensive experimental analysis on real Twitter data shows that both methods have significantly better identification accuracy with low runtime than existing methods for large datasets.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, Y., Szabo, C., Sheng, Q. Z., Zhang, W. E., & Qin, Y. (2017). Identifying domains and concepts in short texts via partial taxonomy and unlabeled data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10253 LNCS, pp. 127–143). Springer Verlag. https://doi.org/10.1007/978-3-319-59536-8_9

Identifying domains and concepts in short texts via partial taxonomy and unlabeled data

Abstract

Author supplied keywords

Cite

Register to see more suggestions