Toponym Extraction in Thai Tweets Using a Hybrid Approach

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Crowdsourcing has become an important tool in areas such as business and marketing. It can help organizations solve large-scale problems in areas including traffic management and political campaigning. Toponym extraction is necessary when analyzing crowdsourced data for traffic tracking or event reporting. Dictionaries and rule-based analysis are commonly used for matching and extracting entities from text. However, the creation of an effective dictionary is not an easy task, especially when the goal is to name a large number of locations. Named Entity Recognition (NER) can help address this, but the approach has certain limitations. In this paper, we describe an improved approach to toponym extraction from Twitter messages that combines a dictionary and NER. As tweets are limited to 280 characters, any locations mentioned are usually referred to using abbreviations. The variety of forms that location names take, and the unstructured language of tweets, are challenging both to the dictionary and NER methods. We divided tweets into four categories to investigate the effect of analyzing messages from different domains. The average accuracy was 49.18% when using only the dictionary, 59.30% when using only NER, and 75.43% when using the hybrid method.

Author supplied keywords

Cite

CITATION STYLE

APA

Boonnavasin, M., & Rattanatamrong, P. (2020). Toponym Extraction in Thai Tweets Using a Hybrid Approach. In ACM International Conference Proceeding Series (pp. 71–76). Association for Computing Machinery. https://doi.org/10.1145/3385209.3385225

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free