Toponym Extraction in Thai Tweets Using a Hybrid Approach

Manassanan Boonnavasin; Prapaporn Rattanatamrong

Conference ProceedingsOPEN ACCESS

Toponym Extraction in Thai Tweets Using a Hybrid Approach

ACM International Conference Proceeding Series (2020) 71-76

DOI: 10.1145/3385209.3385225

1Citations

8Readers

Get full text

Abstract

Crowdsourcing has become an important tool in areas such as business and marketing. It can help organizations solve large-scale problems in areas including traffic management and political campaigning. Toponym extraction is necessary when analyzing crowdsourced data for traffic tracking or event reporting. Dictionaries and rule-based analysis are commonly used for matching and extracting entities from text. However, the creation of an effective dictionary is not an easy task, especially when the goal is to name a large number of locations. Named Entity Recognition (NER) can help address this, but the approach has certain limitations. In this paper, we describe an improved approach to toponym extraction from Twitter messages that combines a dictionary and NER. As tweets are limited to 280 characters, any locations mentioned are usually referred to using abbreviations. The variety of forms that location names take, and the unstructured language of tweets, are challenging both to the dictionary and NER methods. We divided tweets into four categories to investigate the effect of analyzing messages from different domains. The average accuracy was 49.18% when using only the dictionary, 59.30% when using only NER, and 75.43% when using the hybrid method.

Author supplied keywords

Cite

CITATION STYLE

APA

Boonnavasin, M., & Rattanatamrong, P. (2020). Toponym Extraction in Thai Tweets Using a Hybrid Approach. In ACM International Conference Proceeding Series (pp. 71–76). Association for Computing Machinery. https://doi.org/10.1145/3385209.3385225

Toponym Extraction in Thai Tweets Using a Hybrid Approach

Abstract

Author supplied keywords

Cite

Register to see more suggestions