Robust toponym resolution based on surface statistics

Tomohisa Sano; Shiho Hoshi Nobesawa; Hiroyuki Okamoto; Hiroya Susuki; Masaki Matsubara; Hiroaki Saito

Journal ArticleOPEN ACCESS

Robust toponym resolution based on surface statistics

IEICE Transactions on Information and Systems (2009) E92-D(12) 2313-2320

DOI: 10.1587/transinf.E92.D.2313

1Citations

5Readers

Abstract

Toponyms and other named entities are main issues in unknown word processing problem. Our purpose is to salvage unknown toponyms, not only for avoiding noises but also providing them information of area candidates to where they may belong. Most of previous toponym resolution methods were targeting disambiguation among area candidates, which is caused by the multiple existence of a toponym. These approaches were mostly based on gazetteers and contexts. When it comes to the documents which may contain toponyms worldwide, like newspaper articles, toponym resolution is not just an ambiguity resolution, but an area candidate selection from all the areas on Earth. Thus we propose an automatic toponym resolution method which enables to identify its area candidates based only on their surface statistics, in place of dictionary-lookup approaches. Our method combines two modules, area candidate reduction and area candidate examination which uses block-unit data, to obtain high accuracy without reducing recall rate. Our empirical result showed 85.54% precision rate, 91.92% recall rate and .89 F-measure value on average. This method is a flexible and robust approach for toponym resolution targeting unrestricted number of areas. Copyright © 2009 The Institute of Electronics.

Author supplied keywords

Cite

CITATION STYLE

APA

Sano, T., Nobesawa, S. H., Okamoto, H., Susuki, H., Matsubara, M., & Saito, H. (2009). Robust toponym resolution based on surface statistics. IEICE Transactions on Information and Systems, E92-D(12), 2313–2320. https://doi.org/10.1587/transinf.E92.D.2313

Robust toponym resolution based on surface statistics

Abstract

Author supplied keywords

Cite

Register to see more suggestions