Named entity recognition with homophones-noisy data

Zhicheng Liu; Gang Wu

Conference Proceedings

Named entity recognition with homophones-noisy data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11670 LNAI 325-337

DOI: 10.1007/978-3-030-29908-8_26

0Citations

5Readers

Get full text

Abstract

General named entity recognition systems exclusively focus on higher accuracy regardless of dirty data. However, raw source data face serious challenges specially that are originated from automated speech recognition systems’ results. In this paper, we propose Pinyin (Pinyin is the official romanization system for Standard Chinese, each Chinese character has its own pinyin sequence which is composed of Latin alphabet) Hierarchical Attention Encoder-Decoder network and Character Alternate Network to overcome Chinese homophones’ problems which frequently frustrate researchers in consecutive Natural Language Understanding (NLU). Our models present a none word segmentation structure to effectively avoid secondary data corruption and adequately extract words’ internal features. Besides, corrupted sequences can be revised by character-level network. Evaluation demonstrates that our proposed method achieves 93.73% F1 scores which are higher than 90.97% F1 scores using baseline models in homophone-noisy dataset. Additional experiments are conducted to show equivalent results in the universal dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Liu, Z., & Wu, G. (2019). Named entity recognition with homophones-noisy data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11670 LNAI, pp. 325–337). Springer Verlag. https://doi.org/10.1007/978-3-030-29908-8_26

Named entity recognition with homophones-noisy data

Abstract

Author supplied keywords

Cite

Register to see more suggestions