Named entity recognition with homophones-noisy data

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

General named entity recognition systems exclusively focus on higher accuracy regardless of dirty data. However, raw source data face serious challenges specially that are originated from automated speech recognition systems’ results. In this paper, we propose Pinyin (Pinyin is the official romanization system for Standard Chinese, each Chinese character has its own pinyin sequence which is composed of Latin alphabet) Hierarchical Attention Encoder-Decoder network and Character Alternate Network to overcome Chinese homophones’ problems which frequently frustrate researchers in consecutive Natural Language Understanding (NLU). Our models present a none word segmentation structure to effectively avoid secondary data corruption and adequately extract words’ internal features. Besides, corrupted sequences can be revised by character-level network. Evaluation demonstrates that our proposed method achieves 93.73% F1 scores which are higher than 90.97% F1 scores using baseline models in homophone-noisy dataset. Additional experiments are conducted to show equivalent results in the universal dataset.

Cite

CITATION STYLE

APA

Liu, Z., & Wu, G. (2019). Named entity recognition with homophones-noisy data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11670 LNAI, pp. 325–337). Springer Verlag. https://doi.org/10.1007/978-3-030-29908-8_26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free