Towards effective entity extraction of scientific documents using discriminative linguistic features

Sangwon Hwang; Jang Eui Hong; Young Kwang Nam

Journal ArticleOPEN ACCESS

Towards effective entity extraction of scientific documents using discriminative linguistic features

KSII Transactions on Internet and Information Systems (2019) 13(3) 1639-1658

DOI: 10.3837/TIIS.2019.03.030

3Citations

14Readers

Abstract

Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

Author supplied keywords

Cite

CITATION STYLE

APA

Hwang, S., Hong, J. E., & Nam, Y. K. (2019). Towards effective entity extraction of scientific documents using discriminative linguistic features. KSII Transactions on Internet and Information Systems, 13(3), 1639–1658. https://doi.org/10.3837/TIIS.2019.03.030

Towards effective entity extraction of scientific documents using discriminative linguistic features

Abstract

Author supplied keywords

Cite

Register to see more suggestions