Towards effective entity extraction of scientific documents using discriminative linguistic features

3Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

Cite

CITATION STYLE

APA

Hwang, S., Hong, J. E., & Nam, Y. K. (2019). Towards effective entity extraction of scientific documents using discriminative linguistic features. KSII Transactions on Internet and Information Systems, 13(3), 1639–1658. https://doi.org/10.3837/TIIS.2019.03.030

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free