CORDIF: A machine learning-based approach to identify complex words using intra-word feature set

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Identification of complex words is an interesting research problem with various application scenarios such as text simplification. There are various approaches to identify complex words either by incorporating the complete sentence in which the word appears or by focusing only on the word. This paper falls under the later category, which employs intra-word features in classifying a word either as simple or complex. A model termed CORDIF (COmplex woRD identification with Intra-word Features). The proposed methodology incorporates 19 intra-word features. These features are harnessed to train a machine learning model. A dataset termed as CWIdataset is built with the proposed set of intra-word features. With the proposed feature-set, an accuracy level of 84.75% was achieved. Later using this model, we have tested for identifying the complex words for nonnative persons. As a result, we concluded that for identifying complex words, personalized systems are needed.

Cite

CITATION STYLE

APA

Pantula, M., & Kuppusamy, K. S. (2019). CORDIF: A machine learning-based approach to identify complex words using intra-word feature set. In Lecture Notes in Electrical Engineering (Vol. 478, pp. 285–296). Springer Verlag. https://doi.org/10.1007/978-981-13-1642-5_26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free