CORDIF: A machine learning-based approach to identify complex words using intra-word feature set

Muralidhar Pantula; K. S. Kuppusamy

Book Chapter

CORDIF: A machine learning-based approach to identify complex words using intra-word feature set

Springer Verlag, (2019), 285-296

DOI: 10.1007/978-981-13-1642-5_26

1Citations

3Readers

Get full text

Abstract

Identification of complex words is an interesting research problem with various application scenarios such as text simplification. There are various approaches to identify complex words either by incorporating the complete sentence in which the word appears or by focusing only on the word. This paper falls under the later category, which employs intra-word features in classifying a word either as simple or complex. A model termed CORDIF (COmplex woRD identification with Intra-word Features). The proposed methodology incorporates 19 intra-word features. These features are harnessed to train a machine learning model. A dataset termed as CWIdataset is built with the proposed set of intra-word features. With the proposed feature-set, an accuracy level of 84.75% was achieved. Later using this model, we have tested for identifying the complex words for nonnative persons. As a result, we concluded that for identifying complex words, personalized systems are needed.

Cite

CITATION STYLE

APA

Pantula, M., & Kuppusamy, K. S. (2019). CORDIF: A machine learning-based approach to identify complex words using intra-word feature set. In Lecture Notes in Electrical Engineering (Vol. 478, pp. 285–296). Springer Verlag. https://doi.org/10.1007/978-981-13-1642-5_26

CORDIF: A machine learning-based approach to identify complex words using intra-word feature set

Abstract

Cite

Register to see more suggestions