This paper describes two automatic systems: a linguistic features extractor and a text readability classifier for European Portuguese texts. Its main goal is to assist the selection of adequate reading materials to support Portuguese teaching, especially as a second language. To the feature extraction from texts, the system uses several Natural Language Processing (NLP) tools. Currently, 52 features are extracted: parts-ofspeech (POS), syllables, words, chunks and phrases, averages and frequencies, among others. A classifier was created using these features and a corpus, previously annotated readability level, adopting the five-levels language classification official standard for Portuguese as Second Language. In a five-levels (from A1 to C1) scenario, the best-performing learning algorithm (LogitBoost) achieved an accuracy of 75.11% with a root mean square error (RMSE) of 0.269. In a three-levels (A, B and C) scenario, the best-performing learning algorithm (C4.5 grafted) achieved 81.44% accuracy, with a RMSE of 0.346.
CITATION STYLE
Curto, P., Mamede, N., & Baptista, J. (2016). Assisting european portuguese teaching: Linguistic features extraction and automatic readability classifier. In Communications in Computer and Information Science (Vol. 583, pp. 81–96). Springer Verlag. https://doi.org/10.1007/978-3-319-29585-5_5
Mendeley helps you to discover research relevant for your work.