Melbourne at SemEval 2016 task 11: Classifying type-level word complexity using random forests with corpus and word list features

11Citations
Citations of this article
68Readers
Mendeley users who have this article in their library.

Abstract

SemEval 2016 task 11 involved determining whether words in a sentence were complex or simple for a cohort of people with English as a second language. Training data consisted of 200 annotated sentences, representing the combined judgements of 20 human annota-tors, such that if any annotator of the group labelled a word as complex, then it was considered to be complex. Testing was based on single annotator judgements. Our system used a random forest classifier with a variety of features, the most important of which were term frequency statistics garnered from four large corpora, and style lexicons built on two large corpora. Minor features in the final system include the presence or absence of words in various readability word lists; many other features we tried were not successful. Our ranking amongst submitted systems did not reflect the strength of our system, due to submitting a far from optimal weighting between complex and simple, but we show that when a more appropriate weighting is used, our system ranks amongst the best submitted systems.

Cite

CITATION STYLE

APA

Brooke, J., Baldwin, T., & Uitdenbogerd, A. L. (2016). Melbourne at SemEval 2016 task 11: Classifying type-level word complexity using random forests with corpus and word list features. In SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings (pp. 975–981). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s16-1150

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free