Formalizing word sampling for vocabulary prediction as graph-based active learning

28Citations
Citations of this article
83Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Predicting vocabulary of second language learners is essential to support their language learning; however, because of the large size of language vocabularies, we cannot collect information on the entire vocabulary. For practical measurements, we need to sample a small portion of words from the entire vocabulary and predict the rest of the words. In this study, we propose a novel framework for this sampling method. Current methods rely on simple heuristic techniques involving inflexible manual tuning by educational experts. We formalize these heuristic techniques as a graph-based non-interactive active learning method as applied to a special graph. We show that by extending the graph, we can support additional functionality such as incorporating domain specificity and sampling from multiple corpora. In our experiments, we show that our extended methods outperform other methods in terms of vocabulary prediction accuracy when the number of samples is small.

Cite

CITATION STYLE

APA

Ehara, Y., Miyao, Y., Oiwa, H., Sato, I., & Nakagawa, H. (2014). Formalizing word sampling for vocabulary prediction as graph-based active learning. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 1374–1384). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/d14-1143

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free