Formalizing word sampling for vocabulary prediction as graph-based active learning

Yo Ehara; Yusuke Miyao; Hidekazu Oiwa; Issei Sato; Hiroshi Nakagawa

Conference Proceedings

Formalizing word sampling for vocabulary prediction as graph-based active learning

EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2014) 1374-1384

DOI: 10.3115/v1/d14-1143

28Citations

83Readers

Get full text

Abstract

Predicting vocabulary of second language learners is essential to support their language learning; however, because of the large size of language vocabularies, we cannot collect information on the entire vocabulary. For practical measurements, we need to sample a small portion of words from the entire vocabulary and predict the rest of the words. In this study, we propose a novel framework for this sampling method. Current methods rely on simple heuristic techniques involving inflexible manual tuning by educational experts. We formalize these heuristic techniques as a graph-based non-interactive active learning method as applied to a special graph. We show that by extending the graph, we can support additional functionality such as incorporating domain specificity and sampling from multiple corpora. In our experiments, we show that our extended methods outperform other methods in terms of vocabulary prediction accuracy when the number of samples is small.

Cite

CITATION STYLE

APA

Ehara, Y., Miyao, Y., Oiwa, H., Sato, I., & Nakagawa, H. (2014). Formalizing word sampling for vocabulary prediction as graph-based active learning. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 1374–1384). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/d14-1143

Formalizing word sampling for vocabulary prediction as graph-based active learning

Abstract

Cite

Register to see more suggestions