Language geometry using random indexing

Aditya Joshi; Johan T. Halseth; Pentti Kanerva

Conference Proceedings

Language geometry using random indexing

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10106 LNCS 265-274

DOI: 10.1007/978-3-319-52289-0_21

55Citations

27Readers

Get full text

Abstract

Random Indexing is a simple implementation of Random Projections with a wide range of applications. It can solve a variety of problems with good accuracy without introducing much complexity. Here we demonstrate its use for identifying the language of text samples, based on a novel method of encoding letter N-grams into high-dimensional Language Vectors. Further, we show that the method is easily implemented and requires little computational power and space. As proof of the method’s statistical validity, we show its success in a language-recognition task. On a difficult data set of 21,000 short sentences from 21 different languages, we achieve 97.4% accuracy, comparable to state-of-the-art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Joshi, A., Halseth, J. T., & Kanerva, P. (2017). Language geometry using random indexing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10106 LNCS, pp. 265–274). Springer Verlag. https://doi.org/10.1007/978-3-319-52289-0_21

Language geometry using random indexing

Abstract

Author supplied keywords

Cite

Register to see more suggestions