Random Indexing is a simple implementation of Random Projections with a wide range of applications. It can solve a variety of problems with good accuracy without introducing much complexity. Here we demonstrate its use for identifying the language of text samples, based on a novel method of encoding letter N-grams into high-dimensional Language Vectors. Further, we show that the method is easily implemented and requires little computational power and space. As proof of the method’s statistical validity, we show its success in a language-recognition task. On a difficult data set of 21,000 short sentences from 21 different languages, we achieve 97.4% accuracy, comparable to state-of-the-art methods.
CITATION STYLE
Joshi, A., Halseth, J. T., & Kanerva, P. (2017). Language geometry using random indexing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10106 LNCS, pp. 265–274). Springer Verlag. https://doi.org/10.1007/978-3-319-52289-0_21
Mendeley helps you to discover research relevant for your work.