A linear algebra approach to language identification

Laura A. Mather

Conference Proceedings

A linear algebra approach to language identification

Mather L

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (1998) 1481 92-103

DOI: 10.1007/3-540-49654-8_8

3Citations

3Readers

Get full text

Abstract

Identification of the language of documents has traditionally been accomplished using dictionaries or other such language sources. This paper presents a novel algorithm for identifying the language of documents using much less information about the language than traditional methods. In addition, if no information about the language of incoming documents is known, the algorithm groups the documents into language groups, despite the deficit of language knowledge. The algorithm is based on the vector space model of information retrieval and uses a matrix projection operator and the singular value decomposition to identify terms that distinguish between languages. Experimental results show that the algorithm works reasonably well.

Cite

CITATION STYLE

APA

Mather, L. A. (1998). A linear algebra approach to language identification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1481, pp. 92–103). Springer Verlag. https://doi.org/10.1007/3-540-49654-8_8

A linear algebra approach to language identification

Abstract

Cite

Register to see more suggestions