A linear algebra approach to language identification

3Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Identification of the language of documents has traditionally been accomplished using dictionaries or other such language sources. This paper presents a novel algorithm for identifying the language of documents using much less information about the language than traditional methods. In addition, if no information about the language of incoming documents is known, the algorithm groups the documents into language groups, despite the deficit of language knowledge. The algorithm is based on the vector space model of information retrieval and uses a matrix projection operator and the singular value decomposition to identify terms that distinguish between languages. Experimental results show that the algorithm works reasonably well.

Cite

CITATION STYLE

APA

Mather, L. A. (1998). A linear algebra approach to language identification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1481, pp. 92–103). Springer Verlag. https://doi.org/10.1007/3-540-49654-8_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free