Multi-Lingual LSA with Serbian and Croatian: An Investigative Case Study

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the challenges in information retrieval is attempting to search a corpus of documents that may contain multiple languages. This exploratory study expands upon earlier research employing Latent Semantic Analysis (so called Multi-Lingual Latent Semantic Indexing, or ML-LSI/LSA). We experiment using this approach, and a new one, in a multi-lingual context utilising two similar languages, namely Serbian and Croatian. Traditionally, with an LSA approach, a parallel corpus would be needed in order to train the system by combining identical documents in two languages into one document. We repeat that approach and also experiment with creating a semantic space using the parallel corpus on its own without merging the documents together to test the hypothesis that, with very similar languages, the merging of documents may not be required for good results.

Cite

CITATION STYLE

APA

Layfield, C., Ivanović, D., & Azzopardi, J. (2018). Multi-Lingual LSA with Serbian and Croatian: An Investigative Case Study. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10546 LNCS, pp. 155–164). Springer Verlag. https://doi.org/10.1007/978-3-319-74497-1_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free