Text Language Identification Using Letters (Frequency, Self-information, and Entropy) Analysis for English, French, and German Languages

  • Abbas R
  • Kareem F
N/ACitations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

People illustrate the world, convey stories, share ideas, and interconnect in over 6900 languages. Information on the Internet may appear unlimited. All over history, electrical and computer experts have built tools such as telephone, telegraph and internet router, which have helped people communicate. Computer software that can translate between languages stands for one of such tools. The first step of translating a text is to categorize its language. In this research, self-identification program of text language was designed and tested depending on text letters (frequency, self-information, and entropy of certain chosen letters) for the English, French and German languages. The research, trying to detect the original language, is successful of detecting these languages, after applied to randomly selected text files. The detection program was written using C++ programming language.

Cite

CITATION STYLE

APA

Abbas, R. H., & Kareem, F. A. E. A. (2019). Text Language Identification Using Letters (Frequency, Self-information, and Entropy) Analysis for English, French, and German Languages. Journal of Southwest Jiaotong University, 54(4). https://doi.org/10.35741/issn.0258-2724.54.4.21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free