Markov models for written language identification

  • Tran D
  • Sharma D
Citations of this article
Mendeley users who have this article in their library.


—The paper presents a Markov chain-based method for automatic written language identification. Given a training document in a specific language, each word can be represented as a Markov chain of letters. Using the entire training document regarded as a set of Markov chains, the set of initial and transition probabilities can be calculated and referred to as a Markov model for that language. Given an unknown language string, the maximum likelihood decision rule was used to identify language. Experimental results showed that the proposed method achieved lower error rate and faster identification speed than the current n-gram method.




Tran, D., & Sharma, D. (2005). Markov models for written language identification. Proceedings of the 12th International Conference on Neural Information Processing. Retrieved from

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free