RuTUT: Roman Urdu to Urdu translator based on character substitution rules and unicode mapping

8Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Urdu language written in English alphabets for communication is known as Roman Urdu. In pronunciation, both are the same but different in spelling and have different shapes of the alphabet. A survey acknowledges that 300 million people are speaking Urdu and about 11 million speakers in Pakistan from which maximum users prefer Roman Urdu for the textual communication. Today most of the modern technologies like computers and mobile phones using English script, due to this local Urdu user has to use English letters to type Urdu script that is Roman Urdu. In this research, Roman Urdu to Urdu Translator (RUTUT) is proposed that consists of preprocessing methods, rule-based character substitution and Unicode based character mapping techniques. It can transliterate the messages or descriptions from the Roman Urdu script to Urdu script which may help the Urdu speaker to elaborate their message in efficient manners. The focus of this research is to analyze the issues related to the Roman Urdu script to Urdu script transliteration and develop a translator based on the concepts of transliteration. This research analyzed Roman Urdu data and identified different rules-based character substitution techniques that transform the Roman Urdu into Urdu script at fundamental levels. This research is carried out using a python programming language in programming tool Anaconda in Jupiter notebook and user-friendly Graphical User Interface (GUI) created by using Tkinter library. To evaluate the RUTUT, different translational tests are performed and compare those results with famous Google online translator and ijunoon online transliteration. The analyses of results show that the proposed RUTUT approach translates accurately than Google online translator and ijunoon online transliteration.

Cite

CITATION STYLE

APA

Shahroz, M., Mushtaq, M. F., Mehmood, A., Ullah, S., & Choi, G. S. (2020). RuTUT: Roman Urdu to Urdu translator based on character substitution rules and unicode mapping. IEEE Access, 8, 189823–189841. https://doi.org/10.1109/ACCESS.2020.3031393

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free