A Romanization system is used to convert some text of a source script to the Roman script through word by word mapping. The phonological characteristics of the source word are not lost. Only writing script is changed, without any changes in the spoken language. This paper presents a rule based approach for Romanization of Gurmukhi script proper nouns. The aim is to develop a lightweight Romanization system, which may produce multiple possible results for the same input word. The algorithm uses a list of Gurmukhi script characters along with their equivalent character combinations in Roman script. Direct mapping of Gurmukhi script characters to their equivalent Roman script character combinations does not produce efficient results, so some rules are applied to get the correct mappings. The rules are basically to place or remove the letter ‘a’ in between the mapped consonants. Three different sets of rules are applied to get three different Romanized outputs. All these outputs are acceptable for information extraction using pattern matching. In Gurmukhi, some words are written differently than these are pronounced. To handle such words, these words or part of these words are stored in a database table. Along with these words their Romanized form is also stored in second column. The table is used to directly pick the Romanization from the table and use it for Romanization of these words. The result of this Romanization system is a set of possible words that can be generated from the source script word. It enables an application to pattern match those output words with some text or database to get the required information.
CITATION STYLE
Singh, H., & Oberoi, A. (2019). An efficient romanization of gurmukhi punjabi proper nouns for pattern matching. International Journal of Recent Technology and Engineering, 8(3), 634–640. https://doi.org/10.35940/ijrte.B2467.098319
Mendeley helps you to discover research relevant for your work.