We propose a detection method for orthographic variants caused by transliteration in a large corpus. The method employs two similarities. One is string similarity based on edit distance. The other is contextual similarity by a vector space model. Experimental results show that the method performed a 0.889 F-measure in an open test.
CITATION STYLE
Ohtake, K., Sekiguchi, Y., & Yamamoto, K. (2004). Detecting transliterated orthographic variants via two similarity metrics. In COLING 2004 - Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220355.1220457
Mendeley helps you to discover research relevant for your work.