PESTO: Switching Point Based Dynamic and Relative Positional Encoding for Code-Mixed Languages (Student Abstract)

2Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

NLP applications for code-mixed (CM) text have gained a significant momentum recently, mainly due to the prevalence of language mixing in social media communications in multilingual societies like India, Europe, parts of USA etc. Word embeddings are basic building blocks of any NLP system today, yet, word embedding for CM languages is an unexplored territory. The major bottleneck for CM word embeddings is switching points, where the language switches. These locations lack in contextually and statistical systems fail to model this phenomena due to high variance in the seen examples. In this paper we present our initial observations on applying switching point based positional encoding techniques for CM language, specifically Hinglish (Hindi - English). Results are only marginally better than SOTA, but it is evident that positional encoding could be an effective way to train position sensitive language models for CM text.

Cite

CITATION STYLE

APA

Ali, M., Kandukuri, S. T., Manduru, S., Patwa, P., & Das, A. (2022). PESTO: Switching Point Based Dynamic and Relative Positional Encoding for Code-Mixed Languages (Student Abstract). In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (Vol. 36, pp. 12901–12902). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i11.21587

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free