A deep generative model for code-switched text

18Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Code-switching, the interleaving of two or more languages within a sentence or discourse is pervasive in multilingual societies. Accurate language models for code-switched text are critical for NLP tasks. State-of-the-art data-intensive neural language models are difficult to train well from scarce language-labeled code-switched text. A potential solution is to use deep generative models to synthesize large volumes of realistic code-switched text. Although generative adversarial networks and variational autoencoders can synthesize plausible monolingual text from continuous latent space, they cannot adequately address code-switched text, owing to their informal style and complex interplay between the constituent languages. We introduce VACS, a novel variational autoencoder architecture specifically tailored to code-switching phenomena. VACS encodes to and decodes from a two-level hierarchical representation, which models syntactic contextual signals in the lower level, and language switching signals in the upper layer. Decoding representations sampled from prior produced well-formed, diverse code-switched sentences. Extensive experiments show that using synthetic code-switched text with natural monolingual data results in significant (33.06%) drop in perplexity.

Cite

CITATION STYLE

APA

Samanta, B., Reddy, S., Jagirdar, H., Ganguly, N., & Chakrabarti, S. (2019). A deep generative model for code-switched text. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 5175–5181). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/719

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free