A deep generative model for code-switched text

Bidisha Samanta; Sharmila Reddy; Hussain Jagirdar; Niloy Ganguly; Soumen Chakrabarti

Conference Proceedings

A deep generative model for code-switched text

IJCAI International Joint Conference on Artificial Intelligence (2019) 2019-August 5175-5181

DOI: 10.24963/ijcai.2019/719

18Citations

44Readers

Get full text

Abstract

Code-switching, the interleaving of two or more languages within a sentence or discourse is pervasive in multilingual societies. Accurate language models for code-switched text are critical for NLP tasks. State-of-the-art data-intensive neural language models are difficult to train well from scarce language-labeled code-switched text. A potential solution is to use deep generative models to synthesize large volumes of realistic code-switched text. Although generative adversarial networks and variational autoencoders can synthesize plausible monolingual text from continuous latent space, they cannot adequately address code-switched text, owing to their informal style and complex interplay between the constituent languages. We introduce VACS, a novel variational autoencoder architecture specifically tailored to code-switching phenomena. VACS encodes to and decodes from a two-level hierarchical representation, which models syntactic contextual signals in the lower level, and language switching signals in the upper layer. Decoding representations sampled from prior produced well-formed, diverse code-switched sentences. Extensive experiments show that using synthetic code-switched text with natural monolingual data results in significant (33.06%) drop in perplexity.

Cite

CITATION STYLE

APA

Samanta, B., Reddy, S., Jagirdar, H., Ganguly, N., & Chakrabarti, S. (2019). A deep generative model for code-switched text. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 5175–5181). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/719

A deep generative model for code-switched text

Abstract

Cite

Register to see more suggestions