Zigzag conversational patterns of contents in social media are often perceived as noisy or informal text. Unrestricted usage of vocabulary in social media communications complicates the processing of code-mixed text. This paper accentuates two major aspects of code mixed text: Offensive Language Identification and Sentiment Analysis for Malayalam–English code-mixed data set. The proffered framework addresses 3 key points apropos these tasks—dependencies among features created by embedding methods (Word2Vec and FastText), comparative analysis of deep learning algorithms (uni-/bi-directional models, hybrid models, and transformer approaches), relevance of selective translation and transliteration and hyper-parameter optimization—which ensued in F1-Scores (model’s accuracy) of 0.76 for Forum for Information Retrieval Evaluation (FIRE) 2020 and 0.99 for European Chapter of the Association for Computational Linguistics (EACL) 2021 data sets. A detailed error analysis was also done to give meaningful insights. The submitted strategy turned in the best results among the benchmarked models dealing with Malayalam–English code-mixed messages and it serves as an important step towards societal good.
CITATION STYLE
Thara, S., & Poornachandran, P. (2022). Social media text analytics of Malayalam–English code-mixed using deep learning. Journal of Big Data, 9(1). https://doi.org/10.1186/s40537-022-00594-3
Mendeley helps you to discover research relevant for your work.