Detection of Possible Illicit Messages Using Natural Language Processing and Computer Vision on Twitter and Linked Websites

Sergio L. Granizo; Angel Leonardo Valdivieso Caraguay; Lorena Isabel Barona Lopez; Myriam Hernandez-Alvarez

Journal ArticleOPEN ACCESS

Detection of Possible Illicit Messages Using Natural Language Processing and Computer Vision on Twitter and Linked Websites

IEEE Access (2020) 8 44534-44546

DOI: 10.1109/ACCESS.2020.2976530

22Citations

81Readers

Abstract

Human trafficking is a global problem that strips away the dignity of millions of victims. Currently, social networks are used to spread this crime through the online environment by using covert messages that serve to promote these illegal services. In this context, since law enforcement resources are limited, it is vital to automatically detect messages that may be related to this crime and could also serve as clues. In this paper, we identify Twitter messages that could promote these illegal services and exploit minors by using natural language processing. The images and the URLs found in suspicious messages were processed and classified by gender and age group, so it is possible to detect photographs of people under 14 years of age. The method that we used is as follows. First, tweets with hashtags related to minors are mined in real-time. These tweets are preprocessed to eliminate noise and misspelled words, and then the tweets are classified as suspicious or not. Moreover, geometric features of the face and torso are selected using Haar models. By applying Support Vector Machine (SVM) and Convolutional Neural Network (CNN), we are able to recognize gender and age group, taking into account torso information and its proportional relationship with the head, or even when the face details are blurred. As a result, using the SVM model with only torso features has a higher performance than CNN.

Author supplied keywords

Cite

CITATION STYLE

APA

Granizo, S. L., Caraguay, A. L. V., Lopez, L. I. B., & Hernandez-Alvarez, M. (2020). Detection of Possible Illicit Messages Using Natural Language Processing and Computer Vision on Twitter and Linked Websites. IEEE Access, 8, 44534–44546. https://doi.org/10.1109/ACCESS.2020.2976530

Detection of Possible Illicit Messages Using Natural Language Processing and Computer Vision on Twitter and Linked Websites

Abstract

Author supplied keywords

Cite

Register to see more suggestions