Network-theoretic information extraction quality assessment in the human trafficking domain

10Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Information extraction (IE) is an important problem in Natural Language Processing (NLP) and Web Mining communities. Recently, IE has been applied to online sex advertisements with the goal of powering search and analytics systems that can help law enforcement investigate human trafficking (HT). Extracting key attributes such as names, phone numbers and addresses from online sex ads is extremely challenging, since such webpages contain boilerplate, obfuscation, and extraneous text in unusual language models. Assessing the quality of an IE system is an important problem that is particularly problematic in this domain due to lack of gold standard datasets. Furthermore, building a robust ground truth from scratch is an expensive and time-consuming task for social scientists and law enforcement to undertake. In this article, we undertake the empirical challenge of analyzing the quality of IE outputs in the HT domain without the provision of laboriously annotated ground truths. Specifically, we use concepts from network science to construct and study an extraction graph from IE outputs collected over a corpus of online sex ads. Our studies show that network metrics, which require no labeled ground truths, share interesting and consistent correlations with IE accuracy metrics (e.g., precision and recall) that do require ground-truths. Our methods can potentially be applied for comparing the quality of different IE systems in the HT domain without access to ground-truths.

Cite

CITATION STYLE

APA

Kejriwal, M., & Kapoor, R. (2019). Network-theoretic information extraction quality assessment in the human trafficking domain. Applied Network Science, 4(1). https://doi.org/10.1007/s41109-019-0154-z

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free