Classifying illegal activities on tor network based on web textual contents

Mhd Wesam Al Nabki; Eduardo Fidalgo; Enrique Alegre; Ivan De Paz

Conference ProceedingsOPEN ACCESS

Classifying illegal activities on tor network based on web textual contents

15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of Conference (2017) 1 35-43

DOI: 10.18653/v1/e17-1004

87Citations

150Readers

Abstract

The freedom of the Deep Web offers a safe place where people can express themselves anonymously but they also can conduct illegal activities. In this paper, we present and make publicly available1 a new dataset for Darknet active domains, which we call it "Darknet Usage Text Addresses" (DUTA).We built DUTA by sampling the Tor network during two months and manually labeled each address into 26 classes. Using DUTA, we conducted a comparison between two well-known text representation techniques crossed by three different supervised classifiers to categorize the Tor hidden services. We also fixed the pipeline elements and identified the aspects that have a critical influence on the classification results. We found that the combination of TF-IDF words representation with Logistic Regression classifier achieves 96.6% of 10 folds cross-validation accuracy and a macro F1 score of 93.7% when classifying a subset of illegal activities from DUTA. The good performance of the classifier might support potential tools to help the authorities in the detection of these activities.

Cite

CITATION STYLE

APA

Nabki, M. W. A., Fidalgo, E., Alegre, E., & De Paz, I. (2017). Classifying illegal activities on tor network based on web textual contents. In 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of Conference (Vol. 1, pp. 35–43). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/e17-1004

Classifying illegal activities on tor network based on web textual contents

Abstract

Cite

Register to see more suggestions