Metadata-Based Detection of Child Sexual Abuse Material

0Citations
Citations of this article
34Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Child Sexual Abuse Media (CSAM) is any visual record of a sexually explicit activity involving minors. Machine learning-based solutions can help law enforcement identify CSAM and block distribution. Yet, collecting CSAM imagery to train machine learning models has ethical and legal constraints. CSAM detection systems based on file metadata offer several opportunities. Metadata is not a record of a crime and, therefore, clear of legal restrictions. This article proposes a CSAM detection framework consisting of machine learning models trained on file paths extracted from a real-world data set of over 1 million file paths obtained in criminal investigations. Our framework includes guidelines for model evaluation that account for data changes caused by adversarial data modification and variations in data distribution caused by limited access to training data, as well as an assessment of false positive rates against file paths from common crawl data. We achieve accuracies as high as 0.97 while presenting stable behavior under adversarial attacks previously used in natural language tasks. When evaluating the model on publicly available file paths from common crawl data, we observed a false positive rate of 0.002, showing that the model operating in distinct data distributions maintains low false positive rates.

Cite

CITATION STYLE

APA

Pereira, M., Dodhia, R., Anderson, H., & Brown, R. (2024). Metadata-Based Detection of Child Sexual Abuse Material. IEEE Transactions on Dependable and Secure Computing, 21(4), 3153–3164. https://doi.org/10.1109/TDSC.2023.3324275

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free