Metadata-Based Detection of Child Sexual Abuse Material

Mayana Pereira; Rahul Dodhia; Hyrum Anderson; Richard Brown

Journal ArticleOPEN ACCESS

Metadata-Based Detection of Child Sexual Abuse Material

IEEE Transactions on Dependable and Secure Computing (2024) 21(4) 3153-3164

DOI: 10.1109/TDSC.2023.3324275

0Citations

34Readers

Abstract

Child Sexual Abuse Media (CSAM) is any visual record of a sexually explicit activity involving minors. Machine learning-based solutions can help law enforcement identify CSAM and block distribution. Yet, collecting CSAM imagery to train machine learning models has ethical and legal constraints. CSAM detection systems based on file metadata offer several opportunities. Metadata is not a record of a crime and, therefore, clear of legal restrictions. This article proposes a CSAM detection framework consisting of machine learning models trained on file paths extracted from a real-world data set of over 1 million file paths obtained in criminal investigations. Our framework includes guidelines for model evaluation that account for data changes caused by adversarial data modification and variations in data distribution caused by limited access to training data, as well as an assessment of false positive rates against file paths from common crawl data. We achieve accuracies as high as 0.97 while presenting stable behavior under adversarial attacks previously used in natural language tasks. When evaluating the model on publicly available file paths from common crawl data, we observed a false positive rate of 0.002, showing that the model operating in distinct data distributions maintains low false positive rates.

Author supplied keywords

Cite

CITATION STYLE

APA

Pereira, M., Dodhia, R., Anderson, H., & Brown, R. (2024). Metadata-Based Detection of Child Sexual Abuse Material. IEEE Transactions on Dependable and Secure Computing, 21(4), 3153–3164. https://doi.org/10.1109/TDSC.2023.3324275

Metadata-Based Detection of Child Sexual Abuse Material

Abstract

Author supplied keywords

Cite

Register to see more suggestions