Unsupervised spam detection based on string alienness measures

Kazuyuki Narisawa; Hideo Bannai; Kohei Hatano; Masayuki Takeda

Conference Proceedings

Unsupervised spam detection based on string alienness measures

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4755 LNAI 161-172

DOI: 10.1007/978-3-540-75488-6_16

17Citations

11Readers

Get full text

Abstract

We propose an unsupervised method for detecting spam documents from a given set of documents, based on equivalence relations on strings. We give three measures for quantifying the alienness (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Narisawa, K., Bannai, H., Hatano, K., & Takeda, M. (2007). Unsupervised spam detection based on string alienness measures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4755 LNAI, pp. 161–172). Springer Verlag. https://doi.org/10.1007/978-3-540-75488-6_16

Unsupervised spam detection based on string alienness measures

Abstract

Cite

Register to see more suggestions