Unsupervised spam detection based on string alienness measures

17Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose an unsupervised method for detecting spam documents from a given set of documents, based on equivalence relations on strings. We give three measures for quantifying the alienness (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Narisawa, K., Bannai, H., Hatano, K., & Takeda, M. (2007). Unsupervised spam detection based on string alienness measures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4755 LNAI, pp. 161–172). Springer Verlag. https://doi.org/10.1007/978-3-540-75488-6_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free