Document representation is an essential step in web page clustering. Web pages are usually written in HTML, offering useful information to select the most important features to represent them. In this paper we investigate the use of nonlinear combinations of criteria by means of a fuzzy system to find those important features. We start our research from a term weighting function called Fuzzy Combination of Criteria (fcc) that relies on term frequency, document title, emphasis and term positions in the text. Next, we analyze its drawbacks and explore the possibility of adding contextual information extracted from inlinks anchor texts, proposing an alternative way of combining criteria based on our experimental results. Finally, we apply a statistical test of significance to compare the original representation with our proposal. © 2012 Springer-Verlag.
CITATION STYLE
Pérez García-Plaza, A., Fresno, V., & Martínez, R. (2012). Fuzzy combinations of criteria: An application to web page representation for clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7182 LNCS, pp. 157–168). https://doi.org/10.1007/978-3-642-28601-8_14
Mendeley helps you to discover research relevant for your work.