Clustering of wikipedia texts based on keywords

Jalalaldin Gharibi Karyak; Fardin Yazdanpanah Sisakht; Sadrollah Abbasi

Conference Proceedings

Clustering of wikipedia texts based on keywords

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9790 513-529

DOI: 10.1007/978-3-319-42092-9_39

0Citations

6Readers

Get full text

Abstract

The paper presents application of spectral clustering algorithms used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and it has been used to categorize search result in this repository. We evaluate proposed approach with Primary Component Analysis and show, on a test data, how usage of cosine transformation to create combined representations influence a data variability. On a sample test datasets we also show how combined representation improves the data separation that increases overall results of data categorization. We gave the review of the main spectral clustering methods and we compare them using external validation criteria with standard clustering quality measures. Discussion on descriptiveness of evaluation measures and performed experiments on test datasets allows us to select the one spectral clustering algorithm that has been implemented in our system. We gave a brief description of the system architecture that groups on-line Wikipedia articles retrieved with specified keywords. Using the system we show how clustering increases information retrieval effectiveness for Wikipedia data repository.

Author supplied keywords

Cite

CITATION STYLE

APA

Karyak, J. G., Sisakht, F. Y., & Abbasi, S. (2016). Clustering of wikipedia texts based on keywords. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9790, pp. 513–529). Springer Verlag. https://doi.org/10.1007/978-3-319-42092-9_39

Clustering of wikipedia texts based on keywords

Abstract

Author supplied keywords

Cite

Register to see more suggestions