Clustering of wikipedia texts based on keywords

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The paper presents application of spectral clustering algorithms used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and it has been used to categorize search result in this repository. We evaluate proposed approach with Primary Component Analysis and show, on a test data, how usage of cosine transformation to create combined representations influence a data variability. On a sample test datasets we also show how combined representation improves the data separation that increases overall results of data categorization. We gave the review of the main spectral clustering methods and we compare them using external validation criteria with standard clustering quality measures. Discussion on descriptiveness of evaluation measures and performed experiments on test datasets allows us to select the one spectral clustering algorithm that has been implemented in our system. We gave a brief description of the system architecture that groups on-line Wikipedia articles retrieved with specified keywords. Using the system we show how clustering increases information retrieval effectiveness for Wikipedia data repository.

Cite

CITATION STYLE

APA

Karyak, J. G., Sisakht, F. Y., & Abbasi, S. (2016). Clustering of wikipedia texts based on keywords. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9790, pp. 513–529). Springer Verlag. https://doi.org/10.1007/978-3-319-42092-9_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free