The paper presents application of spectral clustering algorithms used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and it has been used to categorize search result in this repository. We evaluate proposed approach with Primary Component Analysis and show, on a test data, how usage of cosine transformation to create combined representations influence a data variability. On a sample test datasets we also show how combined representation improves the data separation that increases overall results of data categorization. We gave the review of the main spectral clustering methods and we compare them using external validation criteria with standard clustering quality measures. Discussion on descriptiveness of evaluation measures and performed experiments on test datasets allows us to select the one spectral clustering algorithm that has been implemented in our system. We gave a brief description of the system architecture that groups on-line Wikipedia articles retrieved with specified keywords. Using the system we show how clustering increases information retrieval effectiveness for Wikipedia data repository.
CITATION STYLE
Karyak, J. G., Sisakht, F. Y., & Abbasi, S. (2016). Clustering of wikipedia texts based on keywords. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9790, pp. 513–529). Springer Verlag. https://doi.org/10.1007/978-3-319-42092-9_39
Mendeley helps you to discover research relevant for your work.