Automatic content-based categorization of Wikipedia articles

7Citations
Citations of this article
96Readers
Mendeley users who have this article in their library.

Abstract

Wikipedia's article contents and its category hierarchy are widely used to produce semantic resources which improve performance on tasks like text classification and keyword extraction. The reverse - using text classification methods for predicting the categories of Wikipedia articles - has attracted less attention so far. We propose to “return the favor” and use text classifiers to improve Wikipedia. This could support the emergence of a virtuous circle between the wisdom of the crowds and machine learning/NLP methods. We define the categorization of Wikipedia articles as a multi-label classification task, describe two solutions to the task, and perform experiments that show that our approach is feasible despite the high number of labels.

Cite

CITATION STYLE

APA

Gantner, Z., & Schmidt-Thieme, L. (2009). Automatic content-based categorization of Wikipedia articles. In People’s Web 2009 - 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources at the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009 - Proceedings (pp. 32–37). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1699765.1699770

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free