Wikipedia's article contents and its category hierarchy are widely used to produce semantic resources which improve performance on tasks like text classification and keyword extraction. The reverse - using text classification methods for predicting the categories of Wikipedia articles - has attracted less attention so far. We propose to “return the favor” and use text classifiers to improve Wikipedia. This could support the emergence of a virtuous circle between the wisdom of the crowds and machine learning/NLP methods. We define the categorization of Wikipedia articles as a multi-label classification task, describe two solutions to the task, and perform experiments that show that our approach is feasible despite the high number of labels.
CITATION STYLE
Gantner, Z., & Schmidt-Thieme, L. (2009). Automatic content-based categorization of Wikipedia articles. In People’s Web 2009 - 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources at the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009 - Proceedings (pp. 32–37). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1699765.1699770
Mendeley helps you to discover research relevant for your work.