Automatic content-based categorization of Wikipedia articles

Zeno Gantner; Lars Schmidt-Thieme

Conference ProceedingsOPEN ACCESS

Automatic content-based categorization of Wikipedia articles

People's Web 2009 - 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources at the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009 - Proceedings (2009) 32-37

DOI: 10.3115/1699765.1699770

7Citations

96Readers

Abstract

Wikipedia's article contents and its category hierarchy are widely used to produce semantic resources which improve performance on tasks like text classification and keyword extraction. The reverse - using text classification methods for predicting the categories of Wikipedia articles - has attracted less attention so far. We propose to “return the favor” and use text classifiers to improve Wikipedia. This could support the emergence of a virtuous circle between the wisdom of the crowds and machine learning/NLP methods. We define the categorization of Wikipedia articles as a multi-label classification task, describe two solutions to the task, and perform experiments that show that our approach is feasible despite the high number of labels.

Cite

CITATION STYLE

APA

Gantner, Z., & Schmidt-Thieme, L. (2009). Automatic content-based categorization of Wikipedia articles. In People’s Web 2009 - 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources at the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009 - Proceedings (pp. 32–37). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1699765.1699770

Automatic content-based categorization of Wikipedia articles

Abstract

Cite

Register to see more suggestions