GraWiTas: A grammar-based wikipedia talk page parser

2Citations
Citations of this article
70Readers
Mendeley users who have this article in their library.

Abstract

Wikipedia offers researchers unique insights into the collaboration and communication patterns of a large self-regulating community of editors. The main medium of direct communication between editors of an article is the article's talk page. However, a talk page file is unstructured and therefore difficult to analyse automatically. A few parsers exist that enable its transformation into a structured data format. However, they are rarely open source, support only a limited subset of the talk page syntax - resulting in the loss of content - and usually support only one export format. Together with this article we offer a very fast, lightweight, open source parser with support for various output formats. In a preliminary evaluation it achieved a high accuracy. The parser uses a grammar- based approach - offering a transparent implementation and easy extensibility.

Cite

CITATION STYLE

APA

Cabrera, B., Steinert, L., & Ross, B. (2017). GraWiTas: A grammar-based wikipedia talk page parser. In 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of the Software Demonstrations (pp. 21–24). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/e17-3006

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free