Abstract
Wikipedia offers researchers unique insights into the collaboration and communication patterns of a large self-regulating community of editors. The main medium of direct communication between editors of an article is the article's talk page. However, a talk page file is unstructured and therefore difficult to analyse automatically. A few parsers exist that enable its transformation into a structured data format. However, they are rarely open source, support only a limited subset of the talk page syntax - resulting in the loss of content - and usually support only one export format. Together with this article we offer a very fast, lightweight, open source parser with support for various output formats. In a preliminary evaluation it achieved a high accuracy. The parser uses a grammar- based approach - offering a transparent implementation and easy extensibility.
Cite
CITATION STYLE
Cabrera, B., Steinert, L., & Ross, B. (2017). GraWiTas: A grammar-based wikipedia talk page parser. In 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of the Software Demonstrations (pp. 21–24). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/e17-3006
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.