Less destructive cleaning of web documents by using standoff annotation

1Citations
Citations of this article
62Readers
Mendeley users who have this article in their library.

Abstract

Standoff annotation, that is, the separation of primary data and markup, can be an interesting option to annotate web pages since it does not demand the removal of annotations already present in web pages. We will present a standoff serialization that allows for annotating well-formed web pages with multiple annotation layers in a single instance, easing processing and analyzing of the data.

Cite

CITATION STYLE

APA

Stührenberg, M. (2014). Less destructive cleaning of web documents by using standoff annotation. In Proceedings of the 9th Web as Corpus Workshop, WaC 2014 - Held at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014 (pp. 16–21). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-0403

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free