Standoff annotation, that is, the separation of primary data and markup, can be an interesting option to annotate web pages since it does not demand the removal of annotations already present in web pages. We will present a standoff serialization that allows for annotating well-formed web pages with multiple annotation layers in a single instance, easing processing and analyzing of the data.
CITATION STYLE
Stührenberg, M. (2014). Less destructive cleaning of web documents by using standoff annotation. In Proceedings of the 9th Web as Corpus Workshop, WaC 2014 - Held at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014 (pp. 16–21). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-0403
Mendeley helps you to discover research relevant for your work.