Less destructive cleaning of web documents by using standoff annotation

1Citations
Citations of this article
62Readers
Mendeley users who have this article in their library.

Abstract

Standoff annotation, that is, the separation of primary data and markup, can be an interesting option to annotate web pages since it does not demand the removal of annotations already present in web pages. We will present a standoff serialization that allows for annotating well-formed web pages with multiple annotation layers in a single instance, easing processing and analyzing of the data.

References Powered by Scopus

A toolkit for multi-dimensional markup: The development of SGF to XStandoff

17Citations
N/AReaders
Get full text

TEI feature structures as a representation format for multiple annotation and generic XML documents

9Citations
N/AReaders
Get full text

SGF - An integrated model for multiple annotations and its application in a linguistic domain

9Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Stührenberg, M. (2014). Less destructive cleaning of web documents by using standoff annotation. In Proceedings of the 9th Web as Corpus Workshop, WaC 2014 - Held at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014 (pp. 16–21). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-0403

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 15

65%

Researcher 5

22%

Lecturer / Post doc 2

9%

Professor / Associate Prof. 1

4%

Readers' Discipline

Tooltip

Computer Science 22

76%

Linguistics 5

17%

Neuroscience 1

3%

Social Sciences 1

3%

Save time finding and organizing research with Mendeley

Sign up for free