High-performance high-volume layered corpora annotation

3Citations
Citations of this article
72Readers
Mendeley users who have this article in their library.

Abstract

NLP systems that deal with large collections of text require significant computational resources, both in terms of space and processing time. Moreover, these systems typically add new layers of linguistic information with references to another layer. The spreading of these layered annotations across different files makes them more difficult to process and access the data. As the amount of input increases, so does the difficulty to process it. One approach is to use distributed parallel computing for solving these larger problems and save time. We propose a framework that simplifies the integration of independently existing NLP tools to build language-independent NLP systems capable of creating layered annotations. Moreover, it allows the development of scalable NLP systems, that executes NLP tools in parallel, while offering an easy-to-use programming environment and a transparent handling of distributed computing problems. With this framework the execution time was decreased to 40 times less than the original one on a cluster with 80 cores. © 2009 The Association for Computational Linguistics.

References Powered by Scopus

MapReduce: Simplified data processing on large clusters

11962Citations
N/AReaders
Get full text

UIMA: An architectural approach to unstructured information processing in the corporate research environment

662Citations
N/AReaders
Get full text

Evolving GATE to meet new challenges in language engineering

139Citations
N/AReaders
Get full text

Cited by Powered by Scopus

GATECloud.net: A platform for large-scale, open-source text processing on the cloud

47Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Luís, T., & De Matos, D. M. (2009). High-performance high-volume layered corpora annotation. In ACL-IJCNLP 2009 - LAW 2009: 3rd Linguistic Annotation Workshop, Proceedings (pp. 99–107). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1698381.1698397

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 22

65%

Researcher 7

21%

Professor / Associate Prof. 3

9%

Lecturer / Post doc 2

6%

Readers' Discipline

Tooltip

Computer Science 27

73%

Linguistics 7

19%

Engineering 2

5%

Neuroscience 1

3%

Save time finding and organizing research with Mendeley

Sign up for free