NLP data cleansing based on linguistic ontology constraints

10Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is - compared to other domains, such as biology - a late Linked Data adopter. However, it has seen a steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the lemon and NIF vocabularies in 277 test cases and point out common quality issues. © 2014 Springer International Publishing.

Author supplied keywords

Cite

CITATION STYLE

APA

Kontokostas, D., Brümmer, M., Hellmann, S., Lehmann, J., & Ioannidis, L. (2014). NLP data cleansing based on linguistic ontology constraints. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8465 LNCS, pp. 224–239). Springer Verlag. https://doi.org/10.1007/978-3-319-07443-6_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free