NLP data cleansing based on linguistic ontology constraints

Dimitris Kontokostas; Martin Brümmer; Sebastian Hellmann; Jens Lehmann; Lazaros Ioannidis

Conference ProceedingsOPEN ACCESS

NLP data cleansing based on linguistic ontology constraints

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8465 LNCS 224-239

DOI: 10.1007/978-3-319-07443-6_16

10Citations

30Readers

Abstract

Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is - compared to other domains, such as biology - a late Linked Data adopter. However, it has seen a steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the lemon and NIF vocabularies in 277 test cases and point out common quality issues. © 2014 Springer International Publishing.

Author supplied keywords

Cite

CITATION STYLE

APA

Kontokostas, D., Brümmer, M., Hellmann, S., Lehmann, J., & Ioannidis, L. (2014). NLP data cleansing based on linguistic ontology constraints. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8465 LNCS, pp. 224–239). Springer Verlag. https://doi.org/10.1007/978-3-319-07443-6_16

NLP data cleansing based on linguistic ontology constraints

Abstract

Author supplied keywords

Cite

Register to see more suggestions