Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is - compared to other domains, such as biology - a late Linked Data adopter. However, it has seen a steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the lemon and NIF vocabularies in 277 test cases and point out common quality issues. © 2014 Springer International Publishing.
CITATION STYLE
Kontokostas, D., Brümmer, M., Hellmann, S., Lehmann, J., & Ioannidis, L. (2014). NLP data cleansing based on linguistic ontology constraints. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8465 LNCS, pp. 224–239). Springer Verlag. https://doi.org/10.1007/978-3-319-07443-6_16
Mendeley helps you to discover research relevant for your work.