Fuzzy semantic labeling of semi-structured numerical datasets

Ahmad Alobaid; Oscar Corcho

Conference Proceedings

Fuzzy semantic labeling of semi-structured numerical datasets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11313 19-33

DOI: 10.1007/978-3-030-03667-6_2

6Citations

11Readers

Get full text

Abstract

SPARQL endpoints provide access to rich sources of data (e.g. knowledge graphs), which can be used to classify other less structured datasets (e.g. CSV files or HTML tables on the Web). We propose an approach to suggest types for the numerical columns of a collection of input files available as CSVs. Our approach is based on the application of the fuzzy c-means clustering technique to numerical data in the input files, using existing SPARQL endpoints to generate training datasets. Our approach has three major advantages: it works directly with live knowledge graphs, it does not require knowledge-graph profiling beforehand, and it avoids tedious and costly manual training to match values with types. We evaluate our approach against manually annotated datasets. The results show that the proposed approach classifies most of the types correctly for our test sets.

Author supplied keywords

Cite

CITATION STYLE

APA

Alobaid, A., & Corcho, O. (2018). Fuzzy semantic labeling of semi-structured numerical datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11313, pp. 19–33). Springer Verlag. https://doi.org/10.1007/978-3-030-03667-6_2

Fuzzy semantic labeling of semi-structured numerical datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions