There is an increasing trend of using Linked Datasets for creating embeddings from URI sequences, since such embeddings can be exploited for several tasks, i.e., for machine learning problems, tasks related to content-based similarity, and others. Existing techniques exploit either a single or a few datasets (or RDF graphs) for creating URI sequences for one or more entities. However, there are not available approaches, where data from multiple datasets are combined, for enriching the URI sequences for a given entity. For this reason, we introduce a prototype, called LODVec, that exploits LODsyndesis knowledge graph, which is the largest knowledge graph including all inferred equivalence relationships. LODVec exploits this graph for creating URI sequences for millions of entities by combining data from 400 datasets, whereas it offers several configurable options for creating such URI sequences that are based on metadata (e.g., provenance). Moreover, it uses as input the produced URI sequences for creating URI embeddings through word2vec model. We evaluate the gain of exploiting several datasets (instead of a single or few ones) and the impact of cross-dataset reasoning for machine-learning based tasks (i.e., classification and regression), and we compare the effectiveness of several configurations and machine learning models.
CITATION STYLE
Mountantonakis, M., & Tzitzikas, Y. (2019). Knowledge Graph Embeddings over Hundreds of Linked Datasets. In Communications in Computer and Information Science (Vol. 1057 CCIS, pp. 150–162). Springer. https://doi.org/10.1007/978-3-030-36599-8_13
Mendeley helps you to discover research relevant for your work.