In order to be considered as Linked Data, the datasets on the web must be linked to other datasets. Current studies on dataset interlinking prediction researches do not distinguish the type of links, which are of less help for real application scenarios, as dataset publishers still do not know what kinds of RDF links can be established and furthermore how to configure the data linking algorithms. In this paper, we focus on predicting the possible links between datasets with the most important RDF link type, owl:sameAs. Since the goal is to discriminate between linked dataset pairs against not-linked ones, we formulate the link prediction problem as a classification problem. We adopt Random Forest as the basic classifier to incorporate features of the scores output by unsupervised predictors, and apply the bagging technique to combine multiple forests to reduce variance and improve the accuracy. Experiments show we can improve the prediction performance by about 10% in AUROC.
Liu, H., Wang, T., Tang, J., Ning, H., & Wei, D. (2017). Link prediction of datasets sameAS interlinking network on web of data. In 2017 3rd International Conference on Information Management, ICIM 2017 (pp. 346–352). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/INFOMAN.2017.7950406