Extracting facets from lost fine-grained categorizations in dataspaces

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Categorization of instances in dataspaces is a difficult and time consuming task, usually performed by domain experts. In this paper we propose a semi-automatic approach to the extraction of facets for the fine-grained categorization of instances in dataspaces. We focus on the case where instances are categorized under heterogeneous taxonomies in several sources. Our approach leverages Taxonomy Layer Distance, a new metric based on structural analysis of source taxonomies, to support the identification of meaningful candidate facets. Once validated and refined by domain experts, the extracted facets provide a fine-grained classification of dataspace instances. We implemented and evaluated our approach in a real world dataspace in the eCommerce domain. Experimental results show that our approach is capable of extracting meaningful facets and that the new metric we propose for the structural analysis of source taxonomies outperforms other state-of-the-art metrics. © 2014 Springer International Publishing.

Cite

CITATION STYLE

APA

Porrini, R., Palmonari, M., & Batini, C. (2014). Extracting facets from lost fine-grained categorizations in dataspaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8484 LNCS, pp. 580–594). Springer Verlag. https://doi.org/10.1007/978-3-319-07881-6_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free