SSUP – A url-based method to entity-page discovery

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Entity-pages are Web pages that publish data representing one only instance of a certain conceptual entity. In this paper we propose SSUP, a new method to entity-page discovery. Specifically, given a sample entity-page from a Web site (e.g., Jolyon Palmer entity-page from GP2 Web site) we aim to find all same type entity-pages (driver entitypages) from this Web site. We propose two structural URL similarity metrics and a set of algorithms to combine URL features with HTML features in order to improve the quality results and minimize the number of downloaded pages and processing time. We evaluate our method in real world Web sites and compare it with two baselines to demonstrate the effectiveness of our method.

Cite

CITATION STYLE

APA

Manica, E., Galante, R., & Dorneles, C. F. (2014). SSUP – A url-based method to entity-page discovery. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8541, 254–271. https://doi.org/10.1007/978-3-319-08245-5_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free