Entity-pages are Web pages that publish data representing one only instance of a certain conceptual entity. In this paper we propose SSUP, a new method to entity-page discovery. Specifically, given a sample entity-page from a Web site (e.g., Jolyon Palmer entity-page from GP2 Web site) we aim to find all same type entity-pages (driver entitypages) from this Web site. We propose two structural URL similarity metrics and a set of algorithms to combine URL features with HTML features in order to improve the quality results and minimize the number of downloaded pages and processing time. We evaluate our method in real world Web sites and compare it with two baselines to demonstrate the effectiveness of our method.
CITATION STYLE
Manica, E., Galante, R., & Dorneles, C. F. (2014). SSUP – A url-based method to entity-page discovery. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8541, 254–271. https://doi.org/10.1007/978-3-319-08245-5_15
Mendeley helps you to discover research relevant for your work.