This paper describes new machine learning approaches to predict the correct homepage in response to a user’s homepage finding query. This involves two phases. In the first phase, a decision tree is generated to predict whether a URL is a homepage URL or not. The decision tree then is used to filter out non-homepages from the web pages returned by a standard vector space information retrieval system. In the second phase, a logistic regression analysis is used to combine multiple sources of evidence based on the homepages remaining from the first step to predict which homepage is most relevant to a user’s query. 100 queries are used to train the logistic regression model and another 145 testing queries are used to evaluate the model derived. Our results show that about 84% of the testing queries had the correct homepage returned within the top 10 pages. This shows that our machine learning approaches are effective since without any machine learning approaches, only 59% of the testing queries had their correct answers returned within the top 10 hits.
CITATION STYLE
Xi, W., Fox, E. A., Tan, R. P., & Shu, J. (2002). Machine learning approach for homepage finding task. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2476, pp. 145–159). Springer Verlag. https://doi.org/10.1007/3-540-45735-6_14
Mendeley helps you to discover research relevant for your work.