Abstract
Web pages has pieces of information which are of unequal importance like navigational bar, copyright notice, links, advertisement etc. and these are considered as noise or insignificant items of web page for web mining. Web page informative content is only useful for performing effective web mining task and presence of noise on web page can hamper the result of this task. Web page has several features including information location, occupied area and its contents. Content data in different portions of an internet web page has dissimilar significance weights according to its location, occupied location and content that are considered to be features of the web page. The position of contents and importance of contents play a vital role in identification of noise in web pages for removal. In this paper web page feature based method is proposed for identification of noise from web pages. K-means clustering technique is applied to classify main content information and noise content information into two clusters of web pages based on these features. For performance evaluation of clustering method, accuracy, precision, f-measure, and recall are calculated.
Cite
CITATION STYLE
Bhamare*, S. S., & Pawar, B. V. (2020). Feature Based Identification of Web Page Noise through K-Means Clustering. International Journal of Innovative Technology and Exploring Engineering, 9(3), 1966–1970. https://doi.org/10.35940/ijitee.c9023.019320
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.