Feature Based Identification of Web Page Noise through K-Means Clustering

undefined; undefined; S. S. Bhamare*; B. V. Pawar

Journal Article

Feature Based Identification of Web Page Noise through K-Means Clustering

Bhamare* S
et al.

International Journal of Innovative Technology and Exploring Engineering (2020) 9(3) 1966-1970

DOI: 10.35940/ijitee.c9023.019320

N/ACitations

2Readers

Get full text

Abstract

Web pages has pieces of information which are of unequal importance like navigational bar, copyright notice, links, advertisement etc. and these are considered as noise or insignificant items of web page for web mining. Web page informative content is only useful for performing effective web mining task and presence of noise on web page can hamper the result of this task. Web page has several features including information location, occupied area and its contents. Content data in different portions of an internet web page has dissimilar significance weights according to its location, occupied location and content that are considered to be features of the web page. The position of contents and importance of contents play a vital role in identification of noise in web pages for removal. In this paper web page feature based method is proposed for identification of noise from web pages. K-means clustering technique is applied to classify main content information and noise content information into two clusters of web pages based on these features. For performance evaluation of clustering method, accuracy, precision, f-measure, and recall are calculated.

Cite

CITATION STYLE

APA

Bhamare*, S. S., & Pawar, B. V. (2020). Feature Based Identification of Web Page Noise through K-Means Clustering. International Journal of Innovative Technology and Exploring Engineering, 9(3), 1966–1970. https://doi.org/10.35940/ijitee.c9023.019320

Feature Based Identification of Web Page Noise through K-Means Clustering

Abstract

Cite

Register to see more suggestions