An Empirical Study to Classify Website Using Thresholds from Data Characteristics

Ruchika Malhotra; Anjali Sharma

Conference Proceedings

An Empirical Study to Classify Website Using Thresholds from Data Characteristics

Advances in Intelligent Systems and Computing (2019) 904 433-446

DOI: 10.1007/978-981-13-5934-7_39

1Citations

3Readers

Get full text

Abstract

The advent of web had resulted in a plethora of information and data. However, its volume heterogeneity and unstructured organization makes information retrieval difficult. To the existing practice where website categorization is largely based on style rather than text, addition of an extra dimension in form of genre is expected to significantly improve the search outcome. Keeping this in view, we attempt to build a novel classification model to categorize websites into genres using thresholds of the web metrics. Statistical measures of central tendency are assumed to render a value that distinguish websites from a sample space containing News, Travel and Tourism, Entertainment and Social media. Through the statistical analysis of the data we find that the data distribution of all metrics which constitute the website properties are highly skewed. Hence, conventional analysis based on normal distribution statistics fails to apply. Adopting to a systematic empirical approach, we find that the classification performance measure identified through the Area Under the Curve is maximized around a threshold value which is twice the value of the “median-absolute-deviation” of the web metrics.

Author supplied keywords

Cite

CITATION STYLE

APA

Malhotra, R., & Sharma, A. (2019). An Empirical Study to Classify Website Using Thresholds from Data Characteristics. In Advances in Intelligent Systems and Computing (Vol. 904, pp. 433–446). Springer Verlag. https://doi.org/10.1007/978-981-13-5934-7_39

An Empirical Study to Classify Website Using Thresholds from Data Characteristics

Abstract

Author supplied keywords

Cite

Register to see more suggestions