An Empirical Study to Classify Website Using Thresholds from Data Characteristics

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The advent of web had resulted in a plethora of information and data. However, its volume heterogeneity and unstructured organization makes information retrieval difficult. To the existing practice where website categorization is largely based on style rather than text, addition of an extra dimension in form of genre is expected to significantly improve the search outcome. Keeping this in view, we attempt to build a novel classification model to categorize websites into genres using thresholds of the web metrics. Statistical measures of central tendency are assumed to render a value that distinguish websites from a sample space containing News, Travel and Tourism, Entertainment and Social media. Through the statistical analysis of the data we find that the data distribution of all metrics which constitute the website properties are highly skewed. Hence, conventional analysis based on normal distribution statistics fails to apply. Adopting to a systematic empirical approach, we find that the classification performance measure identified through the Area Under the Curve is maximized around a threshold value which is twice the value of the “median-absolute-deviation” of the web metrics.

Cite

CITATION STYLE

APA

Malhotra, R., & Sharma, A. (2019). An Empirical Study to Classify Website Using Thresholds from Data Characteristics. In Advances in Intelligent Systems and Computing (Vol. 904, pp. 433–446). Springer Verlag. https://doi.org/10.1007/978-981-13-5934-7_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free