Importance of HTML structural elements and metadata in automated subject classification

29Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The aim of the study was to determine how significance indicators assigned to different Web page elements (internal metadata, title, headings, and main text) influence automated classification. The data collection that was used comprised 1000 Web pages in engineering, to which Engineering Information classes had been manually assigned. The significance indicators were derived using several different methods: (total and partial) precision and recall, semantic distance and multiple regression. It was shown that for best results all the elements have to be included in the classification process. The exact way of combining the significance indicators turned out not to be overly important: using the F1 measure, the best combination of significance indicators yielded no more than 3% higher performance results than the baseline. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Golub, K., & Ardö, A. (2005). Importance of HTML structural elements and metadata in automated subject classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3652 LNCS, pp. 368–378). https://doi.org/10.1007/11551362_33

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free