Efficient Cross-Domain Classification of Weblogs

Elisabeth Lex; Christin Seifert; Michael Granitzer; Andreas Juffinger

Journal ArticleOPEN ACCESS

Efficient Cross-Domain Classification of Weblogs

Lex E
Seifert C
Granitzer M
et al.

International Journal of Intelligent Computing Research (2010) 1(3) 55-62

DOI: 10.20533/ijicr.2042.4655.2010.0007

N/ACitations

8Readers

Abstract

Text classification is one of the core applications in data miningdue to the huge amount of uncategorized textual data available. Traininga text classifier results in a classification model that reflectsthe characteristics of the domain it was learned on. However, ifno training data is available, labeled data from a related but differentdomain might be exploited to perform cross-domain classification.In our work, we aim to accurately classify unlabeled weblogs intocommonly agreed upon newspaper categories using labeled data fromthe news domain. The labeled news and the unlabeled blog corpus arehighly dynamic and hourly growing with a topic drift, so the classificationneeds to be efficient. Our approach is to apply a fast novel centroid-basedtext classification algorithm, the Class-Feature-Centroid Classifier(CFC), to perform efficient cross-domain classification. Experimentsshowed that this algorithm achieves a comparable accuracy than k-NearestNeighbour (k-NN) and Support Vector Machines (SVM), yet at lineartime cost for training and classification. We investigate the classifierperformance and generalization ability using a special visualizationof classifiers. The benefit of our approach is that the linear timecomplexity enables us to efficiently generate an accurate classifier,reflecting the topic drift, several times per day on a huge dataset.

Cite

CITATION STYLE

APA

Lex, E., Seifert, C., Granitzer, M., & Juffinger, A. (2010). Efficient Cross-Domain Classification of Weblogs. International Journal of Intelligent Computing Research, 1(3), 55–62. https://doi.org/10.20533/ijicr.2042.4655.2010.0007

Efficient Cross-Domain Classification of Weblogs

Abstract

Cite

Register to see more suggestions