Efficient Cross-Domain Classification of Weblogs

  • Lex E
  • Seifert C
  • Granitzer M
  • et al.
N/ACitations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Text classification is one of the core applications in data miningdue to the huge amount of uncategorized textual data available. Traininga text classifier results in a classification model that reflectsthe characteristics of the domain it was learned on. However, ifno training data is available, labeled data from a related but differentdomain might be exploited to perform cross-domain classification.In our work, we aim to accurately classify unlabeled weblogs intocommonly agreed upon newspaper categories using labeled data fromthe news domain. The labeled news and the unlabeled blog corpus arehighly dynamic and hourly growing with a topic drift, so the classificationneeds to be efficient. Our approach is to apply a fast novel centroid-basedtext classification algorithm, the Class-Feature-Centroid Classifier(CFC), to perform efficient cross-domain classification. Experimentsshowed that this algorithm achieves a comparable accuracy than k-NearestNeighbour (k-NN) and Support Vector Machines (SVM), yet at lineartime cost for training and classification. We investigate the classifierperformance and generalization ability using a special visualizationof classifiers. The benefit of our approach is that the linear timecomplexity enables us to efficiently generate an accurate classifier,reflecting the topic drift, several times per day on a huge dataset.

Cite

CITATION STYLE

APA

Lex, E., Seifert, C., Granitzer, M., & Juffinger, A. (2010). Efficient Cross-Domain Classification of Weblogs. International Journal of Intelligent Computing Research, 1(3), 55–62. https://doi.org/10.20533/ijicr.2042.4655.2010.0007

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free