Domain adaptation is usually discussed from the point of view of new algorithms that minimise performance loss when applying a classifier trained on one domain to another. However, finding pertinent data similar to the test domain is equally important for achieving high accuracy in a cross-domain task. This study proposes an algorithm for automatic estimation of performance loss in the context of cross-domain sentiment classification. We present and validate several measures of domain similarity specially designed for the sentiment classification task. We also introduce a new characteristic, called domain complexity, as another independent factor influencing performance loss, and propose various functions for its approximation. Finally, a linear regression for modeling accuracy loss is built and tested in different evaluation settings. As a result, we are able to predict the accuracy loss with an average error of 1.5% and a maximum error of 3.4%. © 2012 Springer-Verlag.
CITATION STYLE
Ponomareva, N., & Thelwall, M. (2012). Biographies or blenders: Which resource is best for cross-domain sentiment analysis? In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7181 LNCS, pp. 488–499). https://doi.org/10.1007/978-3-642-28604-9_40
Mendeley helps you to discover research relevant for your work.