Exploratory class-imbalanced and non-identical data distribution in automatic keyphrase extraction

Weijian Ni; Tong Liu; Qingtian Zeng

Conference Proceedings

Exploratory class-imbalanced and non-identical data distribution in automatic keyphrase extraction

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7368 LNCS(PART 2) 336-345

DOI: 10.1007/978-3-642-31362-2_38

0Citations

4Readers

Get full text

Abstract

While supervised learning algorithms hold much promise for automatic keyphrase extraction, most of them presume that the samples are evenly distributed among different classes as well as drawn from an identical distribution, which, however, may not be the case in the real-world task of extracting keyphrases from documents. In this paper, we propose a novel supervised keyphrase extraction approach which deals with the problems of class-imbalanced and non-identical data distributions in automatic keyphrase extraction. Our approach is by nature a stacking approach where meta-models are trained on balanced partitions of a given training set and then combined through introducing meta-features describing particular keyphrase patterns embedded in each document. Experimental results verify the effectiveness of our approach. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Ni, W., Liu, T., & Zeng, Q. (2012). Exploratory class-imbalanced and non-identical data distribution in automatic keyphrase extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7368 LNCS, pp. 336–345). https://doi.org/10.1007/978-3-642-31362-2_38

Exploratory class-imbalanced and non-identical data distribution in automatic keyphrase extraction

Abstract

Author supplied keywords

Cite

Register to see more suggestions