To Be or not to Be, Tail Labels in Extreme Multi-label Learning

Zhiqi Ge; Ximing Li

Conference ProceedingsOPEN ACCESS

To Be or not to Be, Tail Labels in Extreme Multi-label Learning

International Conference on Information and Knowledge Management, Proceedings (2021) 555-564

DOI: 10.1145/3459637.3482303

3Citations

6Readers

Get full text

Abstract

EXtreme Multi-label Learning (XML) aims to predict each instance its most relevant subset of labels from an extremely huge label space, often exceeding one million or even larger in many real applications. In XML scenarios, the labels exhibit a long tail distribution, where a significant number of labels appear in very few instances, referred to as tail labels. Unfortunately, due to the lack of positive instances, the tail labels are intractable to learn as well as predict. Several previous studies even suggested that the tail labels can be directly removed by referring to their label frequencies. We consider that such violent principle may miss many significant tail labels, because the predictive accuracy is not strictly consistent with the label frequency especially for tail labels. In this paper, we are interested in finding a reasonable principle to determine whether a tail label should be removed, not only depending on their label frequencies. To this end, we investigate a method named Nearest Neighbor Positive Proportion Score (N2P2S) to score the tail labels by annotations of the instance neighbors. Extensive empirical results indicate that the proposed N2P2S can effectively screen the tail labels, where many preserved tail labels can be learned and accurately predicted even with very few positive instances.

Author supplied keywords

Cite

CITATION STYLE

APA

Ge, Z., & Li, X. (2021). To Be or not to Be, Tail Labels in Extreme Multi-label Learning. In International Conference on Information and Knowledge Management, Proceedings (pp. 555–564). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482303

To Be or not to Be, Tail Labels in Extreme Multi-label Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions