On dataless hierarchical text classification

Yangqiu Song; Dan Roth

Conference ProceedingsOPEN ACCESS

On dataless hierarchical text classification

Proceedings of the National Conference on Artificial Intelligence (2014) 2 1579-1585

DOI: 10.1609/aaai.v28i1.8938

86Citations

109Readers

Abstract

In this paper, we systematically study the problem of dataless hierarchical text classification. Unlike standard text classification schemes that rely on supervised training, dataless classification depends on understanding the labels of the sought after categories and requires no labeled data. Given a collection of text documents and a set of labels, we show that understanding the labels can be used to accurately categorize the documents. This is done by embedding both labels and documents in a semantic space that allows one to compute meaningful semantic similarity between a document and a potential label. We show that this scheme can be used to support accurate multiclass classification without any supervision. We study several semantic representations and show how to improve the classification using bootstrapping. Our results show that bootstrapped dataless classification is competitive with supervised classification with thousands of labeled examples.

Cite

CITATION STYLE

APA

Song, Y., & Roth, D. (2014). On dataless hierarchical text classification. In Proceedings of the National Conference on Artificial Intelligence (Vol. 2, pp. 1579–1585). AI Access Foundation. https://doi.org/10.1609/aaai.v28i1.8938

On dataless hierarchical text classification

Abstract

Cite

Register to see more suggestions