Unsupervised methods for developing taxonomies by combining syntactic and statistical information

Dominic Widdows

Conference ProceedingsOPEN ACCESS

Unsupervised methods for developing taxonomies by combining syntactic and statistical information

Widdows D

Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003 (2003)

DOI: 10.3115/1073445.1073481

77Citations

131Readers

Abstract

This paper describes an unsupervised algorithm for placing unknown words into a taxonomy and evaluates its accuracy on a large and varied sample of words. The algorithm works by first using a large corpus to find semantic neighbors of the unknown word, which we accomplish by combining latent semantic analysis with part-of-speech information. We then place the unknown word in the part of the taxonomy where these neighbors are most concentrated, using a class-labelling algorithm developed especially for this task. This method is used to reconstruct parts of the existing WordNet database, obtaining results for common nouns, proper nouns and verbs. We evaluate the contribution made by part-of-speech tagging and show that automatic filtering using the class-labelling algorithm gives a fourfold improvement in accuracy.

Cite

CITATION STYLE

APA

Widdows, D. (2003). Unsupervised methods for developing taxonomies by combining syntactic and statistical information. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003. Association for Computational Linguistics (ACL). https://doi.org/10.3115/1073445.1073481

Unsupervised methods for developing taxonomies by combining syntactic and statistical information

Abstract

Cite

Register to see more suggestions