We propose AutoName, an unsupervised framework that extracts a name for a set of query entities from a large-scale text corpus. Entity-set naming is useful in many tasks related to natural language processing and information retrieval such as session-based and conversational information seeking. Previous studies mainly extract set names from knowledge bases which provide highly reliable entity relations, but suffer from limited coverage of entities and set names that represent broad semantic classes. To address these problems, AutoName generates hypernym-anchored candidate phrases via probing a pre-trained language model and the entities' context in documents. Phrases are then clustered to identify ones that describe common concepts among query entities. Finally, AutoName ranks refined phrases based on the co-occurrences of their words with query entities and the conceptual integrity of their respective clusters. We built a new benchmark dataset for this task, consisting of 130 entity sets with name labels. Experimental results show that AutoName generates coherent and meaningful set names and significantly outperforms all baselines.
CITATION STYLE
Huang, Z., Rahimi, R., Yu, P., Shang, J., & Allan, J. (2021). AutoName: A Corpus-Based Set Naming Framework. In SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2102–2105). Association for Computing Machinery, Inc. https://doi.org/10.1145/3404835.3463100
Mendeley helps you to discover research relevant for your work.