Building topic models in a federated digital library through selective document exclusion

9Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Building topic models in federated digital collections presents numerous challenges due to metadata inconsistencies. The quality of topical metadata is difficult to ascertain and is interspersed with often irrelevant administrative metadata. In this study, we propose a way to improve topic modeling in large collections by identifying documents that convey only weak topical information. These documents are ignored when training topic models. Their topical associations are instead inferred model training. A method is outlined for identifying weakly topical documents by defining runs of similar documents in a collection. In preliminary evaluation using a corpus from the Institute of Museum and Library Services Digital Collections and Content aggregation, results show an increase in coherence among words in topics. In showing this, we demonstrate that it may be beneficial to induce topic models using less, higher-quality data.

Cite

CITATION STYLE

APA

Efron, M., Organisciak, P., & Fenlon, K. (2011). Building topic models in a federated digital library through selective document exclusion. In Proceedings of the ASIST Annual Meeting (Vol. 48). John Wiley and Sons Inc. https://doi.org/10.1002/meet.2011.14504801048

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free