Dirichlet-multinomial (D-M) mixtures like latent Dirichlet allocation (LDA) are widely used for both topic modeling and clustering. Prior work on constructing Levin-style semantic verb clusters achieves state-of-the-art results using D-M mixtures for verb sense induction and clustering. We add a bias toward known clusters by explicitly labeling a small number of observations with their correct VerbNet class. We demonstrate that this partial supervision guides the resulting clusters effectively, improving the recovery of both labeled and unlabeled classes by 16%, for a joint 12% absolute improvement in F1 score compared to clustering without supervision. The resulting clusters are also more semantically coherent. Although the technical change is minor, it produces a large effect, with important practical consequences for supervised topic modeling in general.
CITATION STYLE
Peterson, D., Brown, S. W., & Palmer, M. (2020). Verb class induction with partial supervision. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 8616–8623). AAAI press. https://doi.org/10.1609/aaai.v34i05.6385
Mendeley helps you to discover research relevant for your work.