Low-rank approximations of second-order document representations

Jarkko Lagus; Janne Sinkkonen; Arto Klami

Conference ProceedingsOPEN ACCESS

Low-rank approximations of second-order document representations

CoNLL 2019 - 23rd Conference on Computational Natural Language Learning, Proceedings of the Conference (2019) 634-644

DOI: 10.18653/v1/k19-1059

2Citations

57Readers

Abstract

Document embeddings, created with methods ranging from simple heuristics to statistical and deep models, are widely applicable. Bag-of-vectors models for documents include the mean and quadratic approaches (Torki, 2018). We present evidence that quadratic statistics alone, without the mean information, can offer superior accuracy, fast document comparison, and compact document representations. In matching news articles to their comment threads, low-rank representations of only 3-4 times the size of the mean vector give most accurate matching, and in standard sentence comparison tasks, results are state of the art despite faster computation. Similarity measures are discussed, and the Frobenius product implicit in the proposed method is contrasted to Wasserstein or Bures metric from the transportation theory. We also shortly demonstrate matching of unordered word lists to documents, to measure topicality or sentiment of documents.

Cite

CITATION STYLE

APA

Lagus, J., Sinkkonen, J., & Klami, A. (2019). Low-rank approximations of second-order document representations. In CoNLL 2019 - 23rd Conference on Computational Natural Language Learning, Proceedings of the Conference (pp. 634–644). Association for Computational Linguistics. https://doi.org/10.18653/v1/k19-1059

Low-rank approximations of second-order document representations

Abstract

Cite

Register to see more suggestions