Abstract
Continuous representations of linguistic structures are an important part of modern natural language processing systems. Despite the diversity, most of the existing log-multilinear embedding models are organized under vector operations. However, these operations can not precisely represent the compositionality of natural language due to a lack of order-preserving properties. In this work, we focus on one of the promising alternatives based on the embedding of documents and words in the rotation group through the generalization of the coupled tensor chain decomposition to the exponential family of the probability distributions. In this model, documents and words are represented as matrices, and n-grams representations are combined from word representations by matrix multiplication. The proposed model is optimized via noise-contrastive estimation. We show empirically that capturing word order and higher-order word interactions allows our model to achieve the best results in several document classification benchmarks.
Cite
CITATION STYLE
Vorona, I., Phan, A. H., Panchenko, A., & Cichocki, A. (2021). Documents Representation via Generalized Coupled Tensor Chain with the Rotation Group Constraint. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 1674–1684). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.146
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.