Kronecker Attention Networks

Hongyang Gao; Zhengyang Wang; Shuiwang Ji

Conference ProceedingsOPEN ACCESS

Kronecker Attention Networks

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2020) 229-237

DOI: 10.1145/3394486.3403065

13Citations

38Readers

Get full text

Abstract

Attention operators have been applied on both 1-D data like texts and higher-order data such as images and videos. Use of attention operators on high-order data requires flattening of the spatial or spatial-temporal dimensions into a vector, which is assumed to follow a multivariate normal distribution. This not only incurs excessive requirements on computational resources, but also fails to preserve structures in data. In this work, we propose to avoid flattening by assuming the data follow matrix-variate normal distributions. Based on this new view, we develop Kronecker attention operators (KAOs) that operate on high-order tensor data directly. More importantly, the proposed KAOs lead to dramatic reductions in computational resources. Experimental results show that our methods reduce the amount of required computational resources by a factor of hundreds, with larger factors for higher-dimensional and higher-order data. Results also show that networks with KAOs outperform models without attention, while achieving competitive performance as those with original attention operators.

Author supplied keywords

Cite

CITATION STYLE

APA

Gao, H., Wang, Z., & Ji, S. (2020). Kronecker Attention Networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 229–237). Association for Computing Machinery. https://doi.org/10.1145/3394486.3403065

Kronecker Attention Networks

Abstract

Author supplied keywords

Cite

Register to see more suggestions