LiteGT: Efficient and Lightweight Graph Transformers

8Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Transformers have shown great potential for modeling long-term dependencies for natural language processing and computer vision. However, little study has applied transformers to graphs, which is challenging due to the poor scalability of the attention mechanism and the under-exploration of graph inductive bias. To bridge this gap, we propose a Lite Graph Transformer (LiteGT) that learns on arbitrary graphs efficiently. First, a node sampling strategy is proposed to sparsify the considered nodes in self-attention with only O (Nlog N) time. Second, we devise two kernelization approaches to form two-branch attention blocks, which not only leverage graph-specific topology information, but also reduce computation further to O (1 over 2 Nlog N). Third, the nodes are updated with different attention schemes during training, thus largely mitigating over-smoothing problems when the model layers deepen. Extensive experiments demonstrate that LiteGT achieves competitive performance on both node classification and link prediction on datasets with millions of nodes. Specifically, Jaccard + Sampling + Dim. reducing setting reduces more than 100x computation and halves the model size without performance degradation.

Cite

CITATION STYLE

APA

Chen, C., Tao, C., & Wong, N. (2021). LiteGT: Efficient and Lightweight Graph Transformers. In International Conference on Information and Knowledge Management, Proceedings (pp. 161–170). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482272

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free