With the rapid evolution of transformer architectures, researchers are exploring their application in sequential recommender systems (SRSs) and presenting promising performance on SRS tasks compared with former SRS models. However, most existing transformer-based SRS frameworks retain the vanilla attention mechanism, which calculates the attention scores between all item-item pairs. With this setting, redundant item interactions can harm the model performance and consume much computation time and memory. In this paper, we identify the sparse attention phenomenon in transformer-based SRS models and propose Sparse Transformer for sequential Recommendation tasks (STRec) to achieve the efficient computation and improved performance. Specifically, we replace self-attention with cross-attention, making the model concentrate on the most relevant item interactions. To determine these necessary interactions, we design a novel sampling strategy to detect relevant items based on temporal information. Extensive experimental results validate the effectiveness of STRec, which achieves the state-of-the-art accuracy while reducing 54% inference time and 70% memory cost. We also provide massive extended experiments to further investigate the property of our framework.
CITATION STYLE
Li, C., Wang, Y., Liu, Q., Zhao, X., Wang, W., Wang, Y., … Li, Q. (2023). STRec: Sparse Transformer for Sequential Recommendations. In Proceedings of the 17th ACM Conference on Recommender Systems, RecSys 2023 (pp. 101–111). Association for Computing Machinery, Inc. https://doi.org/10.1145/3604915.3608779
Mendeley helps you to discover research relevant for your work.