Sequential recommendation aims to recommend the next item that a user will likely interact with by capturing the useful sequential patterns from users' historical behaviors. Recently, it has become an important and popular component in various e-commerce platforms. As a successful network, Transformer has been widely used to adaptively capture the dynamics of users' historical behaviors for sequential recommendation. In recommender systems, the size of embedding is usually set to be small. Under small embedding, the dot-product in Transformer may have the limitation on calculating the complex relevance between keys and queries. To address the common but neglected issue, in this paper, we present a new model, Deep Self-Attention for Sequential Recommendation (DSASrec), which proposes a chunking deep attention to compute attention weights. The chunking deep attention has two modules: a deep module and a chunking module. The deep module is used to improve the nonlinearity of the attention function. The chunking module is used to calculate attention weights several times like the multi-head attention in Transformer. Extensive experiments on three benchmark datasets show that our model can achieve state-of-the-art results. Our implementation is available in PyTorch.
CITATION STYLE
Zhang, B., Xiao, Z., & Zhong, S. (2021). Deep self-attention for sequential recommendation. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (Vol. 2021-July, pp. 321–326). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2021-035
Mendeley helps you to discover research relevant for your work.