Mining sequential patterns in a sequence database (SDB) is an important and useful data mining task. Most existing algorithms for performing this task directly mine the set FS of all frequent sequences in an SDB. However, these algorithms often exhibit poor performance on large SDBs due to the enormous search space and cardinality of FS. In addition, constraint-based mining algorithms relying on this approach must read an SDB again when a constraint is changed by the user. To address this issue, this paper proposes a novel approach for generating FS from the two sets of frequent closed sequences (FCS) and frequent generator sequences (FGS), which are concise representations of FS. The proposed approach is based on a novel explicit relationship between FS and these two sets. This relationship is the theoretical basis for a novel efficient algorithm named GFS-CR that directly enumerates FS from FCS and FGS rather than mining them from an SDB. Experimental results show that GFS-CR outperforms state-of-the-art algorithms in terms of runtime and scalability.
CITATION STYLE
Duong, H., Truong, T., Le, B., & Fournier-Viger, P. (2019). An explicit relationship between sequential patterns and their concise representations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11932 LNCS, pp. 341–361). Springer. https://doi.org/10.1007/978-3-030-37188-3_20
Mendeley helps you to discover research relevant for your work.