Noisy graph data and pattern variations are two thorny problem-s faced by mining frequent subgraphs. Traditional exact-matching based methods, however, only generate patterns that have enough perfect matches in the graph database. As a result, a pattern may either remain undetected or be reported as multiple (almost identical) patterns if it manifests slightly different instances in different graphs. In this paper, we investigate the problem of approximate frequent pattern mining, with a focus on finding non-redundant representative frequent patterns that summarize the frequent patterns allowing approximate matches in a graph database. To achieve this goal, we propose the REAFUM framework which (1) first extracts a list of diverse representative graphs from the database, which may contain most approximate frequent patterns exhibited in the entire graph database; (2) then uses distinct patterns in the representative graphs as seed patterns to retrieve approximate matches in the entire graph database; (3) finally employs a consensus refinement model to derive representative approximate frequent patterns. Through a comprehensive evaluation of REAFUM on both synthetic and real datasets, we show that REAFUM is effective and efficient to find representative approximate frequent patterns and REAFUM is able to find patterns that much better resemble the ground truth in the presence of noise and errors, and are less redundant than that from any exact-matching based methods.
CITATION STYLE
Li, R., & Wang, W. (2015). REAFUM: Representative approximate frequent subgraph mining. In SIAM International Conference on Data Mining 2015, SDM 2015 (pp. 757–765). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974010.85
Mendeley helps you to discover research relevant for your work.