Directed acyclic graph (DAG) is an essentially important model to represent terminologies and their hierarchical relationships, such as Disease Ontology. Due to massive terminologies and complex structures in a large DAG, it is challenging to summarize the whole hierarchical DAG. In this paper, we study a new problem of finding k representative vertices to summarize a hierarchical DAG. To depict diverse summarization and important vertices, we design a summary score function for capturing vertices' diversity coverage and structure correlation. The studied problem is theoretically proven to be NP-hard. To efficiently tackle it, we propose a greedy algorithm with an approximation guarantee, which iteratively adds vertices with the large summary contributions into answers. To further improve answer quality, we propose a subtree extraction based method, which is proven to guarantee achieving higher-quality answers. In addition, we develop a scalable algorithm k-PCGS based on candidate pruning and DAG compression for large-scale hierarchical DAGs. Extensive experiments on large real-world datasets demonstrate both the effectiveness and efficiency of proposed algorithms.
CITATION STYLE
Zhu, X., Huang, X., Choi, B., & Xu, J. (2020). Top-k Graph Summarization on Hierarchical DAGs. In International Conference on Information and Knowledge Management, Proceedings (pp. 1903–1912). Association for Computing Machinery. https://doi.org/10.1145/3340531.3411899
Mendeley helps you to discover research relevant for your work.