Web Page Template and Data Separation for Better Maintainability

Chenxu Zhao; Rui Zhang; Jianzhong Qi

Conference Proceedings

Web Page Template and Data Separation for Better Maintainability

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11233 LNCS 439-449

DOI: 10.1007/978-3-030-02922-7_30

2Citations

2Readers

Get full text

Abstract

Separating a web page into template code and data records populated into the template is an important problem. This problem has a wide range of applications in web page compression and information extraction. We study this problem with the aim to separate a web page into easily maintainable template code and data records. We show that this problem is NP-hard. We then propose a heuristic algorithm to solve the problem. The main idea of our algorithm is to parse a web page into a tree and then to process it recursively in a bottom-up manner with three steps: splitting, folding, and alignment. We perform experiments on real datasets to evaluate the performance of our proposed algorithms in maximizing the maintainability of the template code produced. The experimental results show that our proposed algorithms outperform the baseline algorithms by 25% in the maintainability measure.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhao, C., Zhang, R., & Qi, J. (2018). Web Page Template and Data Separation for Better Maintainability. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11233 LNCS, pp. 439–449). Springer Verlag. https://doi.org/10.1007/978-3-030-02922-7_30

Web Page Template and Data Separation for Better Maintainability

Abstract

Author supplied keywords

Cite

Register to see more suggestions