Learning to detect table clones in spreadsheets

Yakun Zhang; Wensheng Dou; Jiaxin Zhu; Liang Xu; Zhiyong Zhou; Jun Wei; Dan Ye; Bo Yang

Conference ProceedingsOPEN ACCESS

Learning to detect table clones in spreadsheets

ISSTA 2020 - Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (2020) 528-540

DOI: 10.1145/3395363.3397384

7Citations

11Readers

Get full text

Abstract

In order to speed up spreadsheet development productivity, end users can create a spreadsheet table by copying and modifying an existing one. These two tables share the similar computational semantics, and form a table clone. End users may modify the tables in a table clone, e.g., adding new rows and deleting columns, thus introducing structure changes into the table clone. Our empirical study on real-world spreadsheets shows that about 58.5% of table clones involve structure changes. However, existing table clone detection approaches in spreadsheets can only detect table clones with the same structures. Therefore, many table clones with structure changes cannot be detected. We observe that, although the tables in a table clone may be modified, they usually share the similar structures and formats, e.g., headers, formulas and background colors. Based on this observation, we propose LTC (Learning to detect Table Clones), to automatically detect table clones with or without structure changes. LTC utilizes the structure and format information from labeled table clones and non table clones to train a binary classifier. LTC first identifies tables in spreadsheets, and then uses the trained binary classifier to judge whether every two tables can form a table clone. Our experiments on real-world spreadsheets from the EUSES and Enron corpora show that, LTC can achieve a precision of 97.8% and recall of 92.1% in table clone detection, significantly outperforming the state-of-the-art technique (a precision of 37.5% and recall of 11.1%).

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, Y., Dou, W., Zhu, J., Xu, L., Zhou, Z., Wei, J., … Yang, B. (2020). Learning to detect table clones in spreadsheets. In ISSTA 2020 - Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (pp. 528–540). Association for Computing Machinery, Inc. https://doi.org/10.1145/3395363.3397384

Learning to detect table clones in spreadsheets

Abstract

Author supplied keywords

Cite

Register to see more suggestions