Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning

Wenhao Yu; Wei Peng; Yu Shu; Qingkai Zeng; Meng Jiang

Conference ProceedingsOPEN ACCESS

Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning

The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 (2020) 951-961

DOI: 10.1145/3366423.3380174

8Citations

27Readers

Get full text

Abstract

Data Science has been one of the most popular fields in higher education and research activities. It takes tons of time to read the experimental section of thousands of papers and figure out the performance of the data science techniques. In this work, we build an experimental evidence extraction system to automate the integration of tables (in the paper PDFs) into a database of experimental results. First, it crops the tables and recognizes the templates. Second, it classifies the column names and row names into "method", "dataset", or "evaluation metric", and then unified all the table cells into (method, dataset, metric, score)-quadruples. We propose hybrid features including structural and semantic table features as well as an ensemble learning approach for column/row name classification and table unification. SQL statements can be used to answer questions such as whether a method is the state-of-the-art or whether the reported numbers are conflicting.

Author supplied keywords

Cite

CITATION STYLE

APA

Yu, W., Peng, W., Shu, Y., Zeng, Q., & Jiang, M. (2020). Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning. In The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 (pp. 951–961). Association for Computing Machinery, Inc. https://doi.org/10.1145/3366423.3380174

Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions