Abstract
Data Science has been one of the most popular fields in higher education and research activities. It takes tons of time to read the experimental section of thousands of papers and figure out the performance of the data science techniques. In this work, we build an experimental evidence extraction system to automate the integration of tables (in the paper PDFs) into a database of experimental results. First, it crops the tables and recognizes the templates. Second, it classifies the column names and row names into "method", "dataset", or "evaluation metric", and then unified all the table cells into (method, dataset, metric, score)-quadruples. We propose hybrid features including structural and semantic table features as well as an ensemble learning approach for column/row name classification and table unification. SQL statements can be used to answer questions such as whether a method is the state-of-the-art or whether the reported numbers are conflicting.
Author supplied keywords
Cite
CITATION STYLE
Yu, W., Peng, W., Shu, Y., Zeng, Q., & Jiang, M. (2020). Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning. In The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 (pp. 951–961). Association for Computing Machinery, Inc. https://doi.org/10.1145/3366423.3380174
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.