Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning

8Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data Science has been one of the most popular fields in higher education and research activities. It takes tons of time to read the experimental section of thousands of papers and figure out the performance of the data science techniques. In this work, we build an experimental evidence extraction system to automate the integration of tables (in the paper PDFs) into a database of experimental results. First, it crops the tables and recognizes the templates. Second, it classifies the column names and row names into "method", "dataset", or "evaluation metric", and then unified all the table cells into (method, dataset, metric, score)-quadruples. We propose hybrid features including structural and semantic table features as well as an ensemble learning approach for column/row name classification and table unification. SQL statements can be used to answer questions such as whether a method is the state-of-the-art or whether the reported numbers are conflicting.

Cite

CITATION STYLE

APA

Yu, W., Peng, W., Shu, Y., Zeng, Q., & Jiang, M. (2020). Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning. In The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 (pp. 951–961). Association for Computing Machinery, Inc. https://doi.org/10.1145/3366423.3380174

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free