Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In this paper, we find that the spreadsheet formula, a commonly used language to perform computations on numerical values in spreadsheets, is valuable supervision for numerical reasoning in tables. Considering large amounts of spreadsheets available on the web, we propose FORTAP, the first exploration to leverage spreadsheet formulas for table pretraining. Two novel self-supervised pretraining objectives are derived from formulas, numerical reference prediction (NRP) and numerical calculation prediction (NCP). While our proposed objectives are generic for encoders, to better capture spreadsheet table layouts and structures, we build FORTAP upon TUTA, the first transformer-based method for spreadsheet&web table pretraining with tree attention. FORTAP outperforms state-of-the-art methods by large margins on three representative datasets of formula prediction, question answering, and cell type classification, showing the great potential of leveraging formulas for table pretraining. The code will be released at https://github.com/microsoft/TUTA_table_understanding.
CITATION STYLE
Cheng, Z., Dong, H., Jia, R., Wu, P., Han, S., Cheng, F., & Zhang, D. (2022). FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 1150–1166). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.82
Mendeley helps you to discover research relevant for your work.