pandera: Statistical Data Validation of Pandas Dataframes

  • Bantilan N
N/ACitations
Citations of this article
19Readers
Mendeley users who have this article in their library.

Abstract

pandas is an essential tool in the data scientist's toolkit for modern data engineering, analysis, and modeling in the Python ecosystem. However, dataframes can often be difficult to reason about in terms of their data types and statistical properties as data is reshaped from its raw form to one that's ready for analysis. Here, I introduce pandera, an open source package that provides a flexible and expressive data validation API designed to make it easy for data wranglers to define dataframe schemas. These schemas execute logical and statistical assertions at runtime so that analysts can spend less time worrying about the correctness of their dataframes and more time obtaining insights and training models.

Cite

CITATION STYLE

APA

Bantilan, N. (2020). pandera: Statistical Data Validation of Pandas Dataframes. In Proceedings of the 19th Python in Science Conference (pp. 116–124). SciPy. https://doi.org/10.25080/majora-342d178e-010

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free