BioPandas: Working with molecular structures in pandas DataFrames

Sebastian Raschka

Journal ArticleOPEN ACCESS

BioPandas: Working with molecular structures in pandas DataFrames

Raschka S

The Journal of Open Source Software (2017) 2(14) 279

DOI: 10.21105/joss.00279

N/ACitations

51Readers

Abstract

BioPandas is a Python library that reads molecular structures from 3D-coordinate files, such as PDB (H. M. Berman 2000) (H. Berman, Henrick, and Nakamura 2003) and MOL2, into pandas DataFrames (McKinney and Others 2010) for convenient data analysis and data mining related tasks. In addition to parsing protein and small molecule data into a data frame format, BioPan-das provides additional utility functions for structure analysis. These functions include common computations such as computing the root-mean-squared-deviation between structures and converting protein structures into primary amino acid sequence formats. Furthermore, useful small-molecule related functions are provided for reading and parsing millions of small molecule structures (from multi-MOL2 files (Tripos 2007)) fast and efficiently in virtual screening applications. Inbuilt functions for filtering molecules by the presence of functional groups and their pair-wise distances to each other make BioPandas a particularly attractive utility library for virtual screening and protein-ligand docking applications.

Cite

CITATION STYLE

APA

Raschka, S. (2017). BioPandas: Working with molecular structures in pandas DataFrames. The Journal of Open Source Software, 2(14), 279. https://doi.org/10.21105/joss.00279

BioPandas: Working with molecular structures in pandas DataFrames

Abstract

Cite

Register to see more suggestions