Pandas DataFrames for a FAST binned analysis at CMS

  • Krikler B
  • Davignon O
  • Kreczko L
  • et al.
N/ACitations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Binned data frames are a generalisation of multi-dimensional histograms, represented in a tabular format with one category per row containing the labels, bin contents, uncertainties and so on. Pandas is an industry-standard tool, which provides a data frame implementation complete with routines for data frame manipultion, persistency, visualisation, and easy access to “big data” scientific libraries and machine learning tools. FAST (the Faster Analysis Software Taskforce) has developed a generic approach for typical binned HEP analyses, driving the summary of ROOT Trees to multiple binned DataFrames with a yaml-based analysis description. Using Continuous Integration to run subsets of the analysis, we can monitor and test changes to the analysis itself, and deploy documentation automatically. This report describes this approach using examples from a public CMS tutorial and details the benefit over traditional methods.

Cite

CITATION STYLE

APA

Krikler, B. E., Davignon, O., Kreczko, L., Linacre, J., Olaiya, E. O., & Sakuma, T. (2019). Pandas DataFrames for a FAST binned analysis at CMS. EPJ Web of Conferences, 214, 06035. https://doi.org/10.1051/epjconf/201921406035

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free