ProIO is a new event-oriented streaming data format which utilizes Google's Protocol Buffers (protobuf) to be flexible and highly language-neutral. The ProIO concept is described here along with its software implementations. The performance of the ProIO concept for a dataset with Monte-Carlo event records used in high-energy physics was benchmarked and compared/contrasted with ROOT I/O. Various combinations of general-purpose compression and variable-length integer encoding available in protobuf were used to investigate the relationship between I/O performance and size-on-disk in a few key scenarios. Program summary: Program Title: ProIO Program Files doi: http://dx.doi.org/10.17632/mfxsg2d5x5.1 Licensing provisions: BSD 3-clause Programming language: Python, Go, C++, Java Nature of problem: In high-energy and nuclear physics (HEP and NP), Google's Protocol Buffers (protobufs) can be a useful tool for the persistence of data. However, protobufs are not well-suited for describing large, rich datasets. Additionally, features such as direct event access, lazy event decoding, general-purpose compression, and self-description are features that are important to HEP and NP, but that are missing from protobuf. Solution method: The solution adopted here is to describe and implement a streaming format for wrapping protobufs in an event structure. This solution requires small (typically less than 1000 lines of code) implementations of the format in the desired programming languages. With this approach, most of the I/O heavy lifting is done by the protobufs, and ProIO adds the necessary physics-oriented features.
Blyth, D., Alcaraz, J., Binet, S., & Chekanov, S. V. (2019). ProIO: An event-based I/O stream format for protobuf messages. Computer Physics Communications, 241, 98–112. https://doi.org/10.1016/j.cpc.2019.03.018