Hashing is a fundamental tool in digital forensic analysis used both to ensure data integrity and to efficiently identify known data objects. However, despite many years of practice, its basic use has advanced little. Our objective is to leverage advanced hashing techniques in order to improve the efficiency and scalability of digital forensic analysis. Specifically, we explore the use of Bloom filters as a means to efficiently aggregate and search hashing information. In this paper, we present md5bloom-an actual Bloom filter manipulation tool that can be incorporated into forensic practice, along with example uses and experimental results. We also provide a basic theoretical foundation, which quantifies the error rates associated with the various Bloom filter uses along with a simulation-based verification. We provide a probabilistic framework that allows the interpretation of direct, bitwise comparison of Bloom filters to infer similarity and abnormality. Using the similarity interpretation, it is possible to efficiently identify versions of a known object, whereas the notion of abnormality could aid in identifying tampered hash sets. © 2006 DFRWS.
Roussev, V., Chen, Y., Bourg, T., & Richard, G. G. (2006). md5bloom: Forensic filesystem hashing revisited. Digital Investigation, 3(SUPPL.), 82–90. https://doi.org/10.1016/j.diin.2006.06.012