Fast content-based file type identification

27Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy.

Cite

CITATION STYLE

APA

Ahmed, I., Lhee, K. S., Shin, H. J., & Hong, M. P. (2011). Fast content-based file type identification. In IFIP Advances in Information and Communication Technology (Vol. 361, pp. 65–75). Springer Science and Business Media, LLC. https://doi.org/10.1007/978-3-642-24212-0_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free