Abstract
Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy.
Author supplied keywords
Cite
CITATION STYLE
Ahmed, I., Lhee, K. S., Shin, H. J., & Hong, M. P. (2011). Fast content-based file type identification. In IFIP Advances in Information and Communication Technology (Vol. 361, pp. 65–75). Springer Science and Business Media, LLC. https://doi.org/10.1007/978-3-642-24212-0_5
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.