In this paper, we present a parallel architecture for fast and robust face detection implemented on FPGA hardware. We propose the first implementation that meets both real-time requirements in an embedded context and face detection robustness within complex backgrounds. The chosen face detection method is the Convolutional Face Finder (CFF) algorithm, which consists of a pipeline of convolution and subsampling operations, followed by a multilayer perceptron. We present the design methodology of our face detection processor element (PE). This methodology was followed in order to optimize our implementation in terms of memory usage and parallelization efficiency. We then built a parallel architecture composed of a PE ring and an FIFO memory, resulting in a scalable system capable of processing images of different sizes. A ring of 25 PEs running at 80 MHz is able to process 127 QVGA images per second and performing real-time face detection on VGA images (35 images per second).
Farrugia, N., Mamalet, F., Roux, S., Yang, F., & Paindavoine, M. (2009). Fast and robust face detection on a parallel optimized architecture implemented on FPGA. IEEE Transactions on Circuits and Systems for Video Technology, 19(4), 597–602. https://doi.org/10.1109/TCSVT.2009.2014013