In this paper the person identification system developed at Athens Information Technology is presented. It comprises of an audio-only (speech), a video-only (face) and an audiovisual fusion subsystem. Audio recognition is based on the Gaussian Mixture modeling of the principal components of the Mel-Frequency Cepstral Coefficients of speech. Video recognition is based on linear subspace projection methods and temporal fusion using weighted voting on the results. Audiovisual fusion is done by fusing the unimodal identities into the multimodal one, using a suitable confidence metric for the results of the unimodal classifiers. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Stergiou, A., Pnevmatikakis, A., & Polymenakos, L. (2007). A decision fusion system across time and classifiers for audio-visual person identification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4122 LNCS, pp. 223–232). Springer Verlag. https://doi.org/10.1007/978-3-540-69568-4_19
Mendeley helps you to discover research relevant for your work.