The automatic Visual Language IDentification (VLID), i.e. a problem of using visual information to identify the language being spoken, using no audio information, is studied. The proposed method employs facial landmarks automatically detected in a video. A convex optimisation problem to find jointly both the discriminative representation (a soft-histogram over a set of lip shapes) and the classifier is formulated. A 10-fold cross-validation is performed on dataset consisting of 644 videos collected from youtube.com resulting in accuracy 73% in a pairwise discrimination between English and French (50% for a chance). A study, in which 10 videos were used, suggests that the proposed method performs better than average human in discriminating between the languages.
CITATION STYLE
Špetlík, R., Čech, J., Franc, V., & Matas, J. (2017). Visual language identification from facial landmarks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10270 LNCS, pp. 389–400). Springer Verlag. https://doi.org/10.1007/978-3-319-59129-2_33
Mendeley helps you to discover research relevant for your work.