Abstract
Two quite different strategies for characterising mouth shapes for visual speech recognition (lipreading) are compared. The first strategy extracts the parameters required to fit an active shape model (ASM) to the outline of the lips. The second uses a feature derived from a onedimensional multiscale spatial analysis (MSA) of the mouth region using a new processor derived from mathematical morphology and median filtering. With multispeaker trials, using image data only, the accuracy is 45% using MSA and 19% using ASM on a letters database. A digits database is simpler with accuracies of 77% and 77% respectively. These scores are significant since separate work has demonstrated that even quite low recognition accuracies in the vision channel can be combined with the audio system to give improved composite performance.
Cite
CITATION STYLE
Matthews, I., Bangham, A. A., Harvey, R., & Cox, S. (1998). A comparison of active shape model and scale decomposition based features for visual speech recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1407, pp. 514–528). Springer Verlag. https://doi.org/10.1007/BFb0054762
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.