A comparison of active shape model and scale decomposition based features for visual speech recognition

Iain Matthews; Andrew A. Bangham; Richard Harvey; Stephen Cox

Conference ProceedingsOPEN ACCESS

A comparison of active shape model and scale decomposition based features for visual speech recognition

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (1998) 1407 514-528

DOI: 10.1007/BFb0054762

17Citations

11Readers

Abstract

Two quite different strategies for characterising mouth shapes for visual speech recognition (lipreading) are compared. The first strategy extracts the parameters required to fit an active shape model (ASM) to the outline of the lips. The second uses a feature derived from a onedimensional multiscale spatial analysis (MSA) of the mouth region using a new processor derived from mathematical morphology and median filtering. With multispeaker trials, using image data only, the accuracy is 45% using MSA and 19% using ASM on a letters database. A digits database is simpler with accuracies of 77% and 77% respectively. These scores are significant since separate work has demonstrated that even quite low recognition accuracies in the vision channel can be combined with the audio system to give improved composite performance.

Cite

CITATION STYLE

APA

Matthews, I., Bangham, A. A., Harvey, R., & Cox, S. (1998). A comparison of active shape model and scale decomposition based features for visual speech recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1407, pp. 514–528). Springer Verlag. https://doi.org/10.1007/BFb0054762

A comparison of active shape model and scale decomposition based features for visual speech recognition

Abstract

Cite

Register to see more suggestions