In this paper, we propose a visual speech recognition method using symbol or real value assignment. Our method is inspired by Bag of Word (BoW) [1] model which is usually applied to an object matching problem. In the BoW model, a codebook is produced by using K-means clustering, and a feature vector extracted from an image is converted to corresponding symbol. Similarly, we generate codebook by running K-means algorithm on a pool of pHog (Pyramid Histogram of Oriented Gradients) feature vectors extracted from a subset of lip database. Then, the remaining lip images are assigned a particular value after comparing the chi-square distance to each cluster. Based on the type of this value, two methods are suggested so as to assign the value to a lip image frame. The first method is to find the cluster whose element image has the minimum chi square distance to the processing frame, and assign the cluster label to the frame. Second one is to calculate the distances between the frame and all cluster's centroids, obtain multi-dimensional vector for the frame which directly becomes an assigned value for the frame. Following these methods, each time sequence is converted into symbolized or multi-dimensional real valued sequence. To measure the similarity between two time sequences, we use Dynamic Time Warping for real valued time sequence and Edit distance for symbolized sequences. © 2013 Springer-Verlag.
CITATION STYLE
Ju, J., Jung, H., & Kim, J. (2013). Speaker dependent visual speech recognition by symbol and real value assignment. In Advances in Intelligent Systems and Computing (Vol. 208 AISC, pp. 1015–1022). Springer Verlag. https://doi.org/10.1007/978-3-642-37374-9_98
Mendeley helps you to discover research relevant for your work.