This paper describes an approach for automated segmentation of tongue in camera images for computer-aided speech diagnosis and therapy. Speech disorders are often related to non-normative position of articulators. One of common pathologies in Polish pronunciation is interdentality, when the tongue protrudes between the front teeth. Segmentation and possible parametrization of tongue in camera images could support speech diagnosis. Presented system is based on images captured by two cameras directed at speaker’s mouth at different angles on the left and right side. A convolutional neural network was designed and trained for semantic segmentation of tongue. Three datasets of input data were examined, two taken from each camera separately and one combined from both cameras. The mean Jaccard index reached 74.01% over the combined dataset with the corresponding accuracy at 96.09%.
CITATION STYLE
Sage, A., Miodońska, Z., Kręcichwost, M., Trzaskalik, J., Kwaśniok, E., & Badura, P. (2021). Deep learning approach to automated segmentation of tongue in camera images for computer-aided speech diagnosis. In Advances in Intelligent Systems and Computing (Vol. 1186, pp. 41–51). Springer. https://doi.org/10.1007/978-3-030-49666-1_4
Mendeley helps you to discover research relevant for your work.