Abstract
This paper presents a new interactive learning method for spoken word acquisition through human-machine audio-visual interfaces. During the course of learning, the machine makes a decision about whether an orally input word is a word in the lexicon the machine has learned, using both speech and visual cues. Learning is carried out on-line, incrementally, based on a combination of active and unsupervised learning principles. If the machine judges with a high degree of confidence that its decision is correct, it learns the statistical models of the word and a corresponding image category as its meaning in an unsupervised way. Otherwise, it asks the user a question in an active way. The function used to estimate the degree of confidence is also learned adaptively on-line. Experimental results show that the combination of active and unsupervised learning principles enables the machine and the user to adapt to each other, which makes the learning process more efficient. Copyright © 2008 The Institute of Electronics, Information and Communication Engineers.
Author supplied keywords
Cite
CITATION STYLE
Iwahashi, N. (2008). Interactive learning of spokenwords and their meanings through an audio-visual interface. IEICE Transactions on Information and Systems, E91-D(2), 312–321. https://doi.org/10.1093/ietisy/e91-d.2.312
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.