Interactive learning of spokenwords and their meanings through an audio-visual interface

14Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

This paper presents a new interactive learning method for spoken word acquisition through human-machine audio-visual interfaces. During the course of learning, the machine makes a decision about whether an orally input word is a word in the lexicon the machine has learned, using both speech and visual cues. Learning is carried out on-line, incrementally, based on a combination of active and unsupervised learning principles. If the machine judges with a high degree of confidence that its decision is correct, it learns the statistical models of the word and a corresponding image category as its meaning in an unsupervised way. Otherwise, it asks the user a question in an active way. The function used to estimate the degree of confidence is also learned adaptively on-line. Experimental results show that the combination of active and unsupervised learning principles enables the machine and the user to adapt to each other, which makes the learning process more efficient. Copyright © 2008 The Institute of Electronics, Information and Communication Engineers.

Cite

CITATION STYLE

APA

Iwahashi, N. (2008). Interactive learning of spokenwords and their meanings through an audio-visual interface. IEICE Transactions on Information and Systems, E91-D(2), 312–321. https://doi.org/10.1093/ietisy/e91-d.2.312

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free