A multimodal system for acquiring new objects, updating already known ones, and searching for them is presented. The system is able to learn objects and associate them to speech received from a speech recogniser in a natural and convenient fashion. The learning and retrieval process takes into account information gained from multiple attributes calculated from an image recorded by a standard video camera, from deictic gestures, and from information of a dialog based conversation. Histogram intersection and subgraph matching on segmented color regions are used as attributes. © Springer-Verlag Berlin Heidelberg 2002.
CITATION STYLE
Lömker, F., & Sagerer, G. (2002). A multimodal system for object learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2449 LNCS, pp. 490–497). Springer Verlag. https://doi.org/10.1007/3-540-45783-6_59
Mendeley helps you to discover research relevant for your work.