An oral exam for measuring a dialog system's capabilities

David Cohen; Ian Lane

Conference ProceedingsOPEN ACCESS

An oral exam for measuring a dialog system's capabilities

30th AAAI Conference on Artificial Intelligence, AAAI 2016 (2016) 835-841

DOI: 10.1609/aaai.v30i1.10060

3Citations

42Readers

Abstract

This paper suggests a model and methodology for measuring the breadth and flexibility of a dialog system's capabilities. The approach relies on having human evaluators administer a targeted oral exam to a system and provide their subjective views of that system's performance on each test problem. We present results from one instantiation of this test being performed on two publicly-Accessible dialog systems and a human, and show that the suggested metrics do provide useful insights into the relative strengths and weaknesses of these systems. Results suggest that this approach can be performed with reasonable reliability and with reasonable amounts of effort. We hope that authors will augment their reporting with this approach to improve clarity and make more direct progress toward broadlycapable dialog systems.

Cite

CITATION STYLE

APA

Cohen, D., & Lane, I. (2016). An oral exam for measuring a dialog system’s capabilities. In 30th AAAI Conference on Artificial Intelligence, AAAI 2016 (pp. 835–841). AAAI press. https://doi.org/10.1609/aaai.v30i1.10060

An oral exam for measuring a dialog system's capabilities

Abstract

Cite

Register to see more suggestions