Embodied Conversational Agent-Bas...
Journal of Management Information Systems / Summer 2011, Vol. 28, No. 1, pp. 17���48. �� 2011 M.E. Sharpe, Inc. 0742���1222 / 2011 $9.50 + 0.00. DOI 10.2753/MIS0742-1222280102 Embodied Conversational Agent���Based Kiosk for Automated Interviewing Jay F. NuNaMakEr Jr., DOuglaS C. DErrICk, aarON C. ElkINS, JuDEE k. BurgOON, aND Mark W. PattON Jay F. NuNamaker Jr. is regents and Soldwedel Professor of MIS, Computer Science, and Communication and Director of the Center for the Management of Information at the university of arizona, tucson. He received his Ph.D. in operations research and systems engineering from Case Institute of technology, an M.S. and B.S. in engineering from the university of Pittsburgh, and a B.S. from Carnegie Mellon university. He received his professional engineer���s license in 1965. In a 2005 journal article in Communications of the AIS, he was recognized as the fourth- to the sixth- most-productive researcher for the period 1991���2003. Dr. Nunamaker received the lEO award from the association for Information Systems (aIS) at the International Conference on Information Systems in Barcelona, Spain, December 2002, and was elected as a fellow of the aIS in 2000. Douglas C. DerriCk is an assistant professor of It innovation at the university of Nebraska at Omaha. He received his Ph.D. in management information systems from the university of arizona, an M.S. in computer science from texas a&M university, an MBa from San Jose State university, and a B.S. from the u.S. air Force academy. His research interests include human���computer interaction, human���agent interaction, agent-based computing, and persuasive technology. aaroN C. elkiNs is a Ph.D. candidate at the university of arizona. His research interests include credibility assessment, physiological and behavioral measurement, human���computer interaction, cognitive dissonance, and the role of identity and culture in IS and organizations. JuDee k. BurgooN is a professor of communications, a professor of family studies and human development, the director of human communication research for the Center for the Management of Information, and the site director of the Center for Identification technology research at the university of arizona. Professor Burgoon holds a Ph.D. in communication and educational psychology from West Virginia university. Her research interests are in deception, trust, interpersonal interaction, and new technologies. mark W. PattoN is the associate director of the Hoffman E-Commerce laboratory at the university of arizona. He holds a Ph.D. from the university of arizona in management information systems. His research interests include decision support systems for automated deception identification, human���computer interaction with embodied conversational agents, organizational modeling and simulation, and agent- based systems.
18 NuNaMakEr, DErrICk, ElkINS, BurgOON, aND PattON aBstraCt: We have created an automated kiosk that uses embodied intelligent agents to interview individuals and detect changes in arousal, behavior, and cognitive ef- fort by using psychophysiological information systems. In this paper, we describe the system and propose a unique class of intelligent agents, which are described as Special Purpose Embodied Conversational Intelligence with Environmental Sensors (SPECIES). SPECIES agents use heterogeneous sensors to detect human physiology and behavior during interactions, and they affect their environment by influencing hu- man behavior using various embodied states (i.e., gender and demeanor), messages, and recommendations. Based on the SPECIES paradigm, we present three studies that evaluate different portions of the model, and these studies are used as founda- tional research for the development of the automated kiosk. the first study evaluates human���computer interaction and how SPECIES agents can change perceptions of information systems by varying appearance and demeanor. Instantiations that had the agents embodied as males were perceived as more powerful, while female embodied agents were perceived as more likable. Similarly, smiling agents were perceived as more likable than neutral demeanor agents. the second study demonstrated that a single sensor measuring vocal pitch provides SPECIES with environmental awareness of human stress and deception. the final study ties the first two studies together and demonstrates an avatar-based kiosk that asks questions and measures the responses using vocalic measurements. key WorDs aND Phrases: avatars, deception detection, embodied conversational agents, NeuroIS. the motivatioN For this PaPer Comes From three simPle axioms: people deceive, hu- mans are poor lie detectors, and there are many circumstances when credibility must be rapidly and accurately determined. In this paper, we describe a new information technology (It) artifact called the Embodied Conversational agent (ECa)���Based kiosk for automated Interviewing. this system was developed using a design science approach and was initially funded by the Department of Homeland Security to evalu- ate how embodied agents can be used to automate port-of-entry screening processes. the kiosk is built on a new class of intelligent agents, which we call Special Purpose Embodied Conversational Intelligence with Environmental Sensors (SPECIES) agents. the ability for automated agents to adapt and learn from new information makes them ideal for dealing with complex and diverse phenomena. However, dealing with human behavior in a real-world environment is stochastic, continuous, dynamic, and difficult to succinctly represent to a computer. these agents were created and put into the kiosk for the express purpose of conducting an automated interview and determining verac- ity during the interaction. Intelligent agents are often used to aid humans in making complex decisions and rely on artificial intelligence to evaluate context, situation, and input from multiple sensors in order to provide a distinctive recommendation. these agent-based systems make knowledge-based recommendations and exhibit human characteristics such as rationality, intelligence, autonomy, and environmental percep- tion . In this case, the environmental perception is based on human behavior and physiological responses.
EMBODIED CONVErSatIONal agENt-BaSED kIOSk FOr autOMatED INtErVIEWINg 19 the kiosks with the embedded SPECIES agents have the potential to help improve the effectiveness of screening environments for three primary reasons. First, they can be replicated and deployed to alleviate the traffic load placed on human agents and can be built to speak a variety of languages. Second, they do not get fatigued, have cognitive limitations, or have biases that interfere with the quality of screening at checkpoints. third, automated embodied agents can detect cues of deception and malicious intent that would normally be very difficult for a trained human to detect. Figure 1 shows the automated kiosk that was created based on this line of research. the kiosk contains a high-definition video camera, a near-infrared camera, a mi- crophone, two computer monitors (the lower one is an integrated touch screen), a proximity card reader, a fingerprint reader, and a magnetic strip reader. Each sensor was selected based on prior research to detect deception-based cues . In the next section, we describe the underlying SPECIES model that comprises the heart of this particular instantiation. Next, we share the results of three studies that examine differ- ent components of the model, and one that demonstrates this kiosk implementation. In the first study, we demonstrate how manipulating the embodied agent���s gender and demeanor affects users��� perceptions of the system���s power, likability, expertise, and trustworthiness. In the second study, we share results that demonstrate how the vocal sensors can be used to detect emotion, arousal, and cognitive effort. In the final study, we use the ���most powerful��� avatar agent to conduct an interview of people passing Figure 1. kiosk for automated Interviewing
20 NuNaMakEr, DErrICk, ElkINS, BurgOON, aND PattON through a checkpoint with a simulated bomb and use the vocal sensor to find relevant correlates of those that are ���guilty��� (carrying the bomb) and those that are ���innocent��� (carrying only clothes). Embodied adaptive Intelligent agent Model the sPeCies system moDel eNCaPsulateD iN the kiosk encompasses five broad re- search components: user interfaces, intelligent agents, sensors, data management, and organizational impacts. this paradigm closely relates to well-known intelligent agent architectures [37, 38], with some key distinctions. like most intelligent agent systems, the paradigm for embodied-avatar interactions with humans involves an agent that perceives its environment through sensors, influences its environment via effectors, and has discrete goals. In computer science terms, an effector is a device used or ac- tion taken by an artificially intelligent agent in order to produce a desired change in an object or environment in response to input. However, the SPECIES operating envi- ronment consists primarily of human actors in the real world, which makes it difficult to access, difficult to represent, and difficult to influence. this operating paradigm is unique because the SPECIES agents are sensing human behaviors and human states such as arousal, cognitive effort, and emotion rather than discrete, easily measured and computed phenomena. Similarly, the SPECIES agents must utilize novel effec- tors to affect both humans and the environment. these effectors may include human influence tactics, impression management techniques, communication messages, agent appearance, agent demeanor, and potentially many other interpersonal communica- tion and persuasion strategies. the SPECIES agent���s most significant effectors may be the recommendations that it makes to the user. Figure 2 illustrates the conceptual components in the embodied adaptive intelligent agent paradigm. Components of the SPECIES agent System Humans and Psychophysiological Signals the sPeCies ageNt ParaDigm is uNique because human interaction is the main control component. a current stream of research based on both behavioral observation and technological assessments presumes that emotions, arousal, and cognitive effort cre- ate physiological, psychological, and behavioral responses that are distinguishable by machines . Similarly, the use of physiological sensors and the study of NeuroIS are beginning to gain momentum . Humans manifest a state of arousal through several physiological responses including pupil dilation, change in heart rate and blood pres- sure, increase in body temperature, especially around the face and eyes, and changes in blink patterns. In our current instantiation, the sensors capture both physiological and behavioral cues from the human counterparts. Physiological cues that may be diagnostic of emotional state, arousal, and cognitive effort include heart rate , blood pressure , respiration , pupil dilation , facial temperature , and blink patterns . Behavioral indicators include kinesics, proxemics, chronemics, vocalics, linguistics, eye movements, and message content .
EMBODIED CONVErSatIONal agENt-BaSED kIOSk FOr autOMatED INtErVIEWINg 21 Intelligent Embodied Conversational agent user Interface In our context, embodied agents refer to virtual, three-dimensional human likenesses that are displayed on computer screens. While they are often used interchangeably, it is important to note that the terms avatar and embodied agent are not synonymous. table 1 shows the distinctions between avatars, embodied agents, and embodied conversational agents. If an embodied agent is intended to interact with people through natural speech, it is often referred to as an embodied conversational agent, or ECa . ECas are becoming more effective at engaging human subjects, as though the ECas were intel- ligent individuals. Humans engage with virtual agents, and respond to their gestures and statements. When the embodied agents react to human subjects appropriately and make appropriate responses, participants report finding the interaction satisfy- ing. at the same time, when the agents fail to recognize what humans are saying, and respond with requests for clarification or inappropriate responses, humans can find the interaction very frustrating . It has been proposed that ECas could be used as an interface between users and computers . the SPECIES models can include full physical representations, or just a part of the body such as a head and face. the face, especially the lower face, is critical for conveying emotions visually . If the face is animated poorly, animation artifacts can create negative responses in people observing them . Facial expressions can be based on Ekman���s facial action units (aus) to simplify control and representation  or can be based on the MPEg-4 face definition parameter (FDP) . Figure 2. Components of the SPECIES agent System Notes: a = embodied agent signals and messages to the human B = human behavior and psychophysiological signals C = agent effectors that change embodied appearance and messages D = data storage and segmentation E = system recommendations to the operator F = privacy, ethical, and policy considerations.