Good afternoon. Boy, I can't see anything out there. I assumeyou all can see me - thats why these lights are here. My name isChris Schmandt from the Media Lab at MIT. I'm co-chairing thispanel with Barry Arons, who is sitting over here. It's actuallyquite a pleasure to co-chair this panel with Barry. We've beenworking together off and on for more years than I care toremember. This panel has a long ridiculous name. Basically it's aboutaudio and window systems and workstations. I'm wearing two hatshere. I'm going to spend a minute or two introducing the panel and then I'm going to spend some time talking about my own segment of the panel. We're going to try to be a panel as opposed to a series of fivemini-papers that never get published. In other words, we're goingto try to keep our presentations relatively short, then segue intoa series of prepared questions that the panelists are going toanswer amongst themselves. Then we'll open the floor up forquestions. In some ways this is a very incestuous crew. We've all knowneach other for quite a while. We have different slants and we'reactually going to try to focus on those slants a little bit. So ifwe disagree with each other, that doesn't necessarily mean wereally hate each other. We're all friends. Where this panel is coming from is a surge of interest in audio, and multimedia, in general, in computer workstations. The Macintoshhas had audio for quite a while - you may or may not choose tocall that a workstation. The NeXT computer sort of surprised peopleby having fairly powerful DSP and audio in and out. You'll get ademo of that later if you haven't seen it. The Sun SPARCStation hascome out with some primitive digital record and playbackcapabilities. On the other hand, there's been interest in voice in computerworkstations for years and years, and what we've seen so far isthat voice really hasn't had very much success. There have been anumber of products that have come and gone. What has become popularhas been centralized service - specifically voice mail. Voice mailis tied in more to a PBX - and the interface is more like atelephone than it is a mouse and window system, in the computerworkstation interface. Obviously, window systems are here to stay. We're not suggestingthat audio is going to replace the graphical paradigm, but ratherhave to interact with it. On the other hand, everybody has a telephone. People hadtelephones on their desks before they had workstations, and we talkall the time at work. Voice really is a fundamental component of the way we talk, the way we interact with each other. What we're seeing in terms of the technologies showing up inthese workstations is higher bit rate coding. Gone are the days ofunintelligible low bit rate linear predictive coding or somethinglike that - except for specialized applications. Speech recognition is here, but it's in its infancy. Text-to-speech - it's around, it's difficult to understand. Youcan learn to understand it. Telephony is obviously part of this set-up if we're dealing withaudio. We don't know whether it's going to be analogue or digital. Is it going to be plain old telephone or is it going to beISDN? Those are some of the issues that we're going to be talkingabout in this session. As I say, we're going to try to keep each of the speakers to a relatively short period - and now I can put onmy other hat. (puts toy plastic headset on - laughter) Some people ask me whether speech recognition is a toy or not. Yes, it is. It's sort of a fun toy. Speech technologies are ingeneral fun. I was originally hoping to be able to play this out tothe audience. But I don't think it's going to work well enough. This is actually a kid's toy -$ at Toys R Us. SpeakerIndependent Isolated Word Speech Recognizer - "yes", "no", "true", and "false". It will take you on tours about dinosaurs and thingslike that. From my point of view, the key for what we can do with voice hasto do with understanding its advantages and disadvantages and thecomcomitant user interface requirements leading us to designreasonable applications for it. Voice has some advantages. It's very useful when your hands and eyes are busy; you're looking at a screen, you have your fingers onthe mouse. Sometimes it's intuitive; we learn to talk at a veryearly age. People talk to their computers even if the computersdon't have speech recognition. (laughter) Usually it's expletives- especially with UNIX. (laughter) Voice really dominateshuman-to-human communication. No matter what we're doing withE-Mail and FAX, the bottom line is we just still have to spend acertain amount of time physically speaking to each other. Telephones are everywhere. If I can turn an ordinary pay phoneinto a computer terminal, suddenly I have access from all over theplace. From my own work, this suggests a heavy focus ontelecommunications. The kinds of systems that I'm building arereally designed to use voice in a communications kind ofenvironment. On the other hand, there's many, many disadvantages ofvoice. It's very slow. 200 words per minute, 150-250 words perminute. That's less than a 300 baud modem and who uses those anymore. Speech is serial. You have to listen to things in sequence. It'sa time varying signal by definition. And it requires attention. Youhave to listen to what's going on, as opposed to simply scrollingit by and stopping it occasionally. My way of characterizing this is to say that speech is "bulky". Yes, it takes up space on the file system, but most importantly youcan't "grep" it, you can't do keyword searches on it. It's hard tofile, it's just hard to get any kind of handle on it. It takestime. Finally, speech broadcasts. If my workstation is talking to meand you're sitting in my office, you're going to hear what it says, which is very different from if it appears as text. In fact, if itappears as text, and I'm sitting in front of the screen with thesekinds of tiny bit map fonts that we tend to use, I'm probably noteven going to be able to read it - much less you. This has some user interface implications. One is that itsuggests that we would like, where possible, to have graphicalaccess to sounds. I'm going to show a video in just a second, showing you an interface to audio built under the X Window System, designed to give you some kind of a graphical context, so you canmouse around and perhaps use some visual cues to keep track ofwhere you are in the sound. If you could roll the first piece ofone-inch, please. This is a sound widget.
CITATION STYLE
Arons, B., Schmandt, C., Hawley, M., Ludwig, L., & Zellweger, P. (1989). Speech and audio in window systems: When Will they happen? In ACM SIGGRAPH 89 Panel Proceedings, SIGGRAPH 1989 (pp. 159–176). Association for Computing Machinery, Inc. https://doi.org/10.1145/77276.77285
Mendeley helps you to discover research relevant for your work.