Assessing auditory distance perce...
Assessing auditory distance perception using virtual acoustics Pavel Zahorika) Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin 53705 ~Received 27 March 2001 revised 14 January 2002 accepted 14 January 2002! In most naturally occurring situations, multiple acoustic properties of the sound reaching a listener���s ears change as sound source distance changes. Because many of these acoustic properties, or cues, can be confounded with variation in the acoustic properties of the source and the environment, the perceptual processes subserving distance localization likely combine and weight multiple cues in order to produce stable estimates of sound source distance. Here, this cue-weighting process is examined psychophysically, using a method of virtual acoustics that allows precise measurement and control of the acoustic cues thought to be salient for distance perception in a representative large-room environment. Though listeners��� judgments of sound source distance are found to consistently and exponentially underestimate true distance, the perceptual weight assigned to two primary distance cues ~intensity and direct-to-reverberant energy ratio! varies substantially as a function of both sound source type ~noise and speech! and angular position ~0�� and 90�� relative to the median plane!. These results suggest that the cue-weighting process is flexible, and able to adapt to individual distance cues that vary as a result of source properties and environmental conditions. �� 2002 Acoustical Society of America. @DOI: 10.1121/1.1458027# PACS numbers: 43.66.Qp, 43.66.Pn, 43.66.Yw @LRB# I. INTRODUCTION Compared to directional localization, relatively little is known about human perception of sound source distance. It is clear that for a complete description of perceived three- dimensional auditory space, distance is no less important than direction. Rigorous study of auditory distance percep- tion is often difficult, however, because of inherent com- plexities of the stimulus. Changes in sound source distance often result in changes to multiple acoustic properties of the sound reaching a listener���s ears, properties which themselves may be affected by factors other than source distance. In many cases, properties of the sound source itself and of the acoustic environment may be confounded with the acoustic changes resulting from distance changes. The perceptual pro- cesses subserving distance localization therefore likely ana- lyze and sensibly combine multiple sources of acoustic in- formation, or cues, in order to produce stable estimates of sound source distance. Perhaps a result of the potentially unreliable nature of distance cues, the accuracy with which sound source distance is judged has often times been found to be quite poor ~Bekesy, �� �� 1949 Bronkhorst and Houtgast, 1999 Coleman, 1968 Holt and Thurlow, 1969 Loomis et al., 1998 Mershon et al., 1989 Nielsen, 1993!. At least four possible acoustic distance cues have been proposed for conditions where both listener and sound source are stationary ~Mershon and King, 1975!. These may be briefly described as follows. Intensity. In general, sound intensity at the listener���s po- sition decreases when the distance between listener and fixed-power sound source is increased. The precise nature of this intensity change, however, depends on both environmen- tal characteristics and various properties of the sound source, including acoustic power and radiation patterns. Under ideal conditions ~a point source in an acoustic free field!, intensity loss as a function of distance obeys an inverse-square law, which implies a 6-dB intensity loss for each doubling of distance. Notable departures from this function occur when the acoustic environment contains surfaces that reflect sound. In these conditions, the rate of loss of intensity as a function of distance is decreased. It is also important to note that source distance and source power are completely confounded in measures of intensity at the ear. As a result, the auditory system is presumably forced to rely on assumptions about source power in order to use intensity reliably as a distance cue. This fact has led, in part, to the suggestion that the accuracy of distance judgments may be improved if the sound source is familiar to the listener ~Coleman, 1962 McGregor, Horn, and Todd, 1985!, because this familiarity may aid in assumptions relating to the acoustic power of the source. Direct-to-reverberant energy ratio. In environments with sound reflecting surfaces, the ratio of energy reaching a listener directly ~without contact with reflecting surfaces! to energy reaching the listener after reflecting surface contact ~reverberant energy! decreases systematically with increases in source distance. In rooms, change in direct-to-reverberant energy ratio results primarily from the effect of the inverse- square law on the direct portion of the sound field, because the energy in the later-arriving reflected portion of the sound field may be well approximated by a diffuse sound field, which is defined to have uniform energy over varying source positions. For a given room, reverberant energy as a function of time is determined principally by the size of the room and the acoustic properties of the reflecting surfaces of the room. Many outdoor environments also produce reverberation, and hence a direct-to-reverberant energy ratio cue that varies with distance ~Richards and Wiley, 1980!. Spectrum. At least two circumstances cause systematic a! Electronic mail: email@example.com 1832 J. Acoust. Soc. Am. 111 (4), April 2002 0001-4966/2002/111(4)/1832/15/$19.00 �� 2002 Acoustical Society of America
changes in the at-the-ear spectrum as a function of distance. For distances greater than approximately 15 m ~Blauert, 1983!, the sound-absorbing properties of air significantly modify the sound source spectrum. In general, these absorb- ing properties of air attenuate high frequencies the most, al- though the effect is relatively small: on the order of 3- to 4-dB loss per 100 meters at 4 kHz ~Ingard, 1953!. A second type of spectral change occurs in sound-reflective environ- ments where the spectrum that reaches the ear may be af- fected by the acoustic properties of the reflective surfaces. As distance increases, the proportion of reflected energy in- creases, thereby potentially changing the at-the-ear spectrum systematically. Like the intensity cue, spectral cue changes with distance are confounded with changes in the sound source spectrum. As a result, sound source familiarity may also enhance the utility of this cue. Binaural differences. When sound sources are in the acoustic near-field, binaural differences in both intensity and time are no longer independent of radial distance, as they are for far-field planar waves. These differences, often referred to as differences resulting from acoustic parallax, are maxi- mal along the interaural axis, and decrease to zero on the median plane. For example, Hartley and Fry ~1921! have shown that interaural intensity differences ~IID! for a sound source ~1860-Hz sinusoid! on the interaural axis can differ for distances between 87.5 and 17.5 cm by as much as 20 dB ~values derived using a spherical head model!. Interaural time differences ~ITD! changes with distance have been shown to be less salient than those for IID ~Brungart and Rabinowitz, 1999 Duda and Martens, 1998!. Among the most important unanswered questions re- garding the perception of sound source distance concerns the way in which information from these multiple acoustic cues is sensibly combined and processed. Because of the com- plexity of the stimulus, past experiments have typically cho- sen to manipulate only one of these acoustic cues at a time, while either removing or holding constant all other cues. Although this approach retains strict experimental control, the resulting stimulus may bear little resemblance to that encountered in natural situations, in which multiple distance cues are available to the listener. The recent advances in virtual acoustic technology ~M��ller et al., 1995 Wightman and Kistler, 1989! make it possible to avoid this trade-off, as precise stimulus control is available under stimulus condi- tions that are ostensibly identical to those occurring in natu- ral environments. As such, the use of virtual acoustic tech- nology is ideally suited for examining listeners��� use of multiple distance cues in a natural environment. This article describes a series of experiments that exam- ines listeners��� abilities to judge sound source distance in a room environment, and specifically the ways in which listen- ers combine and weight multiple distance cues. Virtual acoustic technology is extensively used, both as a method for precisely quantifying the acoustic stimulus reaching the lis- teners��� ears, and as a means of realistic and accurate stimulus presentation. The rest of the article is divided into three sec- tions. The first ~Sec. II! describes the measurement procedure that quantifies the at-the-ear stimulus for a variety of dis- tances in a representative room environment. These measure- ments are then evaluated in terms of potential acoustic cues to distance. Section III describes an experiment in which psychophysical distance functions are measured for virtual sound source stimuli constructed from the individualized measurements described in Sec. II. These stimuli allow all of the acoustic distance cues present in the real room environ- ment to vary as they would naturally. The last section ~Sec. IV! describes an experiment in which certain distance cues present in the virtual sound source stimulus are manipulated simultaneously. This experiment addresses the relative sa- lience of various acoustic distance cues in a room environ- ment. II. ACOUSTICAL ANALYSIS OF POTENTIAL SOURCE DISTANCE CUES One of the major limitations of past work on auditory distance perception has been the quantification and control of the stimulus reaching the listener���s ears. To overcome this limitation, the current work employs modern virtual acoustic techniques to measure impulse responses of an acoustic sys- tem that contains: a listener, a room environment thought ~a priori! to be rich in acoustic distance cues, and a sound source that can be varied in distance from the listener. The methods used to measure these impulse responses, which will be referred to as binaural room impulse responses ~BRIRs!, are fundamentally similar to those used for anechoic measures ~e.g., M��ller et al., 1995 Wightman and Kistler, 1989!, but require longer excitation periods to fully capture the response of the room environment. Because these impulse responses are essentially complete representations of the proximal stimulus reaching the listener���s two ears, all acoustic distance cues are represented within them and may therefore be precisely quantified. A. Methods 1. Environment A small auditorium ~264-person seating capacity! lo- cated within the Waisman Center at the University of Wisconsin���Madison served as the test environment. This room was chosen based on the following characteristics: ~a! its size ~b! its reverberation characteristics and ~c! its avail- ability for the purposes of this experiment. The auditorium had a total volume of approximately 830 m3 and total surface area of approximately 653 m2. Its shape was complex, with sloping floor ~26��! and ceiling ~11��!, as well as nonparallel sections of the side walls. The main floor ~not including stage area! was approximately rectangular: 14 m long by 12.2 m wide. The majority of floor surfaces were carpeted, excluding seating areas and the stage, which were covered in tile and wood, respectively. The ceiling was covered with acoustical tile and the walls were composed primarily of painted drywall material. The sound source consisted of a small, high-quality loudspeaker ~Realistic Minimus 3.5! with a 90-mm full- range driver. This loudspeaker was mounted on a tripod that could be easily moved up and down one aisle of the audito- rium. Twelve distances extending from a fixed origin point near the end of one aisle were examined: 0.30, 0.43, 0.61, 0.86, 1.22, 1.72, 2.44, 3.45, 4.88, 6.90, 9.75, and 13.79 m. 1833 J. Acoust. Soc. Am., Vol. 111, No. 4, April 2002 Pavel Zahorik: Distance perception
The loudspeaker was always oriented with its driver facing the origin. In order to accommodate the farthest distance, the loudspeaker had to be placed on the auditorium���s stage ~ap- proximately 1.5 m from the front edge!. At this position, the loudspeaker height was decreased in order to account for the 0.38-m height of the stage. For all other positions, no attempt was made to adjust the loudspeaker height to compensate for the floor slope. Figure 1 displays measured reverberation time (T60) as a function of frequency for this environment. These measure- ments were made with a reference microphone ~Bruel �� & Kjaer 1-in. 2 microphone, cartridge type 4133! using methods similar to those described by Schroeder ~1965!. 2. Participants Nine paid volunteers ~8 female and 1 male, ages 20���28! participated in the binaural impulse-response measurements. None of the participants had any prior exposure to the audi- torium where the measurements took place. These partici- pants also served as listeners in all subsequent experiments. 3. Measurement technique and apparatus Binaural room impulse responses ~BRIRs! were mea- sured using methods fundamentally similar to those de- scribed in detail by M��ller et al. ~1995!, although much longer excitation periods were used to capture the response of the room. Miniature electret microphones ~Sennheiser KE4-211-2! were inserted into each participant���s ear canals such that the microphone diaphragm was at the position of the canal entrance, and an acoustic seal was formed between the microphone and canal wall using Etymotic �� Research ER- 13R-2 ring seals ~a blocked-meatus configuration!. Two impulse-response measurements were made with the loud- speaker positioned ~ear height for the 0.30-m source, ap- proximately 1.25 m from the floor surface! at each of the 12 distances. For one measurement the seated participant faced the loudspeaker ~0�� azimuth!, for the other the listener was rotated 90�� such that the loudspeaker was opposite the right ear ~90�� azimuth!. All measurements were made with the loudspeaker driven with a fixed power level that produced approximately 80 dB SPL at 0.30 m with the participant removed. A technique using a maximum-length sequence ~MLS! excitation signal, a type of pseudorandom noise, was used to measure all impulse responses ~Rife and Vanderkooy, 1989!. Generally, this technique excites the system under evaluation periodically with an MLS signal, while recording the results and then averaging these results periodically. The system���s impulse response is computed by cross correlating the aver- aged response with the raw MLS. This technique offers a number of advantages over past impulse-response measure- ment techniques in terms of excitation signal generation, sta- bility of required computations, and signal-to-noise ratio ~Rife and Vanderkooy, 1989!. In situations where nonstation- ary noise disturbances exist, a weighted average of responses to the excitation signal periods may be used to further im- prove signal-to-noise ratio. Nielsen ~1998! has shown that weighting response periods by the reciprocal of the mean- squared level in each period can substantially improve signal-to-noise ratio in conditions of high-level nonstationary disturbances. This weighted averaging procedure was used for all impulse responses in this application. Specifically developed software was used to implement the MLS technique on Tucker���Davis Technologies ~TDT! hardware ~DD1 2-channel D/A and A/D, AP2 array processor card with optical link to the DD1! with two simultaneous recording channels and the ability to average recording peri- ods in real time. For all measurements, a 15th-order MLS ~32 767 points! was used. D/A and A/D conversion was ef- fected with 16-bit precision at a sampling frequency of 50.0 kHz. Forty-four periods of the MLS signal were presented, with the results of the final 40 periods averaged ~weighted! in real time. The responses to the first four periods were not recorded in order to insure steady-state system excitation for the final 40 periods. Microphone outputs were fed first to custom-built amplifiers, then into TDT MA2 microphone amplifiers to provide an additional 10 dB of gain prior to TDT DD1 A/D input. The output of the DD1 D/A was fed to a Crown D-75 amplifier with fixed gain driving the movable loudspeaker. The computed impulse responses were stored in floating-point format on a Pentium-class PC used to control the TDT hardware. It should be noted that a different technique for impulse- response measurement was initially utilized in the present study. This method, using a Golay code stimulus ~Foster, 1986 Zhou and Green, 1992! was found to have serious shortcomings when used to measure systems that are to a small degree time-variant, such as binaural recordings from humans. These disturbances, caused by such factors as small head movements, resulted in impulse-response artifacts when using the Golay technique ~Zahorik, 2000!. Given these problems, the Golay technique was abandoned in favor of the MLS technique. Headphone ~Beyerdynamic DT 990 Pro! impulse re- sponses when coupled to the ears of a given participant were also measured. These measurements were collected for the generation of virtual sound sources to be used in subsequent experiments. Since it is important that the microphone posi- tion in the ear canal is the same for both BRIR measurements and these ear-coupled headphone IR measurements, both were obtained during the same measurement session. Param- FIG. 1. Reverberation time, T60, for the test environment ~a medium-sized auditorium! as a function of frequency. Results have been averaged across the 12 distances measured with a reference microphone. 1834 J. Acoust. Soc. Am., Vol. 111, No. 4, April 2002 Pavel Zahorik: Distance perception