Plank, 2005: Chapter 9.1

Lecture notes


t o p i c s
Auditory localization - Introduction
Auditory localization cues: Interaural time (phase) and interaural level differences
     Binaural cues and release from masking
     Sound source localization neural mechanisms




Auditory localization - Introduction


Auditory localization is a term used to describe judgments on the location, distance, movement, and size of a sound source, based solely on auditory cues.

Some terminology:
       Sound entering a single ear (usually through headphones - earplug in one ear and stimulus in the other)
       Sound entering both ears via the air (i.e. no headphones - most common type of listening in real-world environments)
           Diotic (special case of  Binaural)
               Exactly the same sound signal entering both ears (artificial type of binaural listening - e.g. listening to a mono
               recording through stereo headphones)
           Dichotic (special case of  Binaural)
               Completely different sound signals entering each ear (artificial type of binaural listening - through headphones).
       Ear closest to the sound source
       Ear furthest from the sound source

Auditory localization judgments are mainly described in terms of the position of a sound source relative to the listener's head (see the figure below; in Plack, 2005: 174), with the azimuth describing sound source location relative to the horizontal plane, and elevation describing sound source location relative to the median sagittal plane. 

Lateralization describes judgments on the apparent position of a sound source within a listener's head when listening through headphones.

The JND of auditory localization can be defined in terms of minimum audible angles (MAAs); that is, minimum angle of rotation in azimuth, elevation, or both that can be reliably perceived as a corresponding change in sound-source positioning relative to the head (see the figure below; in Plack, 2005: 175). MAAs depend on both frequency and starting angle, with the smallest possible MAA being ~ 10 (in the azimuth and with 00 starting angle).



Auditory localization cues
Intreraural time (phase) and interaural level differences


We localize sound based on phase (period-related time), intensity level, and spectral differences between the portions of the sound arriving at each ear.  Interaural differences in arrival time (phase) and intensity constitute the most important sound localization cues. The theory outlining their contribution to sound localization judgments is referred to as the duplex theory of sound localization and was introduced by Lord Rayleigh (1877-1907).

Interaural time/phase differences

For low frequency sounds (<500Hz) with wavelengths >~ 28 inches or >~0.68m (>~ 4/3 of the average head's circumference) the auditory system relies mainly on period-related interaural time differences (ITDs) which are equivalent to single-cycle interaural phase differences (IPDs).
Low frequency sounds arrive at the two ears with interpretable phase differences. However, by being efficiently diffracted, they don't result in interaural level differences that are large enough to be perceptible.

IPDs provide useful cues for complex signals whose spectrum includes components with frequencies <500Hz. 
For complex signals with no low-frequency content (or for amplitude-modulated high-frequency-content signals), IPDs remain useful as long as several of the frequency components are separated by (or as long as the modulation rate is) <500Hz.  In such cases the complex signal's envelope will display amplitude fluctuations at rates <500Hz and the IPDs between the signal envelopes arriving in each ear will provide interpretable localization cues.
(Figures to the right from Plack, 2005: 176. See also Figure 9.5, page 178)

The minimum detectable interaural time difference is 10microseconds or 0.000010s, corresponding to a shift in sound source location by 10 in azimuth relative to straight ahead (consistent with the minimum audible angle).

The maximum possible interaural time difference that can occur due to a sound source's move around the head (assuming an average head diameter of ~0.22m and speed of sound = 345m/s) is ~0.65ms or ~0.00065s.
For a sine signal to take advantage of IPD cues without resulting in ambiguity, its period  must be at least twice this value (i.e. 2 x 0.00065s = 0.0013s), corresponding to a wavelength of at least twice the head's diameter (i.e. 2 x 0.22m = 0.44m/cycle).
Therefore, the absolute highest frequency for which IPDs provide useful cues is 1/0.0013 = ~770Hz.
For higher frequencies, single-cycle IPDs are not useful cues because they cannot be interpreted reliably, as they depend not only on sound source location on the horizontal & medial planes but also on frequency and, most importantly distance (widely different sound source locations can result in the same IPDs or the same angular location in the localization coordinate system may result in different IPDs).
Strongest separation in lateralization (i.e. virtual location inside the head) occurs for IPDs = 1/4 of a cycle, corresponding to signals with period = 4 x 0.00065 = 0.0026s and frequency = 1/0.0026 = ~385Hz.

This example includes three 300Hz tones presented in stereo (you must use headphones). In the first, the left channel's phase leads by 1/4 cycle (900); in the second, the phase is the same across channels; and in the third, the right channel's phase leads by 1/4 cycle (note that there is no interaural onset difference).  This results in a lateralization of the signals that appears to move from the left to the middle and then to the right.
Listen to three stereo signals with the same phase relationships (as above) at 100Hz and 8000Hz. What types of lateralization do they result in? Do you still get this clear sense of motion from left, to middle, to right?

Interaural phase differences of 1/2 cycle (1800) result in a wider stereo image rather than in lateralization changes. In this example (you must listen through headphones), the first stereo tone has 0 phase difference between the two channels and the second tone has 1/2 cycle difference between channels (again, no interaural onset difference).

The phenomenon of dichotic beats (often -and confusingly- referred to as binaural beats) describes a beating-like sensation arising when two signals with slightly different frequencies are presented dichoticaly (one in each ear through headphones).  Contrary to the beating sensations accompanying signals with amplitude fluctuation rates <~15-20 fluct./sec., dichotic presentation does not permit physical interaction so dichotic beats cannot be the result of periodic alterations between constructive and destructive interference. Rather, they are the result of periodic changes in IPDs and a direct manifestation of our ability to detect IPD changes.
For very small frequency differences (<~4-5Hz) between ears, the resulting IPD modulations give rise to a "rotating" sensation inside the head that can be easily identified as such (example: 250Hz in one channel and 250.5Hz in the other). 
For larger frequency differences, (>~4-5Hz), the sensation does resemble loudness fluctuations (beating) that would result if the tones in each ear were allowed to interfere. However, these fluctuations are much less pronounced that they would be if actual interference had taken place (example: 250Hz in one ear and 257Hz in the other).
Dichotic and "regular" beating sensations are therefore manifestations of different physical, physiological, and perceptual phenomena.

Interaural level (intensity) differences

For high frequency sounds (>1500Hz) with wavelengths <~9 inches or <~0.24m (<~1/2 of the average head's circumference) the auditory system relies mainly on interaural level / intensity / amplitude differences (ILDs or IIDs) when making auditory localization judgments. High frequency sounds cannot diffract efficiently around a listener's head, which blocks acoustic energy and produces interpretable intensity level differences. The relationship among IIDs, frequency, and sound source location are displayed to the right.  IIDs are negligible for frequencies below 500Hz and increase gradually with increase in frequency.  For high frequencies, IIDs change systematically with azimuth changes and provide reliable localization cues on the horizontal plane (except for front-to-back confusion).

Artificial IIDs (i.e. imposed through headphones and not due to sound-source positioning relative to the head) can provide localization cues at all frequencies.  For instance, this example includes three 300Hz tones presented in stereo (must use headphones). In the first, the intensity level is higher at the left channel (by 7dB), in the second, the level is the same across channels, and in the third, the level is higher at the right channel (by 7dB).  This results in a lateralization of the signals that appears to move from the left to the middle and then to the right.
Audio engineers often use IIDs to simulate different sound source positions.  This practice introduces several problems, when listening through speakers (rather than through headphones). One such problem is related to a perceptual strategy developed to address auditory localization ambiguities when listening in reflective environments. The manifestation of this strategy (referred to as "precedence effect" or "Haas effect) will be discussed next week.

For sounds of intermediate frequencies (~ 500<f<~1500), IPD or IID cues alone are ambiguous/unreliable but can provide some useful localization information when combined together (remember that IPD cues stop working above 770Hz while IID cues are non perceivable below 500Hz). In the case of intermediate frequencies as well as in the case of sound source elevation changes (which are not accompanied by IPDs or IIDs), the auditory system relies mainly on interaural spectral differences, to be discussed later. 




Binaural cues and release from masking

Interaural phase and intensity differences (IPDs and IIDs) provide binaural cues that help us perceive tones in contexts where such tones would have otherwise been masked.  The figure to the right (from Plack, 2005: 180) illustrates this point, while the following three listening examples correspond to the three scenarios described in the figure (you must listen through headphones).
A) Example 1: a 300Hz sine signal with no IPDs or IIDs is presented along with a 600Hz-wide noise band, centered at 300Hz. Due to the level difference between signal and noise (the signal is 28dB below the noise) the sine tone is masked.
B) Example 2: same as in (A) but with a 1800 IPD for the sine signal. In spite of the level difference between noise and signal, the sine tone is now perceivable.
C) Example 3: same as in (A) but with the left channel of the sine signal removed (extreme case of IID for the sine signal). Again, in spite of the level difference between noise and signal (which has now been increased even further), the sine tone is now perceivable.
The interesting and counter-intuitive observation in (C) is that a masked tone becomes audible by reducing its overall level.
At low frequencies, IPDs can reduce a signal's detection threshold by up to ~15dB, while, as expected (why?), they have no effect at high frequencies.
For your reference, here are the 4 signals used to construct the above examples (must use headphones): Noise  -  300Hz   300Hz 1800 IPD  -  300Hz no left channel.

The described release from masking is not due to our ability to localize the sine signal in one ear thanks to the imposed IPDs or IIDs.  Rather it is due to signal de-correlation between ears, facilitated by the change in interaural phase or intensity, and to the associated attention focusing (i.e. cognitive strategy) .

Differences between binaural masking (just described) and monaural masking are related to issues of signal detectability in complex environments. Binaural cues may also override masking of complex signals by other sounds (e.g. noise), if some additional cues are also present (e.g. such as pitch, timbre, etc. differences between the signals arriving in each ear).  In the absence of additional such cues, IPD- and IID-related attention focusing is not sufficient to override cognitive strategies that link together signals with similar timbre, pitch level and contour, etc.



Sound source localization neural mechanisms

To explain ITD (IPD) detection, Jeffress (1948, 1977) hypothesized the presence of a coincidence detector, at the neural level, that uses delay lines to compare arrival times at each ear.  His theory is illustrated in the figure below (from Plack, 2005: 182).  Assuming the presence of neuron arrays, which are tuned to different delays and encode signal arrival time differences between ears, the Jeffress model is equivalent to cross-correlation, comparing the inputs from each ear at different time delays. 

Although Jeffress's model has been partially confirmed by physiological evidence from birds (e.g. barn owl), evidence from mammals (e.g. gerbil) suggest broad sensitivity to just two ranges of interaural time differences at each characteristic frequency (i.e. per neuron - see the figure to the right; from Plack, 2005: 183), corresponding to the period of that frequency.  This observation fairs well with the already discussed observations that a) ITDs do not provide sound source localization at high frequencies (the Jeffress model cannot explain this) and b) it is a very special type of ITDs that is of importance: IPDs.
(Up to 1/2-cycle IPDs can be well represented by the gerbil data, to the right.) 

IID detection can be explained in terms of the associated difference between excitatory and inhibitory activity in the two ears when stimulated by signals that display IIDs.



Columbia College, Chicago - Audio Arts & Acoustics