Plank, 2005: Chapter 9.2


Lecture notes


t o p i c s
Auditory localization cues: Interaural spectral differences & HRTFs
Cone of confusion - Localization cues from head movements
Judging sound source distance
The precedence effect




Auditory localization cues:
Intreraural spectral differences & HRTFs


As already noted, for sounds of intermediate frequencies (~ 500<f<~1500), IPD or IID cues alone are ambiguous/unreliable but can provide some useful localization information when combined together (IPD cues stop working above 770Hz while IID cues are non perceivable below 500Hz, allowing for some overlap). In the case of intermediate frequencies, as well as in the case of sound source elevation changes (which are not accompanied by IPDs or IIDs), the auditory system relies on interaural spectral differences

Interaural spectral differences are the result of the torso, head, and outer ear performing azimuth- and, most importantly, elevation-dependent spectral filtering on signals, filtering that affects the signal arriving in each ear differently. This filtering is commonly referred to as a Head Related Transfer Function (HRTF) or, more correctly (to address the contribution of parts of the body other than the head), Anatomic Transfer Function (ATF).  HRTF (or ATF) aids in resolving the ambiguity arising from ITD (IPD) and IID information alone and helps us localize sounds in free space by 'coloring' the spectral composition of the signal arriving in each ear differently based on sound-source location.  As is the case with most such cues, spectral difference cues are not foolproof.  For example, in the median plane, signals rich in high frequencies tend to be localized higher than signals rich in low frequencies.

Interaural spectral differences due specifically to structural differences between the pinnae of the two ears contribute to better lateralization (i.e. virtual location of the sound within the head), especially in the median plane (i.e. changes in elevation).  However, pinna-related spectral filtering alone, caused by pinna resonances, does not help us 'construct' a complete aural 'image' of the outside world.  Experiments in which ATFs obtained from dummy heads/torsos are imposed on the equalization and reproduction of signals indicate that interaural spectral differences contributed by the entire Pinnae/Head/Torso system contain information that better helps us perceive and reconstruct sound source localization (i.e. actual location identification outside the head). 

Head Related Transfer Function (HRTF) is defined as the ratio of the sound pressure spectrum measured at the eardrum to the sound pressure spectrum that would exist at the center of the head if the head were removed. The figure to the right displays HRTFs from a single ear as a function of sound-source azimuth angle. From this and other similar graphs it has been inferred that:
• The 8kHz region seems to correlate with overhead perception (i.e. spectral changes in this region correlate with changes in the perceived location of sound sources above our heads) 
• Regions in the frequency bands 300-600Hz & 3000-6000Hz seem to correlate with frontal perception
• Regions centered at around 1200Hz & 12000Hz seem to correlate with rear perception.

HRTFs are personalized, as they depend on highly variable, among individuals, pinna, head, and torso construction (see for example the figure below), so the spectral sound-localization cues of HRTFs are most likely learned through experience.
Consequently, individuals generally localize better with their own cues that with those of others. At the same time it has been shown that physiological differences may result in some listeners performing much better than others on auditory localization  tasks. 
Imposing, therefore, good HRTFs (through appropriately equalized headphone listening) on individuals who have difficulty localizing sound sources can improve their localization performance by providing more salient interaural spectral differences (assuming comparable head size between 'donor' and 'recipient').



Cone of confusion - Localization cues from head movements


Cone of confusion

Sound sources moving in the sagittal plane (i.e. changing in elevation) result in no IPDs or IIDs and therefore ambiguous localization cues. More generally, for a given IPD or IID, there will be a conical surface extending out of the ear that will produce identical IPDs and IIDs, making precise sound source localization over this surface difficult (see the figure below; from Plack, 2005: 184). 


Localization cues from head movements

We have already discussed how interaural spectral differences and the associated HRTFs can help resolve localization ambiguities, whether in the median sagittal plane or on the cone of confusion.

However, the most effective strategy in resolving sound-source localization ambiguities is head movement, assuming the sound signal lasts long enough, unchanged, to allow for such a movement to be of use.  Moving the head in the horizontal plane can help resolve front-to-back ambiguities (see the figure to the right; from Plack, 2005: 185), while head tilting can help resolve top-to-bottom ambiguities.

As is the case with most sound localization cues, head-movement salience depends largely upon long-term learning and experience.  Auditory localization  experiments using headphones and conflicting source and head movements exploit this reliance on previous experience, producing interesting and revealing illusions.  For examples, explore the three figures below and read carefully the explanations in the captions.



Judging sound source distance


Distance and loudness

Judging sound-source distance can be partially aided by loudness cues.  In general, soft sounds are more likely to be associated with sources far away and loud sounds are more likely to be associated with sources nearby. 

However, such cues are only reliable when comparing the loudness of a sound to a known reference that precedes it and/or when judging familiar sounds.  In all other cases, loudness cues are unreliable when it comes to distance judgments. 
Even in reliable contexts, distance changes are underestimated when judged based solely on loudness changes.  More specifically, although distance doubling in the free field (i.e. where there are no reflections) corresponds to an ~6dB SPL reduction, perception of distance doubling corresponds to an ~20dB SPL reduction. 

Distance and reverberation

In reflective environments, distance judgments are aided by reverberation cues.  In general, the greater the distance of the source the greater the proportion of the reverberant (relative to the direct) sound.  Direct-to-reverberant-sound cues provide only coarse source-distance information, as we are only able to detect distance changes larger than by a factor of two.  In addition, distance-change judgments based on this cue alone tend, again, to be underestimated.

Distance and spectral composition (timbre)

Increasing the distance from a sound source tends to reduce high frequency energy due to air absorption.  However, this cue is salient only for large changes in distance. 
In addition, for sound sources that are in front of us, this absorption-based high-frequency energy reduction is counter-balanced by a low-frequency energy reduction caused by low frequency energy dropping at a much higher rate with distance than high frequency energy.  This is due to the fact that low frequencies propagate spherically, obeying the inverse-square law (i.e. their intensity at a fixed point in space reduces by the square of the distance), while high frequencies follow a more directional path in front of the source (i.e. their intensity at a fixed point in space is reduced at a lower rate).

In conclusion, sound source distance judgments based solely on aural cues are unreliable and must rely on a combination of experience/familiarity with the given sound source, context, loudness, reverberation, timbral, and visual changes that may accompany changes in distance.



The precedence effect


Precedence / Haas effect: The Haas effect, named after Helmut Haas (1949) and his work on speech perception, is a special case of the precedence effect. The precedence effect describes a learned strategy employed implicitly by listeners in order to address conflicting or ambiguous localization cues occurring in environments where sound wave reflections play an important role (e.g. all rooms other than anechoic environments). 

According to this effect, listeners make their localization judgments based on the earliest arriving sound onset. The term "precedence" is used because the direct sound with presumably accurate localization information is given precedence over the subsequent reflections and reverberation, which usually convey inaccurate localization information.  In reflective/reverberant environments, both IPD and ILD cues are diffused to such an extent that they become unusable, making the precedence effect a necessary strategy.
The Haas effect occurs when signal arrival times differ by up to 30–40 milliseconds and persists even if the second (delayed) signal is up to 10dB stronger than the first. This strategy is developed through extensive exposure to reverberant listening contexts. After it has been developed, it is usually automatically applied to all listening contexts, leading to possible auditory localization "illusions."

The figure to the left illustrates a precedence effect demonstration with two loudspeakers reproducing the same pulsed wave. The pulse from the left speaker leads in the left ear by a few hundred microseconds, suggesting that the source is on the left. The pulse from the right speaker leads in the right ear by a similar amount, which provides a contradictory localization cue. Because the listener is closer to the left speaker, the left pulse arrives sooner and wins the competition—the listener perceives just a single pulse coming from the left

(From "How we Localize Sound" by W. M. Hartman)

The figure, below, illustrates the interaction between interaural time & intensity differences, observed in experiments exploring the precedence effect.  For small delays (<~1ms) between ears (i.e. small distance differences between sound sources and the two ears), localization is biased towards the side producing the louder sound.  For delays between 1-40ms, localization is biased towards the earlier sound.  For larger delays we hear two sounds, with the second perceived as an echo of the first, even if the 'echo' is actually louder than the original signal.



Columbia College, Chicago - Audio Arts & Acoustics