Spatial Audio Positioning Techniques (hand off the cross-fader)

I prefer to use my nice speakers, but when listening to music with headphones, little puts me more on edge than sound going to only one channel. In theory it's just an interesting stereo effect, but it still makes me grind my teeth every time. The problem is that while it sounds fine on speakers which are separated from the listener by enough space that sound reaches both ears naturally, headphones move those sources to inside each of my ears. The only time that happens in real life is when there is something in my ear, which I typically do not want to be there (think mosquitos and strong winds)!

Our ears and the listening machinery in our brains are incredibly complex. If all you’re going for spatially-positioned sound, there are many ways to go about it. That’s because we rely on several cues to locate real sounds in real space (found in Music, Cognition, and Computerized Sound: An Introduction to Psychoacoustics):

  1. The intensity of the sound in each ear
  2. The frequency spectra of the sound in each ear
  3. The time of arrival of the sound to each ear

As I said before, the first one is the most annoying to listen to, for me at least.

The second is true because our bodies reflect and absorb different frequency ranges. For example, our torso reflects high-frequency sounds and our heads shield absorb them. The ear that is further away from a sound is experiencing it through something of a low-pass filter (pg. 98).

Finally, there is the precedence effect: sound reaches the closest ear first. Sound events separated by up to around 60 milliseconds will be perceived as the same sound (not echoes) (pg. 95), but the difference can be as little as 10 microseconds and still have an effect (pg. 98).

In my headphone experiments, 1 millisecond was about as much delay as I could stand, but it was more than enough to accurately locate sounds, and the effect worked just as well with desktop speakers placed in front of me.

Try the experiment for yourself. The following ChucK script loads three WAV files (1.wav, 2.wav, 3.wav) and plays them at random intervals with random pan values (from full “left” to full “right”). If you leave bad_pan at the top as true, it will position using crossfading; set it to false and it will adjust the sounds’ delays in the left and right channels instead.

If you don’t want to bother with ChucK scripts, you can listen to this demonstration instead: pan-example.mp3.

// surround.ck
true => int bad_pan;
// true: clips positioned by adjusting gain per channel
// false: clips positioned by adding per-channel delays

fun void player(string filename) {
  SndBuf snd;
  snd => Envelope e1 => Delay d1 => dac.chan(0);
  snd => Envelope e2 => Delay d2 => dac.chan(1);
  filename => snd.read;
  0 => snd.rate;

  0::samp => d1.delay => d2.delay;
  1 => e1.value => e2.value;

  while( Std.rand2f(0.5, 5)::second => now) {
    0 => snd.pos;
    1 => snd.rate;

    Std.rand2f(0, 1) => float pan;
    if(bad_pan) {

      // tweak channel envelopes
      pan => e1.value;
      (1-pan) => e2.value;

    } else {

      // tweak channel delays
      (pan * 1)::ms => d1.delay;
      ((1-pan) * 1)::ms => d2.delay;

      // also tweak gain to taste
      //(1-pan)*0.2 + 0.8 => e1.value;
      //pan*0.2 + 0.8 => e2.value;

  }
}

spork ~ player("1.wav");
spork ~ player("2.wav");
spork ~ player("3.wav");

1::day => now;

There are definite payoffs to going the more psychoacoustically-informed route applying one or more of the techniques.

First, (entirely subjective!) it sounds better. Straight cross-fades are very unnatural and disconcerting for me when listening with headphones.

Second, it's louder. Completely cutting a channel seems wasteful when you consider that you could simply add a tiny delay to achieve the same effect, assuming you weren't going for the crossfade effect.


Comments

Click here to view the comments on this post.