B&O Tech: A day in the life

#17 in a series of articles about the technology behind Bang & Olufsen loudspeakers

 

This week, instead of talking about what is inside the loudspeakers, let’s talk about what I listen for when sound is coming out of them. Specifically, let’s talk about one spatial aspect of the mix – where instruments and voices are located in two-dimensional space. (This will be a short posting this week, because it includes homework…)

Step 1: Go out and buy a copy of Jennifer Warnes’s album called “Famous Blue Raincoat: The Songs of Leonard Cohen” and play track 2 – “Bird on a Wire”.

Step 2: Close your eyes and really concentrate on where the various voices and instruments are located in space relative to your loudspeakers. If you hear what I hear, you’ll hear something like what I’ve tried to represent on the map shown in the figure below.

A map of the locations of many of the instruments in Jennifer Warnes's recording of "Bird on a Wire".
A map of the locations of many of the instruments in Jennifer Warnes’s recording of “Bird on a Wire”.

I’ve used some colour coding, just to help keep things straight:

  • Voices are in Red
  • Drums are in blue
  • Metallic instruments (including cymbals) are in green
  • Bass is gray
  • Synth and Saxophone are in purple

Note that Jennifer sings her own backup vocals, so the “voice”, and the two “bk” (for backup – not Burger King) positions are all her. It also sounds like she’s singing in the “choir” on the left – but it’s hard for me to hear exactly where she is.

Whenever I’m listening to a pair of loudspeakers (or a car audio system, or the behaviour of an upmix algorithm) to determine the spatial properties, I use this map (which I normally keep in my head – not on paper…) to determine how things are behaving. The two big questions I’m trying to answer when considering a map like this revolve around the loudspeakers’ ability to deliver the (1) accuracy and (2) the precision I’m looking for. (Although many marketing claims will use these words interchangeably, they do not mean the same thing.)

The question of accuracy is one of whether the instruments are located in the correct places, both in terms of left and right, but also in terms of distance. For example, the tune starts with a hit on the centre tom-tom, followed immediately by the bigger tom-tom on the left of the mix. If I have to point at that second, deeper-pitched tom-tom – which direction am I pointing in? Is it far enough left-of-centre, but not hard over in the left loudspeaker? (This will be determined by how well the loudspeakers’ signals are matched at the listening position, as well as the location of the listening position.) Secondly, how far away does it sound, relative to other sound sources in the mix? (This will be influenced primarily by the mix itself.) Finally,  how far away does it sound from the listening position in the room? (This will be influenced not only by the mix, but by the directivity of the loudspeakers and the strength of sidewall reflections in the listening room. I talked about that in another blog posting once-upon-a-time.)

The question of precision can be thought of as a question of the size of the image. Is it a pin-point in space (both left/right and in distance)? Or is a cloud – a fuzzy location with indistinct edges? Typically, this characteristic is determined by the mix (for example, whether the panning was done using amplitude or delay differences between the two audio channels), but also by the loudspeaker matching across the frequency range and their directivity. For example, one of the experiments that we did here at B&O some years ago showed that a difference as small as 3 degrees in the phase response matching of a pair of loudspeakers could cause a centrally-located phantom image to lose precision and start to become fuzzy.

Some things I’ve left out of this map:

  • The locations of the individual voices in the “choir”
  • Extra cowbells at around 2:20
  • L/R panned cabasa (or shaker?) at about 2:59
  • Reveberation

Some additional notes:

  • The triangles on the right side happen around 2:12 in the tune. The ones on the left come in much earlier in the track.
  • The “synth-y fx around 2:20” might be a guitar with a weird modulation on it. I don’t want to get into an argument about exactly what instrument this is.
  • I’ve only identified the location of the bass in the choir. There are other singers, of course…

You might note that I used the term “two-dimensional space” in the beginning of this posting. In my head, the two dimensions are (1) angle to the source and (2) distance to the source. I don’t think in X-Y cartesian terms, but Polar terms.

An important thing to mention before I wrap up is that this aspect of a loudspeaker’s performance (accuracy and precision of phantom imaging) is only one quality of many. Of course, if you’re not sitting in the sweet spot, none of this can be heard, so it doesn’t matter. Also, if your loudspeakers are not positioned “correctly”  (±30 degrees of centre and equidistant from the listening position) then none of this can be heard, so it doesn’t matter. And so on and so on. The point I’m trying to make here is that phantom image representation is only one of the many things to listen for, not only in a recording but also when evaluating loudspeakers.

 

First Impressions

I’m sitting and listening to Mary Chapin Carpenter’s new album “Songs From the Movie” for the first time on Spotify on a pair of headphones.

I listen to a lot of recordings – usually to find problems in loudspeakers, so it’s not very often that an album makes the hair on the back of my neck stand up.

This one does.

I haven’t yet been able to find out who the recording engineer(s) was(were) for this album, but to whomever it was – Thanks!

 

achieving distance and depth in stereo recordings – one man’s opinion

I had an interesting email from an old recording-engineer friend of mine this week regarding a debate he had with a student concerning the issue of “depth” in recordings (in his specific case, 2-channel stereo recordings done with an ORTF mic configuration). This got me thinking about to a bunch of thoughts I had once-upon-a-time about distance perception, and a  newer bunch of thoughts about loudspeaker directivity. Now, those two bunches of thoughts are congealing into a single idea regarding how to achieve (and experience) a reasonable perceived sensation of distance and depth in 2-channel stereo.

To start, some definitions:

  1. When I say “stereo” I mean “2-channel sound recording”
  2. “Distance” to a source in a stereo recording is the perceived distance between the listener and the (probably phantom) image.
  3. “Depth” in a stereo recording is the difference in the perceived distances from the listener to the closest and farthest (probably phantom) images (i.e. the distance to the concert master vs. the distance to the xylophone in a symphony orchestra)
Step 1: Distance perception in real life

Go to an anechoic chamber with a loudspeaker and a friend. Sit there and close your eyes and get your friend to place the loudspeaker some distance from you. Keep your eyes closed, play some sounds out of the loudspeaker and try to estimate how far away it is. You will be wrong (unless you’re VERY lucky). Why? It’s because, in real life with real sources in real spaces, distance information (in other words, the information that tells you how far away a sound source is) comes mainly from the relationship between the direct sound and the early reflections. If you get the direct sound only, then you get no distance information. Add the early reflections and you can very easily tell how far away it is. This has been proven in lots of “official” listening tests. (For example, go check out this report as a basic starting point).

Anecdote #1: Back in the old days when I was working on my Ph.D. we had an 8-loudspeaker system in the lab – one speaker every 45° in a circle around the listening position. We were trying to build a multichannel room simulator where we were building a sound field, piece by piece – the direct sound and (up to 3rd-order) early reflections had the “correct” panning, delay and gain, and we added a diffuse field to tail in behind it. One of the interesting things that I found with that system was that the simulated distance to the source was easily to achieve with just the 1st-order reflections, but that the precision of that perceived distance was increased as we added 2nd- and 3rd-order reflections. (We didn’t have enough computing power to simulate higher-order reflections at the time. It would be interesting to go back and try again to see what would happen with higher-order stuff now that my Mac has gotten a little faster…) Another interesting thing (although, in retrospect, it shouldn’t surprise anyone) was that the location and the distance to the simulated sound source were also easy to determine without the direct sound being part of the sound field at all. Just the 1st- to 3rd-order reflections by themselves were enough to tell you where things were.

 

Step 2: Distance perception in a recording
It’s been well-known for many years that the apparent distance to a sound source in a stereo recording is controllable by the so-called “dry-wet” ratio – in other words, the relative levels of the direct sound and the reverb. I first learned this in the booklet that came with my first piece of recording gear – an Alesis Microverb. To be honest – this is a bit of an over-simplification, but done in good faith for people who are at the knowledge level one would typically have if one were an Alesis Microverb customer. The people at another reverb unit manufacturer know that the truth requires a little more details. For example, their flagship reverb unit uses correctly-positioned and correctly-delayed early reflections (calculated using ray tracing, apparently) to deliver a believable room size and sound source location in that room.
If you’re thinking in terms of a stereo microphone pair, then consider it this way: you want your microphone configuration to be reasonably good at acting like a decent panning algorithm. At the very least, you should ensure that you don’t have conflicting information between the interchannel time and the interchannel amplitude differences for your direct sound and the early reflections. For example, if you have a pair of near-coincident cardioids, but they’re “toed-in” instead of “toed-out”, you have a problem (i.e. the left mic is pointing to the right and the right mic is pointing to the left. This means that the the earlier channel will not be the louder channel for sound sources and reflections that are not on-axis to the pair) This would make for conflicting and therefore confusing information for your brain.

Anecdote #2: I did a recording for Atma once-upon-a-time in a large church in Montreal with a very long reverb time. During the sessions, I sat in the church (no control room), about 20 m from the mic pair. So, when I and the organist discussed what take to do next, we were talking live in the same room – no talkback speakers. During the editing for this disc, I happened to be shuttling around, looking for the beginning of a take – so I’d drop the cursor somewhere on the screen and hit “play” quickly to see where I was. One of the takes ended with the organist asking “did we get it?” and I responded  “yup” quickly and loudly. It just so happened that, when I was shuttling around, looking for the right take, I hit “play” at the beginning of the “yup” and then quickly hit “stop”. The interesting thing is that it sounded, for that split second, like I was right next to the microphones – not 20 m away like I knew I was. So, I hit “play” again, and this time didn’t hit stop. This time, I sounded far away. What’s going on? Well, because the church was so big, it was possible to hit the stop button before any of the first reflections came in (save maybe the one off the floor), so it was possible (with a fast enough thumb on the transport buttons of the editing machine) to make the recording of my voice anechoic. The result was that I sounded 0 m away instead of 20 m.

The moral of the stories thus far? In order to deliver a perception of precise distance and depth (even if it’s not accurate…) you need early reflections in the recording, and they have to be panned and delayed appropriately.

Step 3: The delivery

Think back to Step 1. We agreed (or at least I said…) that early reflections tell your brain how far away the sound source is. Now think to a loudspeaker in a listening room.

Case #1: If you have an anechoic room, there are no early reflections, and, regardless of how far away the loudspeakers are, a sound source in the recording without early reflections (i.e. a close-mic’ed vocal) will sound much closer to you than the loudspeakers.

Case #2: If you have a listening room with early reflections, but the loudspeakers are directional such that there is no energy being delivered to the side walls (for example, a dipole with the angles carefully chosen to point the null of the loudspeaker at the point of specular reflection from the side wall), then the result is the same as in Case 1. This time there are no early reflections because of loudspeaker directivity instead of wall absorption, but the effect at the listening position is the same.

Case #3: If you have a listening room with early reflections, and the loudspeakers are omni-directional, then the early reflections from the side walls tell you how far away the loudspeakers are. Therefore, the close-mic’ed vocal track from Case #1 cannot sound any closer than the loudspeakers – your brain is too smart to be told otherwise.

 

The punchline

So, if you want to achieve precision in the distance and depth of your stereo recordings (whether you’re on the recording end or the playback end) you’re going to need to make sure that you have a reasonable mix of the following:

  1. Early reflections in the recording itself have to be there, and coming in at the right times with the right gains with the right panning
  2. Not much energy in the early reflections in your listening room – either by putting some absorption on the walls in the right places, or by having reasonably directional loudspeakers (or both).