Listening Tips

So, you want to evaluate a pair of loudspeakers, or a new turntable, or you’re trying to decide whether to subscribe to a music streaming service, or its more expensive “hi-fi” version – or just to stick with your CD collection… How should you listen to such a thing and form an opinion about its quality or performance – or your preference?

One good way to do this is to compartmentalise what you hear into different categories or attributes. This is similar to breaking down taste and sensation into different categories – sweetness, bitterness, temperature, etc… This allows you to focus or concentrate on one thing at a time so that you’re not overwhelmed by everything all at once… Of course, the problem with this is that you become analytical, and you stop listening to the music, which is why you’re there in the first place…

Normally, when I listen to a recording over a playback system, I break things down into 5 basic categories:

  • timbre (or spectral balance)
  • spatial aspects
  • temporal behaviour
  • dynamics
  • noise and distortion

Each of these can be further broken down into sub-categories – and there’s (of course) some interaction and overlap – however it’s a good start…

Timbre

One of the first things people notice when they listen to something like a recording played over a system (for example, a pair of headphones or some loudspeakers in a room) is the overall timbre – the balance between the different frequency bands.

Looking at this from the widest point of view, we first consider the frequency range of what we’re hearing. How low and how high in frequency does the signal extend?

On the next scale, we listen for the relative balance between the bass (low frequencies), the midrange, and the treble (high frequencies). Assuming that all three bands are present in the original signal, do all three get “equal representation” in the playback system? And, possibly more importantly: should they? For example, if you are evaluating a television, one of the most important things to consider is speech intelligibility, which means that the midrange frequency bands are probably a little more important than the extreme low- and high-frequency bands. If you are evaluating a subwoofer, then its behaviour at very high frequencies is irrelevant…

Zooming into more details, we can ask whether there are any individual notes sticking out. This often happens in smaller rooms (or cars), often resulting in a feeling of “uneven bass” – some notes in the bass region are louder than others. If there are narrow peaks in the upper midrange, then you can get the impression of “harshness” or “sharpness” in the system. (although words like “harsh” and “sharp” might be symptoms of distortion, which has an effect on timbre…)

Spatial

The next things to focus on are the spatial aspects of the recording in the playback system. First we’ll listen for imaging – the placement of instruments and voices across the sound stage, thinking left – to – right. This imaging has two parameters to listen for: accuracy (are things where they should be?) and precision (are they easy to point to?). Note that, depending on the recording technique used by the recording engineer, it’s possible that images are neither accurate nor precise, so you can’t expect your loudspeakers to make things more accurate or more precise.

Secondly, we listen for distance and therefore depth. Distance is the perceived distance from you to the instrument. Is the voice near or far? Depth is the distance between the closest instrument and the farthest instrument (e.g. the lead vocal and the synth pad and reverberation in the background – or the principal violin at the front and the xylophone at the back).

Next we list for the sense of space in the recording – the spaciousness and envelopment. The room around the instruments can range from non-existent (e.g. Suzanne Vega singing “Tom’s Diner”) to huge (a trombone choir in a water reservoir) and anything in between.

It is not uncommon for a recording engineer to separate instruments in different rooms and/or to use different reverb algorithms on them. In this case, it will sound like each instrument or voice has its own amount of spaciousness that is different from the others.

Also note that, just because an instrument has reverb won’t necessarily make it enveloping or spacious. Listen to “Chain of Fools” by Aretha Franklin on a pair of headphones. You’ll hear the snare drum in your right ear – but the reverb from the same snare is in the centre of your head. (It was a reverb unit with a single channel output, and the mixing console could be used to only place images in one of three locations, Left, Centre, or Right.)

Temporal

The temporal aspects of the sound are those that are associated with time. Does the attack of the harpsichord or the pluck of a guitar string sound like it starts instantaneously, or does it sound “rounded” or as if the plectrum or pick is soft and padded?

The attack is not the only aspect of the temporal behaviour of a system or recording. The release – the stop of a sound – is just as (or maybe even more) important. Listen to a short, dry kick drum (say, the kick in “I Bid You Goodnight” by Aaron Neville that starts at around 0:20). Does it just “thump” or does it “sing” afterwards at a single note – more like a “boommmmmm”… sound?

It’s important to say here that, if the release of a sound is not fast, it might be a result of resonance in your listening room – better known as “room modes”. These will cause a couple of frequencies to ring longer than others, which can, in turn, make things sound “boomy” or “muddy”. (In fact, when someone tells me that things sound boomy or muddy, my first suspect is temporal problems – not timbral ones. Note as well that those room modes might have been in the original recording space… And there’s not much you’re going to be able o do about that without a parametric equaliser and a lot of experience…

Dynamics

Dynamics are partly related to temporal behaviour (a recording played on loudspeakers can’t sound “punchy” if the attack and release aren’t quick enough) but also a question of capability. Does the recording have quiet and loud moments (not only in the long term, but also in the very short term)? And, can the playback system accurately produce those differences? (A small loudspeaker simply cannot play low frequencies loudly – so if you’re listening to a track at a relatively high volume, and then the kick drum comes in, the change in level at the output will be less than the change in level on the recording.)

Noise and Distortion

So far, the 4 attributes I’ve talked about above are descriptions of how the stuff -you-want “translates” through a system. Noise and Distortion is the heading on the stuff-you-don’t-want – extra sounds that don’t belong (I’m not talking about the result of a distortion pedal on an AC/DC track – without that, it would be a Tuck and Patti track…)

However, Noise and Distortion are very different things. Noise is what is known as “program independent” – meaning that it does not vary as a result of the audio signal itself. Tape hiss on a cassette is a good example of this… It might be that the audio signal (say, the song) is loud enough to “cover up” or “mask” the noise – but that’s your perception changing – not the noise itself.

Distortion is different – it’s garbage that results from the audio signal being screwed up – so it’s “program dependent”. If there was no signal, there would be nothing to distort. Note, however, that distortion takes many forms. One example is clipping – the loud signals are “chopped off”, resulting in extra high frequencies on the attacks of notes. Another example is quantisation error on old digital recordings, where the lower the level, the more distortion you get (this makes reverberation tails sound “scratchy” or “granular”).

A completely different, and possibly more annoying, kind go distortion is that which is created by “lossy” psychoacoustic codec’s such as MP3. However, if you’re not trained to hear those types of artefacts, they may be difficult to notice, particularly with some kinds of audio signals. In addition, saying something as broad as “MP3” means very little. You would need to know the bitrate, and a bunch of parameters in the MP3 encoder, in addition to knowing something about the signal that’s being encoded, to be able to have any kind of reasonable prediction about its impact on the audibility of the “distortion” that it creates.

Wrapping up…

It’s important here to emphasise that, although the loudspeakers, their placement, the listening room, and the listening position all have a significant impact on how things sound – the details of the attributes – the recording is (hopefully) the main determining factor… If you’re listening to a recording of solo violin, then you will not notice if your subwoofer is missing… Loudspeakers should not make recordings sound spacious if the recordings are originally monophonic. This would be like a television applying colours to a black and white movie…

In-situ experimentation with surprising conclusions

Before I start this posting, I have to explain a term or two… Whenever you’re doing the type of experiment that I used to do back when I did experiments, you take a thing – for example:

  • a pair of loudspeakers
  • in a place
  • in a room
  • playing a 2-second loop of music
  • at a known playback level
  • to a listening subject
  • located at a known location

All of that stuff stays the same. You then make one change to one thing (for example, you increase the bass level, while still making sure that the overall listening level has not changed – therefore you have to turn down the overall volume because you turned up the bass (if the frequency band(s) that are in the region affected by the “bass” controller are present in the 2-second loop of music…)

You then ask the listening subject questions about the difference in the two things they heard – knowing (as the experimenter) that they only heard two things. For example, you ask them to tell you which of the two stimuli they preferred.

In this example, the thing that you (the experimenter) changed is called the “independent variable“. The information that you get from the listening subjects is called the “dependent variable“, since it’s (in theory) dependent on the variable that you’re changing.

As an ex-experimenter, I’m a poor test subject in someone else’s experiment, because I spend most of the time during the test NOT actually doing the test – but trying to reverse-engineer what is being tested. In other words, while I give the experimenter the dependent variables, I’m trying to figure out what the independent variables are…

Cut to the chase…

For the past two weeks, I’ve done more than my usual amount of highway driving here in Denmark. A couple of trips to Copenhagen (a 4-hour drive) and a trip to Aarhus (a 1.5 hour drive). In those many hours of driving, I began to notice a pattern that seemed to involve an independent variable and a dependent variable…

Typically, I drive within 5% of the speed limit. Or, more accurately, the speed displayed on my speedometer when I’m driving is within 5% of the speed limit of the road that I’m on. This is because my car (a 7-year old Honda Civic) has an approximately 5% error in its calculation of my actual speed (with my current wheels and tires at their current air pressure). And, over the past two weeks on the highway, I’ve noticed that almost everyone else is driving almost exactly the same speed as me.

However, I started to notice that there were two exceptions to this. There is a small number of vehicles that are being driven significantly slower than I drive. And there are another small number of vehicles that go significantly faster. So, it appears that we have a dependent variable – the speed at which the vehicles are going… This must be the result of something – the independent variable… So, I was trying to figure out what the independent variables are – trying to reverse-engineer the problem as if I were a subject in an experiment.

It didn’t take long to come up with with a hypothesis – my best guess as to why I could break down the speeds of the vehicles into three different groups… And, once I came up with a hypothesis, I spent the rest of my time watching, to see if the hypothesis could be used to predict a behaviour…

Hypothesis 1: Slower vehicles

So, the first group of vehicles – the ones that go slower than me – almost all fit into a category of being significantly longer than my car. This includes large trucks, busses, and cars towing trailers. So, there appears to be a negative correlation between the vehicle length (over a yet-to-be-determined threshold – see the Appendix 1, below) and the difference in speed relative to the speed limit (which we will assume that I am obeying). Perhaps this is because longer vehicles cannot go as fast because they are heavier. Unfortunately, I have noticed that this conclusion only holds true on the highway. It does not hold true when the speed limit is either 80 km/h or 50 km/h… So, further investigation is required.

Therefore, I can conclude that the independent variable in this case is the vehicle length – with a significant degree of uncertainty.

Hypothesis 2: Faster vehicles

It became quickly apparent that almost all of the vehicles that were travelling faster than me were either a Mercedes, Audi, or BMW. So, assuming that all persons are equal, I can only conclude that the speedometers on German-made cars are poorly calibrated – and all with an error that deviates with the same polarity. The speedometers on German-made cars must be displaying a speed that is lower than the actual value.

Appendix 1: Outlier and Interaction

So, it appears that there are two different independent variables in this experiment. Interestingly, there is also an interaction. This can be seen in the case of German vehicles that are longer than my Honda driving much faster than me. So, it appears that the second independent variable has a heavier weighting than the first. In other words, if a vehicle is longer than mine, AND it’s a German-made car, then it will go faster, not slower. This is true to a yet-to-be-determined-length-difference-threshold, above which it is not true (in other words, German busses go slower than me).

Appendix 2: Outlier without interaction

There was one potential additional independent variable in this experiment that appeared after a little more time on the highway. This was the question of whether the vehicle had a licence plate originating in Germany instead of Denmark. In this case, it appears that the second hypothesis is nulled. In other words, a German-made car with German plates will drive the speed limit.

Appendix 3: Generalisations

It should be said that, after a little research on the Internet, I found that my second hypothesis may be too general. It may, in fact, be concluded that cars made in Southern Germany are the vehicles with the inaccurate speedometers. This nuance is the result of my noticing that most (but not all) Volkswagens are moving at a velocity that is approximately equal to the speed limit.

Counter-intuitively, the larger (and therefore longer) Volkswagens behave more like the Southern German vehicles, which makes me wonder if there is an additional classification that I cannot derive yet…

Conclusion

Speaking as a consumer, and as someone who works in the engineering / development department of a company that makes consumer products, I find it odd that German car manufacturers would intentionally put consistently inaccurate speedometers on the cars destined for their export market. However, based on the observations made in this short “experiment”, it’s the only conclusion I can come up with…

Admittedly, however, my conclusions are based on the reliance on a number of assumptions – some of them quite naive, of course. However, further observations and experimentation are required before submitting a paper to a peer-reviewed journal. This also raises the question of which journal(s) should be chosen for submitting my manuscript. Perhaps the JESC – The European Journal of Spurious Correlations or the GCFC – The George Carlin Fan Club. We’ll see…

Fc ≠ Fc

I was working on the sound design of a loudspeaker last week with some new people and software – so we had to get some definitions straight before we messed things up by thinking that we were using the same words to mean the same thing. I’ve made a similar mistake to this before, as I’ve written about here – and I don’t being reminded of my own stupidity repeatedly… (Or, as Stephen Wright once said “I’m having amnesia and deja vu at the same time – I think I’ve forgotten this before…”)

So, in this case on that day, we were talking about the lowly 2nd-order Low Pass Filter, based on a single biquad.

If you read about how to find the cutoff frequency of a low-pass filter, you’ll probably find out that you find the frequency where the gain is one half of the power of that in the bandpass portion of the filter’s response. Since 10*log10(0.5) = -3.01 dB, then this is also called the “3 dB down point” of the filter.

In my case, when I’m implementing a filter, I use the math provided by Robert Bristow-Johnson to calculate my biquad coefficients. You input a cutoff frequency (Fc), and a Q value, and (for a given sampling rate) you get your biquad coefficients.

The question then, is: is the desired cutoff frequency the actual measurable cutoff frequency of the system? (Let’s assume for the purposes of this discussion that there are no other components in the system that affect the magnitude response – just to keep it simple.)

The simple answer is: No.

For example, if I make a 2nd-order low pass filter with a desired cutoff frequency of 1 kHz (using a high enough sampling rate to not introduce any errors due to the bilinear transform) and I vary the Q from something very small (in this example, 0.1) to something pretty big (in this example, 20) I get magnitude response curves that look like the figure below.

Magnitude responses of 2nd order low pass filters with Q’s ranging from 0.1 to 20.

It is probably already evident that these 25 filter responses plotted above that they do not all cross each other at the 1 kHz line. In addition, you may notice that there is only one of those curves that is -3.01 dB at 1 kHz – when the Q = 1/sqrt(2) or 0.707.

This begs the question: what is the gain of each of those filters at the desired value of Fc (in this case, 1 kHz)? This is plotted as the red line in the figure below.

The actual gain value of the filters at the desired Fc, and the maximum gain at any frequency.

This plot also shows the maximum gain of the filters for different values of Q. Notice that, in the low end, the maximum value is 0 dB, since the low pass filters only roll off. However, for Q values higher than 1/sqrt(2), there is an overshoot in the response, resulting in a boost at some frequency. As the Q increases, the frequency at which the gain of the filter is highest approaches the desired cutoff frequency. (As can be seen in the plot above, by the time you get to a Q of 20, the gain at Fc and the maximum gain of the filter are the same.)

It may be intuitively interesting (or interestingly intuitive) to note that, when Q goes to infinity, the gain at Fc also goes to infinity, and (relatively speaking) all other frequencies are infinitely attenuated – so you have a sine wave generator.

So, we know that the gain value at the stated Fc is not -3 dB for all but one value of Q. So, what is the -3 dB point, if we state a desired Fc of 1 kHz and we vary the Q? This is shown in the figure below.

The -3 dB point of a 2nd order 1 kHz low pass filter as a function of Q.

So, varying the Q from 0.1 to 20 varies the actual Fc (or, at least, the -3 dB point) from about 104 Hz to about 1554 Hz.

Or, if we plot the same information as a function (or just a multiple) of the desired Fc, you get the plot below.

So, if you’re sitting in a meeting, and the person in front of you is looking at a measurement of a loudspeaker magnitude response, and they say “could you please put in a low pass filter with a cutoff frequency of 1 kHz and a Q of 0.5” you should start asking questions by what, exactly, they mean by “cutoff frequency”… If not, you might just wind up with nice-looking numbers but strangely-sounding loudspeakers.