“High-Res” Audio: Part 5 – Mirrors are bad

Part 1
Part 2
Part 3
Part 4

Let’s go back to something I said in the last post:

Mistake #1

I just jumped to at least three conclusions (probably more) that are going to haunt me.

The first was that my “digital audio system” was something like the following:

Figure 1

As you can see there, I took an analogue audio signal, converted it to digital, and then converted it back to analogue. Maybe I transmitted it or stored it in the part that says “digital audio”.

However, the important, and very probably incorrect assumption here is that I did nothing to the signal. No volume control, no bass and treble adjustments… nothing.


If you consider that signal flow from the position of an end-consumer playing a digital recording, this was pretty easy to accomplish in the “old days” when we were all playing CDs. That’s because (in a theoretical, oversimplified world…)

  • the output of the mixing/mastering console was analogue
  • that analogue signal was converted to digital in the mastering studio
  • the resulting bits were put on a disc
  • you put that disc in your player which contained a DAC that converted the signal directly to analogue
  • you then sent the signal to your “processing” (a.k.a. “volume control”, and maybe some bass and treble adjustment.).

So, that flowchart in Figure 1 was quite often true in 1985.

These days, things are probably VERY different… These days, the signal path probably looks something more like this (note that I’ve highlighted “alterations” or changes in the bits in the audio signal in red):

  • The signal was converted from analogue to digital in the studio
    (yes, I know… studios often work with digital mixers these days, but at least some of the signals within the mix were analogue to start – unless you are listening to music made exclusively with digital synthesizers)
  • The resulting bits were saved on a file
  • Depending on the record label, the audio signal was modified to include a “watermark” that can identify it later – in court, when you’ve been accused of theft.
  • The file was transferred to a storage device (let’s say “hard drive”) in a large server farm renting out space to your streaming service
  • The streaming service encodes the file
    • If the streaming service does not offer an lossless option, then the file is converted to a lossy format like MP3, Ogg Vorbis, AAC, or something else.
    • If the streaming service offers a lossless option, then the file is compressed using a format like FLAC or ALAC (This is not an alteration, since, with a lossless compression system, you don’t lose anything)
  • You download the file to your computer
    (it might look like an audio player – but that means it’s just a computer that you can’t use to check your social media profile)
  • You press play, and the signal is decoded (either from the lossy CODEC or the compression format) back to LPCM. (Still not an alteration. If it’s a lossy CODEC, then the alteration has already happened.)
  • That LPCM signal might be sample-rate converted
  • The streaming service’s player might do some processing like dynamic range compression or gain changes if you’ve asked it to make all the songs have the same level.
  • All of the user-controlled “processing” like volume controls, bass, and treble, are done to the digital signal.
  • The signal is sent to the loudspeaker or headphones
    • If you’re sending the signal wirelessly to a loudspeaker or headphones, then the signal is probably re-encoded as a lossy CODEC like AAC, aptX, or SBC.
      (Yes, there are exceptions with wireless loudspeakers, but they are exceptions.)
    • If you’re sending the signal as a digital signal over a wire (like S/PDIF or USB), the you get a bit-for-bit copy at the input of the loudspeaker or headphones.
  • The loudspeakers or headphones might sample-rate convert the signal
  • The sound is (finally) converted to analogue – either one stream per channel (e.g. “left”) or one stream per loudspeaker driver (e.g. “tweeter”) depending on the product.

So, as you can see in that rather long and complicated list (it looks complicated, but I’ve actually simplified it a little, believe it or not), there’s not much relation to the system you had in 1985.

Let’s take just one of those blocks and see what happens if things go horribly wrong. I’ll take the “volume control” block and add some distortion to see the result with two LPCM systems that have two different sampling rates, one running at 48 kHz and the other at 194 kHz – four times the rate. Both systems are running at 24 bits, with TPDF dither (I won’t explain what that means here). I’ll start by making a 10 kHz tone, and sending it through the system without any intentional distortion. If we look at those two signals in the time domain, they’ll look like this:

Figure 1: Two 10 kHz tones. The black one is in a 48 kHz, 24 bit LPCM system. The red one is in a 192 kHz, 24 bit LPCM system.

The sine tone in the 48 kHz system may look less like a sine tone than the one in the 192 kHz system, however, in this case, appearances are deceiving. The reconstruction filter in the DAC will filter out all the high frequencies that are necessary to reproduce those corners that you see here, so the resulting output will be a sine wave. Trust me.

If we look at the magnitude responses of these two signals, they look like Figure 2, below.

Figure 2: The magnitude responses of the two signals shown in Figure 1.

You may be wondering about the “skirts” on either side of the 10 kHz spikes. These are not really in the signal, they’re a side-effect (ha ha) of the windowing process used in the DFT (aka FFT). I will not explain this here – but I did a long series of articles on windowing effects with DFTs, so you can search for it if you’re interested in learning more about this.

If you’re attentive, you’ll notice that both plots extend up to 96 kHz. That’s because the 192 kHz system on the bottom has a Nyquist frequency of 96 kHz, and I want both plots to be on the same scale for reasons that will be obvious soon.

Now I want to make some distortion. In order to make things obvious, I’m going to make a LOT of distortion. I’ve made the sine wave try to have an amplitude that is 10 times higher than my two systems will allow. In other words, my amplitude should be +/- 10, but the signal clips at +/- 1, resulting in something looking very much like a square wave, as shown in Figure 3.

Figure 3: Distorted 10 kHz sine waves. The black one is in a 48 kHz, 24 bit LPCM system. The red one is in a 192 kHz, 24 bit LPCM system.

You may already know that if you want to make a square wave by building it up using its constituent harmonics, you need to have the fundamental (which we’ll call Fc. In our case, Fc = 10 kHz) with an amplitude that we’ll say is “A”, you then add the

  • 3rd harmonic (3 times Fc, so 30 kHz in our case) with an amplitude of A/3.
  • 5th harmonic (5 Fc = 50 kHz) with an amplitude of A/5
  • 7 Fc at A/7
  • and so on up to infinity

Let’s look at the magnitude responses of the two signals above to see if that’s true.

Figure 4: The magnitude responses of the two signals shown in Figure 3.

If we look at the bottom plot first (running at 192 kHz and with a Nyquist limit of 96 kHz) the 10 kHz tone is still there. We can also see the harmonics at 30 kHz, 50 kHz, 70 kHz, and 90 kHz in amongst the mess of other spikes we’ll get to those soon…)

Figure 5. Some labels applied to Figure 4 for clarity, showing the harmonics of the square waves that are captured by the two systems

Looking at the top plot (running at 48 kHz and with a Nyquist limit of 24 kHz), we see the 10 kHz tone, but the 30 kHz harmonic is not there – because it can’t be. Signals can’t exist in our system above the Nyquist limit. So, what happens? Think back to the images of the rotating wheel in Part 3. When the wheel was turning more than 1/2 a turn per frame of the movie, it appears to be going backwards at a different speed that can be calculated by subtracting the actual rotation from 180º (half-a-turn).

The same is true when, inside a digital audio signal flow, we try to make a signal that’s higher than Nyquist. The energy exists in there – it just “folds” to another frequency – its “alias”.

We can look at this generally using Figure 6.

Figure 6: A general plot of aliasing, showing the intended frequency in black and the actual output frequency in red.

Looking at Figure 6: If we make a sine tone that sweeps upward from 0 Hz to the Nyquist frequency at Fs/2 (half the sampling rate or sampling frequency) then the output is the same as the input. However, when the intended frequency goes above Fs/2, the actual frequency that comes out is Fs/2 minus the intended frequency. This creates a “mirror” effect.

If the intended frequency keeps going up above Fs, then the mirroring happens again, and again, and again… This is illustrated in Figure 7.

Figure 7: An extension of Figure 5 to a higher intended frequency.

This plot is shown with linear scales for both the X- and Y-axes to make it easy to understand. If the axes in Figure 7 were scaled to a logarithmic scaling instead (which is how “Frequency Response” are normally shown, since this corresponds to how we hear frequency differences), then it would look like Figure 8.

Figure 8: The same information shown in Figure 7, plotted on a logarithmic scale instead. Note that this example is for a system running at 48 kHz (therefore with a Nyquist frequency of 24 kHz), and an intended input frequency (in black) going up to 3 times 48 kHz = 144 kHz.

Coming back to our missing 30 kHz harmonic in the 48 kHz LPCM system: Since 30 kHz is above the Nyquist limit of 24 kHz in that system, it mirrors down to 24 kHz – (30 kHz – 24 kHz) = 18 kHz. The 50 kHz harmonic shows up as an alias at 2 kHz. (follow the red line in Figure 7: A harmonic on the black line at 48 kHz would actually be at 0 Hz on the red line. Then, going 2000 Hz up to 50 kHz would bring the red line up to 2 kHz.)

Similarly, the 110 kHz harmonic in the 192 kHz system will produce an alias at 96 kHz – (110 kHz – 96 kHz) = 82 kHz.

If I then label the first set of aliases in the two systems, we get Figure 9.

Figure 9: The first set of aliased frequency content in the two systems.

Now we have to stop for a while and think about what’s happened.

We had a digital signal that was originally “valid” – meaning that it did not contain any information above the Nyquist frequency, so nothing was aliasing. We then did something to the signal that distorted it inside the digital audio path. This produced harmonics in both cases, however, some of the harmonics that were produced are harmonically related to the original signal (just as they ought to be) and others are not (because they’re aliases of frequency content that cannot be reproduced by the system.

What we have to remember is that, once this happens, that frequency content is all there, in the signal, below the Nyquist frequency. This means that, when we finally send the signal out of the DAC, the low-pass filtering performed by the reconstruction filter will not take care of this. It’s all part of the signal.

So, the question is: which of these two systems will “sound better” (whatever that means)? (I know, I know, I’m asking “which of these two distortions would you prefer?” which is a bit of a weird question…)

This can be answered in two ways that are inter-related.

The first is to ask “how much of the artefact that we’ve generated is harmonically related to the signal (the original sine tone)?” As we can see in Figure 5, the higher the sampling rate, the more artefacts (harmonics) will be preserved at their original intended frequencies. There’s no question that harmonics that are harmonically related to the fundamental will sound “better” than tones that appear to have no frequency relationship to the fundamental. (If I were using a siren instead of a constant sine tone, then aliased harmonics are equally likely to be going down or up when the fundamental frequency goes up… This sounds weird.)

The second is to look at the levels of the enharmonic artefacts (the ones that are not harmonically related to the fundamental). For example, both the 48 kHz and the 192 kHz system have an aliased artefact at 2 kHz, however, its level in the 48 kHz system is 15 dB below the fundamental whereas, in the 192 kHz system, it’s more than 26 dB below. This is because the 6 kHz artefact in the 48 kHz system is an alias of the 30 kHz harmonic, whereas, in the 192 kHz system, it’s an alias of the 190 kHz harmonic, which is much lower in level.

As I said, these two points are inter-related (you might even consider them to be the same point) however, they can be generalised as follows:

The higher the sampling rate, the more the artefacts caused by distortion generated within the system are harmonically related to the signal.

In other words, it gives a manufacturer more “space” to screw things up before they sound bad. The title of this posting is “Mirrors are bad” but maybe it should be “Mirrors are better when they’re further away” instead.

Of course, the distortion that’s actually generated by processing inside a digital audio system (hopefully) won’t be anything like the clipping that I did to the signal. On the other hand, I’ve measured some systems that exhibit exactly this kind of behaviour. I talked about this in another series about Typical Problems in Digital Audio: Aliasing where I showed this measurement of a real device:

Figure 10: A measurement of a real device showing some kind of distortion and aliased artefacts of a swept sine tone. Half of the aliasing is immediately recognizable as going downwards when the tone is going upwards.

However, I’m not here to talk about what you can or can’t hear – that is dependent on too many variables to make it worth even starting to talk about. The point of this series is not to prove that something is better or worse than something else. It’s only to show the advantages and disadvantages of the options so that you can make an informed choice that best suits your requirements.

On to Part 6

Turn it down half-way…

#81 in a series of articles about the technology behind Bang & Olufsen loudspeakers

Bertrand Russell once said, “In all affairs it’s a healthy thing now and then to hang a question mark on the things you have long taken for granted.”

This article is a discussion, both philosophical and technical about what a volume control is, and what can be expected of it. This seems like a rather banal topic, but I find it surprising how often I’m required to explain it.

Why am I writing this?

I often get questions from colleagues and customers that sound something like the following:

  • Why does my Beovision television’s volume control only go to 90%? Why can’t I go to 100%?
  • I set the volume on my TV to 40, so why is it so quiet (or loud)?

The first question comes from people who think that the number on the screen is in percent – but it’s not. The speedometer in your car displays your speed in kilometres per hour (km/h), the tachometer is in revolutions of the engine per minute (RPM) the temperature on your thermostat is in degrees Celsius (ºC), and the display on your Beovision television is on a scale based on decibels (dB). None of these things are in percent (imagine if the speed limit on the highway was 80% of your car’s maximum speed… we’d have more accidents…)

The short reason we use decibels instead of percent is that it means that we can use subtraction instead of division – which is harder to do. The shortcut rule-of-thumb to remember is that, every time you drop by 6 dB on the volume control, you drop by 50% of the output. So, for example, going from Volume step 90 to Volume step 84 is going from 100% to 50%. If I keep going down, then the table of equivalents looks like this:

I’ve used two colours there to illustrate two things:

  • Every time you drop by 6 volume steps, you cut the percentage in half. For example, 60 is five drops of 6 steps, which is 1/2 of 1/2 of 1/2 of 1/2 of 1/2 of 100%, or 3.2% (notice the five halves there…)
  • Every time you drop by 20, you cut the percentage to 1/10. So, Volume Step 50 is 1% of Volume Step 90 because it’s two drops of 20 on the volume control.

If I graph this, showing the percentage equivalent of all 91 volume steps (from 0 to 90) then it looks like this:

Of course, the problem this plot is that everything from about Volume Step 40 and lower looks like 0% because the plot doesn’t have enough detail. But I can fix that by changing the way the vertical axis is displayed, as shown below.

That plot shows exactly the same information. The only difference is that the vertical scale is no longer linearly counting from 0% to 100% in equal steps.

Why do we (and every other audio company) do it this way? The simple reason is that we want to make a volume slider (or knob) where an equal distance (or rotation) corresponds to an equal change in output level. We humans don’t perceive things like change in level in percent – so it doesn’t make sense to use a percent scale.

For the longer explanation, read on…

Basic concepts

We need to start at the very beginning, so here goes:

Volume control and gain

  1. An audio signal is (at least in a digital audio world) just a long list of numbers for each audio channel.
  2. The level of the audio signal can be changed by multiplying it by a number (called the gain).
    1. If you multiply by a value larger than 1, the audio signal gets louder.
    2. If you multiply by a number between 0 and 1, the audio signal gets quieter.
    3. If you multiply by zero, you mute the audio signal.
  3. Therefore, at its simplest core, a volume control implemented in a digital audio system is a multiplication by a gain. You turn up the volume, the gain value increases, and the audio is multiplied by a bigger number producing a bigger result.

That’s the first thing. Now we move on to how we perceive things…

Perception of Level

Speaking very generally, our senses (that we use to perceive the world around us) scale logarithmically instead of linearly. What does this mean? Let’s take an example:

Let’s say that you have $100 in your bank account. If I then told you that you’d won $100, you’d probably be pretty excited about it.

However, if you have $1,000,000 in your bank account, and I told you that you’re won $100, you probably wouldn’t even bother to collect your prize.

This can be seen as strange; the second $100 prize is not less money than the first $100 prize. However, it’s perceived to be very different.

If, instead of being $100, the cash prize were “equal to whatever you have in your bank account” – so the person with $100 gets $100 and the person with $1,000,000 gets $1,000,000, then they would both be equally excited.

The way we perceive audio signals is similar. Let’s say that you are listening to a song by Metallica at some level, and I ask you to turn it down, and you do. Then I ask you to turn it down by the same amount again, and you do. Then I ask you to turn it down by the same amount again, and you do… If I were to measure what just happened to the gain value, what would I find?

Well, let’s say that, the first time, you dropped the gain to 70% of the original level, so (for example) you went from multiplying the audio signal by 1 to multiplying the audio signal by 0.7 (a reduction of 0.3, if we were subtracting, which we’re not). The second time, you would drop by the same amount – which is 70% of that – so from 0.7 to 0.49 (notice that you did not subtract 0.3 to get to 0.4). The third time, you would drop from 0.49 to 0.343. (not subtracting 0.3 from 0.4 to get to 0.1).

In other words, each time you change the volume level by the “same amount”, you’re doing a multiplication in your head (although you don’t know it) – in this example, by 0.7. The important thing to note here is that you are NOT subtracting 0.3 from the gain in each of the above steps – you’re multiplying by 0.7 each time.

What happens if I were to express the above as percentages? Then our volume steps (and some additional ones) would look like this:

100%
70%
49%
34%
24%
17%
12%
8%

Notice that there is a different “distance” between each of those steps if we’re looking at it linearly (if we’re just subtracting adjacent values to find the difference between them). However, each of those steps is a reduction to 70% of the previous value.

This is a case where the numbers (as I’ve expressed them there) don’t match our experience. We hear each reduction in level as the same as the other steps, but they don’t look like they’re the same step size when we write them all down the way I’ve done above. (In other words, the numerical “distance” between 100 and 70 is not the same as the numerical “distance” between 49 and 34, but these steps would sound like the same difference in audio level.)

SIDEBAR: This is very similar / identical to the way we hear and express frequency changes. For example, the figure below shows a musical staff. The red brackets on the left show 3 spacings of one octave each; the distance between each of the marked frequencies sound the same to us. However, as you can see by the frequency indications, each of those octaves has a very different “width” in terms of frequency. Seen another way, the distance in Hertz in the octave from 440 Hz to 880 Hz is equal to the distance from 440 Hz all the way down to 0 Hz (both have a width of 440 Hz). However, to us, these sound like very different intervals.

SIDEBAR to the SIDEBAR: This also means that the distance in Hertz covered by the top octave on a piano is larger than the the distance covered by all of the other keys.

SIDEBAR to the SIDEBAR to the SIDEBAR: This also means that changing your sampling rate from 48 kHz to 96 kHz doubles your bandwidth, but only gives you an extra octave. However, this is not an argument against high-resolution audio, since the frequency range of the output is a small part of the list of pro’s and con’s.)

This is why people who deal with audio don’t use percent – ever. Instead, we use an extra bit of math that uses an evil concept called a logarithm to help to make things make more sense.

What is a logarithm?

If I say the following, you should not raise your eyebrows:

2*3 = 6, therefore 6/2 = 3 and 6/3 = 2

In other words, division is just multiplication done backwards. This can be generalised to the following:

if a*b=c, then c/a=b and c/b=a

Logarithms are similar; they’re just exponents done backwards. For example:

102 = 100, therefore Log10(100) = 2

and generally:

AB=C, therefore LogA(C) = B

Why use a logarithm?

The nice thing about logarithms is that they are a convenient way for a mathematician to do addition instead of multiplication.

For example, if I have the following sequence of numbers:

2, 4, 8, 16, 32, 64, and so on…

It’s easy to see that I’m just multiplying by 2 to get the next number.

What if I were to express the number sequence above as a series of exponents? Then it would look like this:

21, 22, 23, 24, 25, 26

Not useful yet…

What if I asked you to multiply two numbers in that sequence? Say, for example, 1024 * 8192. This would take some work (or at least some scrambling, looking for the calculator app on your phone…). However, it helps to know that this is the same as asking you to multiply 210 * 213 – to which the answer is 223. Notice that 23 is merely 10+13. So, I’ve used exponents to convert the problem from multiplication (1024*8192) to addition (210 * 213 = 2(10+13)).

How did I find out that 8192 = 213? By using a logarithm : Log2(8192) = 13.

In the old days, you would have been given a book of logarithmic tables in school, which was a way of looking up the logarithm of 8192. (Actually, those books were in base 10 and not base 2, so you would have found out that Log10(8192) = 3.9013, which would have made this discussion more confusing…) Nowadays, you can use an antique device called a “calculator” – a simulacrum of which is probably on a device you call a “phone” but is rarely used as such.

I will leave it to the reader to figure out why this quality of logarithms (that they convert multiplication into addition) is why slide rules work.

So what?

Let’s go back to the problem: We want to make a volume slider (or knob) where an equal distance (or rotation) corresponds to an equal change in level. Let’s do a simple one that has 10 steps. Coming down from “maximum” (which we’ll say is a gain of 1 or 100%), it could look like these:

The gain values for four different versions of a 10-step volume control.

The plot above shows four different options for our volume controller. Starting at the maximum (volume step 10) and working downwards to the left, each one drops by the same perceived amount per step. The Black plot shows a drop of 90% per step, the red plot shows a drop of 70% per step (which matches the list of values I put above), Blue is 50% per step, and green is 30% per step.

As you can see, these lines are curved. As you can also see, as you get lower and lower, they get to the point where it gets harder to see the value (for example, the green curve looks like it has the same gain value for Volume steps 1 through 4).

However, we can view this a different way. If we change the scale of our graph’s Y-Axis to a logarithmic one instead of a linear one, the exact same information will look like this:

The same data plotted using a different scale for the Y-Axis.

Notice now that the Y-axis has an equal distance upwards every time the gain multiplies by 10 (the same way the music staff had the same distance every time we multiplied the frequency by 2). By doing this, we now see our gain curves as straight lines instead of curved ones. This makes it easy to read the values both when they’re really small and when they’re (comparatively) big (those first 4 steps on the green curve don’t look the same on that plot).

So, one way to view the values for our Volume controller is to calculate the gains, and then plot them on a logarithmic graph. The other way is to build the logarithm into the gain itself, which is what we do. Instead of reading out gain values in percent, we use Bels (named after Alexander Graham Bell). However, since a Bel is a big step, we we use tenths of a Bel or “decibels” instead. (… In the same way that I tell people that my house is 4,000 km, and not 4,000,000 m from my Mom’s house because a metre is too small a division for a big distance. I also buy 0.5 mm pencil leads – not 0.0005 m pencil leads. There are many times when the basic unit of measurement is not the right scale for the thing you’re talking about.)

In order to convert our gain value (say, of 0.7) to decibels, we do the following equation:

20 * Log10(gain) = Gain in dB

So, we would say

20 * Log10(0.7) = -3.01 dB

I won’t explain why we say 20 * the logarithm, since this is (only a little) complicated.

I will explain why it’s small-d and capital-B when you write “dB”. The small-d is the convention for “deci-“, so 1 decimetre is 1 dm. The capital-B is there because the Bel is named after Alexander Graham Bell. This is similar to the reason we capitalize Hz, V, A, and so on…

So, if you know the linear gain value, you can calculate the equivalent in decibels. If I do this for all of the values in the plots above, it will look like this:

Notice that, on first glance, this looks exactly like the plot in the previous figure (with the logarithmic Y-Axis), however, the Y-Axis on this plot is linear (counting from -100 to 0 in equal distances per step) because the logarithmic scaling is already “built into” the values that we’re plotting.

For example, if we re-do the list of gains above (with a little rounding), it becomes

100% = 0 dB
70% = -3 dB
49% = -6 dB
34% = -9 dB
24% = -12 dB
17% = -15 dB
12% = -18 dB
8% = -21 dB

Notice coming down that list that each time we multiplied the linear gain by 0.7, we just subtracted 3 from the decibel value, because, as we see in the equation above, these mean the same thing.

This means that we can make a volume control – whether it’s a slider or a rotating knob – where the amount that you move or turn it corresponds to the change in level. In other words, if you move the slider by 1 cm or rotate the knob by 10º – NO MATTER WHERE YOU ARE WITHIN THE RANGE – the change is level will be the same as if you made the same movement somewhere else.

This is why Bang & Olufsen devices made since about 1990 (give or take) have a volume control in decibels. In older models, there were 79 steps (0 to 78) or 73 steps (0 to 72), which was expanded to 91 steps (0 to 90) around the year 2000, and expanded again recently to 101 steps (0 to 100). Each step on the volume control corresponds to a 1 dB change in the gain. So, if you change the volume from step 30 to step 40, the change in level will appear to be the same as changing from step 50 to step 60.

Volume Step ≠ Output Level

Up to now, all I’ve said can be condensed into two bullet points:

  • Volume control is a change in the gain value that is multiplied by the incoming signal
  • We express that gain value in decibels to better match the way we hear changes in level

Notice that I didn’t say anything in those two statements about how loud things actually are… This is because the volume setting has almost nothing to do with the level of the output, which, admittedly, is a very strange thing to say…

For example, get a DVD or Blu-ray player, connect it to a television, set the volume of the TV to something and don’t touch it for the rest of this experiment. Now, put in a DVD copy of any movie that has ONLY dialogue, and listen to how loud it sounds. Then, take out the DVD and put in a CD of Metallica’s Death Magnetic, press play. This will be much, much louder. In fact, if you own a B&O TV, the difference in level between those two things is the same as turning up the volume by 31 steps, which corresponds to 31 dB. Why?

When re-recording engineers mix a movie, they aim to make the dialogue sit around a level of 31 dB below maximum (better known as -31 dB FS or “31 decibels below Full Scale”). This gives them enough “headroom” to get much louder for explosions and gunshots to be exciting.

When a mixing engineer and a mastering engineer work on a pop or rock album, it’s not uncommon for them to make it as loud as possible, aiming for maximum (better known as 0 dB FS).

This means that a movie’s dialogue is much quieter than Metallica or Billie Eilish or whomever is popular when you’re reading this, because Metallica is as loud as the explosions in the movie.

The volume setting is just a value that changes that input level… So, If I listen to music at volume step 42 on a Beovision television, and you watch a movie at volume step 73 on the same Beovision television, it’s possible that we’re both hearing the same sound pressure level in our living rooms, because the music is 31 dB louder than the movie, which is the same amount that I’ve turned down my TV relative to yours (73-42 = 31).

In other words, the Volume Setting is not a predictor of how loud it is. A Volume Setting is a little like the accelerator pedal in your car. You can use the pedal to go faster or slower, but there’s no way of knowing how fast you’re driving if you only know how hard you’re pushing on the pedal.

What about other brands and devices?

This is where things get really random:

  • take any device (or computer or audio software)
  • play a sine wave (because thats easy to measure)
  • measure the change in output level as you change the volume setting
  • graph the result
  • Repeat everything above for different devices

You’ll see something like this:

The gain vs. Volume step behaviours of 8 different devices / software players

there are two important things to note in the above plot.

  1. These are the measurements of 8 different devices (or software players or “apps”) and you get 8 different results (although some of them overlap, but this is because those are just different version numbers of the same apps).
    • Notice as well that there’s a big difference here. At a volume setting of “50%” there’s a 20 dB difference between the blue dashed line and the black one with the asterisk markings. 20 dB is a LOT.
  2. None of them look like the straight lines seen in the previous plot, despite the fact that the Y-axis is in decibels. In ALL of these cases, the biggest jumps in level happen at the beginning of the volume control (some are worse than others). This is not only because they’re coming up from a MUTE state – but because they’re designed that way to fool you. How?

Think about using any of these controllers: you turn it 25% of the way up, and it’s already THIS loud! Cool! This speaker has LOTS of power! I’m only at 25%! I’ll definitely buy it! But the truth is, when the slider / knob is at 25% of the way up, you’re already pushing close to the maximum it can deliver.

These are all the equivalent of a car that has high acceleration when starting from 0 km/h, but if you’re doing 100 km/h on the highway, and you push on the accelerator, nothing happens.

First impressions are important…

On the other hand (in support of thee engineers who designed these curves), all of these devices are “one-offs” (meaning that they’re all stand-alone devices) made by companies who make (or expect to be connected to) small loudspeakers. This is part of the reason why the curves look the way they do.

If B&O used those style of gain curves for a Beovision television connected to a pair of Beolab 90s, you’d either

  • be listening at very loud levels, even at low volume settings;
  • or you wouldn’t be able to turn it up enough for music with high dynamic range.

Some quick conclusions

Hopefully, if you’ve read this far and you’re still awake:

  • you will never again use “percent” to describe your volume level
  • you will never again expect that the output level can be predicted by the volume setting
  • you will never expect two devices of two different brands to output the same level when set to the same volume setting
  • you understand why B&O devices have so many volume steps over such a large range.

Turntables and Vinyl: Part 9

Back to Part 8

Magnitude response

The magnitude response* of any audio device is a measure of how much its output level deviates from the expected level at different frequencies. In a turntable, this can be measured in different ways.

Usually, the magnitude response is measured from a standard test disc with a sine wave sweep ranging from at least 20 Hz to at least 20 kHz. The output level of this signal is recorded at the output of the device, and the level is analysed to determine how much it differs from the expected output. Consequently, the measurement includes all components in the audio path from the stylus tip, through the RIAA preamplifier (if one is built into the turntable), to the line-level outputs.

Because all of these components are in the signal path, there is no way of knowing immediately whether deviations from the expected response are caused by the stylus, the preamplifier, or something else in the chain.

It’s also worth noting that a typical standard test disc (JVC TRS-1007 is a good example) will not have a constant output level, which you might expect if you’re used to measuring other audio devices. Usually, the swept sine signal has a constant amplitude in the low frequency bands (typically, below 1 kHz) and a constant modulation velocity in the high frequencies. This is to avoid over-modulation in the low end, and burning out the cutter head during mastering in the high end.

* This is the correct term for what is typically called the “frequency response”. The difference is that a magnitude response only shows output level vs. frequency, whereas the frequency response would include both level and phase information.

Rumble

In theory, an audio playback device only outputs the audio signal that is on the recording without any extra contributions. In practice, however, every audio device adds signals to the output for various reasons. As was discussed above, in the specific case of a turntable, the audio signal is initially generated by very small movements of the stylus in the record groove. Therefore, in order for it to work at all, the system must be sensitive to very small movements in general. This means that any additional movement can (and probably will) be converted to an audio signal that is added to the recording.

This unwanted extraneous movement, and therefore signal, is usually the result of very low-frequency vibrations that come from various sources. These can include things like mechanical vibrations of the entire turntable transmitted through the table from the floor, vibrations in the system caused by the motor or imbalances in the moving parts, warped discs which cause a vertical movement of the stylus, and so on. These low-frequency signals are grouped together under the heading of rumble.

A rumble measurement is performed by playing a disc that has no signal on it, and measuring the output signal’s level. However, that output signal is first filtered to ensure that the level detection is not influenced by higher-frequency problems that may exist.

The characteristics of the filters are defined in internal standards such as DIN 45 539 (or IEC98-1964), shown below. Note that I’ve only plotted the target response. The specifications allow for some deviation of ±1 dB (except at 315 Hz). Notice that the low-pass filter is the same for both the Weighted and the Unweighted filters. Only the high-pass filter specifications are different for the two cases.

The magnitude responses for the “Unweighted” (black) and “Weighted” filters for rumble measurements, specified in DIN 45 539

If the standard being used for the rumble measurement is the DIN 45 539 specification, then the resulting value is stated as the level difference between the measured filtered noise and a the standard output level, equivalent to the output when playing a 1 kHz tone with a lateral modulation velocity of 70.7 mm/sec. This detail is also worth noting, since it shows that the rumble value is a relative and not an absolute output level.

Rotational speed

Every recording / playback system, whether for audio or for video signals, is based on the fundamental principle that the recording and the playback happen at the same rate. For example, a film that was recorded at 24 frames (or photos) per second (FPS) must also be played at 24 FPS to avoid objects and persons moving too slowly or too quickly. It’s also necessary that neither the recording nor the playback speed changes over time.

A phonographic LP is mastered with the intention that it will be played back at a rotational speed of 33 1/3 RPM (Revolutions Per Minute) or 45 RPM, depending on the disc. (These correspond to 1 revolution either every 1.8 seconds or every 1 1/3 seconds respectively.) We assume that the rotational speed of the lathe that was used to cut the master was both very accurate and very stable. Although it is the job of the turntable to duplicate this accuracy and stability as closely as possible, measurable errors occur for a number of reasons, both mechanical and electrical. When these errors are measured using especially-created audio signals like pure sine tones, the results are filtered and analyzed to give an impression of how audible they are when listening to music. However, a problem arises in that a simple specification (such as a single number for “Wow and Flutter”, for example) can only be correctly interpreted with the knowledge of how the value is produced.

Accuracy

The first issue is the simple one of accuracy: is the turntable rotating the disc at the correct average speed? Most turntables have some kind of user control of this (both for the 33 and 45 RPM settings), since it will likely be necessary to adjust these occasionally over time, as the adjustment will drift with influences such as temperature and age.

Stability

Like any audio system, regardless of whether it’s analogue or digital, the playback speed of the turntable will vary over time. As it increases and decreases, the pitch of the music at the output will increase and decrease proportionally. This is unavoidable. Therefore, there are two questions that result:

  • How much does the speed change?
  • What is the rate and pattern of the change?

In a turntable, the amount of the change in the rotational speed is directly proportional to the frequency shift in the audio output. Therefore for example, if the rotational speed decreases by 1% (for example, from 33 1/3 RPM to exactly 33 RPM), the audio output will drop in frequency by 1% (so a 440 Hz tone will be played as a 440 * 0.99 = 435.6 Hz tone). Whether this is audible is dependent on different factors including

  • the rate of change to the new speed
    (a 1% change 4 times a second is much easier to hear than a 1% change lasting 1 hour)
  • the listener’s abilities
    (for example, a person with “absolute pitch” may be able to recognise the change)
  • the audio signal
    (It is easier to detect a frequency shift of a single, long tone such as a note on a piano or pipe organ than it is of a short sound like a strike of claves or a sound with many enharmonic frequencies such as a snare drum.)

In an effort to simplify the specification of stability in analogue playback equipment such as turntables, four different classifications are used, each corresponding to different rates of change. These are drift, wow, flutter, and scrape, the two most popular of which are wow and flutter, and are typically grouped into one value to represent them.

Drift

Frequency drift is the tendency of a playback device’s speed to change over time very slowly. Any variation that happens slower than once every 2 seconds (in other words, with a modulation frequency of less than 0.5 Hz) is considered to be drift. This is typically caused by changes such as temperature (as the playback device heats up) or variations in the power supply (due to changes in the mains supply, which can vary with changing loads throughout the day).

Wow

Wow is a modulation in the speed ranging from once every 2 seconds to 6 times a second (0.5 Hz to 6 Hz). Note that, for a turntable, the rotational speed of the disc is within this range. (At 33 1/3 RPM: 1 revolution every 1.8 seconds is equal to approximately 0.556 Hz.)

Flutter

Flutter describes a modulation in the speed ranging from 6 to 100 times a second (6 Hz to 100 Hz).

Scrape

Scrape or scrape flutter describes changes in the speed that are higher than 100 Hz. This is typically only a problem with analogue tape decks (caused by the magnetic tape sticking and slipping on components in its path) and is not often used when classifying turntable performance.

Measurement and Weighting

The easiest accurate method to measure the stability of the turntable’s speed within the range of Wow and Flutter is to follow one of the standard methods, of which there are many, but they are all similar. Examples of these standards are AES6-2008, CCIR 409-3, DIN 45507, and IEC-386. A special measurement disc containing a sine tone, usually with a frequency of 3150 Hz is played to a measurement device which then does a frequency analysis of the signal. In a perfect system, the result would be a 3150 Hz sine tone. In practice, however, the frequency of the tone varies over time, and it is this variation that is measured and analysed.

There is general agreement that we are particularly sensitive to a modulation in frequency of about 4 Hz (4 cycles per second) applied to many audio signals. As the modulation gets slower or faster, we are less sensitive to it, as was illustrated in the example above: (a 1% change 4 times a second is much easier to hear than a 1% change lasting 1 hour).

So, for example, if the analysis of the 3150 Hz tone shows that it varies by ±1% at a frequency of 4 Hz, then this will have a bigger impact on the result than if it varies by ±1% at a frequency of 0.1 Hz or 40 Hz. The amount of impact the measurement at any given modulation frequency has on the total result is shown as a “weighting curve” in the figure below.

Weighting applied to the Wow and Flutter measurement in most standard methods. See the text for an explanation.

As can be seen in this curve, a modulation at 4 Hz has a much bigger weight (or impact) on the final result than a modulation at 0.315 Hz or at 140 Hz, where a 20 dB attenuation is applied to their contribution to the total result. Since attenuating a value by 20 dB is the same as dividing it by 10; a ±1% modulation of the 3150Hz tone at 4 Hz will produce the same result as a ±10% modulation of the 3150 Hz tone at 140 Hz, for example.

This shows just one example of why comparing one Wow and Flutter measurement value should be interpreted very cautiously.

Expressing the result

When looking at a Wow and Flutter specification, one will see something like <0.1%, <0.05% (DIN), or <0.1% (AES6). Like any audio specification, if the details of the measurement type are not included, then the value is useless. For example, “W&F: <0.1%” means nothing, since there is no way to know which method was used to arrive at this value.(Similarly, a specification like “Frequency Range: 20 Hz to 20 kHz” means nothing, since there is no information about the levels used to define the range.)

If the standard is included in the specification (DIN or AES6, for example), then it is still difficult to compare wow and flutter values. This is because, even when performing identical measurements and applying the same weighting curve shown in the figure above, there are different methods for arriving at the final value. The value that you see may be a peak value (the maximum deviation from the average speed), the peak-to-peak value (the difference between the minimum and the maximum speeds), the RMS (a version of the average deviation from the average speed), or something else.

The AES6-2008 standard, which is the currently accepted method of measuring and expressing the wow and flutter specification, uses a “2-sigma” method, which is a way of looking at the peak deviation to give a kind of “worst-case” scenario. In this method, the 3150 Hz tone is played from a disc and captured for as long a time as is possible or feasible. Firstly, the average value of the actual frequency of the output is found (in theory, it’s fixed at 3150 Hz, but this is never true). Next, the short-term variation of the actual frequency over time is compared to the average, and weighted using the filter shown above. The result shows the instantaneous frequency variations over the length of the captured signal, relative to the average frequency (however, the effect of very slow and very fast changes have been reduced by the filter). Finally, the standard deviation of the variation from the average is calculated, and multiplied by 2 (“2-Sigma”, or “two times the standard deviation”), resulting in the value that is shown as the specification. The reason two standard deviations is chosen is that (in the typical case where the deviation has a Gaussian distribution) the actual Wow & Flutter value should exceed this value no more than 5% of the time.

The reason this method is preferred today is that it uses a single number to express not only the wow and flutter, but the probability of the device reaching that value. For example, if a device is stated to have a “Wow and Flutter of <1% (AES6)”, then the actual deviation from the average speed will be less than 1% for 95% of the time you are listening to music. The principal reason this method was not used in the “old days” is that it requires statistical calculations applied to a signal that was captured from the output of the turntable, an option that was not available decades ago. The older DIN method that was used showed a long-term average level that was being measured in real-time using analogue equipment such as the device shown in below.

Bang & Olufsen WM1, analogue wow and flutter meter.

Unfortunately, however, it is still impossible to know whether a specification that reads “Wow and Flutter: 1% (AES6)” means 1% deviation with a modulation frequency of 4 Hz or 10% deviation with a modulation frequency of 140 Hz – or something else. It is also impossible to compare this value to a measurement done with one of the older standards such as the DIN method, for example.

Turntables and Vinyl: Part 8

Back to Part 7

As was discussed in Part 3, when a record master is cut on a lathe, the cutter head follows a straight-line path as it moves from the outer rim to the inside of the disk. This means that it is always modulating in a direction that is perpendicular to the groove’s relative direction of travel, regardless of its distance from the centre.

The direction of travel of the cutting head when the master disk is created on a lathe.

A turntable should be designed to ensure that the stylus tracks the groove made by the cutter head in all aspects. This means that this perpendicular angle should be maintained across the entire surface of the disk. However, in the case of a tonearm that pivots, this is not possible, since the stylus follows a circular path, resulting in an angular tracking error.

Any tonearm has some angular tracking error that varies with position on the disk.

The location of the pivot point, the tonearm’s shape, and the mounting of the cartridge can all contribute to reducing this error. Typically, tonearms are designed so that the cartridge is angled to not be in-line with the pivot point. This is done to ensure that there can be two locations on the record’s surface where the stylus is angled correctly relative to the groove.

A correctly-designed and aligned pivoting tonearm has a tracking error of 0º at only two locations on the disk.

However, the only real solution is to move the tonearm in a straight line across the disc, maintaining a position that is tangential to the groove, and therefore keeping the stylus located so that its movement is perpendicular to the groove’s relative direction of travel, just as it was with the cutter head on the lathe.

A tonearm that travels sideways, maintaining an angle that is tangent to the groove at the stylus.

In a perfect system, the movement of the tonearm would be completely synchronous with the sideways “movement” of the groove underneath it, however, this is almost impossible to achieve. In the Beogram 4000c, a detection system is built into the tonearm that responds to the angular deviation from the resting position. The result is that the tonearm “wiggles” across the disk: the groove pulls the stylus towards the centre of the disk for a small distance before the detector reacts and moves the back of the tonearm to correct the angle.

Typically, the distance moved by the stylus before the detector engages the tracking motor is approximately 0.1 mm, which corresponds to a tracking error of approximately 0.044º.

An exaggerated representation of the maximum tracking error of the tonearm before the detector engages and corrects.

One of the primary artefacts caused by an angular tracking error is distortion of the audio signal: mainly second-order harmonic distortion of sinusoidal tones, and intermodulation distortion on more complex signals. (see “Have Tone Arm Designers Forgotten Their High-School Geometry?” in The Audio Critic, 1:31, Jan./Feb., 1977.) It can be intuitively understood that the distortion is caused by the fact that the stylus is being moved at a different angle than that for which it was designed.

It is possible to calculate an approximate value for this distortion level using the following equation:

Hd \approx 100 * \frac{ \omega  A  \alpha }{\omega_r R }

Where Hd is the harmonic distortion in percent, \omega is the angular frequency of the modulation caused by the audio signal (calculated using \omega = 2 \pi F), A is the peak amplitude in mm, \alpha is the tracking error in degrees, \omega_r is the angular frequency of rotation (the speed of the record in radians per second. For example, at 33 1/3 RPM, \omega_r = 2 \pi 0.556 rev/sec = 3.49) and R is the radius (the distance of the groove from the centre of the disk). (see “Tracking Angle in Phonograph Pickups” by B.B. Bauer, Electronics (March 1945))

This equation can be re-written, separating the audio signal from the tonearm behaviour, as shown below.

Hd \approx 100 * \frac{ \omega A }{\omega_r} * \frac{\alpha}{R}

which shows that, for a given audio frequency and disk rotation speed, the audio signal distortion is proportional to the horizontal tracking error over the distance of the stylus to the centre of the disk. (This is the reason one philosophy in the alignment of a pivoting tonearm is to ensure that the tracking error is reduced when approaching the centre of the disk, since the smaller the radius, the greater the distortion.)

It may be confusing as to why the position of the groove on the disk (the radius) has an influence on this value. The reason is that the distortion is dependent on the wavelength of the signal encoded in the groove. The longer the wavelength, the lower the distortion. As was shown in Figure 1 in Part 6 of this series, the wavelength of a constant frequency is longer on the outer groove of the disk than on the inner groove.

Using the Beogram 4000c as an example at its worst-case tracking error of 0.044º: if we have a 1 kHz sine wave with a modulation velocity of 34.1 mm/sec on a 33 1/3 RPM LP on the inner-most groove then the resulting 2nd-harmonic distortion will be 0.7% or about -43 dB relative to the signal. At the outer-most groove (assuming all other variables remain constant), the value will be roughly half of that, at 0.3% or -50 dB.

Turntables and Vinyl: Part 7

Back to Part 6

Tracking force

In order to keep the stylus tip in the groove of the record, it must have some force pushing down on it. This force must be enough to keep the stylus in the groove. However, if it is too large, then both the vinyl and the stylus will wear more quickly. Thus a balance must be found between “too much” and “not enough”.

Figure 1: Typical tracking force over time. The red portion of the curve shows the recommendation for Beogram 4002 and Beogram 4000c.

As can be seen in Figure 1, the typical tracking force of phonograph players has changed considerably since the days of gramophones playing shellac discs, with values under 10 g being standard since the introduction of vinyl microgroove records in 1948. The original recommended tracking force of the Beogram 4002 was 1 g, however, this has been increased to 1.3 g for the Beogram 4000c in order to help track more recent recordings with higher modulation velocities and displacements.

Effective Tip Mass

The stylus’s job is to track all of the vibrations encoded in the groove. It stays in that groove as a result of the adjustable tracking force holding it down, so the moving parts should be as light as possible in order to ensure that they can move quickly. The total apparent mass of the parts that are being moved as a result of the groove modulation is called the effective tip mass. Intuitively, this can be thought of as giving an impression of the amount of inertia in the stylus.

It is important to not confuse the tracking force and the effective tip mass, since these are very different things. Imagine a heavy object like a 1500 kg car, for example, lifted off the ground using a crane, and then slowly lowered onto a scale until it reads 1 kg. The “weight” of the car resting on the scale is equivalent to 1 kg. However, if you try to push the car sideways, you will obviously find that it is more difficult to move than a 1 kg mass, since you are trying to overcome the inertia of all 1500 kg, not the 1 kg that the scale “sees”. In this analogy, the reading on the scale is equivalent to the Tracking Force, and the mass that you’re trying to move is the Effective Tip Mass. Of course, in the case of a phonograph stylus, the opposite relationship is desirable; you want a tracking force high enough to keep the stylus in the groove, and an effective tip mass as close to 0 as possible, so that it is easy for the groove to move it.

Compliance

Imagine an audio signal that is on the left channel only. In this case, the variation is only on one of the two groove walls, causing the stylus tip to ride up and down on those bumps. If the modulation velocity is high, and the effective tip mass is too large, then the stylus can lift off the wall of the groove just like a car leaving the surface of a road on the trailing side of a bump. In order to keep the car’s wheels on the road, springs are used to push them back down before the rest of the car starts to fall. The same is true for the stylus tip. It’s being pushed back down into the groove by the cantilever that provides the spring. The amount of “springiness” is called the compliance of the stylus suspension. (Compliance is the opposite of spring stiffness: the more compliant a spring is, the easier it is to compress, and the less it pushes back.)

Like many other stylus parameters, the compliance is balanced with other aspects of the system. In this case it is balanced with the effective mass of the tonearm (which includes the tracking force(1), resulting in a resonant frequency. If that frequency is too high, then it can be audible as a tone that is “singing along” with the music. If it’s too low, then in a worst-case situation, the stylus can jump out of the record groove.

If a turntable is very poorly adjusted, then a high tracking force and a high stylus compliance (therefore, a “soft” spring) results in the entire assembly sinking down onto the record surface. However, a high compliance is necessary for low-frequency reproduction, therefore the maximum tracking force is, in part, set by the compliance of the stylus.

If you are comparing the specifications of different cartridges, it may be of interest to note that compliance is often expressed in one of five different units, depending on the source of the information:

  • “Compliance Unit” or “cu”
  • mm/N
    millimetres of deflection per Newton of force
  • µm/mN
    micrometres of deflection per thousandth of a Newton of force
  • x 10^-6 cm/dyn
    hundredths of a micrometre of deflection per dyne of force
  • x 10^-6 cm / 10^-5 N
    hundredths of a micrometre of deflection per hundred-thousandth of a Newton of force

Since

mm/N = 1000 µm / 1000 mN

and

1 dyne = 0.00001 Newton

Then this means that all five of these expressions are identical, so, they can be interchanged freely. In other words:

20 CU

= 20 mm / N

= 20 µm / mN

= 20 x 10^-6 cm / dyn

= 20 x 10^-6 cm / 10^-5 N

Footnotes

  1. On the Mechanics of Tonearms, Dick Pierce

Turntables and Vinyl: Part 6

Back to Part 5

Tip shape

The earliest styli were the needles that were used on 78 RPM gramophone players. These were typically made from steel wire that was tapered to a conical shape, and then the tip was rounded to a radius of about 150 µm, by tumbling them in an abrasive powder.(1) This rounded curve at the tip of the needle had a hemispherical form, and so styli with this shape are known as either conical or spherical.

The first styli made for “microgroove” LP’s had the same basic shape as the steel predecessor, but were tipped with sapphire or diamond. The conical/spherical shape was a good choice due to the relative ease of manufacture, and a typical size of that spherical tip was about 36 µm in diameter. However, as recording techniques and equipment improved, it was realised that there are possible disadvantages to this design.

Remember that the side-to-side shape of the groove is a physical representation of the audio signal: the higher the frequency, the smaller the wave on the disc. However, since the disc has a constant speed of rotation, the speed of the stylus relative to the groove is dependent on how far away it is from the centre of the disc. The closer the stylus gets to the centre, the smaller the circumference, so the slower the groove speed.

If we look at a 12″ LP, the smallest allowable diameter for the modulated groove is about 120 mm, which gives us a circumference of about 377 mm (or 120 * π). The disc is rotating 33 1/3 times every minute which means that it is making 0.56 of a rotation per second. This, in turn, means that the stylus has a groove speed of 209 mm per second. If the audio signal is a 20,000 Hz tone at the end of the recording, then there must be 20,000 waves carved into every 209 mm on the disc, which means that each wave in the groove is about 0.011 mm or 11 µm long.

Figure 1: The relative speed of the stylus to the surface of the vinyl as it tracks from the outside to the inside radius of the record.
Figure 2: The wavelengths measured in the groove, as a function of the stylus’s distance to the centre of a disc. The shorter lines are for 45 RPM 7″discs, the longer lines are for 33 1/3 RPM 12″ LPs.

However, now we have a problem. If the “wiggles” in the groove have a total wavelength of 11 µm, but the tip of the stylus has a diameter of about 36 µm, then the stylus will not be able to track the groove because it’s simply too big (just like the tires of your car do not sink into every small crack in the road). Figure 3 shows to-scale representations of a conical stylus with a diameter of 36 µm in a 70 µm-wide groove on the inside radius of a 33 1/3 RPM LP (60 mm from the centre of the disc), viewed from above. The red lines show the bottom of the groove and the black lines show the edge where the groove meets the surface of the disc. The blue lines show the point where the stylus meets the groove walls. The top plot is a 1 kHz sine wave and the bottom plot is a 20 kHz sine wave, both with a lateral modulation velocity of 70 mm/sec. Notice that the stylus is simply too big to accurately track the 20 kHz tone.

Figure 3: Scale representations of a conical stylus with a diameter of 36 µm in a 70 µm-wide groove on the inside radius of a 33 1/3 RPM LP, looking directly downwards into the groove. See the text for more information.

One simple solution was to “sharpen” the stylus; to make the diameter of the spherical tip smaller. However, this can cause two possible side effects. The first is that the tip will sink deeper into the groove, making it more difficult for it to move independently on the two audio channels. The second is that the point of contact between the stylus and the vinyl becomes smaller, which can result in more wear on the groove itself because the “footprint” of the tip is smaller. However, since the problem is in tracking the small wavelength of high-frequency signals, it is only necessary to reduce the diameter of the stylus in one dimension, thus making the stylus tip elliptical instead of conical. In this design, the tip of the stylus is wide, to sit across the groove, but narrow along the groove’s length, making it small enough to accurately track high frequencies. An example showing a 0.2 mil x 0.7 mil (10 x 36 µm) stylus is shown in Figure 4. Notice that this shape can track the 20 kHz tone more easily, while sitting at the same height in the groove as the conical stylus in Figure 3.

Figure 4: Scale representations of an elliptical stylus with diameters of 10 x 36 µm in a 70 µm-wide groove on the inside radius of a 33 1/3 RPM LP, looking directly downwards into the groove. See the text for more information.

Both the conical and the elliptical stylus designs have a common drawback in that the point of contact between the tip and the groove wall is extremely small. This can be seen in Figure 5, which shows various stylus shapes from the front. Notice the length of the contact between the red and black lines (the stylus and the groove wall). As a result, both the groove of the record and the stylus tip will wear over time, generally resulting in an increasing loss of high frequency output. This was particularly a problem when the CD-4 Quadradisc format was introduced, since it relies on signals as high as 45 kHz being played from the disc.(2) In order to solve this problem, a new stylus shape was invented by Norio Shibata at JVC in 1973. The idea behind this new design is that the sides of the stylus are shaped to follow a much larger-radius circle than is possible to fit into the groove, however, the tip has a small radius like a conical stylus. An example showing this general concept can be seen on the right side of Figure 5.

Figure 5: Dimensions of example styli, drawn to scale. The figure on the left is typical for a 78 RPM steel needle. The four examples on the right show different examples of tip shapes. These are explained in more details in the text. (For comparison, a typical diameter of a human hair is about 0.06 mm.)

There have been a number of different designs following Shibata’s general concept, with names such as MicroRidge (which has an interesting, almost blade-like shape “across” the groove), Fritz-Geiger, Van-den-Hul, and Optimized Contour Contact Line. Generally, these designs have come to be known as line contact (or contact line) styli, because the area of contact between the stylus and the groove wall is a vertical line rather than a single point.

In 1973, Bang and Olufsen started working its own turntable that could play the new CD-4 Quadradisc format. This not only meant developing a new decoder with a 4-channel output, but also a stylus with a bandwidth reliably extending to approximately 45 kHz. This task was given to Villy Hansen, who was project manager for pickup development, despite being still relatively new to the company. Hansen proposed an improvement upon the Shibata grind (which was already commercially available by then) by making 4 facets instead of 2, resulting in a better shape for tracking the very high-frequency modulation. Although developed by Hansen, the new stylus became known as the “Pramanik diamond”, named after Subir K. Pramanik, who had started working as an engineer in Struer in 1971, but who had temporarily returned to India. The end result was a new pickup family that was initially launched with the top model, the MMC 6000.

Figure 6: An example of an elliptical stylus on the left vs. a line contact Pramanik grind on the right. Notice the difference in the area of contact between the styli and the groove walls.

Bonded vs. Nude

There is one small, but important point regarding a stylus’s construction. Although the tip of the stylus is almost always made of diamond today, in lower-cost units, that diamond tip is mounted or bonded to a titanium or steel pin which is, in turn, connected to the cantilever (the long “arm” that connects back to the cartridge housing). This bonded design is cheaper to manufacture, but it results in a high mass at the stylus tip, which means that it will not move easily at high frequencies.

Figure 7: Scale models (on two different scales) of different styli. The example on the left is bonded, the other four are nude.

In order to reduce mass, the steel pin is eliminated, and the entire stylus is made of diamond instead. This makes things more costly, but reduces the mass dramatically, so it is preferred if the goal is higher sound performance. This design is known as a nude stylus.

Footnotes

  1. See “The High-fidelity Phonograph Transducer” B.B. Bauer, JAES 1977 Vol 25, Number 10/11, Oct/Nov 1977
  2. The CD4 format used a 30 kHz carrier tone that was frequency-modulated ±15 kHz. This means that the highest frequency that should be tracked by the stylus is 30 kHz + 15 kHz = 45 kHz.

On to Part 7

Turntables and Vinyl: Part 5

Back to Part 4

Before we go any further, we need to just collect a bunch of information about vinyl records.

General Information

Min groove depth: 0.001″ = 0.0254 mm = 25.4 µm
Max groove depth: 0.005″ = 0.127 mm = 127 µm

12″ LP’s

Outside modulation groove radius: 146 mm
Inside modulation groove radius: 60 mm
Total maximum modulation radius: 86 mm (3.4″)
Typical modulation radius: 76 mm (3″)

7″ 45 RPM

Outside modulation groove radius: 84 mm
Inside modulation groove radius: 54 mm
Total maximum modulation radius: 30 mm (1.2″)

Basic math

Pitch = (Running time x RPM) / Modulation Radius

Groove Width = (1000/LPI + 1) / 2

Peak Amplitude of Displacement = Peak Lateral Velocity / (2 π freq)

Examples

Pitch = (Running time x RPM) / Modulation Radius
Pitch = (20 minutes x 33.333) / 76 mm
Pitch = 8.8 lines per mm (LPm) = 223 LPI

Groove Width = (1000/LPI + 1) / 2
Groove Width = (1000/223 + 1) / 2
Groove Width = 2.74 mil = 2.74 x 10^-3 inches =  0.0696 mm

Peak Amplitude of Displacement = Peak Lateral Velocity / (2 π freq)
Peak Amplitude of Displacement = 70 mm/sec / (2 π 1000)
Peak Amplitude of Displacement = 0.011 mm

Reference:
“Basic Disc Mastering” by Larry Boden (1981)

On to Part 6

Turntables and Vinyl: Part 3

Back to Part 2

Signal Levels

Every audio device relies on a rather simple balancing act. The “signal”, whether it’s speech, music, or sound effects, should be loud enough to mask the noise that is inherent in the recording or transmission itself. The measurement of this “distance” in level is known as the Signal-to-Noise Ratio or SNR. However, the signal should not be so loud as to overload the system and cause distortion effects such as clipping, which results in what is commonly called Total Harmonic Distortion or THD.(1) One basic method to evaluate the quality of an audio signal or device is to group these two measurements into one value: the Total Harmonic Distortion plus Noise or THD+N value. The somewhat challenging issue with this value is that a portion of it (the noise floor) is typically independent of the signal level, since a device or signal will have some noise regardless of whether a signal is present or not. However, the distortion is typically directly related to the level of the signal.

In modern digital PCM audio signal (assuming that it is correctly-implemented and ignoring any additional signal processing), the noise floor is the result of the dither that is used to randomise the inherent quantisation error in the encoding system. This noise is independent of the signal level, and entirely dependent on the resolution of the system (measured in the number of bits used to encode each sample). The maximum possible level that can be encoded without incurring additional distortion that is inherent in the encoding system itself is when the maximum (or minimum) value in the audio signal reaches the highest possible signal value of the system. Any increase in the signal’s level beyond this will be clipped, and harmonic distortion artefacts will result.

Figure 1 shows two examples of the relationship between the levels of the signal and the THD+N in a digital audio system. The red line shows a 24-bit encoding, the blue line is for 16-bit. The “flat line” on the left of the plot is the result of the noise floor of the system. In this region, the signal level is so low, it’s below the noise floor of the system itself, so the only measurable output is the noise, and not the signal. As we move towards the right, the input signal gets louder and raises above the noise floor, so the output level naturally increases as well. However, in a digital audio system, we reach a maximum possible input level of 0 dB FS. If we try to increase the signal’s level above this, the signal itself will not get louder, however, it will become more and more distorted. As a result, the distortion artefacts quickly become almost as loud as the signal itself, and so the plots drop dramatically.

This is why good recording engineers typically attempt to align the levels of the microphones to ensure that the maximum peak of the entire recording will just barely reach the maximum possible level of the digital recording system. This ensures that they are keeping above the noise floor as much as possible without distorting the signals.

Figure 1: Two examples of the relationship between the levels of the signal and the THD+N in a digital audio system. These are idealised calculations, assuming TPDF dither in a “perfect” LPCM system. The red line shows a 24-bit encoding, the blue line is for 16-bit.

Audio signals recorded on analogue-only devices generally have the same behaviour; there is a noise floor that should be avoided and a maximum level above which distortion will start to increase. However, many analogue systems have a slightly different characteristic, as can be seen in the idealised model shown in Figure 2. Notice that, just like in the digital audio system, the noise floor is constant, and as the level of the input signal is increased, it rises above this. However, in an analogue system, the transition to a distorted signal is more gradual, seen as the more gentle slopes of the curves on the right side of the graph.

Figure 2: Two examples of the relationship between the levels of the signal and the THD+N in a simplified analogue audio system, showing two different maximum SNRs.

As a result, in a typical analogue audio system, there is an “optimal” level that is seen to be the best compromise between the signal being loud enough above the noise floor, but not distorting too much. The question of how much distortion is “too much” can then be debated — or even used as an artistic effect (as in the case of so-called “tape compression”).

If we limit our discussion to the stylus tracking a groove on a vinyl disc, converting that movement to an electrical signal that is amplified and filtered in a RIAA-spec preamplifier, then a phonograph recording is an analogue format. This means, generally speaking, that there is an optimal level for the audio signal, which, in the case of vinyl, means a modulation velocity of the stylus, converted to an electrical voltage.

Although there are some minor differences of opinion, a commonly-accepted optimum level for the groove on a stereo recording is 35.4 mm/sec for a single audio channel at 1,000 Hz. In a case where both audio channels have the same 1 kHz signal recorded in phase (as a dual-monophonic signal), then this means that the lateral velocity of the stylus will be 50 mm/sec.(2)

Of course, the higher the modulation velocity of the stylus, the higher the output of the turntable. However, this would also mean that the groove on the vinyl disc would require more space, since it is being modulated more. This means that there is a relationship between the total playing time of a vinyl disc and the modulation velocity. In order to have 20 minutes of music on a 12” LP spinning at 33 1/3 RPM, then it the standard method was to cut 225 “lines per inch” or “LPI” (about 89 lines per centimetre) on the disc. If a mastering engineer wishes to have a signal with a higher output, then the price is a lower playing time (because the grooves much be spaced further apart to accommodate the higher modulation velocity) however, in well-mastered recordings, this spacing is varied according to the dynamic range of the audio signal. In fact, in some classical recordings, it is easy to see the louder passages in the music because the grooves are intentionally spaced further apart, as is illustrated in Figure 3.

Figure 3: An extreme example of a disc in which the groove spacing has been varied to accommodate louder passages in the music. One consequence of this is that this side of the disc contains a single piece of music only 15 and a half minutes long.

A large part of the performance of a turntable is dependent on the physical contact between the surface of the vinyl and the tip of the stylus. In general terms, as we’re already seen, there is a groove with two walls that vary in height, almost independently and the tip of the stylus traces that movement accordingly. However, it is necessary to get down to the microscopic level to consider this behaviour in more detail.

When a record is mastered (meaning, when the master disc is created on a lathe) the groove is cut by a heated stylus that has a specific shape, shown in Figure4. The depth of the groove can range from a minimum of 25 µm to a maximum of 127µm, which, in turn varies the width of the groove.(3)

Figure 4: The cutting stylus used to create the groove in the master disc.
Figure 5: A Neumann-Teldec cutting head creating the groove in the master disc. The cutting stylus can be seen just under the circular support under the head. (Wikimedia Commons)
Figure 6: Dimensions of record grooves, drawn to scale. The figure on the left is typical for a 78 RPM shellac disc. The three grooves on the right show the possible variation in a 33 1/3 RPM “microgroove” LP.

The result is a groove with a varying width and depth that are dependent on the decisions made by the mastering engineer, and a modulation displacement (the left/right size of the “wiggle”) that is dependent on the level of the audio signal that is being reproduced.

In a perfect situation, the stylus that is used to play that signal back on a turntable would have exactly the same shape as the cutting stylus, since this would mean that the groove is traced in exactly the same way that it was cut. This, however, is not practical for a number of reasons. As a result, there are a number of options when choosing the shape of the playback stylus.

Footnotes

  1. The assumption here is that the distortion produces harmonics of the signal, which is a simplified view of the truth, but an effect that is easy to measure.
  2. (35.4*2) / sqrt(2) because the two channels are modulated at an angle of 45 degrees to the surface of the disc.
  3. See “The High-fidelity Phonograph Transducer” B.B. Bauer, JAES 1977 Vol 25, Number 10/11, Oct/Nov 1977

On to Part 4

Turntables and Vinyl: Part 2

Back to Part 1: History

The physics

Amplitude vs. Velocity

At the end of Part 1, I mentioned that there is an interaction between magnetic fields, movement, and electrical current. This is at the heart of almost every modern turntable. As the stylus or “needle” (1) is pulled through the grove in the vinyl surface, it moves from side-to-side at a varying speed called the modulation velocity or just the velocity. An example of this wavy groove can be seen in the photo below. Inside the housing of most cartridges are small magnets and coils of wire, either of which is being moved by the stylus as it vibrates. That movement generates an electrical current that is analogous to the shape of the groove: the higher the velocity of the stylus, the higher the electrical signal from the cartridge.

Figure 1: The groove in a late-1980’s pop tune on a 33 1/3 RPM stereo LP. The white dots in the groove are dirt that should be removed before playing the disc.

However, this introduces a problem because if the amplitude remains the same at all frequencies the modulation velocity of the stylus decreases with frequency; in other words, the lower the note, the lower the output level, and therefore the less bass. This is illustrated in the graph in Figure 2 below in which three sine waves are shown with different frequencies. The blue line shows the lowest frequency and the orange line is the highest. Notice that all three have the same amplitude (the same maximum “height”). However, if you look at the slopes of the three curves when they pass Time = 0 ms, you’ll see that the higher the frequency, the higher the slope of the line, and therefore the higher the velocity of the stylus.

Figure 2: Three sine waves of different frequencies (from low to high: blue, red and orange curves), but with the same amplitude.

In order to achieve a naturally flat frequency response from the cartridge, where all frequencies have the same electrical output level, it is necessary to ensure that they have the same modulation velocity, as shown in Figure 3 below. In that plot, it can be seen that the slopes of the three waves are the same at Time = 0 ms. However, it is also evident that, when this is true, they have very different amplitudes: in fact, the amplitude would have to double for every halving of frequency (a drop of 1 octave). This is not feasible, since it would mean that the stylus would have to move left and right by (relatively) huge distances in order to deliver the desired output. For example, if the stylus were moving sideways by $\pm$ 0.1 mm at 1,000 Hz to deliver a signal, then it would have to move $\pm$ 1 mm at 100 Hz, and $\pm$ 10 mm at 10 Hz to deliver the same output level. This is not possible (or at least it’s very impractical).

Figure 3: Three sine waves of different frequencies (from low to high: blue, red and orange curves), but with the same modulation velocity.

The solution for this limitation was to use low-frequency audio compensation filters, both at the recording and the playback stages. When a recording is mastered to be cut on a disc, the low frequency level is decreased; the lower the frequency, the lower the level. This results in a signal recorded on disc with a constant amplitude for signals below approximately 1 kHz.

Of course, if this signal were played back directly, there would be an increasing loss of level at lower and lower frequencies. So, to counteract this, a filter is applied to the output signal of the turntable that boosts the low frequencies signals to their original levels.

Surface noise

A second problem that exists with vinyl records is that of dust and dirt. If you look again at the photo in the first Figure at the top of this posting, you can see white specks lodged in the groove. These look very small to us, however, to the stylus, they are very large bumps that cause the tip to move abruptly, and therefore quickly. Since the output signal is still proportional to the modulation velocity, then this makes the resulting cracks and pops quite loud in relation to the audio signal.

In order to overcome this problem, a second filter is used, this time for higher frequencies. Upon playback, the level of the treble is reduced; the higher the frequency the lower the output. This reduces the problem of noise caused by surface dirt on the disc, however it would also reduce the high frequency content of the audio signal itself. This is counteracted by increasing the level of the high-frequency portion of the audio signal when it is mastered for the disc.

This general idea of lowering the level of low-frequencies and/or boosting highs when recording and doing the opposite upon playback is a very old idea in the audio industry and has been used on many formats ranging from film “talkies” to early compact discs. Unfortunately, however, different recording companies and studios used different filters on phonographs for many years. (2) Finally, in the mid-1950s, the Recording Industry Association of America (the RIAA) suggested a standard filter description with the intention that it would be used world-wide for all PVC “vinyl” records.

The two figures below show the responses of the RIAA filters used in both the mastering and the playback of long playing vinyl records. Although there are other standards with slightly different responses, the RIAA filter is by far the most commonly-used.

Figure 4: The “pre-emphasis” filter to be used in the mastering to disc, as described by the RIAA standard. The black line shows the simplified description, and the red curve shows the real-world implementation.
Figure 5: The “de-emphasis” filter to be used for playback as described by the RIAA standard. This standard filter response is integral in what is now commonly called a “RIAA preamp”.

It may be of interest to note that typical descriptions of the RIAA equalisation filter define the transition points as time constants instead of frequencies. So, instead of 50 Hz, 500 Hz, and 2122 Hz (as shown in the response plots), the points are listed as 3180 µs (microseconds), 318 µs, and 75 µs instead. If you wish to convert a time constant (Tc) to the equivalent frequency (F), you can use the equation below.

F = 1 / (2 π Tc)

Mono to Stereo

In Edison’s first cylinder recordings, the needle vibrated up and down instead of left and right to record the audio signal. This meant that the groove cut into the surface of the tin foil was varying in depth, and therefore in width, as shown in the figure below.

Figure 6: Example of an audio signal encoded using a vertical cutting system.

There are some disadvantages to this system, such as the risk of the needle slipping out of the groove when it is too shallow, or suffering from excessive wear if the groove is too deep. In addition, any vertical variation in the recording surface (such as a cylinder that is not quite round, or mechanical vibrations in the player caused by footsteps in the room) becomes translated into unwanted noises upon playback. (3)

Figure 7: An Edison cylinder player, on display in the Struer Museum.
Figure 8: A closeup of the Edison player. Notice that the needle is mounted to move vertically, modulating a membrane located at the end of the tonearm (the bent pipe).

Berliner’s Gramophone used a different system, where the needle vibrated sideways instead. This lateral cut system produced a groove on the disc with a constant depth, thus avoiding some of the problems incurred by the vertical cut recording system.

Figure 9: Example of an audio signal encoded using a lateral cutting system.

However, both of these systems were only capable of recording a single channel of audio information. In order to capture 2-channel stereo audio (invented by Alan Blumlein in 1931) the system had to be adapted somehow. The initial challenge was to find a way of making a disc player that could reproduce two channels of stereo audio, while still maintaining compatibility with lateral-cut discs.

The solution was to rotate the modulation direction by 45 degrees, so the two walls of the groove are used to record the two separate audio channels. This means that the stylus moves in two (theoretically independent) axes as shown in the figure below. When the same signal is applied to both channels (better known as a “dual-mono” or “in-phase” signal), then the stylus moves upwards for the left while moving downwards for the right channel (or down-left & up-right), for example. This means that signals that are identical in both channels move the stylus laterally, exactly as in earlier monophonic discs. (For a more correct explanation of this movement, see this webpage)

Figure 10: An over-simplified depiction of how the two audio channels are encoded in the groove. From left to right: No modulation (silence); Left channel signal modulates the groove’s left wall; Right channel signal modulates the right wall.

As a result, if you look at the groove in a modern two-channel stereo LP, it appears first glance that it simply wiggles left-to-right. However, if you inspect the same groove with extreme magnification, you can see that the modulations in the two sidewalls of the groove are slightly different, since the audio signals on the left and right channels are not identical.

Figure 11: Example of two different signals encoded on the two channels of a stereo groove.

Footnotes

  1. Some authors reserve the term “stylus” for the device that is used to cut the groove during mastering, and the term “needle” for the device used to play a phonographic record. However, I’ll use the two terms interchangeably in this series.
  2. See the Manual of Analogue Sound Restoration Techniques (2008), by Peter Copeland
  3. Some 78 RPM discs use a vertical cutting system as well, including those made by Edison Disc Records and Pathé.

On to Part 3

Turntables and Vinyl: Part 1

Lately, a large part of my day job has been involved with the Beogram 4000c project at Bang & Olufsen. This turned out to be pretty fun, because, as I’ve been telling people, I’m old enough that many of my textbooks have chapters about vinyl and phonographs, but I’m young enough that I didn’t have to read them, since vinyl was a dying technology in the 1990’s.

So, one I the things I’ve had to do lately is to go back and learn all the stuff I didn’t have to do 25 years ago. In the process, I’ve wound up gathering lots of information that might be of interest to someone else, so I figured I’d collect it here in a multi-part series on phonographs.

A warning: this will not be a tome on why vinyl is better than digital or why digital is better than vinyl. I’m not here to start any arguments or rail against anyone’s religious beliefs. If you don’t like some of the stuff I say here, put your complaints in your own website.

Also, if you’ve downloaded the Technical Sound Guide for the Beogram 4000c, then you’ll recognise a large portions of these postings as auto-plagiarism. Consider the TGS as a condensed version of this series.

A very short history

In 1856, Édouard-Léon Scott de Martinville invented a device based on the basic anatomy of the human ear. It consisted of a wooden funnel ending at a flexible membrane to emulate the ear canal and eardrum. Connected to the membrane was a pig bristle that moved with it, scratching a thin line into soot on a piece of paper wrapped around a rotating cylinder. He called this new invention a “phonautograph” or “self-writer of sound”.

The phonoautograph (from www.firstsounds.org)

This device was conceived to record sounds in the air without any intention of playing them back, so it can be considered to be the precursor to the modern oscilloscope. (It should be said that some “recordings” made on a phonoautograph were finally played in 2008. See www.firstsounds.org for more information.) However, in the late 1870’s, Charles Cros realised that if the lines drawn by the phonoautograph were photo-engraved onto the surface of a metal cylinder, then it could be used to vibrate a needle placed in the resulting groove. Unfortunately, rather than actually build such a device, he only wrote about the idea in a document that was filed at the Académie des Sciences and sealed. Within 6 months of this, in 1877, Thomas Edison asked his assistant, John Kruesi, to build a device that could not only record sound (as an indentation in tin foil on a cylinder) but reproduce it, if only a few times before the groove became smoothed. (see “Reproduction of Sound in High-fidelity and Stereo Phonographs” (1962) by Edgar Villchur)

It was ten years later, in 1887, that the German-American inventor Emil Berliner was awarded a patent for a sound recording and reproducing system that was based on a groove in a rotating disc (rather than Edison’s cylinder); the original version of the system that we know of today as the “Long Playing” or “LP” Record.

An Edison “Blue Amberol” record with a Danish 78 RPM “His Master’s Voice” disc recording X8071 of Den Blaa Anemone.

Early phonographs or “gramophones” were purely mechanical devices. The disc (or cylinder) was rotated by a spring-driven clockwork mechanism and the needle or stylus rested in the passing groove. The vibrations of the needle were transmitted to a flexible membrane that was situated at the narrow end of a horn that amplified the resulting sound to audible levels.

Magnets and Coils

In 1820, more than 30 years before de Martinville’s invention, the Danish physicist and chemist, Hans Christian Ørsted announced the first link made between electricity and magnetism: he had discovered that a compass needle would change direction when placed near a wire that was carrying an electrical current. Nowadays, it is well-known that this link is bi-directional. When current is sent through a wire, a magnetic field is generated around it. However, it is also true that moving a wire through a magnetic field will generate current that is proportional to its velocity.

Forward to Part 2: Physics