Bit depth conversion: Part 2

Binary concatenation and bit splitting

In Part 1, I talked about different options for converting a quantised LPCM audio signal, encoded with some number of bits into an encoding with more bits. In this posting, we’ll look at a trick that can be used when you combine these options.

To start, made two signals:

“Signal 1” is a sinusoidal tone with a frequency of 100 Hz.
It has an amplitude of ±1, but then encoded it as a quantised 8-bit signal, so in Figure 1, it looks like it has an amplitude of ±127 (which is 2^(nBits-1)-1)
“Signal 2” is a sinusoidal tone with a frequency of 1 kHz and the same amplitude as Signal 1.

Both of these two signals are plotted on the left side of Figure 1, below. On the right, you can see the frequency content of the two signals as well. Notice that there is plenty of “garbage” at the bottom of those two plots. This is because I just quantised the signals without dither, so what you’re seeing there is the frequency-domain artefacts of quantisation error.

Figure 1. Two sinusoidal waveforms with different frequencies. Both are 8-bit quantised without dither.

If I look at the actual sample values of “Signal 1” for the first 10 samples, they look like the table below. I’ve listed them in both decimal values and their binary representations. The reason for this will be obvious later.

Sample number	Sample value (decimal)	Sample Value (binary)
1	0	00000000
2	2	00000010
3	3	00000011
4	5	00000101
5	7	00000111
6	8	00001000
7	10	00001010
8	12	00001100
9	13	00001101
10	15	00001111

Let’s also look at the first 10 sample values for “Signal 2”

Sample number	Sample value (decimal)	Sample Value (binary)
1	0	00000000
2	17	00010001
3	33	00100001
4	49	00110001
5	63	00111111
6	77	01001101
7	90	01011010
8	101	01100101
9	110	01101110
10	117	01110101

The signals I plotted above have a sampling rate of 48 kHz, so there are a LOT more samples after the 10th one… however, for the purposes of this posting, the ten values listed in the tables above are plenty.

At the end of the Part 1, I talked about the Most and the Least Significant Bits (MSBs and LSBs) in a binary number. In the context of that posting, we were talking about whether the bit values in the original signal became the MSBs (for Option 1) or the LSBs (for Option 3) in the new representation.

In this posting, we’re doing something different.

Both of the signals above are encoded as 8-bit signals. What happens if we combine them by just slamming their two values together to make 16-bit numbers?

For example, if we look at sample #10 from both of the tables above:

Signal 1, Sample #10 = 00001111
Signal 2, Sample #10 = 01110101

If I put those two binary numbers together, making Signal 1 the 8 MSBs and Signal 2 the 8 LSBs then I get

0000111101110101

Note that I formatted them with bold and italics just to make it easier to see them. I could have just written 0000111101110101 and let you figure it out.

Just to keep things adequately geeky, you should know that “slamming their values together” is not the correct term for what I’ve done here. It’s called binary concatenation.

Another way to think about what I’ve done is to say that I converted Signal 1 from an 8-bit to a 16-bit number by zero-padding, and then I added Signal 2 to the result.

Yet another way to think of it is to say that I added about 48 dB of gain to Signal 1 (20*log10(2^8) = about 48.164799306236993 dB of gain to be more precise…) and then added Signal 2 to the result. (NB. This is not really correct, as is explained below.)

However, when you’re working with the numbers inside the computer’s code, it’s easier to just concatenate the two binary numbers to get the same result.

If you do this, what do you get? The result is shown in Figure 2, below.

Figure 2. The binary concatenated result of Signal 1 and Signal 2

As you can see there, the numbers on the y-axis are MUCH bigger. This is because of the bit-shifting done to Signal 1. The MSBs of a 16-bit number are 256 times bigger in decimal world than those of an 8-bit number (because 2^8 = 256).

In other words, the maximum value in either Signal 1 or Signal 2 is 127 (or 2^(8-1)-1) whereas the maximum value in the combined signal is 32767 (or 2^(16-1)-1).

The table below shows the resulting first 10 values of the combined signal.

Sample number	Sample value (decimal)	Sample Value (binary)
1	0	0000000000000000
2	529	0000001000010001
3	801	0000001100100001
4	1329	0000010100110001
5	1855	0000011100111111
6	2125	0000100001001101
7	2650	0000101001011010
8	3173	0000110001100101
9	3438	0000110101101110
10	3957	0000111101110101

Why is this useful? Well, up to now, it’s not. But, we have one trick left up our sleeve… We can split them apart again, taking that column of numbers on the right side of the table above, cut each one into two 8-bit values, and ta-da! We get out the two signals that we started with!

Just to make sure that I’m not lying, I actually did all of that and plotted the output in Figure 3. If you look carefully at the quantisation error artefacts in the frequency-domain plots, you’ll see that they’re identical to those in Figure 1. (Although, if they weren’t, then this would mean that I made a mistake in my Matlab code…)

Figure 3. The two signals after they’ve been separated once again.

So what?

Okay, this might seem like a dumb trick. But it’s not. This is a really useful trick in some specific cases: transmitting audio signals is one of the first ones to come to mind.

Let’s say, for example, that you wanted to send audio over an S/PDIF digital audio connection. The S/PDIF protocol is designed to transmit two channels of audio with up to 24-bit LPCM resolution. Yes, you can do different things by sending non-LPCM data (like DSD over PCM (DoP) or Dolby Digital-encoded signals, for example) but we won’t talk about those.

If you use this binary concatenation and splitting technique, you could, for example, send two completely different audio signals in each of the audio channels on the S/PDIF. For example, you could send one 16-bit signal (as the 16 MSBs) and a different 8-bit signal (as the LSBs), resulting in a total of 24 bits.

On the receiving end, you split the 24-bit values into the 16-bit and 8-bit constituents, and you get back what you put in.

(Or, if you wanted to get really funky, you could put the two 8-bit leftovers together to make a 16-bit signal, thus transmitting three lossless LPCM 16-bit channels over a stream designed for two 24-bit signals.)

However, if you DON’T split them, and you just play the 24-bit signal into a system, then that 8-bit signal is so low in level that it’s probably inaudible (since it’s at least 93 dB below the peak of the “main” signal). So, no noticeable harm done!

Hopefully, now you can see that there are lots of potential uses for this. For example, it could be a sneaky way for a record label to put watermarking into an audio signal, for example. Or you could use it to send secret messages across enemy lines, buried under a recording of the Alvin and the Chipmunk’s cover of “Achy Breaky Heart”. Or you could use it for squeezing more than two channels out of an S/PDIF cable for multichannel audio playback.

One small issue…

Just to be clear, I actually used Matlab and did all the stuff I said above to make those plots. I didn’t fake it. I promise!

But if you’re looking carefully, you might notice two things that I also noticed when I was writing this.

I said above that, by bit-shifting Signal 1 over by 8 bits in the combined signal, this makes it 48 dB louder than Signal 2. However, if you look at the frequency domain plot in Figure 2, you’ll notice that the 1 kHz tone is about 60 dB lower than the 100 Hz tone. You’ll also notice that there are distortion artefacts on the 1 kHz signal at 3 kHz, 5 kHz and so on – but they’re not there in the extracted signal in Figure 3. So, what’s going on?

To be honest, when I saw this, I had no idea, but I’m lucky enough to work with some smart people who figured it out.

If you go back to the figures in Part 1, you can see that the MSB of a sample value in binary representation is used as the “sign” of the value. In other words, if that first bit is 0, then it’s a positive value. If it’s a 1 then it’s a negative value. This is known as a “two’s complement” representation of the signal.

When we do the concatenation of the two sample values as I showed in the example above, the “sign” bit of the signal that becomes the LSBs of the combined signal no longer behaves as a +/- sign. So, the truth is that, although I said above that it’s like adding the two signals – it’s really not exactly the same.

If we take the signal combined through concatenation and subtract ONLY the bit-shifted version of Signal 1, the result looks like this:

Figure 4. The difference between the combined signals shown in Figure 3 and Signal 1, after it’s been bit-shifted (or zero-padded) by 8 LSBs.

Notice that the difference signal has a period of 1 ms, therefore its fundamental is 1 kHz, which makes sense because it’s a weirdly distorted version of Signal 2, which is a 1 kHz sine tone.

However, that fundamental frequency has a lower level than the original sine tone (notice that it shows up at about -60 dB instead of -48 dB in Figure 2). In addition, it has a DC offset (no negative values) and it’s got to have some serious THD to be that weird looking. Since it’s a symmetrical waveform, its distortion artefacts consist of only odd multiples of the fundamental.

Therefore, when I stated above that you’re “just” adding the two signals together, so there’s no harm done if you don’t separate them at the receiving end. This was a lie. But, if your signal with the MSBs has enough bits, then you’ll get away with it, since this pushes the second signal further down in level.

earfluff and eyecandy

mostly audio, but with some other stuff occasionally

Bit depth conversion: Part 2

Binary concatenation and bit splitting

So what?

One small issue…