Someone asked the title of this thread “If different sources are sending the same bits to the same receiver via optical, and the track is exactly the same, how could they sound so different?”
So here’s what I wrote. I figured maybe people might like to see some of these issues all tied together in one post.
In comparing the sound quality of two sources that are “identical” we need to control a lot of variables. The differences might be caused by the amount of crap that the sources generate (EMI or conducted noise) when they are playing vs when they aren’t. It might be where the sources are plugged in. It might be that W-Fi or something is being used by one of the sources and the system is sensitive to that RF energy….
Anyway assuming well controlled variables with the environment of the source device there are still a few other uncontrolled variables: Do both units put out the same bitstream over the optical cable? We are presuming that after decoding the signal from the optical cable the receiver gets the same bits, but since more than one bitstream can carry a specific set of bits (e.g. differing preemphasis – should be very apparent), different channel status codes (there are other meta data thing that could conceivably affect the interpretation of the bitstream, tho the 3 different forms of preemphasis would be the most audible in my opinion.) Or perhaps simply different polarity of digital signal (in TOSLink, S/PDIF, and AES3 the polarity of the signal doesn’t contribute any information because everything is encoded using transitions rather than absolute levels.
Assuming that the bit streams from the two devices are identical about the only thing left is the exact timing of the transitions of the signal lines (jitter.) Any digital audio receiver needs a clock that’s accurately tracking the input clock or they need to “fix things” with ASRC (asynchronous sample rate conversion.) Lets take the second first. ASRC changes the data subtlety to account for the current difference between the edges of the incoming clock vs the local clock. For example if the local clock is faster than the source clock by 1 sample every second then somehow an extra sample needs to be generated. It may seem like the silliest thing to do is replicate one sample, but that may be less intrusive than what most DACs do. ASRC is the process of digitally calculating with high precision what the signal is that’s coming in and then resampling with the local clock. In our example at a forth of a second it would be sampling at a forth of the way between each pair of incoming samples. At half of a second it’s moved to half way between each two samples. At two thirds of a second it sampling two thirds of the way between each pair of samples. The advantage of ASRC is that since the signal is sampled with a local clock technically there’s very little clock jitter for the DAC proper. On the other hand the data’s been changed and ends up encoding the incoming jitter into the output signal in a way that can never be undone downstream. Some people like the sound of ASRC better than the sound that the DAC would produce with the exact incoming bitstream with it’s incoming jitter.
New we are probably getting to the spirit of the question:
All other things being equal, how does jitter affect the sound quality of a DAC?
Since we’re presuming that there’s no ASRC then the receiving DAC must generate a clock that closely matches the incoming clock but presumably with less jitter. This is most often done with a PLL (phase locked loop). A PLL is an oscillator whose frequency (and exact phase) can be controlled (within certain bounds) and the process doing that control. Let’s assume the frequency is very close then all a phase locked loop needs to do is check whether the local clock is a little ahead (in phase) to the transitions in the incoming bit stream and slow the local clock down just a very little or if the local clock is just a little later than the transition in the incoming bitstream the PLL needs to speed up the local clock (a very little amount.) FWIW building hardware that both can find the rough frequency of an incoming signal and also closely track the phase of the incoming signal takes some control theory and isn’t trivial.
It may sound like we’re now home free – the receiver should be doing just about the same thing no matter which source. But speeding up a clock and slowing it down is adding jitter into the output signal! The best that a PLL can do is to low pass filter the incoming jitter – it smoothes out the very tiny errors in the incoming clock but any jitter that happens at a frequency lower than the control loop can handle just slips thru as if the PLL weren’t there. Obviously one could try have the control loop track very slow changes in the incoming clock phase, but the slower the PLL reacts, the longer it takes to lock on the incoming signal when first applied. No one wants to wait minutes for a DAC to start playing
There are other compromises, e.g. buffering the data and letting things get a little more than a sample out of phase but keeping the buffer from getting full or empty. If the buffer is one sample big (easy to build reliably) then we call the clock receiver a FLL (frequency locked loop) since it doesn’t try to keep the phase exact…
Now we’re down to how does jitter affect the DAC at the very point that the digital signal becomes an analog signal? The simplest way of thinking about it is to consider what happens with a quickly rising signal and some jitter. If the jitter causes the point at which the next sample tries to change the output level just a little too early, then the rising signal hasn’t had time to rise to the correct value. If the incoming jitter cause the point that the next sample is processed to be just a little late then the output signal goes a little higher than it should before the next sample is handled. This all applies backwards if the outgoing signal is falling. For the more mathematically inclined the result is that the final outgoing analog is the convolution of the ideal output and the jitter. To make things more concrete: if we are playing a perfect sine wave then the spectrum ideally would be a single spike at exactly the frequency of the sine wave. But jitter widens that spike by exactly the shape of the spectrum of the jitter. So we have a little uncertainty about the frequency of the output coming from the uncertainty of the timing of the input. Note that this affects all frequencies of the material.
I’ll stop now.