I got a response in the Roon Community (@Philipp_Schaefer thanks for the tip). I would love to hear what @tedsmith has to say about this below as much of it goes by my knowledge:
From Roon:
We do indeed process DSD without performing a DSD->PCM conversion first. The signal path is reflecting that accurately.
Iâm going to explain how it worksâkeep in mind that there are some subtle technical details here, and some background knowledge is required to understand them fully. Processing DSD isnât nearly as straightforward as processing PCM. With the exception of a few simple operations, you canât process it directly in the 1-bit representation. There are more steps involved, but itâs possible to perform those steps in a way that keeps all of the important properties of DSD intact.
First, Iâll explain DSD->PCM conversion, because it helps to understand the other technique in a relative sense.
DSD->PCM conversion starts with a with a DSD signal and produces a signal with two characteristics:
- PCM representation (lower sample rate, wider samples)
- Low noise floor throughout the frequency domain of the PCM format that is as flat as possible.
The first one is obviousâwe need a PCM-like representation at the end. The second goal is more subtleâit is saying that the content of the signal must look like a PCM signal. It must be accepted and played properly by PCM equipment. It must be processable by downstream DSP processes that expect to work with PCM data, and so on. It must not cause damage to equipment thatâs expecting PCM.
This is accomplished in three steps:
- Start with a DSD stream, and widen from 1 bit-per-sample to 64 bits-per-sample
- Downsample it by 8x (so DSD64 â 352.8kHz, DSD128 â 705.6kHz, etc).
- Apply a low pass âreconstruction filterâ. This filter also exists in a DSD DAC, but since we are effectively simulating the DAC, we must simulate that aspect here too, since PCM DACs do not have this filter.
The reconstruction filter removes the noise inherent to the DSD signal before it can reach equipment that might not be prepared to handle it. Most of the energy in a DSD signal lives in this noise (well over 95%), so even though the noise is all at inaudible high frequencies, itâs important to filter it out so that your gear is not asked to turn that energy into loud, high frequency sound.
If you look at a spectrogram of DSD->PCM converted data, it looks like a PCM signal. Depending on the source material, and the sensitivity of your spectrogram, you might see a bit of a very quiet noise floor in the area where the transition band of the noise shaping filter used during mastering crosses over with the transition band of the DSD->PCM low pass filter (30-60kHz for DSD64).
OK, so now that DSD->PCM is explained, lets talk about the case youâre actually interested inâthe one where we process and output DSD without converting it to PCM.
This works like this:
- Start with a DSD stream, and widen from 1 bit-per-sample to 64 bits-per-sample
- Apply a low pass filter to remove the bulk of the inherent noise energy from the widened signal.
- Apply processing steps to the wide intermediate format.
- Send the signal through a sigma-delta-modulator to re-render the âwideâ 64-bit stream into a 1-bit DSD stream.
The low pass filter (2) in this process might sound like the reconstruction filter we discussed above, but it is very different. It is much more lenient, less steep, and it only attenuates frequencies over 100kHzâand these already have a very poor SNR because of the inherent noise shaping in DSD, so we can be sure that no meaningful information existed there in the first place.
Without the filter, sound quality suffers significantly or the sigma delta modulator risks becoming unstable (i.e. starts outputting horrible sounds that ruin your ears and if youâre unlucky your gear too).
At step (3) the signal is structurally similar to a PCM signalâin that it is comprised of a series of multi-bit samples. However, it does not have content typical of PCM signals and it maintains the DSD sample rate. If you looked at a spectrogram of the intermediate format in (3), it would look just like DSD, except with the bulk of the noise above 100kHz severely attenuated by the low pass filter.
By maintaining the original sample rate through processing, the time-domain characteristics of DSD are maintained. By designing the filter to stay far away from musical content, the frequency-domain characteristics are maintained too.
Sometimes this form of processing, or this intermediate format is referred to as âDSD-Wideâ. We didnât use that term because some people have defined DSD-Wide as an 8 bit intermediate format (whereas we use 64 bitsâŚa luxury of precision afforded to us by running on modern desktop-class CPUs) and I didnât want to create confusion.