These comments suggest that you might be over-indexing on the 1-bit-ness of DSD, but that’s not a criticism – it’s just another example of why I hold a general disdain for the activities of marketing departments in any technical context.
Repeating a comment I made in a different thread recently, the 1-bit sampling of the DSD format is like jitter in that it truly only matters in precisely two places: the A-to-D conversion of recording and the D-to-A conversion of playback. And it only matters in those places because it allows us to build more accurate devices from physical materials, ie you can get closer to perfect linear responses switching one device between just two states than you can when combining multiple devices to represent four or more states.
Once you’ve done the A-to-D conversion and you’re in the purely mathematical realm there’s absolutely nothing lost by representing the signal with other symbols of as many bits as you care to use. In truth, multi-bit representations provide a more intuitively correct representation of the signal than DSD itself does, because in DSD 1 and 0 don’t mean “on” and “off”, they mean “positive” and “negative” values of equal magnitude. Zero isn’t negative the last time I checked. But I can use any multi-bit signed numeric system, whether integers or floats, and transcribe the DSD waveform into equal-valued positive and negative samples and the signal remains utterly true to the original.
Multi-bit does not mean “chopped-up”. It means nothing more than the ability to define the amplitude of the waveform at a particular point in time with more precision than just “positive” or “negative”. It’s that extra precision which allows for mixing, by making the DSD waveform of track 1 smaller or larger relative to track 2 and then having enough room to add them together without losing any of their individual contribution.
If we want to deliver the final product as DSD (again, because that’s potentially beneficial for D-to-A conversion using physical devices) it does have to go through another sigma delta modulation process to resample our mixed high-precision waveform back to a 1-bit representation. But guess what…?
That’s exactly what happens when you mix DSD using analog systems too! And I promise you the pure digital approach has less noise and other distortions than the analog gear necessarily introduces.
The reason it hasn’t been done routinely in the past is not because it was impossible or sonically compromised but because it didn’t stack up in terms of cost/benefit. Most entry-level audio work is done with 96kHz sample rates. Working at DSD rates is 30, 60, or 120 times more demanding on your workstation if you go all the way to 4xDSD. So people make trade-offs and do the one thing which actually does damage your original DSD waveform: they downsample to lower sampling frequencies. DXD, which is a very high res PCM encoding, is only four times the sampling rate of 24/96.
But please note carefully: it’s the reduction in sample rate, not the use of more bits per sample, which does the damage to the original DSD signal from the recording desk. Single-bit-ness is only valuable right at the beginning and right at the end of the process.