Have you tried GPU Audio? I am asking this because I wonder if the latest NVIDIA/AMD GPUs are powerful enough to manage 32 mono channels of audio with 32 bit floats at 11.3 Mhz (the DSD256 sample rate). With this, conversion to DXD would no longer be necessary.
(32 channels in + 2 channels out) * 4 bytes/sample * 11289600 samples/s = 1.5 GiB/s
Recent SSDs can handle that, especially since audio files are mostly accessed sequentially.
GPUs have a host-to-device bandwidth 10x faster than that.
NVIDIA A100 and H100 have 80 GB per device. This represents about 50 seconds of those 32 channels.
Optionally, the above GPUs can compute in FP16 to reduce the amount of memory space, while having ~11 bits of resolution.
There may me lots of software development to be done before this can work, but it seems we have the technology to process at DSD256 sample rate with 16/32 floats.