I’m considering the DS given its logical simplicity and obviously very positive reviews. I’ve been trying to piece together bits of information from this forum, other forums, and PSA videos etc. about the DS. There has been a lot of speculating that an input DSD signal is being somehow “degraded by being converted to PCM first” (rather than some kind of “pure DSD passthrough”) before being put out at 2xDSD.
I don’t believe this is the case (that the upsampling in any way “degrades” the DSD signal) and want to see if I have this roughly right. I apologize if this was clearly explained somewhere else.
To upsample PCM/DSD you essentially:
(1) add least significant bits to the signal (up to a total of 30 or 50?)
(2) duplicate data (or fill 0s) “in time”.
Thus there is no effective mathematical change to either a PCM or DSD signal: the signal is being multiplied by an integer factor in amplitude and being stretched by an integer factor in time. Keeping everything integer means the PCM or DSD signal is perfectly represented for any current sampling rate and bit depth (e.g. 1-bit DSD, or 16/24-bit PCM). Do I have this right so far?
Then you do (delta sigma?) magic to convert a PCM to DSD, downsample and reclock it on the way out and the LRC filter does it’s thing. Thus a DSD input signal isn’t really being converted to PCM in the traditional sense but simply scaled in amplitude and time to be a perfect multiple of the original. By construction there is no loss of information for DSD (or 2xDSD) input. Thus there is no difference between a theoretical “pure DSD passthrough” and what you are doing, since you’d probably buffer anyway (to remove jitter) and the signal ends up being the same. Is this correct?
Also, since everything is buffered, as long as the input signal from any input channel (coax, optical, USB) is bit-wise identical there is no effect on sound quality due to input type. This is what I would hope for any digital device (ignoring jitter), but perhaps the DS is one of the few that ensures this by design (by removing jitter from the source data).
Is this at all in the ballpark?
Thanks!
(I also have a few questions about FPGA but want to see if I’m starting from the right understanding of the basic process.)
Mostly correct, the step you left out was after samples are widened (lossless) and zero stuffed (lossless) they are low pass filtered to “spread the signal into the zeros” - This is part of any upsampling - you need to remove the extra images caused by zero stuffing or equivalently keep any downstream processing from seeing “aliases” (images) of the original signal. The frequency response and phase response of that filter determine how much “damage” is done to the input signal (PCM or DSD). DSD allows very gentle filters which can have clean time domain behavior (e.g. it allows keeping close to the original wave shape.) That filtering is required to upsample single rate DSD to higher rate DSD as well as to upsample PCM to that higher rate - I manage to use the same filter to upsample both PCM and DSD allowing both to have the same math done on them: the only real difference is how many zeros are stuffed.
All of that aside, the issue of how much damage is done to single or double rate DSD is more related to the sigma delta modulator that produces the final DSD output from the FPGA. If my sigma delta modulator is well done there’s little damage, if it were poorly done there’d be more damage. You can think of sigma delta modulating DSD as a similar process to dithering PCM, both add some noise to allow keeping more resolution than might be expected from the final sample width. The sigma delta remodulator can add less noise in the audio band than the dithering noise does for PCM. It certainly adds much more noise over all, but that’s why we have a very high sampling rate - to have a place for that noise out side the audio band.
Ted, thanks very much. I think I get it (but I’ll have to study more on noise shaping/filtering, dithering etc.).
An overall benefit of this design is a unified treatment of both PCM and DSD converted to DSD–with what you believe are SQ benefits of the DSD format, if all of this is “done well”.
So far so good. I think what I don’t get is how converting PCM to DSD improves matters (subjective SQ) if essentially the same information is there. Jitter aside, does it come down to using different (cleaner/smoother?) filters (e.g. low/no pre-ringing) and different (or potentially less) “dithering” noise compared to typical PCM DAC implementations?
On the other hand, using this approach for a DSD signal there is some degradation due to the upsampling/downsampling compared to a straight passthrough (which would only require a LP filter?), but going to higher rates plus good filtering at least minimizes the damage, correct?
I knew that DSD could sound correct and great and that it “only” needed a simple passive low pass filter for the analog output - You can’t have a simple passive low pass filter as the only analog output hardware for a PCM DAC implementation.
To simplify the passive output filter (which should lead to better sound) the filter is optimized for one frequency. I chose double rate DSD since single rate DSD needs analog output filters steeper than I’d like.
To get single rate DSD to double rate DSD there’s no choice but to implement an upsampling filter.
It turns out that I could get PCM upsampled to that same output rate by using the same math, just stuffing more zeros.
In my opinion the feature of DSD that make the biggest difference in resultant sound is slow rolloff filtering (everywhere). The slower the better. Slow rolloff filters allow you to keep the timing information more correct: you can more easily have the output waveform look like the input waveform - transients aren’t distorted. Most PCM DACs have filters that mess with transients: they often have preringing for example. You can’t have (only) slow rolloff filters in a PCM DAC, you need steep filters somewhere, either digital or analog (or more likely both).
To get slow rolloff filters you need a high sample rate. So I consider the most important features of a DSD DAC to be a high sample rate and using only slowly rolling off filters. Too many people think that DSD is defined by a high sample rate and a single bit per sample. The number of bits in a sample isn’t important: one bit is sufficient for the final output but not necessary, more don’t hurt.
Since one bit output is sufficient for DSD and a one bit DAC is much easier to implement well with simple analog hardware I chose a one bit output from the FPGA - but the real “goodness” comes from a high sample rate and simple slow rolloff filters.
Ted Smith said
You can't have (only) slow rolloff filters in a PCM DAC, you need steep filters somewhere, either digital or analog (or more likely both).
Does this apply only to typical 24/192 upsampling DACs? If there was a PCM DAC that upsampled to 10xDSD, would there still be a need to have "steep filters somewhere"?
My other curiousity is how you end up with multiple versions of the FPGA “compile” which the PSA team then subjectively evaluates. Are these versions due to you choosing different filter parameters? Or are they due to the “random compiling” aspect of FPGA – which I admit I don’t understand as I would hope that the “compile” would give a fixed deterministic result for a given source code/parameter configuration. I’m a programmer, but FPGA programming seems to involve trial & error which doesn’t exist in standard procedural/imperative (“do what I tell you”) programming approaches. (Again, I pardon my ignorance here if I’ve got this entirely wrong:)
If a PCM DAC want’s to keep the full bandwidth and S/N ratio of the original signal it needs a filter with an extremely steep rolloff (on the order of hundreds or thousands of dB / octave.) If you allow some compromise in frequency response or S/N ratio you can use shallower rolloffs, but they are still in the hundreds of dB / octave range.
The FPGA compiler is solving a very complicated problem - doing it deterministically would require more resources than anyone has… By using simulated annealing (or other probabilistic algorithms) you may get an answer much quicker. Typically big FPGAs are compiled on a sea of machines and you take the first one that succeeds in compiling. The FPGA we are using isn’t huge and we aren’t stuffing it to the gills so all compiles typically complete, but each one has a different random number seed and some compiles take 2 - 10 times as long as others.
Ted Smith said
By using simulated annealing (or other probabilistic algorithms) you may get an answer much quicker. [...] so all compiles typically complete, but each one has a different random number seed and some compiles take 2 - 10 times as long as others.
Interesting. I guess my real question is not if (or even why) compiling requires something like simulated annealing to find a viable solution, but if the result of all "complete" and "viable" FPGA compiles are functionally identical to your original transfer function.
That is, if your designed transfer function (the pure digital transformation up until the analog filter) maps input x(t) to output y(t’), do all “valid FPGA compilations” of that function implement this mathematical function exactly (though perhaps with different inefficiencies based on the specific resulting network)? Or does the FPGA approach itself mean that the mapping of x(t) to your target y(t’) is necessarily approximate?
The short answer is all compiles are functionally identical to the original “code”.
The FPGA is basically a sea of a few flavors of logic blocks and potential interconnections of them. Each compile is trying to find a mapping from the specification provided by the user to those FPGA resources. That specification includes things like logical equations, state machines and timing requirements. So all FPGA compiles will be functionally identical within the freedom allowed by the specifications. In addition the compile process produces warnings and information messages about potential mistakes that might trip someone up as well as reports about which resources are used for each piece of “code”, etc. There are also simulators that do behavioral simulation (so you can check your specification for completeness and correctness) or logic level simulations so you can check the translation to FPGA resources…
The compiles differ in which and how many FPGA resources are used for each piece of source code. The resultant compile has to fit in the FPGA and it has to meet the timing specifications (e.g. run with the clocks specified and meet setup and hold times for all FPGA resources and the specified input and output timings between the FPGA and the outside world.) There are many options which all the user to suggest whether a piece of source should be optimized for speed, time, or, for example, which kind of FPGA resources should be favored. The user can let the compiler do all of the work and/or restrict certain pieces of source to specific FPGA resources and/or hand place and wire small or large portions of the source. (Sort of like having the choice of using machine code, assembly code, procedural code or functional code for each piece of the source in more traditional programming.)
Is there a way to configure SA parameters prior to compiling the FPGA code?
I just wonder if additional time and efforts spent here could constrict the varying effect of the outcome and thus make it homogenous and predictable resulting in less time spent on the listening/voicing task?
I realize that this could be time consuming and resource demanding, but I would guess it would also add quality and increased understanding = competitive edge.
Ted Smith said
So all FPGA compiles will be functionally identical within the freedom allowed by the specifications.
The code is identical but the specific "layout" is randomly generated to satisfy the functional specification and other constraints. Makes sense.
My main follow up is then: what “versions” are you and PSA listening to to determine the one that gets released? Are you selecting from a set of compiles that match the spec (identical “code”) but can still sound different? Or are you evaluating different codes?
Secondly (and this might also dovetail into Frode’s question): if you could identify the parameters (or error metrics) that correlate with “high SQ” (or inversely those that likely result in lower SQ), then perhaps this could be codified in an automatic filtering process. However, I would imagine that certain vague subjective parameters like “bass slam”, “imaging”, “attack”, “air”, “PRaT” etc are all but impossible to codify as input parameters or even measurable as outputs (e.g. version A has quantifiably more “bass slam” than version B).
We use identical code (even changing a version number can affect sound quality significantly) for a set of compiles - the only difference compile to compile is the seed for the random number generator.
All parameters of the simulated annealing are fixed before any compile (including the specific random number seeds.) But, if the placement and routing is not precisely specified, changing even one wire somewhere can result in a completely different layout. There are a lot of levels of locking placement and layout from region based or neighbor based, down to locking individual wires. When the FPGA is fairly full it paid for us to lock some of the bigger resources, i.e. we got similar enough routes that they had a sound quality flavor in common. The FPGA is now a little emptier and it pays to let the route be unrestricted in case we can find a shining route or two as far as sound quality.
The major thing that’s changing is the output noise/jitter of the FPGA DSD output lines. In essence there isn’t any control of the sound quality short of picking a noise/jitter level to get the most pleasing sound. I believe that if we could directly measure that jitter we could just pick the compile with the lowest jitter and probably not have to pick by ear. But remember, a little noise/jitter adds the impression of more detail so comparing the “optimal” build to the ones we are used to may leave a few people thinking we’ve lost some detail when in fact we are more faithfully reproducing the input.
Ted Smith said
I believe that if we could directly measure that jitter we could just pick the compile with the lowest jitter and probably not have to pick by ear. But remember, a little noise/jitter adds the impression of more detail so comparing the "optimal" build to the ones we are used to may leave a few people thinking we've lost some detail when in fact we are more faithfully reproducing the input.
If you could measure it (which is at least theoretically possible) AND minimize it (by automatic trial-and-error search?), then you'd more "faithfully reproduce the input" which seems like the holy grail of digital-to-analog conversion!
If some customers feel certain aspects “go missing”, you could say, “no, nothing is missing and nothing is added” or “the least possible is missing and is added”. Then audiophiles could use their components to tweak and “voice” the results to taste knowing that at least they are starting with the closest translation of the original source material.
Have you thought about simulating a typical PCM DAC on the FPGA as a “before” case to contrast to the current/best you can do now (though I don’t know how you’d simulate true jitter)? It might then make the differences more plainly obvious. The risk might be that some might like it better… but then you might increase your customer base;)
What ult67 says resembles the PCM pre-ringing that creates an extra snap in the sound that many prefer over a more smooth sounding DSD. So, being thruthful to the origin doesn’t neccesarily bring about the most preferred sound.
ult67 said
Have you thought about simulating a typical PCM DAC on the FPGA as a "before" case to contrast to the current/best you can do now (though I don't know how you'd simulate true jitter)?
In essence I am simulating a typical PCM DAC, I need a steep cutoff filter for 16/44.1 ... 24/96 so I use a typical PCM upsampling filter for those PCM upsample by 2 filters. For 44.1kHz I go from unity response at 20kHz to -144dB at 22.05kHz (and similarly for the 48, 88.2 and 96kHz sample rates.) Tho I could add some varied rolloff filters like most DACs, in reality they are simply choosing different cheap to implement filters, instead I use a single expensive filter that doesn't have the various anomalies that each of those cheap to implement filters have. For these filters I chose the kind of filters that the mastering engineers probably used so I can most accurately reproduce their intentions - i.e. tho in general I try to not mess with the wave shape, I do have some preringing in the PCM upsampling filters for the lower PCM rates.
In general in the DirectStream development if I tried every idea I had I’d have taking significantly longer than I did to develop it. At each juncture I tried to do the best I could (within budgets) based on my experience and research. When I noticed a problem or Paul (and others) complained about something I’d try to figure out what I did that could cause the complaint and often I could find a better way of doing it to avoid their complaint. If I were doing the DS from scratch I believe I could make a better version with what I’ve learned since it’s release, but as we’ve seen, tho the release process isn’t ideal (e.g. requiring extensive listening), we can still make significant improvements just in software.
There’s suddenly this big question of “how do we consistently minimise the noise and jitter of the 2xDSD stream going to the output stage, such that sound quality is impacted only by deliberate algorithm choices and not by random FPGA layout characteristics?” Is there some way you could separate the processing, the FPGA itself, away from the output stage and have some kind of ultra clean 2xDSD input buffer which can be optimised once and for all?
I’m even thinking so far as breaking the DirectStream into two separate boxes, with the inputs, controls and FPGA in one box, streaming raw DSD over two high-spec optical fibres to the second box which contains not much more than the smallest buffer you can get away with for clock rate tracking, feeding into the output stage. Less extreme would just be optocouplers and a buffer inside the existing DS chassis.
I don’t think you need to go to those extremes. In order to keep in budget I made an explicit choice to simplify the reclocking outside of the FPGA. We knew that that would have some repercussions, but not doing it would have added significantly to the cost of the DS. A little extra work for us on each FPGA software release is probably worth the lower cost to you guys. If I were building a cost no object DAC I’d certainly add back in some more reclocking after the FPGA. Having the lines between the FPGA and the reclocker too long will also add some jitter and noise so there’s always a tradeoff.
In the FPGA it’s the output buffer that matters, but the jitter and noise don’t come from the input jitter or jitter added during processing, it comes from the very gates the FPGA uses to drive the outside world - they are CMOS and hence make more jitter than we want. Also, altho I use a separate power supply for the DSD output lines of the FPGA, a little noise still pollutes that supply inside the FPGA and adds noise. FPGAs aren’t optimized for low analog noise, they are optimized for digital speed and reliability. Speed causes noise…
I should have mentioned that the FPGA clocks are low jitter as far as they are concerned (low enough to not cause setup and hold errors when they are used), but high jitter compared to what we need. No amount of buffering with an FPGA will lower jitter enough since the clock already has too much jitter. They do have multiple different kinds of flip flops available and different qualities of clocks, but all in all everything has too much jitter and noise for us.
In essence I am simulating a typical PCM DAC, I need a steep cutoff filter for 16/44.1 ... 24/96 so I use a typical PCM upsampling filter for those PCM upsample by 2 filters. For 44.1kHz I go from unity response at 20kHz to -144dB at 22.05kHz (and similarly for the 48, 88.2 and 96kHz sample rates.)
as I thought you upsampled all inputs to 10xDSD rate, do delta sigma, etc. applying shallow filters everywhere to finally get to 2xDSD and then a final shallow LP output filter.
With traditional steep LP filter on 16/44 you have preringing, phase effects etc, don’t you? How is there SQ improvement of traditional approaches – is it mostly jitter rejection for the lower res inputs?
If the signal is 44.1k or 48k PCM it’s upsampled to 88.2 or 96k
Then if the signal is 88.2k or 96k PCM it’s upsampled to 176.4k or 192k
Then if the signal is PCM it’s upsampled to 10 x the DSD rate.
A steep filter is required for 44.1k and 48k PCM, if any other filter is used the signal will loose a lot of high frequency info.
It’s arguable how steep a filter is required for 88.2k and 96k PCM, there are different options about the right balance of filter slope and attenuation of ultrasonic signal…
The output of the DS DAC has a gentle passive analog filter that starts rolling off at about 50k so 176.4k and 192k PCM can be more gently filtered since the signals above 88.2k will be gently filtered anyway.
There are choices for these filters and for lower rate PCM upsampling filters I always leaned towards constant group delay filters (preserve wave shape) instead of, say, minimal phase PCM filters. A constant group delay PCM filter will have symmetrical time response, i.e. preringing. This sounds more correct to me than attempting to get rid of preringing at the expense of changing the waveshape. I always thought that preringing was the root of all evil in PCM processing, but if I want to preserve wave shape and not roll off the high frequencies of PCM below, say, 100kHz then there has to be preringing.