P.S. for those with an unhealthy curiosity about clocking in the FPGA here’s the 114-page description of it:
so reading between the lines. Buying an expensive DS Dac here in the UK, is a waste, as the MK II is far superior and even more expensive that a joke BTW
Ted, were any of the parts changes such that, had the originally intended part still been available, the MkII would be “better”?
No, in general we could find a part that was only slightly more expensive. In the last fix we definitely were better off after the change: the isolated 5V to 5V power supplies on the original MK II were using a part specifically made to do what we were doing, then it (and a few of its relatives) disappeared. Now we’re doing what the TSS did, which takes more parts, costs more, but is quieter. Having two FPGAs is the same kind of thing. All of the medium sized FPGAs went away quite a while back. Using two smaller FPGAs seemed like a good idea from many points of view. (Each of the ones we are using are still larger than the DS’s FPGA.)
For I2S and USB galvanic isolation the chunk of hardware that’s physically connected to the connectors needs power. That power has to be galvanically isolated from the rest of the DAC. A simple transformer won’t pass DC power so something else is needed. Most of the time a switching power supply which incorporates a transformer is used. By using a pair of caps I can avoid the EMI downsides of a transformer. So the main purpose is breaking ground loops, but I use a balanced implementation to reject common mode noise also.
I’m not sure what you mean by “final DA conversion”. After upsampling the sigma delta modulator is what converts the very wide samples into one bit.
But the two FPGAs have 10 (9?) highspeed connections between them so I can divide up the software however I want release to release. One channel in each or upsampling in one and everything else in the other, etc. With the multiple connections I could go back and forth between the FPGAs multiple times if needed. Right now the code is using approx one half of one FPGA and it has twice as big buffering and twice as big upsamplers than the DS.
Nothing in the real world is perfect. So, just for argument’s sake let’s assume that a reclocker is, say, 90% effective in reducing jitter/noise. Doing another one could get to 99% effective?
One thing that’s happening is that when signals arrive at the reclocker they cause small amounts of current to flow simply because there’s something recording that they got there. Those small changes in current affect the power supply which itself isn’t perfect. These changes in the power supply voltages ripple for a little while and may cause later transitions in the reclocker to happen minutely early or late… A simpler way of seeing things is that jitter can cause noise in the system and noise in the system can cause jitter… So we need the best power supplies we can get along with the most consistent timing. Having two reclockers with separate power supplies will make a difference. There’s clearly a point of diminishing returns and probably two reclockers is near that.
The jitter in the FPGA causes different problems in the FPGA than we usually talk about in audio. If you don’t quite know when a clock is going to transition (the very definition of jitter) then you need to allow for bigger margins in the timing in case the clock is late. The more the jitter the smaller part of each clock cycle can be counted on to be transition free. Tightening up jitter can allow more work per clock cycle. On the other hand it can cost more power and that has its own downsides. At times the incoming clock to the FPGA has too much jitter. Then it may be important to spend resources filtering the incoming jitter to the PLL rather than controlling the outgoing jitter from the PLL.
It is the case that jitter in the FPGA causes the outputs of the DSD from the FPGA to have a little unwanted jitter, but jitter and noise go hand in hand inside the FPGA and some of that generated jitter and noise affects the FPGA’s power supplies which, in turn, affects the rest of the board. That in turn will cause small effects in the outgoing audio. Keeping jitter and noise down everywhere is the best way to get a clean output. Simply assuming a reclocker can fix everything isn’t enough.
Just for fun here is the report for one path (of 31133 paths analyzed). Note the Clock Uncertainty about 15 lines down. It shows some of the jitter considerations that could cause uncertainty in the period of the clock.
Slack (MET) : 0.268ns (required time - arrival time)
Source: SDM/SDM[0].co_in_tristate_oe_reg[0]/C
(rising edge-triggered cell FDRE clocked by clk_80x_out_clk_local {rise@0.000ns fall@2.214ns period=4.429ns})
Destination: SDM/SDM[0].co3a_reg[44]/D
(rising edge-triggered cell FDRE clocked by clk_80x_out_clk_local {rise@0.000ns fall@2.214ns period=4.429ns})
Path Group: clk_80x_out_clk_local
Path Type: Setup (Max at Slow Process Corner)
Requirement: 4.429ns (clk_80x_out_clk_local rise@4.429ns - clk_80x_out_clk_local rise@0.000ns)
Data Path Delay: 3.978ns (logic 2.446ns (61.493%) route 1.532ns (38.507%))
Logic Levels: 13 (CARRY4=11 LUT2=1 LUT4=1)
Clock Path Skew: -0.119ns (DCD - SCD + CPR)
Destination Clock Delay (DCD): -0.908ns = ( 3.521 - 4.429 )
Source Clock Delay (SCD): -0.438ns
Clock Pessimism Removal (CPR): 0.350ns
Clock Uncertainty: 0.123ns ((TSJ^2 + DJ^2)^1/2) / 2 + PE
Total System Jitter (TSJ): 0.071ns
Discrete Jitter (DJ): 0.235ns
Phase Error (PE): 0.000ns
Location Delay type Incr(ns) Path(ns) Netlist Resource(s)
------------------------------------------------------------------- -------------------
(clock clk_80x_out_clk_local rise edge)
0.000 0.000 r
D14 0.000 0.000 r Analog_Clock (IN)
net (fo=0) 0.000 0.000 AnalogClockMMCM/inst/clk_in
D14 IBUF (Prop_ibuf_I_O) 1.416 1.416 r AnalogClockMMCM/inst/clkin1_ibufg/O
net (fo=1, routed) 1.065 2.482 AnalogClockMMCM/inst/clk_in_clk_local
MMCME2_ADV_X0Y0 MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT2)
-5.773 -3.292 r AnalogClockMMCM/inst/mmcm_adv_inst/CLKOUT2
net (fo=2, routed) 1.419 -1.873 AnalogClockMMCM/inst/clk_80x_out_clk_local
BUFGCTRL_X0Y0 BUFGCTRL (Prop_bufgctrl_I0_O)
0.081 -1.792 r AnalogClockMMCM/inst/clkout3_buf/O
net (fo=12961, routed) 1.354 -0.438 SDM/Clock80x
SLICE_X30Y55 FDRE r SDM/SDM[0].co_in_tristate_oe_reg[0]/C
------------------------------------------------------------------- -------------------
SLICE_X30Y55 FDRE (Prop_fdre_C_Q) 0.433 -0.005 r SDM/SDM[0].co_in_tristate_oe_reg[0]/Q
net (fo=1, routed) 0.416 0.411 SDM/SDM[0].co_in_tristate_oe_reg_n_0_[0]
SLICE_X30Y55 LUT2 (Prop_lut2_I0_O) 0.126 0.537 r SDM/SDM[0].co3b[0]_i_1/O
net (fo=6, routed) 1.107 1.645 SDM/SDM[0].co_in[0]
SLICE_X13Y67 LUT4 (Prop_lut4_I0_O) 0.283 1.928 r SDM/SDM[0].co3a[6]_i_7/O
net (fo=1, routed) 0.000 1.928 SDM/SDM[0].co3a[6]_i_7_n_0
SLICE_X13Y67 CARRY4 (Prop_carry4_S[1]_CO[3])
0.457 2.385 r SDM/SDM[0].co3a_reg[6]_i_1/CO[3]
net (fo=1, routed) 0.000 2.385 SDM/SDM[0].co3a_reg[6]_i_1_n_0
SLICE_X13Y68 CARRY4 (Prop_carry4_CI_CO[3])
0.098 2.483 r SDM/SDM[0].co3a_reg[10]_i_1/CO[3]
net (fo=1, routed) 0.000 2.483 SDM/SDM[0].co3a_reg[10]_i_1_n_0
SLICE_X13Y69 CARRY4 (Prop_carry4_CI_CO[3])
0.098 2.581 r SDM/SDM[0].co3a_reg[14]_i_1/CO[3]
net (fo=1, routed) 0.000 2.581 SDM/SDM[0].co3a_reg[14]_i_1_n_0
SLICE_X13Y70 CARRY4 (Prop_carry4_CI_CO[3])
0.098 2.679 r SDM/SDM[0].co3a_reg[18]_i_1/CO[3]
net (fo=1, routed) 0.000 2.679 SDM/SDM[0].co3a_reg[18]_i_1_n_0
SLICE_X13Y71 CARRY4 (Prop_carry4_CI_CO[3])
0.098 2.777 r SDM/SDM[0].co3a_reg[22]_i_1/CO[3]
net (fo=1, routed) 0.000 2.777 SDM/SDM[0].co3a_reg[22]_i_1_n_0
SLICE_X13Y72 CARRY4 (Prop_carry4_CI_CO[3])
0.098 2.875 r SDM/SDM[0].co3a_reg[26]_i_1/CO[3]
net (fo=1, routed) 0.000 2.875 SDM/SDM[0].co3a_reg[26]_i_1_n_0
SLICE_X13Y73 CARRY4 (Prop_carry4_CI_CO[3])
0.098 2.973 r SDM/SDM[0].co3a_reg[30]_i_1/CO[3]
net (fo=1, routed) 0.000 2.973 SDM/SDM[0].co3a_reg[30]_i_1_n_0
SLICE_X13Y74 CARRY4 (Prop_carry4_CI_CO[3])
0.098 3.071 r SDM/SDM[0].co3a_reg[34]_i_1/CO[3]
net (fo=1, routed) 0.008 3.079 SDM/SDM[0].co3a_reg[34]_i_1_n_0
SLICE_X13Y75 CARRY4 (Prop_carry4_CI_CO[3])
0.098 3.177 r SDM/SDM[0].co3a_reg[38]_i_1/CO[3]
net (fo=1, routed) 0.000 3.177 SDM/SDM[0].co3a_reg[38]_i_1_n_0
SLICE_X13Y76 CARRY4 (Prop_carry4_CI_CO[3])
0.098 3.275 r SDM/SDM[0].co3a_reg[42]_i_1/CO[3]
net (fo=1, routed) 0.000 3.275 SDM/SDM[0].co3a_reg[42]_i_1_n_0
SLICE_X13Y77 CARRY4 (Prop_carry4_CI_O[1])
0.265 3.540 r SDM/SDM[0].co3a_reg[46]_i_1/O[1]
net (fo=1, routed) 0.000 3.540 SDM/SDM[0].co3a_reg[46]_i_1_n_6
SLICE_X13Y77 FDRE r SDM/SDM[0].co3a_reg[44]/D
------------------------------------------------------------------- -------------------
(clock clk_80x_out_clk_local rise edge)
4.429 4.429 r
D14 0.000 4.429 r Analog_Clock (IN)
net (fo=0) 0.000 4.429 AnalogClockMMCM/inst/clk_in
D14 IBUF (Prop_ibuf_I_O) 1.351 5.779 r AnalogClockMMCM/inst/clkin1_ibufg/O
net (fo=1, routed) 1.004 6.783 AnalogClockMMCM/inst/clk_in_clk_local
MMCME2_ADV_X0Y0 MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT2)
-5.919 0.864 r AnalogClockMMCM/inst/mmcm_adv_inst/CLKOUT2
net (fo=2, routed) 1.352 2.216 AnalogClockMMCM/inst/clk_80x_out_clk_local
BUFGCTRL_X0Y0 BUFGCTRL (Prop_bufgctrl_I0_O)
0.077 2.293 r AnalogClockMMCM/inst/clkout3_buf/O
net (fo=12961, routed) 1.228 3.521 SDM/Clock80x
SLICE_X13Y77 FDRE r SDM/SDM[0].co3a_reg[44]/C
clock pessimism 0.350 3.872
clock uncertainty -0.123 3.749
SLICE_X13Y77 FDRE (Setup_fdre_C_D) 0.059 3.808 SDM/SDM[0].co3a_reg[44]
-------------------------------------------------------------------
required time 3.808
arrival time -3.540
-------------------------------------------------------------------
slack 0.268
I thought I’d turn my attention to the analog card for a little.
Here’s the “big picture” (it’s actually a physical rev older than the current boards, but I’d have to tear my Mk II apart to get current pictures.)
Everything on the top half of the board is power supplies.
The power factor correction is the area defined by the 6 large (for surface mount) caps. Down the center between the two rows of caps are diodes and resistors.
In the top right are the high level 14V and 7V supplies. below them on the right are the top level 12V analog supply and the top level 5V digital supply, both with LT3045 regulators with plenty of bypassing. Then below those on the right just above the halfway point are the 3.3V CMOS and 3.3V relay regulators.
The center of the bottom of the board is the DSD coming from the digital card, the isolators, reclocker and VCXO (I turned it on its side.) I forgot to label the 1st reclocker, it’s the black chip between the isolators and the main reclocker.
Then on either side of the middle are the attenuators and the “digital switches” (really video opamps) which convert from a digital signal to the prefiltered analog output signal. There are four in parallel for each channel. The 10V regulator for this channel is on the right: another LT3045 with lots of bypass. Along the bottom of the image are the voltage stabilization caps for this channel’s 10V and the “analog ground” a.k.a the Vocm (Output common mode voltage) which is set at one half of the analog 10V.
And below that is the audio output transformer and the output connector configuration. (We’ll talk about that later.)
We’ll drill down on these various sections in later notes.
Ted, as a transformer seems to be very sensible to shielding, isn’t it also sensible to being placed asymmetrically close to the casework sidewalls?
The case walls are pretty far from the transformers. The effect of shielding goes down quickly with distance.
All things being equal I’d like things to be symmetric, but putting the power in the middle of the box is the worst place. I’d also like to have more room between the power inlet (and fuse) and the right channel output and I’ve shoved things over twice to get a little more space, but squeezing the two channels together too tight is also a problem.
[Edit: The current boards do have more space between the fuse and the right channel output than the picture above.]
Thank goodness I only have to look at one or two of these when the FPGA doesn’t route. Unlike the DS, the DS Mk II almost always routes. Periodically I do about 300 different routes with different settings and see if they all route. If they don’t, of the ones that don’t route I look and see if there’s something in common to see if I can fix something simple and get more clean routes.
@tedsmith Ted - Did you manage to get the 16fS samples rates working? You mentioned you were hoping for 705.6kHz …and this would be great to support software upsamping.
Let’s take a look at the analog output options, here’s the output schematic (slightly abbreviated):
The green boxes are solid state relays they can either be closed or open. They are by default closed with no power which, for example, engages the mute.
All of the green box settings will default to the settings that match the DS. Grounds not lifted, etc.
The bottom most green box is the simplest, the ground lift for this channel.
Then there’s the right most green box the balanced output’s (XLR’s) shield lift.
Both of these will probably sound a little better lifted in most systems. But some interconnects have a shield that’s only connected on the source end. Systems with these interconnects may need to tie the shield or XLR ground to the system ground to drain off any RFI that the cable’s shield picks up.
Mute is pretty obvious; they just short their outputs to the connector’s ground (which itself may or may not be grounded.)
Symmetric vs. unsymmetric drive is perhaps not expected. Balanced outputs don’t imply that there’s a ground reference. An ideal balanced receiver will provide the local ground on its outputs but only passes the differential signal that it receives over the cable. Any source to destination ground connection via the cable can only serve to unbalance one side or the other of the balanced connection, with the possibility of sending noise or lessening the receiver’s ability to cancel common mode noise (e.g. ground loop hum.) That’s unsymmetric drive.
Symmetric drive references both signals to ground and will be symmetric around the ground reference.
A transformer in the preamp or amp will work best with unsymmetric drive, then the transformer can better cancel any common mode noise.
If the preamp has active/opamp based inputs, it may need a ground reference to keep the signal within the limits of its power supply. That’s one downside of non-transformer coupling.
The last option is the two green boxes on the upper right. They hook up the shield/ground reference of the RCA connection to either the ground as expected or to the other side of the output transformer making a balanced connection. The DS Mk II will make sure that the grounds of the output are lifted when the RCA is balanced to not interfere with the balanced signal. Not every unbalanced input on a preamp or an amp can accept a 4VRMS input, but for those that do this could be useful for systems that don’t have enough gain.
Yes, the DS Mk II upsamples 44.1k, 88.2k, 176.4k and 352.8k to 705.6k. I run 705.6k here (tho I don’t have much PCM music at that rate.) I also simplified the conversion to DSD by directly upsampling from 705.6k to quad rate (11.2896MHz). I.e. I no longer upsample the 44.1k family to 20 x the DSD sample rate. I can’t swear that makes a sound quality difference, but it doesn’t hurt.
48k, 96k, and 192k take the old path to 56.448MHz (20 x) and then back down to 11.2896MHz (4 x)
That’s fascinating. If you’re not chasing lowest common multiples of the highest rates any more, why keep the 20x path instead of dropping back to 6x (for 384) or 12x (if you want to enable 768)? Am I doing the arithmetic right?
I don’t think so. You probably forgot the 000 after 384.
lcm(44100*64*4,384000) ===> 56.448MHz
56.448e6/384000 ==> 147
Thank you Professor.
Just for fun I wanted to show my original thinking on sizing resources for upsampling:
We’ll use a parallel polyphase systolic multiply accumulate FIR if you want to look up details. I started going into details describing it, but that’s what textbooks are for
Our input clock is at 80 * 64 * 44.1k (= 225.792 MHz)
Since I use the same filter to upsample 44.1 to, say 705.6 as I do to upsample 48 to 768 our maximum input data rate is (48k + ~100ppm) stereo samples per second.
These together mean we’ll have at most 4703 clocks available / input sample pair.
If we, for example are doing a 100,000 coefficient filter then we’ll need to do at least 21.26 multiplies per clock tick. Not really a problem.
BUT
if we are using 35 bit coefficients then we need to have memory for 100,000 x 35 bits which is 3.5Mbits. The smallest FPGA which has enough memory for that costs $125 each (!)
What if we store the bits in some external DRAM? But we need to read all 100,000 35 bit coefficients for each of the 48k samples per second. Hmm, that’s only 21 Gbytes per second (!) (We only need to read approx the first half, but it’s still a ridiculous bandwidth. That would require quite a few pins on the FPGA just dedicated to DDR3 (or whatever) interfaces.)
Perhaps we need to set our sights a little lower
Right now I’m doing approx 16384 coefficients for 44.1k (of which I only need to store approx. half.) But since I have separate filters for 44.1, 88.2, 176.4 and 352.8 if I use the same FPGA hardware (multiples, etc.) for all of them it turns out that I’ll need approx twice as much coefficient memory as I need for 44.1k. Doable.
I have plenty of space to expand into with two FPGAs and, in the future, I’ll be doing bigger filters than 16k taps. But the filters I’m now using are significantly better than those in the DS.
How many more ways can there be for you to convince us to purchase a MKII?
I was sold when I read “Ted is designing a new DAC”.