Diretta network protocol breakthrough

Diretta was an inexpensive audiophile’s alternative to TCP/IP which I adopted in January 2024 to get greater fidelity out of a Raspberry Pi Roon endpoint into my new Meridian DSP9 active speakers. Benefits included an expanded soundstage, more focused/realistic imaging and a blacker background, so I was pretty happy spending 100 Euro for a perpetual license.

Then, in August this year, an online acquaintance named David Snyder suggested that I add a 2nd Raspberry Pi between my Roon server and this endpoint to create a completely isolated, very steady Diretta link -separated from everything else on the LAN- which dramatically reduced electrical/cpu activity and elevated my humble Pi transport to superstar status. Big WOW -but we’re not done yet.

Now, the developer in Japan has transitioned to an even quieter way to transfer the bits, prompting a flurry of discussion on the Roon forum and on the Audiophile Style thread David started in August which you might want to check out. BTW Diretta works just as well with UPnP or LMS, not just Roon.

1 Like

For those who haven’t taken the time to digest all that is being discussed elsewhere, here is the latest bottom line from David Snyder:

“For those following the rapid succession of releases and wondering why we are seeing frequent updates (versions 147_08, _09, _10, _12, _13, etc.) and why I’ve been obsessing over “100Mbps” and “EEE,” it’s because we are currently living through a major architectural shift in the Diretta protocol itself.

We are moving from what Yu-san, author of Diretta, calls Mode 2 to Mode 3. He also refers to Mode3 as Diretta Direct Stream (DDS).”

The Evolution of the Stream

  • Mode 1 (The Past): Traditional buffered transmission. Reliable, but heavy on CPU usage.

  • Mode 2 (The Standard): This used UDP packets. It was faster and lighter than TCP, but the Target computer still had to process the full OS Network Stack. This means parsing IP headers (Layer 3) and managing UDP ports and sockets (Layer 4). Every cycle spent validating a checksum or routing data to a socket is a cycle not spent moving audio.

  • Mode 3 / DDS (The New Hotness): Yu-san has recently shifted to Layer 2 Ethernet Frames, removing two layers of the network protocol stack.

    • This bypasses both the IP layer (L3) and the Transport layer (L4) entirely.

    • No IP addresses to parse, no UDP ports to manage, and no socket overhead. It is essentially a direct memory transfer over the wire.

    • The Host talks directly to the Target’s MAC address using raw frames with a custom EtherType (0x88b5).

    • The Target computer no longer wastes time inspecting IP headers. It just sees a frame and hands the payload to the USB bus.


Standardization Note: Yu-san is actually working with the IEEE to have 0x88b5 formally defined and reserved as a standard Layer 2 frame type for audio data transmission. This is a serious move toward establishing a new industry standard.


Why I think this matters for Sound Quality

This transition to Layer 2 (L2) is the ultimate expression of the “low noise” philosophy. By removing two layers of network stack processing, we further reduce the CPU instruction count on the Target. Less code execution = less power draw modulation = less electrical noise.

The Results (Version 147_13)

The transition has been a bit of a bumpy ride (as bleeding-edge tinkering often is!), but version 147_13 seems to have turned the corner.

I just ran a benchmark on my system (RPi 4 for Host and Target) using the new L2 protocol. The timing precision is startling:

  • CycleTime: 514µs
    (time gap between data L2 frames)

  • Measured Core Jitter: 1.68 µs
    (variation in the time gap between frames)

That means the variance in packet arrival time is less than 2 microseconds. This is truly impressive timing precision that rivals hardware-based FPGA solutions, but we are doing it on a pair of Raspberry Pi 4 computers.

How or why this correlates to better sound quality we can only speculate, but those who have tried 147_13 will tell you it’s the best sounding version so far, and it happens to also have the lowest jitter. It will be interesting to see if this trend continues.

diretta_bench_20251124_081244_report

1 Like

when Diretta coming to PMG DAC streamer ?

“Oooh” thought I - “something to play with over christmas!”.

Cutting out IP and UDP/TCP sounds fun, but then I saw the license cost, and github seems to talk only about Roon based setups.
Shame…

1 Like

Well, the license cost is modest IMO and it can be used with any player: LMS, Audirvana etc.

Hmmm ok now you’re talking.
Maybe I shall investigate further, thanks :slight_smile:

He demo’d it for our AZAVClub. Using DSP for Bass is “OK” but for the mids and highs…..defeats the purpose of all the engineering that went into all the expensive “tchotchkes” that we’ve spent years investing in. Plus, I’m a DSD fanboy. I wouldn’t dare put a a nice DSD stream into the Diretta’s ADC.

But there are people in our Club that swear by Diretta. That’s cool. I have room treatment….that’s my “DSP”.

1 Like

You will get over that thinking one of these days. Stone tablets were cool at one time.:grinning:

1 Like

Yeah I had a quick look and saw all sorts of gadgets - I was just interested in having an ethernet-level network stream without IP and all that stack tottering above it, and
I have the spare RasPi boxes to implement…

Huh? I don’t understand what an ADC is? -Other than your concern about sending hi-res through the isolated Diretta link…what is the issue here?

An adc is an Analog to Digital Converter. It what a recording studio uses to convert the sign waves, analog , to digital , 1s and 0s, bits. Then on our end the DAC turns the bits back into sign waves. Make sense?

Not really. Diretta is an all-digital network protocol without any ADC, which would have been used earlier to convert the recording it is transmitting.