Diretta network protocol breakthrough

Diretta was an inexpensive audiophile’s alternative to TCP/IP which I adopted in January 2024 to get greater fidelity out of a Raspberry Pi Roon endpoint into my new Meridian DSP9 active speakers. Benefits included an expanded soundstage, more focused/realistic imaging and a blacker background, so I was pretty happy spending 100 Euro for a perpetual license.

Then, in August this year, an online acquaintance named David Snyder suggested that I add a 2nd Raspberry Pi between my Roon server and this endpoint to create a completely isolated, very steady Diretta link -separated from everything else on the LAN- which dramatically reduced electrical/cpu activity and elevated my humble Pi transport to superstar status. Big WOW -but we’re not done yet.

Now, the developer in Japan has transitioned to an even quieter way to transfer the bits, prompting a flurry of discussion on the Roon forum and on the Audiophile Style thread David started in August which you might want to check out. BTW Diretta works just as well with UPnP or LMS, not just Roon.

1 Like

For those who haven’t taken the time to digest all that is being discussed elsewhere, here is the latest bottom line from David Snyder:

“For those following the rapid succession of releases and wondering why we are seeing frequent updates (versions 147_08, _09, _10, _12, _13, etc.) and why I’ve been obsessing over “100Mbps” and “EEE,” it’s because we are currently living through a major architectural shift in the Diretta protocol itself.

We are moving from what Yu-san, author of Diretta, calls Mode 2 to Mode 3. He also refers to Mode3 as Diretta Direct Stream (DDS).”

The Evolution of the Stream

  • Mode 1 (The Past): Traditional buffered transmission. Reliable, but heavy on CPU usage.

  • Mode 2 (The Standard): This used UDP packets. It was faster and lighter than TCP, but the Target computer still had to process the full OS Network Stack. This means parsing IP headers (Layer 3) and managing UDP ports and sockets (Layer 4). Every cycle spent validating a checksum or routing data to a socket is a cycle not spent moving audio.

  • Mode 3 / DDS (The New Hotness): Yu-san has recently shifted to Layer 2 Ethernet Frames, removing two layers of the network protocol stack.

    • This bypasses both the IP layer (L3) and the Transport layer (L4) entirely.

    • No IP addresses to parse, no UDP ports to manage, and no socket overhead. It is essentially a direct memory transfer over the wire.

    • The Host talks directly to the Target’s MAC address using raw frames with a custom EtherType (0x88b5).

    • The Target computer no longer wastes time inspecting IP headers. It just sees a frame and hands the payload to the USB bus.


Standardization Note: Yu-san is actually working with the IEEE to have 0x88b5 formally defined and reserved as a standard Layer 2 frame type for audio data transmission. This is a serious move toward establishing a new industry standard.


Why I think this matters for Sound Quality

This transition to Layer 2 (L2) is the ultimate expression of the “low noise” philosophy. By removing two layers of network stack processing, we further reduce the CPU instruction count on the Target. Less code execution = less power draw modulation = less electrical noise.

The Results (Version 147_13)

The transition has been a bit of a bumpy ride (as bleeding-edge tinkering often is!), but version 147_13 seems to have turned the corner.

I just ran a benchmark on my system (RPi 4 for Host and Target) using the new L2 protocol. The timing precision is startling:

  • CycleTime: 514µs
    (time gap between data L2 frames)

  • Measured Core Jitter: 1.68 µs
    (variation in the time gap between frames)

That means the variance in packet arrival time is less than 2 microseconds. This is truly impressive timing precision that rivals hardware-based FPGA solutions, but we are doing it on a pair of Raspberry Pi 4 computers.

How or why this correlates to better sound quality we can only speculate, but those who have tried 147_13 will tell you it’s the best sounding version so far, and it happens to also have the lowest jitter. It will be interesting to see if this trend continues.

diretta_bench_20251124_081244_report

1 Like