Blog · Engineering

The network isn't the bottleneck.

Buy a faster link and your data moves faster, right? At petabyte scale, usually not. Here's what actually limits end-to-end data movement.

October 14, 2025 · by Dr. Chin Fang, Founder & CEO

For years, the instinct when a transfer was slow was to buy more bandwidth. Light up a 100 Gb link and the data should fly. Then it doesn't — the shiny new link sits half-empty while a "simple" copy drags on for days.

That gap between the bandwidth you pay for and the throughput you actually get is the central problem of data movement. As our founder lays out in Reexamining Paradigms of End-to-End Data Movement, once a fast link is in place, the network is rarely what's holding you back.

Where the time actually goes

The real bottlenecks live on the ends, not on the wire:

  • Storage I/O — especially writes. The destination filesystem, not the network, often sets the ceiling.
  • Host architecture — CPU, memory bandwidth, NUMA layout, and where the NIC sits.
  • Software design — single-stream tools like rsync and scp can't fill a 100 Gb pipe no matter how fast the link.
  • Security — encryption and integrity checking that aren't engineered in will throttle everything.

A petabyte transfer touches all of these at once. Optimize the link alone and you've tuned one link in a chain of five.

Why congestion control barely matters

A counter-intuitive finding from the production runs: on a well-provisioned research-and-education backbone, the choice of TCP congestion-control algorithm — CUBIC, BBR, Reno — made little difference, and packet loss was negligible. The record 1 PB run used plain CUBIC. The lesson: if you're tuning the congestion-control algorithm before you've fixed storage I/O and host architecture, you're optimizing the wrong layer.

The answer is co-design

If the bottleneck is the whole path, the fix has to be the whole path. That's the idea behind the Zettar zx Appliance: hardware and software engineered together — storage, host, NICs, and the data mover — so every layer is balanced. Burst buffers (NVMe staging between production storage and the WAN) absorb the writes. The software scales out across nodes to fill the fabric. Integrity and encryption are built in, not bolted on.

The result is the gap closed: with SLAC and ESnet, zx moved 1 PB in 29 hours at 96% link utilization over a 5,000-mile loop — practically line rate, encrypted and checksummed. Not because the link was special, but because the whole path was designed to keep it full.

What this means for you

If your transfers are slow, more bandwidth probably isn't the answer. Look at the writes at the destination, the host your mover runs on, and whether your tool can actually parallelize. Or skip the months of tuning: a pretuned appliance solves the whole path so your team doesn't have to.

See it for yourself

Move petabytes at wire speed.

See how the appliance closes the gap — or schedule a demo on your data.