Why Consistent High Bandwidth and Low Latency Are Critical for AI Factories

Summary

Combining Everpure FlashBlade//EXA and NVIDIA Spectrum-X creates a high-bandwidth, low-latency AI factory fabric that eliminates GPU idle time, boosts throughput, and delivers predictable performance at scale.

Modern AI/ML workloads are fundamentally data-intensive, particularly large-scale training and high-volume inferencing. Success hinges on more than just powerful GPUs. The underlying infrastructure that connects compute and storage resources is one of the key performance bottlenecks. To prevent idle GPUs and inefficient model iterations, a high-performance network that delivers consistent high bandwidth and low latency is non-negotiable.

Training models require rapidly loading petabytes of data, and inferencing requires fast access to large model weights and incoming data streams. The GPU, which serves as the main engine for tensor processing, can’t operate at full efficiency when it’s idling. Therefore, the connection between the GPU and the storage layer—which should ideally be an all-flash solution like Everpure™ FlashBlade//EXA™—must be capable of keeping the GPU fed.

Everpure FlashBlade//EXA: High-performance storage for AI factories

FlashBlade//EXA is an ultra-scale storage platform specifically designed to meet the extreme performance demands of large AI environments. It effectively addresses the infrastructure and storage bottlenecks that often cause costly GPUs to idle while waiting for data. FlashBlade//EXA builds upon the established Purity//FB software, combining a dedicated FlashBlade^® array for metadata with purpose-built Purity//DN software running on the data nodes. By leveraging industry-standard protocols such as NFS, pNFS, and RDMA, FlashBlade//EXA delivers both low-latency and high-throughput storage access.

Its disaggregated architecture separates data and metadata to avoid performance contention, while its resilient design supports high availability even as the focus remains on performance at scale. Built on a 400G infrastructure with multiple 400G interfaces, FlashBlade//EXA requires a robust underlying I/O and network fabric. Its data nodes leverage the latest commodity hardware, including PCIe Gen 5, modern CPUs, NVMe SSD drives, and 400G adapters. This sophisticated architecture allows FlashBlade//EXA to handle the massive scaling needs of modern AI factories, supporting trillions of files and exabytes of capacity, while delivering the necessary high read and write bandwidth.

NVIDIA Spectrum-X Ethernet: Networking optimized for AI and GPU clusters

NVIDIA Spectrum-X Ethernet is a networking platform purpose-built to eliminate the bottlenecks in AI environments. By integrating the NVIDIA Spectrum-4 SN5600/SN5610 switch with the NVIDIA ConnectX®-8 SuperNIC™, it delivers a specialized fabric that bridges the gap between traditional Ethernet and the demands of massive GPU clusters.

Unlike off-the-shelf Ethernet, Spectrum-X Ethernet leverages NVIDIA adaptive routing and advanced congestion control to maximize link utilization and neutralize “hot spots.” The result is a highly predictable, low-latency infrastructure that accelerates data movement between storage and compute, ensuring GPUs remain fully utilized even under the most intense AI workloads.

Powering the AI factory: Beyond traditional networking

In the modern AI factory, data is the raw material and tokens are the finished product. This production line requires a lossless environment where storage and compute are in constant, high-speed sync. However, traditional Ethernet technologies designed for general-purpose web traffic struggle with the all-to-all communication patterns of distributed AI.

Off-the-shelf Ethernet deployments typically rely on Equal-Cost Multi-Path (ECMP) for traffic distribution. ECMP uses a static hashing algorithm to assign data flows to specific paths. While it works for general data, it’s unaware of the actual state of the network. It does not account for real-time link utilization or buffer depth. It can shove two large AI data flows onto the same path, creating a hot spot while other links sit idle. This creates two primary bottlenecks:

Tail latency and GPU stalls: Because AI training is a synchronous process (e.g., AllReduce operations), the entire GPU cluster is gated by the slowest packet. A single congested link increases tail latency, forcing compute into an idle state.
Storage throughput capping: For parallel file systems like FlashBlade//EXA, network congestion creates artificial backpressure. Even if the storage tier has the IOPS and bandwidth to fulfill a request, the network path becomes the limiting factor. This results in poor ROI on storage hardware, as the effective throughput is capped by the congested link rather than the storage’s native performance.

The Solution: Everpure FlashBlade//EXA + NVIDIA Spectrum-X

The combination of FlashBlade//EXA storage and the Spectrum-X Ethernet platform transforms the network from a passive pipe into an intelligent, end-to-end fabric, providing:

Predictable performance for AI training: FlashBlade//EXA provides the massive, parallel throughput needed for checkpointing and data ingestion, while Spectrum-X Ethernet ensures those data bursts don’t cause network incast or congestion.
Workload isolation in multi-tenant clouds: Spectrum-X Ethernet ensures that a massive training job in one “cell” of the AI factory doesn’t create “noisy neighbor” interference for an inference workload in another.

Performance validation: FlashBlade//EXA + Spectrum-X lab results

To quantify the impact of an integrated fabric, Everpure performed a controlled lab test using a 400G end-to-end Spectrum-X Ethernet environment. The setup utilized two NVIDIA SN5600 switches, four data nodes, and four initiators, testing a 4x400G inter-switch topology under heavy, multi-group traffic loads.

Note: Validation results are provided for informational purposes only. Figures based off Test Bed environment and client results may differ depending on environmental and other factors. These results are not a guarantee of outcome.

The results: Breaking the ECMP ceiling

The testing compared Spectrum-X Ethernet against an off-the-shelf Ethernet configuration, utilizing standard ECMP and DCQCN. The findings highlight a significant departure from the mathematical statistical luck inherent in traditional hashing:

Throughput gains: Spectrum-X Ethernet delivered a 25% improvement in read bandwidth and a 23% improvement in writes.

Efficiency: While ECMP can often leave links underutilized due to hot spot collisions, the adaptive routing in Spectrum-X Ethernet maintains a higher average link utilization across the fabric.

Figure 1: Read and write bandwidth test results for Spectrum-X vs. standard Ethernet.

Delivering predictable low latency

Lab testing between FlashBlade//EXA and Spectrum-X Ethernet revealed that while latency improvements are negligible for small metadata-style messages, the benefits scale dramatically as payload sizes increase. This is critical because AI workloads are dominated by elephant flows, massive data transfers for data set ingestion, and model checkpointing.

By replacing the statistical uncertainty of ECMP with hardware-driven adaptive routing, the platform achieved:

38% reduction in read latency for large-block transfers
29.5% reduction in write latency for larger data payloads

Why block size matters

Traditional Ethernet treats every packet as an independent event. For small messages, the overhead of standard hashing is manageable. However, during large transfers, standard Ethernet’s inability to dynamically re-route around micro-congestions causes buffer bloat and packet queuing.

Spectrum-X uses a sub-microsecond feedback loop between the Spectrum-X Ethernet switch and the ConnectX-8 SuperNIC. This ensures that the massive data bursts from a FlashBlade//EXA storage node are sprayed across all available paths at the packet level. By preventing these packets from bunching up on a single link, the fabric ensures that the storage-to-GPU path remains deterministic.

Figure 2: Average write latency for Ethernet vs. Spectrum-X.

Figure 3: Average read latency for Ethernet vs. Spectrum-X.

Performance isolation from ‘noisy neighbors’

In multi-tenant AI environments, a primary risk to productivity is the noisy neighbor effect. This occurs when a bursty or poorly optimized workload on one node consumes disproportionate fabric resources, creating congestion that impacts unrelated, mission-critical jobs.

The Spectrum-X congestion control mechanism provides isolation to protect primary workloads. In lab testing with FlashBlade//EXA storage, this was validated through a noisy neighbor simulation:

The scenario: A disruptive, periodic short-write workload was introduced alongside a continuous, mission-critical write job.
The result: Spectrum-X congestion control effectively isolated the bursty traffic, ensuring the primary write job maintained consistent, near-line-rate bandwidth. The DCQCN chart illustrates that Client 1’s RDMA traffic (green) is throttled, leading to a drop in write throughput, whenever Client 2 (brown) initiates writes. In contrast, the Spectrum chart demonstrates that Client 1 experiences minimal impact due to Spectrum-X’s CC implementation.

Figure 4: Noisy neighbor simulation results using DCQCN. DN represents the FlashBlade//EXA data node storage device.

Figure 5: Noisy neighbor simulation results using Spectrum-X.

This capability ensures that mission-critical training or inferencing jobs running on FlashBlade//EXA storage are not affected by less-critical or bursty workloads running elsewhere on the network, guaranteeing the predictable performance necessary for production AI operations.

Conclusion: A fabric built for the AI era

In our testing, we found the following key improvements by leveraging FlashBlade//EXA with NVIDIA Spectrum-X Ethernet vs. standard Ethernet:

Throughput: 25% read/23% write improvement
Latency: Up to 38% reduction for large-block transfers

The transition to NVIDIA Spectrum-X Ethernet represents a shift from generic networking to an intelligent, end-to-end fabric specifically engineered for the high-concurrency demands of the AI factory. By pairing the parallel performance of FlashBlade//EXA storage with the deterministic routing of Spectrum-X Ethernet switches and ConnectX-8 SuperNICs, organizations can finally eliminate the network tax that has traditionally capped GPU and storage.

As AI models grow in complexity, the efficiency of the underlying infrastructure becomes the primary competitive advantage. The combination of FlashBlade//EXA and NVIDIA Spectrum-X Ethernet doesn’t just make the network faster; it makes it predictable. By providing the bandwidth, low latency, and performance isolation required for industrial-scale AI, this integrated solution ensures that your most expensive resources—your GPUs and your data—are never left waiting on the wire.

AI Factory

An AI factory is a specialized computing infrastructure designed to industrialize the creation, training, and deployment of artificial intelligence models at production scale.

Learn More