Optimize AI Infrastructure Costs with a Hybrid Cloud AI Factory

Summary

FlashBlade//S, Red Hat OpenShift, NVIDIA GPUs, AWS, and Portworx combine to deliver a hybrid cloud AI factory that can reduce infrastructure TCO and securely scale AI training and inference across on-premises and public cloud environments.

What is an AI Factory?

An AI factory is a specialized computing infrastructure managing the entire AI lifecycle at production scale, from data ingestion through training to high-volume inference.

Learn More

Red Hat and Everpure are partnering to build an AI factory that helps enterprises accelerate AI adoption acrosstraining, inference, and hybrid cloud environments. Powered by a modern AI infrastructure stack, this collaboration enables customers to run diverse workloads with consistency, scale, and flexibility. In this blog, we’ll focus on the hybrid cloud AI infrastructure narrative—how organizations can seamlessly build, deploy, and operate AI across on-premises, private cloud, and public cloud environments.

In the era of generative AI, the AI factory has evolved from a conceptual framework into a strategic imperative for the modern enterprise. Turning raw data into industrial-scale intelligence requires a foundation that is as agile as the public cloud but as predictable and cost-effective as the private data center. However, the path to a production-ready AI factory is often blocked by the “GPU tax” of the cloud, high data egress costs, and the complexity of moving massive workloads.

The alliance between Everpure, Red Hat, and NVIDIA provides the blueprint to overcome these hurdles. By integrating Everpure™ FlashBlade//S™ with Red Hat OpenShift, NVIDIA RTX Pro/Data Center GPUs, and Portworx^®, organizations can build a seamless hybrid environment optimized for cloud repatriation and cloud bursting.

The strategic business case: AI infrastructure economics

Enterprises are increasingly realizing that a “cloud-only” AI strategy often leads to diminishing returns. The business case for the hybrid AI factory is built on:

TCO optimization: Repatriating steady-state training to on-premises clusters powered by FlashBlade//S and NVIDIA RTX Pro hardware can reduce infrastructure TCO by up to 40%–60% compared to equivalent AWS P4/P5 instances.

Eliminating data egress: Moving petabytes of data to the cloud for training is a one-way street due to egress fees. Keeping the “data lakehouse” on FlashBlade//S ensures data remains local and accessible.

Consistent governance: Red Hat OpenShift unifies hybrid governance, allowing, for example, healthcare providers to repatriate HIPAA-sensitive inference to on-prem RTX 6000s without creating security silos. By mirroring AWS policies locally, organizations escape the 24X7 “cloud tax”—reclaiming millions in OPEX while maintaining automated, ironclad compliance across their entire footprint.

Detailed motivation: Why repatriate and why burst?

The case for repatriation: Reclaiming the baseline

While the cloud is excellent for “zero-to-one” experimentation, the “one-to-n” scaling phase is where the cloud tax becomes visible.

Data gravity and latency: As data sets reach petabyte scale, the cost and time required to move data to the compute (cloud) become prohibitive. Repatriation moves the compute (NVIDIA) to where the data lives (FlashBlade//S).
Predictable unit economics: AI training is a high-duty-cycle workload. Unlike traditional web apps that scale up and down, AI training often runs at 100% utilization for weeks. In this scenario, CAPEX (owning the hardware) significantly outperforms OPEX (renting it).
The GPU scarcity hedge: Public cloud providers often face GPU shortages during peak demand cycles. Repatriating to a local fleet of NVIDIA RTX Pro, Hopper, or Blackwell GPUs ensures that your R&D pipeline is never gated by a cloud provider’s inventory.
Fine-tuning in the cloud and inference at scale on premises: Use the cloud for AI fine-tuning and experimentation where elastic GPU scale and speed matter most. Run production inference on premises for lower cost, stronger security, lower latency, and better control. This hybrid model gives enterprises the best balance of innovation in cloud and operational efficiency on-prem.

The case for bursting: Handling the ‘black swan’ events

Bursting is the release valve that ensures your local AI factory never hits a ceiling.

Hyper-scaling for “sprint” cycles: When a model needs a massive re-training/fine-tuning cycle for a new product launch, the local cluster may be at 100% capacity. Bursting to AWS allows you to temporarily double or triple your compute power without purchasing hardware that would sit idle 90% of the year.
Global inference distribution: A model fine-tuned on-prem can be “burst” to global cloud regions to provide low-latency inference to users in London, Tokyo, or New York, while keeping the core IP in the home data center.

Figure 2: Benefits of repatriation and bursting with a hybrid cloud infrastructure.

Value proposition: The ‘uninterrupted innovation’ framework

The value proposition of this joint solution leveraging FlashBlade//S, Red Hat, NVIDIA, and Portworx is centered on removing the friction between “thinking” and “doing.”

Agility without lock-in: Everpure provides a “cloud-like” experience on premises. Data scientists get the same Jupyter notebooks and PyTorch environments they love in the cloud, but with the performance of local FlashBlade//S storage.
Maximized hardware ROI: By using NVIDIA BCM and FlashBlade//S, we ensure that every dollar spent on GPUs is utilized. We eliminate “I/O wait” bottlenecks, meaning your training finishes 30% faster than on traditional NAS systems.
Future-proofing: Because the stack is built on Red Hat OpenShift and Portworx, your infrastructure is abstracted. If you want to move from AWS to Azure, or from RTX Pro to the next generation of NVIDIA Blackwell chips, the software layer remains identical.

Differentiation: Why this specific stack?

Many vendors offer “hybrid cloud,” but they usually deliver two separate silos. Our differentiation lies in integrated mobility.

Figure 3: Hybrid AI optimizes workload placement—bursting to cloud for training and experimentation while repatriating inference and RAG on premises for cost, control, and performance.

Traditional infrastructure vs. hybrid cloud AI factory with Everpure

Feature	Traditional Solutions	AI Factory with Everpure
Storage Performance	Legacy NAS (slow metadata, high latency)	FlashBlade//S: Parallel architecture designed for AI throughput
Data Mobility	Manual scripts, slow FTP, or AWS Snowball	Portworx Async Migration: Live block-level replication between clusters
GPU Management	Manual driver installs and SSH scripts	NVIDIA BCM & GPU Operator: Automated, policy-based fleet management
Cloud Strategy	Fragmented (one team for cloud, one for on-prem)	Unified control plane: One OpenShift dashboard for the entire global estate

Technical deep dive: The migration engine

The core of the repatriation story is Portworx Async Migration between two Red Hat OpenShift clusters. This solution enables asynchronous, application-consistent migration running on Red Hat OpenShift using:

Portworx → Data replication layer
STORK → Migration orchestration layer

Together, they form a policy-driven, incremental migration engine that operates at the application and storage layer, rather than relying on infrastructure-level data movement.

Step-by-step procedure: Migrating from AWS to on premises

Cluster pairing and validation: Before any data transfer occurs, a ClusterPair must be established to create a secure relationship between the two clusters.

This includes:

API connectivity between clusters
Portworx authentication (cluster token)
Shared object store configuration

Key requirements:

Bidirectional or reachable network path between clusters
Object store accessible from both clusters (used for metadata exchange and coordination between clusters)

storkctl create clusterpair <name-clusterpair> \

–src-kube-file <source-kubeconfig> \

–dest-kube-file <destination-kubeconfig> \

–mode migration \

–provider s3 \

–bucket <bucket-name>

Application resource discovery: STORK identifies all Kubernetes resources associated with the AI application. The goal is to capture a complete application definition, including namespaces, pods, PersistentVolumeClaims (PVCs), and ConfigMaps/secrets. This ensures the migration preserves application state, not just raw data.

Volume snapshot and data sync: Portworx takes snapshots of volumes backing the application and incremental data transfer (changed blocks only).

Key characteristics:

Storage-level replication
Delta-based transfer (not full copy)
Scales efficiently to large data sets

Transfer and rehydration: Data is transferred to the destination cluster, where

volumes are recreated, data is restored, and storage is provisioned using the destination StorageClass.

An important abstraction to consider is that migration is storage-agnostic at the Kubernetes layer, so underlying storage types may differ (e.g., cloud block vs. on-prem systems).

Migration execution: Migration can be triggered in two modes:

One-time migration (full application transfer)
Scheduled migration (continuous synchronization)

storkctl create migration migrate-app \

–clusterPair remotecluster \

–namespaces <app-namespace> \

–includeResources \

–startApplications

This operation triggers Portworx to create snapshots, transfer data, and recreate applications onto the NVIDIA RTX Pro nodes in the destination cluster.

Application deployment: STORK recreates the application

PVCs are bound to restored volumes
Pods are scheduled
Finally, the application starts on the destination cluster

Deployment modes:

Application automatically starts up
Manual validation before cutover

Continuous synchronization: Using MigrationSchedule in Portworx, AI applications can be kept in sync with the active training job on AWS/ on-prem.

storkctl create migrationschedule app-schedule \

–clusterPair remotecluster \

–namespaces <app-namespace> \

–schedule-policy <policy-name>

This enables:

Periodic execution (e.g., every 15 minutes)
Incremental updates only
Near-real-time consistency (RPO-driven)

Monitor migration: Track migration progress and watch for the key fields:

Stage → Progress (Volumes, Final, etc.)

Status → Successful/Failed

storkctl get migration -n portworx

Figure 4: Cross-cluster Portworx DR replication using ObjectStore for Kubernetes workload recovery across sites.

Common use cases

Smart manufacturing: High-speed defect detection via hybrid AI

By fine-tuning computer vision models in the cloud, manufacturers capitalize on cost-effective, on-demand burst compute. The finalized models are then deployed directly to the factory floor for continuous, on-premises inference powered by NVIDIA RTX Pro GPUs. This hybrid strategy allows real-time processing of heavy, live video feeds ingested directly onto FlashBlade//S to instantly catch product defects—helping guarantee the ultra-low latency required for automated assembly lines while eliminating the exorbitant bandwidth costs of streaming video to the cloud.

Personalized medicine and genomics: Secure drug discovery at scale

By maintaining sensitive genomic data and patient Personally Identifiable Information (PII) on premises on FlashBlade//S, biotech firms ensure ironclad HIPAA compliance and data sovereignty. This hybrid strategy utilizes Red Hat OpenShift and Portworx to securely burst compute-intensive protein-folding simulations to elastic AWS GPU instances during peak research windows. Researchers can then repatriate refined results to the local AI factory for high-performance clinical decision support—providing the massive scale needed for rapid drug discovery while ensuring raw patient records never leave the secure perimeter.

Financial fraud detection: Global edge inference with secure training

Major financial institutions train sophisticated deep learning models on premises using massive transaction histories stored on FlashBlade//S to protect proprietary IP and meet strict data residency requirements. Once optimized, these models are burst to AWS global regions to provide the sub-millisecond inference required for real-time fraud detection in local markets worldwide. This approach eliminates the “one-way street” of cloud egress fees and the risks of moving sensitive data sets, allowing banks to stop fraudulent activity instantly at the edge while keeping the primary training loop secure in their own data center.

Energy sector: Seismic imaging and subsurface modeling

Energy companies move massive volumes of raw 3D seismic imaging data from remote field sites to the cloud for rapid initial triage and automated interpretation using elastic cloud compute. Once high-priority subsurface areas are identified, the multi-petabyte data sets are repatriated to the on-premises AI factory and stored on FlashBlade//S for long-term, high-fidelity refinement and proprietary modeling. This hybrid lifecycle optimizes operational costs by using the cloud for project-based “sprints” while leveraging local high-performance storage to feed NVIDIA-powered clusters for the deep analysis required to minimize drilling risk.

Outcomes and success metrics

Financial: TCO reduction via repatriation

Figure 6: On-prem requires higher upfront investment but breaks even around month 13 and delivers growing cumulative savings versus AWS by year 5.

AWS has low upfront cost, but cumulative spend rises quickly, reaching roughly $245K by month 60. On-prem starts higher at about $45K, but grows slowly to around $80K by month 60. The on-prem model breaks even at about 13 months, after which the shaded area represents cumulative savings versus AWS.

Figure 7: As AI workload scale increases, on-prem RTX 6000 Ada shifts from near cost parity to major 3-year TCO savings versus AWS.

This graph compares three-year total cost of ownership for running AI workloads on AWS cloud versus repatriating on-prem with RTX 6000 Ada.

At the smallest workload, AWS is slightly cheaper: $31.4K vs. $34.3K on-prem. But as usage scales, AWS costs grow much faster: At the middle tier, AWS is $149K vs. $65.8K on-prem, saving about $83K. At the largest tier, AWS is $422.2K vs. $113.8K on-prem, saving about $308K.

The key message: Cloud is attractive for low usage, but once the workload is steady and large, owning the GPU infrastructure becomes dramatically cheaper.

Operational: Significant reduction in time to move data sets and models between environments

Figure 8: Portworx and Red Hat OpenShift replace manual, weeks-long cloud repatriation with automated, app-aware migration completed in hours or days.

Before repatriation, moving AI workloads requires manual data copy, YAML/config rewrites, app reconfiguration, debugging, and integrity checks, which can take weeks or months.

With Portworx and Red Hat OpenShift, the process becomes automated through storage migration, infrastructure abstraction, and app-aware migration.

The key message: Cloud repatriation becomes faster, safer, and more repeatable—reducing migration time from weeks/months to hours/days.

Performance: 100% GPU saturation through FlashBlade//S high-throughput architecture

Public cloud storage can create major bottlenecks for GPU workloads. Volume caps, instance throughput limits, noisy neighbors, and throttling can restrict throughput and leave GPU clusters idle or waiting. By contrast, an on-prem FlashBlade//S architecture provides scale-out storage with linear, scalable throughput to GPU servers, helping keep GPUs fully utilized and productive.

The AI factory with Everpure—powered by Red Hat, NVIDIA, and Portworx—is more than an integrated stack. It’s a strategic control plane for enterprise AI, enabling organizations to repatriate high-cost cloud workloads back on premises, burst seamlessly into the cloud when demand spikes, and place every training or inference workload where cost, performance, and governance align best. This gives enterprises the freedom to scale AI without lock-in, runaway cloud spend, or operational compromise.

Optimize AI Infrastructure Costs with a Hybrid Cloud AI Factory

Summary

What is an AI Factory?

The strategic business case: AI infrastructure economics

Detailed motivation: Why repatriate and why burst?

The case for repatriation: Reclaiming the baseline

The case for bursting: Handling the ‘black swan’ events

Value proposition: The ‘uninterrupted innovation’ framework

Differentiation: Why this specific stack?

Technical deep dive: The migration engine

Step-by-step procedure: Migrating from AWS to on premises

Common use cases

Smart manufacturing: High-speed defect detection via hybrid AI

Personalized medicine and genomics: Secure drug discovery at scale

Financial fraud detection: Global edge inference with secure training

Energy sector: Seismic imaging and subsurface modeling

Outcomes and success metrics

Predictable Enterprise Data Reduction at Scale with Purity DeepReduce

Everpure Continues to Simplify Purity Delivery

6 Ways Portworx Helps Mitigate Rising Storage and Memory Costs

Unlock Fleet-Wide Storage Management with ‘Just Enough’ LDAP

Top Stories

Predictable Enterprise Data Reduction at Scale with Purity DeepReduce

Everpure Continues to Simplify Purity Delivery

6 Ways Portworx Helps Mitigate Rising Storage and Memory Costs

Unlock Fleet-Wide Storage Management with ‘Just Enough’ LDAP

Why Modern Data Needs a New Reduction Model

Optimize AI Infrastructure Costs with a Hybrid Cloud AI Factory

Summary

What is an AI Factory?

The strategic business case: AI infrastructure economics

Detailed motivation: Why repatriate and why burst?

The case for repatriation: Reclaiming the baseline

The case for bursting: Handling the ‘black swan’ events

Value proposition: The ‘uninterrupted innovation’ framework

Differentiation: Why this specific stack?

Technical deep dive: The migration engine

Step-by-step procedure: Migrating from AWS to on premises

Common use cases

Smart manufacturing: High-speed defect detection via hybrid AI

Personalized medicine and genomics: Secure drug discovery at scale

Financial fraud detection: Global edge inference with secure training

Energy sector: Seismic imaging and subsurface modeling

Outcomes and success metrics

Related Stories

Top Stories