TL;DR
I benchmarked EBS and S3 Express One Zone across EC2 and EKS setups to settle a question we kept hearing from customers:
“What’s the best temp storage for ML pipelines?”
The short answer:
- Small files – Use EBS.
- Large files – S3 Express One Zone is your friend.
- Running on EKS – Expect some I/O tax.
Here’s the full story and why it matters to your ML pipelines.

Why I Ran This Test
As more customers build ML training pipelines on AWS, we’ve seen a recurring question:
“Should I use EBS or S3 for my temporary storage?”
It’s not just about cost, it’s about performance. You’ve got data preprocessing, model checkpoints, intermediate artifacts – all hammering your storage layer. One wrong call, and your training jobs become a bottleneck festival.
So, I ran a benchmark to give you answers backed by data, not guesswork.
Test Setup
I compared four real-world setups, with identical compute instances – c6g.large – under four distinct configurations:
- EC2 + EBS (mounted volume) – standard disk-based I/O
- EC2 + S3 Express One Zone (SDK access via IAM Role) – network-based object storage
- EKS (EC2-based) + PVC (EBS) – Kubernetes orchestration with persistent volume claim
- EKS (EC2-based) + S3 Express One Zone (via Pod Identity) – containerized object access through the S3 SDK
I tested four file sizes – 1 MB, 15 MB, 100 MB, and 1 GB – over a three-day continuous run, analyzing both upload and download performance across multiple parallelization levels. setup is especially attractive if you want to keep internal workloads in-house but still benefit from AWS-native tooling.
What Surprised Me
🧠 1. EKS Adds Overhead
Using EKS introduces extra network hops, especially when accessing storage.
I saw 10–30% performance degradation compared to EC2, depending on file size and concurrency.
Translation: Orchestration flexibility comes at a cost – raw I/O speed.
⚙️ 2. EBS Dominates with Small Files
If your workload deals with tons of small files (like logs, model shards, or preprocessed features <1MB), EBS is faster. Period.
Even EKS-backed PVCs did OK, just 40–80% slower than EC2, but if you’re doing tight loops or real-time I/O, that matters.
☁️ 3. S3 Express Wins at Scale
For larger files (100MB+), S3 Express One Zone crushed it. Especially when I scaled parallel I/O, S3’s low-latency and regional architecture showed its strength.
It’s also cheaper and plays nicer with lifecycle policies. Great for temp storage and checkpoints.
Pros and Cons Cheat Sheet
| Method | Pros | Cons |
| EC2 + EBS | Fastest for small files, predictable low latency | Costlier, less scalable |
| EC2 + S3 Express | Great for large files, cheaper, easy replication | Some network latency |
| EKS + PVC (EBS) | Kubernetes-native simplicity | 10–30% slower |
| EKS + S3 Express | Highly scalable, multi-pod access, cross-account ready | Slight throughput hit |
What This Means for You
If you’re running ML pipelines on AWS, here’s the simple breakdown:
Use EBS (or PVC) for:
- Frequent small-file reads/writes
- Single-node operations
Use S3 Express One Zone for:
- Staging large datasets
- Model checkpoints
- Multi-node or cross-account scenarios
And in many cases? Use both. Let S3 handle the heavy payloads, while EBS deals with high-churn temp data. Optimize for cost and performance.
One Less Thing You Have to Benchmark
I ran this benchmark because customers kept asking and frankly, so would we if we were in your shoes. Now you’ve got the data, without spending your own compute budget.
If you’re building complex training pipelines and want to avoid the storage tax traps, steal our setup. Or better yet – talk to us. We’ve already walked this path.
Evgeny, Solution Architect
____________________________________