Benchmark
Environment
- EC2 and the S3 bucket are located in the same region
fio
Create a 100GB test file:
fio --name=create_100gb_file \
--filename=/mnt/fuse/100gb \
--ioengine=libaio \
--direct=1 \
--group_reporting \
--fallocate=none \
--create_on_open=1 \
--end_fsync=1 \
--size=100000M \
--rw=write \
--bs=10M \
--numjobs=1
Load the test file into local cache:
$ lsblk
$ sudo mkfs.xfs /dev/nvme1n1
$ sudo mkdir /mnt/fuse /data
$ sudo chmod 0777 /mnt/fuse /data -R
$ sudo mount /dev/nvme1n1 /data
$ sudo mapfs add vol_benchmark aws <AWSAccessKey> <AWSSecretKey> <S3-BucketName> <Region> cache_dir=/data
$ mapfs load /mnt/fuse/100gb
Read Performance with cache:
Network Write Performance:
fio --name=write_benchmark \
--directory=/mnt/fuse \
--ioengine=libaio \
--direct=1 \
--group_reporting \
--fallocate=none \
--create_on_open=1 \
--end_fsync=1 \
--runtime=60 \
--time_based \
--size=100G \
--rw=write \
--bs=<1M|4M|8M> \
--numjobs=<1|32|128>
Note:
After each fio test, delete fio generated large test file from the Cloud Storage:
$ rm -f /mnt/fuse/write_benchmark.*
c6id Series: performance overview
With 1 NVME SSD instance storage.
| Instance Type | Baseline IOPS | Peak IOPS | Baseline Throughput (MB/s) | Peak Throughput (MB/s) | Baseline Bandwidth (Mbps) | Peak Bandwidth (Mbps) |
| c6id.large | 3600 | 40000 | 81.25 | 1250 | 650 | 10000 |
| c6id.xlarge | 6000 | 40000 | 156.25 | 1250 | 1250 | 10000 |
| c6id.2xlarge | 12000 | 40000 | 312.5 | 1250 | 2500 | 10000 |
| c6id.4xlarge | 20000 | 40000 | 625 | 1250 | 5000 | 10000 |
| c6id.8xlarge | 40000 | 40000 | 1250 | 1250 | 10000 | 10000 |
Benchmark Results
c6id.large
| Network Read | |
| Speed | 70.8 MiB/s |
| CPU(cores) | 57.48% |
| RSS(KB) | 404785 |
| %MEM | 10.36% |
| Throughput (bs=1M) | ||||||
| Bandwidth/numjobs | Sequential | Random | ||||
| 1 | 32 | 128 | 1 | 32 | 128 | |
| Bandwidth | 152 MiB/s | 4839 MiB/s | 6440 MiB/s | 152MiB/s | 152MiB/s | 152MiB/s |
| CPU (cores) | 3.36% | 75.22% | 162.88% | 3% | 4% | 4% |
| RSS(KB) | 390373 | 408380 | 410700 | 224360 | 232953 | 233083 |
| %MEM | 10% | 10.45% | 10.51% | 6% | 6% | 6% |
| IOPS (4k block size) | ||||||
| IOPS/numjobs | Sequential | Random | ||||
| 1 | 32 | 128 | 1 | 32 | 128 | |
| IOPS | 39k | 182k | 132k | 8.2k | 34.1k | 34.2k |
| CPU1 | 31.68% | 93.89% | 80% | 27.2% | 97.96% | 101.48 |
| RSS(KB) | 397051 | 418732 | 403244 | 212342 | 213160 | 213258 |
| %MEM | 10.16% | 10.72% | 10.32% | 5.43% | 5.46% | 5.46% |
| WRITE | ||||
| BS/numjobs | 1MB | 4MB | ||
| 1 | 32 | 1 | 32 | |
| Bandwidth | 791MiB/s | 831MiB/s | 986MiB/s | 850MiB/s |
| CPU (cores) | 152.61% | 140% | 161.14% | 145.84% |
| RSS(KB) | 663370 | 674742 | 731934 | 697581 |
| %MEM | 16.98 | 17.27% | 18.73% | 17.85% |
c6id.xlarge
4 vCPU, 8GB RAM
On-Demand Linux pricing: 0.231 USD per Hour
Network: baseline 1.25Gbps (156MB/s), up to 12.5Gbps; VPC credits can be exhausted within one minute
NVME SSD: 1 x 237 GiB Instance Store
| Network Read | |
| Speed | 141.8 MiB/s |
| CPU(cores) | 85% |
| RSS(KB) | 630752 |
| %MEM | 7.92% |
| Throughput (bs=1M) | ||||||
| Bandwidth/numjobs | Sequential | Random | ||||
| 1 | 32 | 128 | 1 | 32 | 128 | |
| Bandwidth | 305 MiB/s | 9754 MiB/s | 11.3 GiB/s | 309 MiB/s | 305 MiB/s | 305 MiB/s |
| CPU (cores) | 6.62% | 177% | 312% | 8.64% | 9.72% | 8.90% |
| RSS(KB) | 717340 | 718324 | 742341 | 316319 | 316609 | 316707 |
| %MEM | 9.01% | 9.02% | 9.32% | 3.97% | 3.97% | 3.98% |
| IOPS (4k block size) | ||||||
| IOPS/numjobs | Sequential | Random | ||||
| 1 | 32 | 128 | 1 | 32 | 128 | |
| IOPS | 53.1k | 319k | 198k | 8.2k | 68.2k | 68.3k |
| CPU1 | 41% | 199% | 172% | 26% | 223% | 236% |
| RSS(KB) | 719117 | 697986 | 473843 | 307216 | 308542 | 309972 |
| %MEM | 9.03% | 8.76% | 5.95% | 3.86% | 3.87% | 3.89% |
| WRITE | ||||
| BS/numjobs | 1MB | 4MB | ||
| 1 | 32 | 1 | 32 | |
| Bandwidth | 917 MiB/s | 1200 MiB/s | 1144 MiB/s | 879 MiB/s |
| CPU (cores) | 163% | 180% | 191% | 165% |
| RSS(KB) | 1119798 | 1073321 | 1061085 | 1066169 |
| %MEM | 14.06% | 13.47% | 13.32% | 13.38% |
c6id.2xlarge
8 vCPU, 16GB RAM
On-Demand Linux pricing: 0.4620 USD per Hour
Network: Up to 12.5Gbps
NVME SSD: 1 x 474 GiB Instance Store
| Network Read | |
| Speed | 283.3 MiB/s |
| CPU(cores) | 112% |
| RSS(KB) | 753241 |
| %MEM | 4.68% |
| Throughput (bs=1M) | ||||||
| Bandwidth/numjobs | Sequential | Random | ||||
| 1 | 32 | 128 | 1 | 32 | 128 | |
| Bandwidth | 615MiB/s | 19.0GiB/s | 18.8GiB/s | 612MiB/s | 610MiB/s | 610MiB/s |
| CPU (cores) | 17% | 427% | 589% | 18% | 20% | 22% |
| RSS(KB) | 721862 | 778279 | 814246 | 258948 | 258213 | 260116 |
| IOPS (4k block size) | ||||||
| IOPS/numjobs | Sequential | Random | ||||
| 1 | 32 | 128 | 1 | 32 | 128 | |
| IOPS | 51.2k | 472k | 282k | 7.7k | 116k | 103k |
| CPU1 | 43% | 384% | 347% | 30% | 385% | 391% |
| RSS(KB) | 788131 | 775740 | 514511 | 242278 | 244264 | 244523 |
| WRITE | ||||
| BS/numjobs | 1MB | 4MB | ||
| 1 | 32 | 1 | 32 | |
| Bandwidth | 1319MiB/s | 1254MiB/s | 1255MiB/s | 1250MiB/s |
| CPU (cores) | 186% | 183% | 193% | 182% |
| RSS(KB) | 1999296 | 2032385 | 2076298 | 2052195 |
| %MEM | 12.43% | 12.64% | 12.91% | 12.76% |
c6id.4xlarge
16 vCPU, 32GB RAM
On-Demand Linux pricing: 0.9240 USD per Hour
Network: Up to 12.5Gbps
NVME SSD: 1 x 950 GiB Instance Store
| Network Read | |
| Speed | 568.2 MiB/s |
| CPU(cores) | 125% |
| RSS(KB) | 796710 |
| %MEM | 2.47% |
| Throughput (bs=1M) | ||||||
| Bandwidth/numjobs | Sequential | Random | ||||
| 1 | 32 | 128 | 1 | 32 | 128 | |
| Bandwidth | 1202MiB/s | 38.0GiB/s | 45.8GiB/s | 1222MiB/s | 1221MiB/s | 1221MiB/s |
| CPU (cores) | 29% | 645% | 1236% | 26% | 31% | 32% |
| RSS(KB) | 759424 | 844728 | 932179 | 352842 | 352286 | 350937 |
| IOPS (4k block size) iouring | ||||||
| IOPS/numjobs | Sequential | Random | ||||
| 1 | 32 | 128 | 1 | 32 | 128 | |
| IOPS | 147k | 1010k | 1124k | 9.5k | 112k | 124k |
| CPU(cores) | 41% | 591% | 705% | 17% | 250% | 285% |
| RSS(KB) | 719477 | 742808 | 751704 | 208218 | 209705 | 209728 |
| WRITE | ||||||
| BS/numjobs | 1MB | 4MB | 8MB | |||
| 1 | 32 | 1 | 32 | 1 | 32 | |
| Bandwidth | 1429MiB/s | 1370MiB/s | 1375MiB/s | 1380MiB/s | 1387MiB/s | 1051MiB/s |
| CPU (cores) | 169% | 174% | 190% | 180% | 206% | 162% |
| CPU (usr+sys) | ||||||
| RSS(KB) | 2447013 | 2532508 | 2617603 | 2576234 | 2617686 | 2612897 |
| %MEM | 7.57% | 7.84% | 8.10% | 7.97% | 8.10% | 8.09% |
c6id.8xlarge
32 vCPU, 64GB RAM
On-Demand Linux pricing: 1.8480 USD per Hour
Network: 12.5Gbps
NVME SSD: 1 x 1900 GiB Instance Store
| Network Read: | |||
| 1124 MiB/s | 150% | 1194014 KB | 1.84% |
| Throughput (bs=1M) | ||||||
| Bandwidth/numjobs | Sequential | Random | ||||
| 1 | 32 | 128 | 1 | 32 | 128 | |
| Bandwidth | 1671 MiB/s | 51.8 GiB/s | 80.4 GIB/s | 1471 MiB/s | 2487 MiB/s | 2482 MiB/s |
| CPU (cores) | 35% | 820% | 2666% | 36% | 68% | 71% |
| CPU (usr+sys) | ||||||
| RSS(KB) | 906482 | 925460 | 888389 | 442029 | 445317 | 445010 |
| %MEM | 1.40% | 1.43% | 1.37% | 0.68% | 0.69% | 0.69% |
| IOPS (4k block size) iouring | ||||||
| IOPS/numjobs | Sequential | Random | ||||
| 1 | 32 | 128 | 1 | 32 | 128 | |
| IOPS | 147k | 829k | 818k | 9.6k | 165k | 201k |
| CPU(cores) | 40% | 1893% | 1889% | 17% | 345% | 432% |
| CPU(usr+sys) | ||||||
| RSS(KB) | 721082 | 751799 | 731597 | 262532 | 262532 | 262532 |
| %MEM | 1.11% | 1.16% | 1.13% | 0.41% | 0.41% | 0.41% |
| WRITE | ||||||
| BS/numjobs | 1MB | 4MB | 8MB | |||
| 1 | 32 | 1 | 32 | 1 | 32 | |
| Bandwidth | 1460MiB/s | 1457MiB/s | 1453MiB/s | 1419MiB/s | 1399MiB/s | 1434MiB/s |
| CPU (cores) | 166% | 167% | 192% | 185% | 197% | 186% |
| CPU (usr+sys) | ||||||
| RSS(KB) | 2443310 | 2584623 | 2721486 | 2673981 | 2726733 | 2731521 |
| %MEM | 3.77% | 3.99% | 4.20% | 4.13% | 4.21% | 4.22% |
Performance Notes
1. c6id.large/xlarge/2xlarge/4xlarge: EBS and Network have a baseline performance, and burstable upper limit.
2. "mapfs mount <VolumeName> <MountPoint>" mounts in traditional mode by default;
To enable "fuse over iouring" mode, specify "iouring" during mount:
$ mapfs mount <VolumeName> <MountPoint> iouring
Note:
Specifying iouring does not guarantee "fuse over iouring" will be enabled. It also requires Linux kernel version >= 6.18, typically from these distributions:
- Amazon Linux 2023, kernel 6.18
- Ubuntu 26.04
3. Advantages and limitations of iouring mode
Traditional mount: When CPU number exceeds 16 and IOPS exceeds 300K, a single lock contention point can cause CPU usage to spike while I/O performance may not improve or can even degrade greatly. The more CPUs added, the worse the performance degradation becomes, due to scheduling pressure, frequent L3 cache invalidations, and the single kernel spinlock CPU usage.
iouring mode mount: breaks single lock contention and scale up better when CPU number exceeds 16 and IOPS exceed 300K.
However, its limitation is that when IOPS exceed 1M and CPU number reaches 32 or more, fuse uring threads may degrade into polling mode, causing high CPU usage without a corresponding I/O performance increase.
4. Common benchmark bottlenecks:
- EBS Throughput and IOPS limitations
- c6id instances include an instance NVME SSD. If you use other instance types such as t3.small with an attached GP3 SSD volume, you will be limited by GP3 throughput and IOPS.
- Network bandwidth
- If you use burstable EC2 instances (such as t3 series), CPU performance may fluctuate between baseline and burst limits.