Next-Gen NVMe Reference Architecture

Over a year ago, we published our first reference architecture with KIOXIA, a 2U building block based on the Supermicro SYS-211GT-HNTR, KIOXIA CD8P-R NVMe drives, and ConnectX-6 at 2×100GbE per node. It proved the Enakta Storage Platform could deliver ~200 GiB/s reads from a single 2U chassis on production-grade hardware.

Since then, every component in the stack has a next-generation successor. The drives are faster and denser. The CPUs have more cores and memory bandwidth. And critically, the network has jumped from 100GbE to 400GbE, removing what was previously the bottleneck in the system.

This is the updated reference architecture.

The Building Block

The design philosophy is the same: a dense 2U chassis with 4 independent single-socket nodes, each running the DAOS core as part of a unified storage cluster. The Enakta Storage Platform orchestrates the entire stack, from provisioning to monitoring to recovery.

Component	Specification
Chassis	Supermicro SYS-212GT-HNR (2U, 4 nodes, GrandTwin)
CPU	Intel Xeon 6 (P-core), up to 86 cores per node
Memory	1TB DDR5-6400 ECC Registered per node
Storage	6 × NVMe 15.36TB per node (KIOXIA CD9P-R or Micron 7600 PRO)
Network	2 × NVIDIA ConnectX-7 400GbE (800 Gbps per node)
Boot	PXE (stateless, no local boot drives)

Every node boots via PXE with immutable OS images. No local boot drives, no state to manage, no disks to replace when a boot drive fails. The Enakta Storage Platform provisions and manages the entire cluster lifecycle.

Memory is set at 1TB per node. The DAOS core currently uses DRAM for metadata, but the roadmap includes metadata-on-NVMe (MD-on-SSD stage 2) which will reduce memory requirements further, lowering per-node cost without sacrificing performance.

Each node has 2 PCIe 5.0 x16 slots, one ConnectX-7 in each, giving every node 800 Gbps of network bandwidth. Across the 2U chassis, that's 3.2 Tbps of aggregate network capacity.

KIOXIA CD9P-R: 8th Gen BiCS FLASH

The CD9P-R is the direct successor to the CD8P-R we used in our original reference architecture. Built on KIOXIA's 8th Generation BiCS FLASH with CBA (CMOS directly Bonded to Array) architecture, it delivers meaningful gains across the board:

Spec	CD8P-R (Previous)	CD9P-R (New)	Change
Max Capacity	30.72 TB	61.44 TB	+2×
Sequential Read	12,000 MB/s	14,800 MB/s	+23%
Sequential Write	5,500 MB/s	7,000 MB/s	+27%
Random Read IOPS	2,000K	2,600K	+30%
NAND Generation	5th Gen TLC	8th Gen TLC (CBA)	+3 gens
Form Factors	2.5", E3.S	2.5", E3.S

We use the 15.36 TB capacity point as the baseline for this reference architecture because it delivers the highest per-drive sequential read performance at 14,800 MB/s. For deployments that need more density, the CD9P-R is also available at 30.72 TB (13,500 MB/s read) and 61.44 TB per drive.

Performance: the DAOS core at 90% of NVMe Datasheet

The DAOS core operates entirely in user-space with zero kernel I/O overhead. Through direct NVMe access via SPDK and RDMA-based networking, the Enakta Platform consistently extracts approximately 90% of the underlying NVMe datasheet performance and delivers it over the network to clients.

Here's what that means for the new building block with 6× 15.36 TB drives per node, depending on drive choice:

Metric	KIOXIA CD9P-R 15.36TB	Micron 7600 PRO 15.36TB
Sequential Read (per drive)	14,800 MB/s	12,000 MB/s
Sequential Write (per drive)	7,000 MB/s	7,000 MB/s
Random Read IOPS (per drive)	2,600K	2,100K

At the 15.36 TB capacity point, the CD9P-R leads on sequential reads (+23%) and random read IOPS (+24%), while both drives match at 7,000 MB/s sequential writes. The CD9P-R is the stronger all-round option at this capacity, particularly for read-heavy workloads like media streaming, AI inference, and dataset serving. The 7600 PRO remains a solid alternative with broad ecosystem support and identical write performance.

Write Amplification from Data Protection

The raw throughput numbers above reflect what the DAOS core can push to and from the drives. But client-visible write throughput depends on the data protection scheme, because every write generates additional data for parity or replication:

6+2 erasure coding: 8 chunks written for every 6 of client data, so client sees 75% of raw write throughput
3-way replication: 3 copies of every write, so client sees 33% of raw write throughput
No protection: 1:1, client sees full raw write throughput (not recommended for production)

Reads are unaffected in normal operation. The DAOS core reads only the data chunks, not parity. The numbers below show both raw drive throughput and client-visible writes with 6+2 EC, which is the most common production configuration.

Per 2U Chassis (4 nodes, 24 drives), KIOXIA CD9P-R

~320

GB/s read throughput
(90% of 24×14.8 GB/s)

~151

GB/s raw write
(90% of 24×7.0 GB/s)

~113

GB/s client write
(6+2 EC, 75% of raw)

369

TB raw capacity

3.2

Tbps aggregate network

Per 2U Chassis (4 nodes, 24 drives), Micron 7600 PRO

~259

GB/s read throughput
(90% of 24×12 GB/s)

~151

GB/s raw write
(90% of 24×7 GB/s)

~113

GB/s client write
(6+2 EC, 75% of raw)

369

TB raw capacity

3.2

Tbps aggregate network

The Network Is No Longer the Bottleneck

This is the single most important change from the previous generation.

In our 2024 reference architecture, each node had 2×100GbE, or 200 Gbps, roughly 25 GB/s of network capacity. But the six CD8P-R drives could deliver over 64 GB/s of read throughput from the DAOS core. The drives could push far more data than the network could carry. The network was the ceiling.

With 2×400GbE ConnectX-7, each node now has 800 Gbps, or 100 GB/s of network capacity. With the CD9P-R at 15.36 TB, the DAOS core delivers ~80 GB/s reads per node, consuming 80% of available network bandwidth. With the Micron 7600 PRO, it's ~65 GB/s reads per node (65%). Both drives write at ~38 GB/s per node (38% of network). Either way, there's substantial headroom for metadata operations, erasure coding rebuild traffic, and replication, without ever contending with client I/O.

The drives are now the bottleneck, not the network. That's exactly where you want the constraint to be in a storage system. Every byte the NVMe can deliver reaches the client. No stranded drive performance. No wasted hardware spend.

And for environments that need even more, RoCEv2 and InfiniBand fabrics are fully supported. The same DAOS core that runs on TCP/Ethernet scales to NDR InfiniBand at 400 Gb/s per port and beyond, up to non-blocking fabrics with thousands of nodes.

Comparison: 2024 vs 2026 Reference Architecture

Metric (per 2U)	2024 Architecture	2026 (CD9P-R)	2026 (7600 PRO)
Chassis	SYS-211GT-HNTR	SYS-212GT-HNR
CPU	Xeon Gold 6430 (32C)	Xeon 6 P-core (up to 86C)
NVMe Drives	6× CD8P-R 15TB /node	6× CD9P-R 15.36TB	6× 7600 PRO 15.36TB
Network per Node	2×100GbE (CX-6)	2×400GbE (CX-7)
Read Throughput	~259 GB/s	~320 GB/s	~259 GB/s
Raw Write Throughput	~119 GB/s	~151 GB/s	~151 GB/s
Client Write (6+2 EC)	~89 GB/s	~113 GB/s	~113 GB/s
Raw Capacity	360 TB	369 TB
Network Bandwidth	800 Gbps	3,200 Gbps
Network Bottleneck?	Yes, drives outpace NICs	No, 20% headroom	No, 35% headroom

Linear Scaling: Proven, Not Theoretical

The DAOS core scales linearly. This isn't a marketing claim. It's a property of the architecture. There are no central metadata servers, no global locks, no single points of contention. Every node added to the cluster adds its full share of throughput, IOPS, and capacity. The same engine scales from 4 nodes to over 1,000 in production at Argonne National Laboratory's Aurora exascale supercomputer.

Here's what the 2026 reference architecture looks like as you scale out. Reads shown as CD9P-R / 7600 PRO. Client writes are identical for both drives at 15.36 TB (both do 7,000 MB/s), shown with 6+2 EC:

Nodes	Chassis (2U)	Read (GB/s)	Client Write (GB/s)	Raw Capacity	Network
4	1	320 / 259	113	369 TB	3.2 Tbps
8	2	640 / 518	226	738 TB	6.4 Tbps
20	5	1,600 / 1,295	565	1.8 PB	16 Tbps
40	10	3,200 / 2,590	1,130	3.7 PB	32 Tbps
100	25	8,000 / 6,475	2,825	9.2 PB	80 Tbps
200	50	16,000 / 12,950	5,650	18.4 PB	160 Tbps

At 25 chassis (50U of rack space, or just over one standard 42U rack), the Enakta Storage Platform delivers up to 8 TB/s of read throughput across 9.2 PB of raw capacity. With 6+2 erasure coding, that's approximately 6.9 PB usable, enough to store and serve every frame of an entire studio's production catalogue at speeds that keep hundreds of editing workstations and render nodes fed simultaneously. For higher density, CD9P-R drives are available at 30.72 TB and 61.44 TB capacity points.

Data Protection at Scale

The Enakta Storage Platform supports flexible data protection including N-way replication and erasure coding. Different datasets on the same cluster can use different protection schemes: 3-way replication for hot working data, 6+2 erasure coding for archive, or any combination.

With 6+2 erasure coding (75% usable capacity):

277

TB usable per 2U
(75% of 369 TB)

6.9

PB usable at 25 chassis
(100 nodes)

<10

min node rebuild
at network speed

When a node fails, the DAOS core rebuilds at network speed, under 10 minutes, not the hours or days that legacy RAID and filesystem rebuilds impose. At 400GbE, rebuild completes even faster than with the previous 100GbE architecture.

Platform Management: Highly Available Head Nodes

Every Enakta Storage Platform deployment includes a dedicated management chassis: one 2U block using the same GrandTwin form factor. Three of the four nodes run in a high-availability cluster, with the fourth as a cold spare ready for immediate failover. A single management chassis can manage multiple storage clusters in the same datacentre, so you deploy it once and it grows with you.

Component	Specification
Chassis	Supermicro SYS-212GT-HNR (2U, 4 nodes)
Role	3 nodes HA cluster + 1 cold spare
CPU	Xeon 6 (16 cores per node)
Memory	128 GB per node
Storage	2 × NVMe 3.84 TB per node
Network	Management network + storage fabric access

The management nodes run the full Enakta platform stack: PXE boot services for bare-metal provisioning, the web management interface, logging, metrics, alerting, and cluster orchestration. Every subsystem is replicated across the three active nodes, so losing any single management node has zero impact on operations. Storage I/O continues uninterrupted regardless of management node state.

The local NVMe drives store PXE images, OS images, log archives, and metrics history. At 3.84 TB per node, there's plenty of room for long-term retention without relying on the storage cluster itself.

For a typical starting deployment (2 storage chassis + 1 management chassis), that's 6U total: compact enough for a single quarter-rack, with the full enterprise platform running from the first boot. As you add more storage clusters, the same management block handles them all.

Looking Ahead: PCIe 6.0 with Micron 9650

The Micron 9650 is the industry's first PCIe Gen 6.0 data centre SSD. It requires a PCIe Gen6-capable platform (not yet available in the GrandTwin form factor), but the numbers show where the next architectural jump is heading:

Spec	KIOXIA CD9P-R	Micron 9650 PRO
Interface	PCIe 5.0	PCIe 6.0
Max Capacity	61.44 TB	30.72 TB
Sequential Read	13,500 MB/s	28,000 MB/s
Sequential Write	7,000 MB/s	14,000 MB/s
Random Read IOPS	2,600K	5,500K
DWPD	1	1
Form Factors	2.5", E3.S	E1.S, E3.S

At 28 GB/s reads per drive, a future 6-drive-per-node configuration would deliver 168 GB/s per node, or 604 GB/s per 2U at 90% DAOS core efficiency. That's more than double the current Gen5 architecture. When PCIe Gen6 platforms ship in the GrandTwin or equivalent form factor, the Enakta Storage Platform will be ready.

And if Micron is delivering these numbers at PCIe 6.0, we can't wait to see what KIOXIA brings to the table with their next generation. Given how the CD9P-R already leads on sequential reads at Gen5, a KIOXIA Gen6 drive could push the per-2U envelope even further.

What's Next: CD9P-R at 61.44 TB

KIOXIA's CD9P-R is also available at 61.44 TB per drive, the highest-capacity data centre NVMe SSD in the CD9P lineup. In the same SYS-212GT-HNR chassis, swapping in 61.44 TB drives doubles the density:

368 TB raw per node (6 × 61.44 TB)
1.47 PB per 2U chassis
36.9 PB in 25 chassis (27.6 PB usable with 6+2 EC)

As KIOXIA continues to push density with their 8th Generation BiCS FLASH, the same chassis and the same Enakta Storage Platform software stack absorbs every generation upgrade without any architectural changes.

Capacity-Optimised Configurations: TLC + QLC

As DAOS matures its metadata-on-NVMe (MD-on-SSD stage 2) capability, we're looking to explore hybrid TLC + QLC configurations, specifically 1 TLC + 5 QLC drives per node. The TLC drive handles metadata and hot data with high IOPS and endurance, while the QLC drives provide massive, cost-effective bulk capacity for cold and warm data.

The QLC landscape is getting very interesting:

Drive	Capacity	Seq. Read	Seq. Write	Interface
Solidigm D5-P5336	up to 122.88 TB	7,000 MB/s	3,300 MB/s	PCIe 4.0
Micron 6500 ION	up to 61.44 TB	12,000 MB/s	5,000 MB/s	PCIe 5.0
KIOXIA LC9	up to 245.76 TB	TBD	TBD	PCIe 5.0

KIOXIA's LC9 series, built on the same BiCS8 architecture as the CD9P-R but with QLC NAND and a 32-die stack, reaches 245.76 TB in a single drive. Five of those per node would deliver nearly 1.23 PB raw per node, or 4.9 PB per 2U chassis. Micron's 6500 ION brings PCIe 5.0 performance to QLC at up to 61.44 TB, with 12 GB/s reads matching the 7600 PRO TLC drive, making it a strong mid-range option. And the Solidigm D5-P5336 at 122.88 TB puts over 2.4 PB in a single 2U on the proven PCIe 4.0 interface.

The economics shift dramatically at these densities. QLC won't match TLC on write endurance or random IOPS, but for read-heavy archive, media asset libraries, and AI training datasets where the data is written once and read many times, the cost-per-terabyte advantage is compelling. Pair that with the DAOS core's erasure coding and the Enakta Platform's ability to tier data across different protection schemes, and you get a system that's both fast where it matters and affordable where it doesn't.

This is still on our roadmap, pending MD-on-SSD stage 2 landing in the DAOS core, but we're actively evaluating these drives and will publish updated configurations as the platform matures.

About the numbers: All throughput estimates in this post are based on published datasheet performance for the KIOXIA CD9P-R and Micron 7600 PRO at the 15.36 TB capacity point, with the DAOS core delivering approximately 90% of NVMe-layer throughput to clients over the network. Actual performance varies with workload, protection scheme, network fabric, and cluster configuration. Contact us for validated benchmarks on your specific workload profile.

Ready to spec your next storage deployment?

We can help you size and configure the right architecture for your workloads, whether it's media post-production, AI training, HPC simulation, or enterprise file services.

Enakta Storage Platform → Let's Talk →

Next-Gen Reference Architecture: KIOXIA CD9P-R, Micron 7600 PRO, Xeon 6, and 400GbE

The Building Block

KIOXIA CD9P-R: 8th Gen BiCS FLASH

Performance: the DAOS core at 90% of NVMe Datasheet

Write Amplification from Data Protection

Per 2U Chassis (4 nodes, 24 drives), KIOXIA CD9P-R

Per 2U Chassis (4 nodes, 24 drives), Micron 7600 PRO

The Network Is No Longer the Bottleneck

Comparison: 2024 vs 2026 Reference Architecture

Linear Scaling: Proven, Not Theoretical

Data Protection at Scale

Platform Management: Highly Available Head Nodes

Looking Ahead: PCIe 6.0 with Micron 9650

What's Next: CD9P-R at 61.44 TB

Capacity-Optimised Configurations: TLC + QLC

Related Posts

Ready to spec your next storage deployment?