Ginger6 G6 Cobalt Max — RTX 5090 AI Workstation with CPU Offloading
Description
G6 Cobalt Max — CPU Offloading Workstation for DeepSpeed ZeRO, PyTorch FSDP, and 30B+ Models
The G6 Cobalt Max is built for AI and ML researchers using DeepSpeed ZeRO-Offload or PyTorch FSDP CPU offloading to train models larger than 32GB VRAM on a single GPU. 128GB DDR5 system RAM provides the capacity for optimiser states, gradients, and model parameters to be offloaded from the RTX 5090 to system RAM during training — extending the effective model size range significantly beyond what 32GB VRAM alone supports. The Ryzen 9 9950X's 16 cores and DDR5 memory bandwidth keep the CPU offloading process from bottlenecking GPU utilisation. From £4200, built and stress-tested in Wolverhampton. The Cobalt Max features in our AI and ML workstations range.
The Cobalt Max and the G6 Cobalt use the same RTX 5090 32GB GPU. If your model fits within 32GB VRAM and you do not use ZeRO or FSDP CPU offloading, the Cobalt with 64GB system RAM is the correct machine — the Cobalt Max adds cost without benefit for that workflow. Call Kevin before ordering if you are uncertain whether your training pipeline requires CPU offloading.
Not sure whether you need the Cobalt Max or the Cobalt? Call Kevin on 01902 714533 — describe your training framework and model size and he will give you a straight answer.
G6 Cobalt Max — Full Specification
RTX 5090 32GB, 128GB DDR5, and Ryzen 9 9950X — the maximum VRAM and maximum system RAM configuration for CPU offloading research workloads.
Why This Specification for CPU Offloading and Large Model Training
Every component in the Cobalt Max is chosen for the specific demands of DeepSpeed ZeRO and PyTorch FSDP offloading workloads, where system RAM capacity and CPU bandwidth determine whether the offloading process extends or limits training.
128GB system RAM: enables CPU offloading for models beyond 32GB VRAM
DeepSpeed ZeRO and PyTorch FSDP move optimiser states and gradients to system RAM, allowing models larger than 32GB VRAM to train on a single GPU. Without 128GB system RAM, the offload process itself becomes memory-limited at scale. This machine extends the effective training range significantly beyond what the standard G6 Cobalt with 64GB supports.
RTX 5090 + CPU offload: a practical path to large model training
Multi-GPU training with consumer cards does not pool VRAM — each GPU holds its own model shard. CPU offloading with a single RTX 5090 and 128GB system RAM achieves training of models beyond 32GB VRAM on a single machine, at lower complexity and cost than a multi-GPU setup. For researchers who need to extend beyond 32GB VRAM without the infrastructure overhead of multi-GPU training, this is the practical path.
Ryzen 9 9950X: offloading demands CPU memory bandwidth
CPU offloading is compute-intensive on the host side. DeepSpeed ZeRO and PyTorch FSDP move data between GPU and system RAM continuously during training — this data movement is bandwidth-intensive. The Ryzen 9 9950X's 16 cores and DDR5 memory bandwidth keep the offload process from bottlenecking GPU utilisation. An underpowered CPU extends the time the GPU spends waiting for the host between steps.
APNX C1: RTX 5090 capable with distinctive character
The APNX C1 provides RTX 5090-compatible airflow in a case with strong visual presence. For a researcher's workstation in a lab or studio, the thermal performance is the reason for the case choice — the APNX C1 handles the RTX 5090's 575W TDP under sustained training load. The appearance is incidental but the thermal performance is not.
What the G6 Cobalt Max Handles
Confirmed software performance at the G6 Cobalt Max specification. Workload scales based on Ryzen 9 9950X, 128GB DDR5, RTX 5090 32GB, and 2TB plus 4TB NVMe.
Performance descriptors are indicative. Actual performance depends on project complexity, settings, and system configuration. Kevin can advise on the right spec for your specific workflow.
CPU Offloading Extends the Effective Model Size. 128GB System RAM Is What Makes It Work.
DeepSpeed ZeRO-Offload and PyTorch FSDP are techniques for training models whose total parameter, gradient, and optimiser state footprint exceeds GPU VRAM capacity. The mechanism is straightforward: during training, the states that are not needed for the current computation step — the optimiser states between parameter updates, the gradients between backward and update steps — are moved from GPU VRAM to system RAM. The GPU is freed to hold only the parameters needed for the current forward and backward pass. System RAM holds the offloaded states until they are needed again. This allows a single RTX 5090 with 32GB VRAM to participate in training workloads whose full memory footprint significantly exceeds 32GB, as long as system RAM is large enough to hold the offloaded states.
The reason 128GB system RAM is required — and not 64GB — is the scale of what is being offloaded at 30B+ parameter level. The optimiser states for a 30B parameter model in fp16 with an Adam optimiser occupy approximately 120GB in system RAM under ZeRO-2 offloading. 64GB is insufficient to hold these states alongside the operating system, the Python environment, and the data pipeline. 128GB provides the capacity for the full offloaded state at this model scale. The Ryzen 9 9950X's 16 cores and DDR5 memory bandwidth matter here too — the data movement between GPU and system RAM during training is CPU-bandwidth-bound, and an underpowered CPU on a slower memory platform introduces delays between steps that reduce effective GPU utilisation during the training loop. Kevin's conversation before the order confirms your model size, your training framework, and whether ZeRO-Offload or FSDP is in your pipeline. The 3-year warranty and post-delivery support apply from day one.
Tell Kevin:
- The software you use most and the version
- Your typical file sizes or project scales
- Whether you need to run multiple applications simultaneously — and which ones
- Your approximate budget and whether this is for one machine or a team
No charge for the conversation. No pressure to buy.
93% Five-Star Reviews on Trustpilot
93% of Ginger6 customers leave five-star reviews. A research workstation at this level needs support that remains available after delivery. Kevin builds the Cobalt Max, stress-tests it at sustained load, and is reachable when training pipelines evolve or questions arise.
See all reviews"I upgraded my PC to one that was Windows 11 compatible. I have been using it for about 3 months with no problems. The service from Ginger 6 has been great."
"Placed order, and received it earlier than expected. Windows and drivers already installed so computer was good to go right out of the box. Runs perfectly, have no complaints, only good things to say! Recommended!!"
"I have been using Ginger 6 since 2014 for gaming PCs for wife and myself. That's 5 purchased in total. Never had a technical issue with any of the builds and the only reason for new purchases is technical obsolescence. Highly recommend them."
Built by Hand in Wolverhampton
Every G6 Cobalt Max is assembled, configured, and tested by Kevin's team. The 128GB DDR5 memory configuration is verified at rated speed before the 24-hour test begins — an additional step that is specific to this machine.
Before the build begins, the configuration is reviewed against your training framework, model architecture, and offloading setup. If you have spoken to Kevin, the spec is confirmed against your use of DeepSpeed ZeRO or PyTorch FSDP, your model size, and your sequence lengths. The Ryzen 9 9950X and X870 Eagle WIFI7 are verified for AM5 compatibility at 128GB DDR5. RTX 5090 stock is confirmed before the build begins. The dual NVMe layout is confirmed for your dataset and checkpoint requirements. Components are staged before assembly.
The Cobalt Max is assembled by hand in Wolverhampton. Inside the APNX C1, cables are routed to maintain clear airflow paths to the RTX 5090 and the 360mm radiator, reduce dust build-up around both NVMe drives, and keep future maintenance accessible. BIOS settings and DDR5 memory profiles are confirmed at 128GB before the 24-hour stress test begins — this is an additional verification step specific to the 128GB configuration, ensuring the RAM runs at rated speed and the memory subsystem is stable under the bandwidth demands of the offloading process. The 1000W PSU is confirmed for stable power delivery before the test starts.
Every Cobalt Max runs under sustained GPU and CPU load for a full day before it ships. The test replicates the combined demand of an overnight training run with CPU offloading — the RTX 5090 held at sustained high utilisation at its 575W operating envelope, with the Ryzen 9 9950X managing the continuous data movement between GPU and system RAM. The 128GB DDR5 subsystem stability is confirmed under this sustained bandwidth demand. Windows 11 Pro, drivers, and both NVMe drives are confirmed before packaging.
- Thermal behaviour under sustained RTX 5090 training load
- Processor and graphics stability during extended use
- Memory responsiveness and stability at 128GB under offloading bandwidth
- Storage performance and consistency across both NVMe drives
- BIOS and firmware stability
- System stability under extended use
G6 Cobalt Max — Common Questions
DeepSpeed ZeRO-Offload is a technique for training models whose total memory footprint exceeds GPU VRAM capacity. During training, the states that are not needed for the current computation step — the optimiser states between parameter updates, the gradients between the backward pass and the parameter update — are moved from GPU VRAM to system RAM. The GPU is then freed to hold only the parameters and activations needed for the current step. This allows training of models whose full state footprint significantly exceeds GPU VRAM, as long as system RAM is large enough to hold the offloaded states and the CPU is fast enough to manage the data movement without becoming the bottleneck. You need ZeRO-Offload when your model's combined parameter, gradient, and optimiser state footprint exceeds your GPU VRAM — typically when training 30B+ parameter models in fp16 on a single GPU. If your model fits within 32GB VRAM, you do not need ZeRO-Offload, and the G6 Cobalt with 64GB is the correct machine.
The answer depends on your model size and which FSDP sharding strategy you use. With FSDP full sharding and CPU offloading enabled, the optimiser states for a 30B parameter model in fp16 with an Adam optimiser occupy approximately 120GB in system RAM under ZeRO stage 2 offloading. The operating system, Python environment, data loader, and other processes add additional RAM consumption. 128GB is the practical minimum to hold the offloaded states for a 30B parameter model alongside normal system processes. For models significantly above 30B, or for complex offloading configurations where more states are offloaded simultaneously, RAM requirements increase further. Call Kevin and describe your model size, parameter count, and offloading configuration before ordering — he will confirm whether 128GB covers your specific setup.
Both machines use the same RTX 5090 32GB GPU and Ryzen 9 9950X processor. The single difference is system RAM: the G6 Cobalt has 64GB DDR5, the Cobalt Max has 128GB DDR5. 128GB is required when your training pipeline uses DeepSpeed ZeRO-Offload or PyTorch FSDP CPU offloading to train models whose full state footprint exceeds 32GB VRAM — 64GB system RAM is insufficient for the offloading process at 30B+ scale. If your model fits within 32GB VRAM and you do not use CPU offloading, the Cobalt with 64GB system RAM is the correct machine and the Cobalt Max adds cost without benefit for your workflow. The case is also different: the Cobalt uses a Corsair 5000D, the Cobalt Max uses an APNX C1.
Yes, CPU offloading adds latency compared to keeping the full training state in VRAM, because data movement between GPU and system RAM takes time that would otherwise be used for computation. The throughput reduction depends on the offloading configuration — ZeRO stage 2 with only optimiser state offloading has less overhead than full parameter offloading — and on the CPU and memory bandwidth available on the host side. The Ryzen 9 9950X's DDR5 memory bandwidth minimises the host-side latency in this data movement, but some throughput reduction compared to a pure-VRAM training configuration is unavoidable. The trade-off is that CPU offloading makes it possible to train models that would otherwise require multiple high-end GPUs or cloud instances with HBM-based VRAM. For researchers whose model genuinely exceeds 32GB VRAM, the throughput trade-off of offloading on a single RTX 5090 is typically preferable to the cost and complexity of the alternatives.
Build time is 3 to 5 working days from order confirmation, including the 24-hour stress test and the 128GB DDR5 memory profile verification applied before dispatch. Delivery to UK mainland addresses is free and fully tracked. RTX 5090 stock is subject to availability. Call Kevin on 01902 714533 before ordering to confirm current stock and the build timeline, particularly if you have a research project start date in mind.
Ready to Order the G6 Cobalt Max?
Ginger6 has been building custom workstations in Wolverhampton since 2001. Kevin confirms RTX 5090 stock and your offloading setup before the build, verifies the 128GB DDR5 configuration, stress-tests the machine at sustained training load, and is available after delivery. 93% five-star reviews. 3-year warranty with lifetime support.
Custom Options
£7,211.69
£6,769.99
Specifications
Additional Information
| Processor | AMD Ryzen 9 9950X |
|---|---|
| Processor Type | AMD Ryzen 9 |
| No of Cores | 16 |
| Max Core Speed | 5.70GHz |
| CPU Cooler | 360mm ARGB AIO Liquid Cooler |
| Motherboard | Gigabyte X870 EAGLE WIFI7 |
| Case | APNX Creator C1 Black |
| Power Supply | Corsair 1000w RM1000e 80+ Gold Full Modular |
| Memory Size | 128GB |
| Solid State Drive Size | 2TB |
| 2nd SSD | 4TB |
| Graphics | Nvidia RTX 5090 32GB |
| Graphics Card Connections | Displayport (x3), HDMI |
| Audio | Realtek ALC (HD Audio) |
| LAN | 2.5GB LAN, Wi-Fi 7 |
| Ethernet | Realtek 2.5GbE |
| Wi-Fi | WiFi 7 (MediaTek MT7925 rev1.0 / Realtek RTL8922AE rev1.1) |
| Bluetooth | 5.4 |
| Connections | Rear: 2x USB-C, 1x USB 3.2 Gen2, 3x USB 3.2 Gen1, 4x USB 2.0 |
| Front Panel Connections | 2x USB-A 3.x, 1x USB-C, HD Audio + Mic |
| USB2 Ports | 4 |
| USB3 Ports | 6 |
| USB-C Ports | 3 |
| Operating System | Windows 11 Pro |
| Monitors | Optional (See Custom Options) |
| Warranty | 3 Year Bronze Warranty |
Reviews
- Be the first to review this product




