NVIDIA Vera Rubin Architecture Unveiled: Next-Generation AI Supercomputer Reduces Inference Costs by 10x

NVIDIA has unveiled its next-generation AI supercomputer platform, ‘Vera Rubin.’ The goal is to achieve 5x the inference performance compared to Blackwell, and reduce the cost per token by a factor of 10. Scheduled for release in the second half of 2026, it sets a new standard for AI computing.

Announced at CES 2026, the Vera Rubin platform consists of a total of six new chips. According to NVIDIA’s official announcement, the core is the NVL72 rack-scale configuration that combines the Rubin GPU and Vera CPU. This configuration bundles 72 GPUs into a single system to handle inference tasks for large-scale AI models. Of particular note is the cost-effectiveness. According to a Tom’s Hardware report, the cost per token can be reduced by 10x compared to the Blackwell architecture. For AI service operators, inference costs are one of the biggest burdens, and if this figure is realized, it is expected to have a significant ripple effect throughout the industry. The Vera CPU adopts an ARM-based architecture to improve power efficiency. The communication bandwidth between GPUs has also been greatly expanded through the 6th generation NVLink interconnect. The NVIDIA blog explains that the Rubin platform is part of a blueprint that encompasses autonomous driving and the open model ecosystem. It’s not just about improving hardware performance, but a strategy to redesign the entire AI infrastructure.

The emergence of Vera Rubin has the potential to fundamentally change the cost structure of the AI industry. If inference costs are actually reduced by 10x, it will open an era where small and medium-sized enterprises can also operate large-scale AI services. Of course, actual performance needs to be verified after release, but it is clear that NVIDIA’s roadmap will once again reshape industry standards. The adoption rate of cloud service providers after its release in the second half of the year will be key.

FAQ

Q: When will NVIDIA Vera Rubin be released?

A: NVIDIA has announced its release in the second half of 2026. The exact month has not yet been disclosed.

Q: What improvements have been made compared to Blackwell?

A: Inference performance is improved by up to 5x, and the cost per token is reduced to 1/10th. The 6th generation NVLink and ARM-based Vera CPU have also been newly introduced.

Q: What is the Vera Rubin NVL72 configuration?

A: It is a configuration that integrates 72 Rubin GPUs into a single rack-scale system. It is designed to handle the training and inference of large-scale AI models in a single system.

Leave a Comment