Thinking Machines Becomes First to Deploy NVIDIA Blackwell on Google Cloud

Thinking Machines Lab (TML) has significantly expanded its infrastructure agreement with Google Cloud, becoming one of the first customers to utilize the NVIDIA Blackwell-based A4X Max virtual machines. The new deal centers on the deployment of NVIDIA GB300 NVL72 systems within Google’s AI Hypercomputer architecture. In early bench-marking, TML reported a 2x increase in training and serving speeds compared to previous-generation hardware, a leap critical for the lab’s increasingly complex reinforcement learning workloads.

Become a Subscriber

Please purchase a subscription to continue reading this article.

Subscribe Now

The partnership moves beyond raw compute, leveraging Google’s integrated AI stack to solve persistent data bottlenecks. TML is utilizing the Jupiter network for near-instantaneous weight transfers, alongside Google Kubernetes Engine for massive-scale orchestration and Spanner for managing transactional metadata. By combining these with a custom node-level caching solution, TML is able to maintain continuous model training while simultaneously serving production-grade workloads at a global scale.

The expanded capacity is specifically designed to support the development of TML’s frontier models and its fine-tuning product, Tinker. Myle Ott, Founding Researcher at TML, noted that the automated remediation provided by Google’s Cluster Director and the reliability of the integrated stack allow the team to focus on high-level research rather than infrastructure maintenance.

Thinking Machines Becomes First to Deploy NVIDIA Blackwell on Google Cloud

Published

Category

Become a Subscriber

Contact Us

Email

Phone

Search The AI Software Report

Published

Category

Become a Subscriber

Related Insights

ServiceNow Advances Agentic Workforce Management with AI Orchestration

Siris Capital, a PE Firm Focused on Companies in the Midst of Transition

Workfront Going Strong with Recent $280 Million Deal