Sign Up for Free Email Newsletter

Contact Us

We look forward to hearing from you and will get back to you right away.

Search The AI Software Report

Search for articles and insights about software, technology trends, and industry news

Thinking Machines Lab (TML) has significantly expanded its infrastructure agreement with Google Cloud, becoming one of the first customers to utilize the NVIDIA Blackwell-based A4X Max virtual machines. The new deal centers on the deployment of NVIDIA GB300 NVL72 systems within Google’s AI Hypercomputer architecture. In early bench-marking, TML reported a 2x increase in training and serving speeds compared to previous-generation hardware, a leap critical for the lab’s increasingly complex reinforcement learning workloads.

Become a Subscriber

Please purchase a subscription to continue reading this article.

Subscribe Now

The partnership moves beyond raw compute, leveraging Google’s integrated AI stack to solve persistent data bottlenecks. TML is utilizing the Jupiter network for near-instantaneous weight transfers, alongside Google Kubernetes Engine for massive-scale orchestration and Spanner for managing transactional metadata. By combining these with a custom node-level caching solution, TML is able to maintain continuous model training while simultaneously serving production-grade workloads at a global scale.

The expanded capacity is specifically designed to support the development of TML’s frontier models and its fine-tuning product, Tinker. Myle Ott, Founding Researcher at TML, noted that the automated remediation provided by Google’s Cluster Director and the reliability of the integrated stack allow the team to focus on high-level research rather than infrastructure maintenance. 

Read more