Quantum-inspired math wasn’t expected to reshape AI models—until it did. Enrique Lizaso, Founder and CEO of Multiverse Computing, launched the company in 2019 to apply quantum and tensor-network methods to hard optimization problems. What the team didn’t anticipate was that those same methods would turn out to be highly effective for compressing neural networks.
That insight pushed the team toward a harder challenge. In 2024, they tested their methods on full-scale LLMs, treating them as the largest structures they could stress. What emerged was CompactifAI, a system that cuts model size by roughly 80–90% while reducing inference energy by about half. The impact shows up immediately in real deployments. Companies running continuous inference can lower costs, enterprises can bring models on-prem without assembling oversized GPU clusters, and device makers can run meaningful LLMs directly on local hardware. Even data centers see a measurable lift in throughput on fixed power budgets.
Lizaso walks through that shift from quantum R&D to an applied AI company, and how compression—rather than sheer scale—is opening new paths for where and how large models can run.