Beyond the
GPU Bottleneck: Reflections from Milken on AI’s Hypergrowth Era
Beyond the
GPU Bottleneck: Reflections from Milken on AI’s Hypergrowth Era
This week at the Milken Institute Global Conference I joined a panel discussing the constraints builders are running into as AI enters hypergrowth. The conversation centered on the physical foundations of AI: GPUs, memory, data centers, power, supply chains, and the extraordinary pressure that AI growth is placing on each of them.
Though it is a fascinating problem that will require incredible levels of global coordination to solve, I was struck by a thought that the premise of the panel and its ensuing conversation was a bit like talking to an LLM. If you start early with an assumption, the hallucination in this analogy, that all useful AI will be token based and require massive data centers, then there is no problem more pressing.
I believe that assumption is incorrect. The roadmap for AI looks more like an ecosystem of efficient, purpose built models (or perhaps a sandwich of models), each designed for a different kind of intelligence, and not all requiring the massive amounts of infrastructure we’ve seen with LLMs.
Even if we solved the GPU bottleneck tomorrow, we would still face a deeper question: what kinds of AI systems are we actually building on top of all this infrastructure?
The answer cannot simply be “larger language models.”
If we assume that the future of AI is simply larger and larger language models, then the infrastructure roadmap becomes a race for more: more compute, more data, more capital expenditure. That may be necessary, but it does not answer the deeper question of whether the systems we are building are reliable and accurate enough for the places we want to deploy them.
At Logical Intelligence, we build energy-based reasoning models, or EBMs. The simplest way to explain an EBM is this: instead of asking, “What is the most likely answer?” it asks, “Which answers are valid under the constraints of this system?” It evaluates possible solutions against rules, relationships, and constraints, and rejects invalid answers outright.
During the panel, I tried to explain this distinction by comparing LLMs to the way humans reason. Human beings do not reason only through language. Language is how we express many of our thoughts, but it is not the full substrate of reasoning. We reason spatially, causally, physically, mathematically, and through constraints. We understand that some things are not merely unlikely; they are impossible. We know that a bridge cannot violate physics, a financial system cannot ignore balance-sheet constraints, and a factory line cannot operate in a state that breaks the machine.
That is a different architecture for a different class of problem. And because they are not token based, EBMs do not have the hallucination issues that worry the designers of critical systems. And they do not need anywhere near the scale of physical inputs that other models demand.
The compute bottleneck is real. But the deeper bottleneck is reliable reasoning. More infrastructure can make AI bigger. It does not automatically make that particular AI useful for every purpose. Designing at the speed of AI should not mean accepting the trending assumptions. It should mean asking the better question: not how much scale AI will require, but what kind of intelligence we are actually trying to build.
This week at the Milken Institute Global Conference I joined a panel discussing the constraints builders are running into as AI enters hypergrowth. The conversation centered on the physical foundations of AI: GPUs, memory, data centers, power, supply chains, and the extraordinary pressure that AI growth is placing on each of them.
Though it is a fascinating problem that will require incredible levels of global coordination to solve, I was struck by a thought that the premise of the panel and its ensuing conversation was a bit like talking to an LLM. If you start early with an assumption, the hallucination in this analogy, that all useful AI will be token based and require massive data centers, then there is no problem more pressing.
I believe that assumption is incorrect. The roadmap for AI looks more like an ecosystem of efficient, purpose built models (or perhaps a sandwich of models), each designed for a different kind of intelligence, and not all requiring the massive amounts of infrastructure we’ve seen with LLMs.
Even if we solved the GPU bottleneck tomorrow, we would still face a deeper question: what kinds of AI systems are we actually building on top of all this infrastructure?
The answer cannot simply be “larger language models.”
If we assume that the future of AI is simply larger and larger language models, then the infrastructure roadmap becomes a race for more: more compute, more data, more capital expenditure. That may be necessary, but it does not answer the deeper question of whether the systems we are building are reliable and accurate enough for the places we want to deploy them.
At Logical Intelligence, we build energy-based reasoning models, or EBMs. The simplest way to explain an EBM is this: instead of asking, “What is the most likely answer?” it asks, “Which answers are valid under the constraints of this system?” It evaluates possible solutions against rules, relationships, and constraints, and rejects invalid answers outright.
During the panel, I tried to explain this distinction by comparing LLMs to the way humans reason. Human beings do not reason only through language. Language is how we express many of our thoughts, but it is not the full substrate of reasoning. We reason spatially, causally, physically, mathematically, and through constraints. We understand that some things are not merely unlikely; they are impossible. We know that a bridge cannot violate physics, a financial system cannot ignore balance-sheet constraints, and a factory line cannot operate in a state that breaks the machine.
That is a different architecture for a different class of problem. And because they are not token based, EBMs do not have the hallucination issues that worry the designers of critical systems. And they do not need anywhere near the scale of physical inputs that other models demand.
The compute bottleneck is real. But the deeper bottleneck is reliable reasoning. More infrastructure can make AI bigger. It does not automatically make that particular AI useful for every purpose. Designing at the speed of AI should not mean accepting the trending assumptions. It should mean asking the better question: not how much scale AI will require, but what kind of intelligence we are actually trying to build.