As much as the tech world is excited by “AI”, for makers of semiconductors the excitement is much more tempered. Everyone recognizes that there is a big opportunity out there, but there is considerable trepidation as to who will actually share in those benefits. One year into the large language model (LLM) revolution and the biggest beneficiary is Nvidia, by far. So far that there are many who worry that Nvidia will come to dominate not only AI but all of compute. That is the extreme view, but there are many reasons to think that Nvidia will not leave much room for everyone else.
Nvidia now captures 70%+ share of data center compute spend. They are also set to dominate the market for AI training semis for the foreseeable future. Hence the shift in most semis messaging to talk about the market for AI Inference. This is going to be a much larger market than training, but how exactly that happens and who will participate remains unclear.
First and foremost, there is the very big question as to where inference will take place – in the Cloud in some data center or at the Edge on user devices. For some time, we have operated on the assumption that the only way the economics of AI inference pencil out is for much of the workload to end up being done at the edge, meaning the end user pays the ‘capex’. Inference is going to be very expensive and even the deep pockets of the hyperscalers will struggle to afford the necessary build out. Beyond that, there are reasons around security, privacy and “data sovereignty” which lend themselves to edge inference.
At CES, we had that view seriously challenged several times. For instance, we met with privately-held Groq. They are a new-model company that designed their own inference chip (the LPU), but monetize through software providing low-cost, high-speed inference via their PaaS platform. Their contention is that their approach is sufficiently superior to the alternatives that it bends the cost curve enough to make cloud inference economical and edge inference superfluous. Of course, we have many questions about Groq – how good is there performance really and can they compete with the hyperscalers when it comes to providing cloud services. That being said, they have a company and an approach that merits more attention.
Separately, we spoke to several companies making large moving objects who felt fairly strongly that cloud inference worked for most of their needs. For instance, we spent a lot of time in the John Deere booth. Deere is probably the most advanced industrial company when it comes to compute and autonomy. We spoke with someone from their software team about the edge vs cloud debate. According to him, much of the Deere software stack can be run perfectly well in the Cloud. Some tasks like vehicle safety (i.e. do not run over farm workers) probably need to be run on the vehicle, but many other tasks, including navigation, run perfectly well for them from the cloud. (They also run a robust cellular service to provide that low latency connectivity.) It was clear from our conversation with him that as a software designer he felt much larger compute constraints in the cloud than at the edge, implying much smaller uplift for edge silicon.
Admittedly, these are anecdotal data points from the far end of the capabilities spectrum. Most other companies will have to contend with very different environments. Nonetheless, they give us pause in to reconsider our assumptions about how this will play out.
Ultimately, this debate will be settled on the software side. Can developers continue to bring out models and tools which run on the edge? The answer to this question is important, and the stakes are huge. Companies like AMD, Intel and Qualcomm are all counting on a big uplift for edge inference.
Leave a Reply to SamCancel reply