CES 2024: Begun the Inference Wars Have

As much as the tech world is excited by “AI”, for makers of semiconductors the excitement is much more tempered. Everyone recognizes that there is a big opportunity out there, but there is considerable trepidation as to who will actually share in those benefits. One year into the large language model (LLM) revolution and the biggest beneficiary is Nvidia, by far. So far that there are many who worry that Nvidia will come to dominate not only AI but all of compute. That is the extreme view, but there are many reasons to think that Nvidia will not leave much room for everyone else. 

Nvidia now captures 70%+ share of data center compute spend. They are also set to dominate the market for AI training semis for the foreseeable future. Hence the shift in most semis messaging to talk about the market for AI Inference. This is going to be a much larger market than training, but how exactly that happens and who will participate remains unclear. 

First and foremost, there is the very big question as to where inference will take place – in the Cloud in some data center or at the Edge on user devices. For some time, we have operated on the assumption that the only way the economics of AI inference pencil out is for much of the workload to end up being done at the edge, meaning the end user pays the ‘capex’. Inference is going to be very expensive and even the deep pockets of the hyperscalers will struggle to afford the necessary build out. Beyond that, there are reasons around security, privacy and “data sovereignty” which lend themselves to edge inference. 

At CES, we had that view seriously challenged several times. For instance, we met with privately-held Groq. They are a new-model company that designed their own inference chip (the LPU), but monetize through software providing low-cost, high-speed inference via their PaaS platform. Their contention is that their approach is sufficiently superior to the alternatives that it bends the cost curve enough to make cloud inference economical and edge inference superfluous. Of course, we have many questions about Groq – how good is there performance really and can they compete with the hyperscalers when it comes to providing cloud services. That being said, they have a company and an approach that merits more attention.

Separately, we spoke to several companies making large moving objects who felt fairly strongly that cloud inference worked for most of their needs. For instance, we spent a lot of time in the John Deere booth. Deere is probably the most advanced industrial company when it comes to compute and autonomy. We spoke with someone from their software team about the edge vs cloud debate. According to him, much of the Deere software stack can be run perfectly well in the Cloud. Some tasks like vehicle safety (i.e. do not run over farm workers) probably need to be run on the vehicle, but many other tasks, including navigation, run perfectly well for them from the cloud. (They also run a robust cellular service to provide that low latency connectivity.) It was clear from our conversation with him that as a software designer he felt much larger compute constraints in the cloud than at the edge, implying much smaller uplift for edge silicon. 

Admittedly, these are anecdotal data points from the far end of the capabilities spectrum. Most other companies will have to contend with very different environments. Nonetheless, they give us pause in to reconsider our assumptions about how this will play out.

Ultimately, this debate will be settled on the software side. Can developers continue to bring out models and tools which run on the edge? The answer to this question is important, and the stakes are huge. Companies like AMD, Intel and Qualcomm are all counting on a big uplift for edge inference.

2 responses to “CES 2024: Begun the Inference Wars Have”

  1. Sam Avatar
    Sam

    Hope CES went well! Why do we think the economics would not work for inference done in the cloud? At the edge, it appears consumer might not be willing to pay a higher price for a vague inference concept too. Though it might just be as of now. Appreciate it!

    1. D/D Advisors Avatar

      Inference is very expensive. Most of the estimates I have seen and the two napkins and half a model I have built all make it seem like cloud inference is going to require a lot of capex. If the gen AI operators can move half that inference load to the edge than they probably do not need to build any more data centers than they already had planned.
      I totally agree that there is no reason why consumers today would pay more for AI on their devices, but that is a problem for the chip vendors, it means they can’t raise prices or charge a premium for the on-chip AI they are already designing into their CPUs. We already have laptops with AI built-in (i.e. MacBooks) and we are getting Windows AI PCs this year, but as it stands, those will still all be at the same prices as today’s PCs with no AI. Same for phones. Which would be a huge win for the hyperscalers if they can then push some share of AI inference to those chips.

Leave a Reply to D/D AdvisorsCancel reply