OpenAI’s New Model Is Definitely Something

OpenAI released a new model today called OpenAI o1. We read through their materials and a host of coverage and analysis about it, and we can definitely conclude that people have opinions.

OpenAI positions the new model as being capable of multi-step reasoning, being able to solve multi-step problems. They demonstrate this by showing o1 solving a host of math problems. OpenAI says the new system is very good and math and programming, with other physical sciences capabilities coming soon. At one level, this sounds somewhat trivial, like ChatGPT but better at actually getting the right answer. But 01 is fairly significant in that this kind of reasoning is hard to do and points to much greater capabilities possibly down the road.

That being said, for the moment all that is a bit speculative. Aside from the limited preview nature of the 01 model available so far, the company did not provide much information about what is technically different about the new model. Their blog post mentions that they added reinforcement learning to their training. Reinforcement learning is not new, prior to the rise of Transformer-based models, it was a favored method for training AI models.

The company spends a lot of time talking about how good the new model is in passing various examinations intended for humans. The company details where 01 ranks in passing things like the AIME Math Competition (83rd percentile) and the CodeFoces competitive coding test (89th percentile). We probably could not pass any of these exams, so their significance is beyond us.

OpenAI also includes a sizable section on Safety. They claim to have reduced, but not entirely eliminated hallucinations. Also, they seem to have embedded many safety protocols into the model training itself. Current version of ChatGPT embed those protocols into the prompt in a manner that is not apparent to users. This seems to have caused serious performance degradation over time, as more and more safety protocols were added. The 01 Model seems to have fixed that problem, although we imagine many safety concerns still linger.

Probably the most interesting thing that comes out of the model is what it tells us about the need for semis. First, 01 demonstrates that there are still large gains to be made by growing model sizes. There is a lot of debate in AI software circles as to how important having ever larger models is. In theory, at some point the gains from model sizes will plateau, but if that is true o1 shows we are nowhere near that point. Secondly, OpenAI concedes that inference for this model is much more expensive. In their analysis they point out that scaling inferences is a key constraint, and pricing for the new inference service is triple that of previous models. Our first impression of all of this is that the world (or at least OpenAI) needs more Nvidia GPUs. The second is that the market for inference semis is still wide open. We have to imagine the team at Groq is parsing OpenAI’s data closely. Groq is now positioning itself as among the lowest cost inference providers. It is too soon to tell if they have an advantage in an 01 type model, but there is clearly a big opportunity for someone here.

Photo by Google Gemini