OpenAI's o1-preview: A summary

The News

On September 12th, OpenAI introduced o1 Preview to the world. While OpenAI is claiming big improvements in science, coding, and math benchmarks, what is the real business impact of this model ?

We’ve been experimenting with it in our current Proofs of Concept (POCs) for clients who have the option of using closed Large Language Models (LLMs). o1 does stand out in areas where previous models have struggled, particularly in tasks requiring reflective thinking and complex problem-solving. Calling it a complete “game changer” may be premature as we need more perspective on the model itself, but we think it will have a major impact on agentic system designs.

o1 Itself

The approach builds upon established techniques, notably Chain of Thought (CoT), combined with specialized training. CoT, introduced in 2022, encourages models to think through problems step-by-step, leading to fewer errors and hallucinations. However, OpenAI special sauce makes it more than just an implementation of CoT, and gives it ability that we haven’t seen in other models. An example from yesterday showed a researcher recreating the code he took 10 month to write in his first PHD year.. in 6 prompts.

Despite these strengths, that we explore in this in-depth blog post, the model is still in preview, and key features like file processing, function calling, streaming and system prompts are not yet available. This limits its immediate usability for certain customer scenarios. But this also means that the preview model might be not as good as the final version, and it’s already showing really strong performance: It does not make mistakes on a considerable number of puzzles previously reported to be hard on LLMs, and is able to solve some problems that were previously impossible natively for an LLMs.

On a side note, OpenAI has opted to keep its Chain of Thought hidden from end users, citing reasons like security and user experience. However, this move likely reflects strategic considerations as they move towards a more closed and competitive environment, especially given reports that OpenAI may shift towards a for-profit structure .

What Should You Do?

First, there’s no immediate rush. While we’ve recommended that our clients with Azure Subscriptions request access as soon as possible, and OpenAI’s Tier 5 API clients start testing, others will need to wait for the model to be more broadly available. Depending on how much you already spent on API calls, it might also make sense to add credit to your account to go to Tier 5, but beware that the current limitation of the models might make it unsuitable for production use.

As with any major release, the advice remains straightforward: gain access to the model and test it in your staging pipelines. If you are seeking performance improvements, especially in reasoning tasks, o1 may offer significant benefits. However, if your current system is functioning well, we suggest maintaining your existing setup and exploring R&D in other areas.

Systems with a focus on speed won’t benefit much from o1, as CoT introduces more token generation for the intermediary thinking steps, adding latency to the overall process. As a result, these systems remain largely unaffected.

Clients using only locally-based LLMs are not directly impacted, but o1’s architecture may influence the direction of some open-source projects in the mid-to-long term.

The Cost Factor

Currently, o1-preview is significantly more expensive than gpt-4o. With the gpt-4o-2024-08-06 model becoming the default on October 2nd, 2024, priced at $2.50 per million input tokens and $10.00 per million output tokens, o1-preview’s cost of $15.00 per million input tokens and $60.00 per million output tokens is a significant consideration. The o1-mini version is more affordable at $3.00 per million input tokens and $12.00 per million output tokens.

Additionally, o1-preview and o1-mini do not currently offer a batching option, preventing cost reductions for non-urgent computations. While batching may be added in the full release, for now, costs cannot be reduced by half as they can with other GPT models.

But pricing is only a question of how many tokens you use, So don’t be fooled by the price per token. If your system use less token because of o1, you might win in performance for little to no extra-cost, or even cheaper. See our in-depth blog post for details.

Final Note

For us at Bot Resources, o1 reaffirms the importance of intelligence in system architecture.

As always, test thoroughly, evaluate carefully, and ensure that your systems are designed to deliver the most value for your business needs.

If you want to know more about o1, check out Jean-Xavier’s blog post about his experiences with the model !