o1 a true game changer ?

The Night AI Surprised Us All

Last week, I had it all planned out. I was going to write an article addressing a significant challenge faced by AI language models. It was going to be an eye-opening piece for non-specialists in the field. But then, OpenAI decided to throw a curveball by releasing o1. Thanks for the surprise, OpenAI.

Picture this: It’s late at night in Europe, and I’m about to call it a day when the news breaks. Suddenly, I’m wide awake, running tests until 3:30 AM like a kid with a new toy. My weekend plans? Let’s just say they took an unexpected AI-themed detour.

To add to the mix, I had just finished a kickoff meeting with a leading financial institution for a complex AI project mere hours before the announcement. Nothing like having to revamp the entire “What model will we use?” section of your presentation overnight. And you know how it is with financial institutions – getting anything whitelisted is about as easy as teaching a cat to bark. At least the client was outside the EU, allowing us to use US-East2 servers and sidestep those pesky GDPR delays. Small victories, right?

Despite the sleepless night and frantic reworking of plans, I can’t help but be intrigued by o1’s potential. It’s not just another incremental update; it represents a significant step forward in AI capabilities. So, grab your coffee (or your beverage of choice), and let’s dive into why o1 is shaping up to be a game-changer in the world of AI.

Putting o1 to the Test: Self-Awareness and Complex Problem-Solving

The Self-Aware AI

When a new Generative AI model hits the scene, we AI enthusiasts can’t resist putting it through its paces. With o1, we decided to start with a deceptively simple question that’s been giving AI a headache for years: “How many words will you use in the response to this question?”

Now, you might be thinking, “What’s so hard about that?” Well, imagine trying to write a report while only being able to see one word at a time - and you have to guess how long your report will be before you’ve finished writing it. That’s the challenge AI faces with this question.

Traditional AI models struggle with this task because they generate responses one piece at a time, unable to look ahead or reflect on what they’ve already produced. It’s like trying to plan a road trip without a map - you’re just guessing where you’ll end up.

But o1? It handled this question with surprising ease. o1 can think about its answer while creating it - much like how we humans can plan out a sentence before saying it aloud. This self-reflection ability is a game-changer, opening up new possibilities for more accurate and context-aware AI responses in business applications.

Cracking the Mathematical Conundrum

To really push o1’s limits, we threw a complex mathematical problem at it - one that even human mathematicians sometimes struggle with. Without getting too deep into the math (I promise!), let’s just say it’s the kind of problem that requires not just calculation, but real reasoning and problem-solving skills.

The results were impressive. O1 not only solved the problem but approached it from multiple angles, demonstrating a level of flexibility and depth of understanding we haven’t seen before in AI models.

Putting o1 to the Test: Writing This Article with an Agentic System

In a moment of what I can only describe as “journalistic mad science,” I decided to turn this article into a testing ground for o1. Picture this: me, playing the role of the slightly frazzled idea generator, o1 as the prompt engineer (because apparently, I needed help telling other AIs what to do), and Claude 3.5 Sonnet as our diligent copywriter. It was like assembling an AI boyband, but with less choreography and more XML.

I provided the brainpower (or what passes for it after my third coffee), supplying detailed instructions and ideas. O1 took these ramblings and, like a digital Marie Kondo, organized them into tidy XML instructions for Claude. Meanwhile, Claude sat in the corner, probably wondering why it couldn’t just write the article itself.

Now, here’s where it gets interesting (and slightly embarrassing for yours truly). All the core ideas, analysis, and structural elements were human-made. Yes, that’s right - my brain cells actually had to fire up for this one. O1’s job was to make sure my ideas didn’t sound like they came from a sleep-deprived hamster on a wheel. And boy, did it excel at that.

But here’s the kicker: when I tried to get o1 to write the content directly, it was about as engaging as watching paint dry in slow motion. The writing was flatter than my attempts at homemade sourdough during lockdown. No humor, no personality - just pure, unadulterated AI blandness.

Claude 3.5 Sonnet, on the other hand, took o1’s instructions and turned them into something people might actually want to read. It added flair, humor, and dare I say, a touch of sass. It was like watching a master chef turn my grocery list into a gourmet meal.

The most humbling part of this whole experience? Realizing that o1 is a far better prompt engineer than I could ever hope to be. It’s like watching a toddler solve a Rubik’s cube while you’re still trying to figure out which end is up. O1’s ability to craft precise instructions for Claude was nothing short of impressive, making me question my career choices and wonder if I should just let the AIs take over already. Everytime I tried to give instruction to Claude by myself, the overall quality of the article dropped significantly… While o1 had no issue whatsoever. Humbling, to say the least.

In the end, this experiment taught me that o1 truly shines as a manager and orchestrator in AI systems. It’s like having a super-efficient, never-sleeps, never-needs-coffee project manager who can herd cats (or in this case, other AI models) with ease. But when it comes to creative writing, I’ll stick with Claude. O1 can structure my workflow, but I’ll leave the witty one-liners to the professionals.

So there you have it - a behind-the-scenes look at how this article came to be, with a healthy dose of human ideas, AI assistance, and a sprinkling of existential crisis on my part. Now, if you’ll excuse me, I need to go contemplate my place in this brave new AI world. Perhaps over another cup of coffee

What These Tests Tell Us About o1

O1’s performance in these tests shows a leap forward in AI capabilities. It’s not just about getting the right answer; it’s about the flexibility and depth of understanding demonstrated by the model. And remember, we’re seeing these impressive results in the preview version – imagine what the full version might be capable of!

o1 and AI Teamwork: A Game-Changing Approach

Teaching AI to Double-Check Itself

Traditionally, when we want agentic system to produce high-quality, accurate content, we use what’s called a “generator-critic pattern.” Imagine you’re writing an important email, and you have a friend read it over before you send it. In agentic terms, one agent (the generator) writes the email, and another agent (the critic) checks it for mistakes or weird phrasing.

This process often required multiple back-and-forth interactions, kind of like if you and your friend kept passing the email draft back and forth, making tweaks each time. It works, but it’s time-consuming and uses a lot of processing power.

Enter o1’s reflexivity. This new model can reflect and self-correct during the generation process, like having a writer and editor in one brain. It potentially reduces the need for as many separate AI agents or as many interactions between them.

However, it’s worth noting that while o1 can self-correct, we might still need “critic” agents for double-checking, especially in high-stakes tasks like legal or financial analysis. We’re still testing to see just how far we can trust o1’s self-editing skills. For this article o1 made a few mistakes when calculating the fees below. Critic agents would have caught them.

o1’s Knack for Following Complex Instructions

One of the most impressive things about o1 is its ability to understand and follow complex instructions. It excels at breaking down tasks into clear, manageable steps and working through them methodically. This capability is particularly valuable in managing AI systems for complex projects.

For example, in our work at Bot Resources, we often need to create intricate agentic systems for our clients. With o1, we can provide detailed, multi-step instructions for a task, and it will follow them precisely, without losing focus or mixing up the steps. This level of instruction-following is a significant improvement in model capabilities and especially critical for agentic systems.

The Potential and Limitations of o1

The benefits of o1’s reflexivity are clear: it could significantly reduce the complexity of agentic systems and cut down on the number of interactions needed to complete a task. This translates to time and cost savings for businesses using AI solutions. It also opens up new possibilities for more efficient manager agents in agentic systems which is a key factor in the success of the most complex AI projects.

However, it’s important to note that o1 isn’t a magic solution for every AI challenge. It’s still primarily a language model, which means it excels at text-based tasks but isn’t designed for tasks involving images, videos, or sound. For these, we still need specialized models.

Additionally, while o1’s self-correction abilities are impressive, we’re not ready to completely do away with critic agents, especially in high-stakes situations as shown above.

Looking to the future, o1’s capabilities represent a significant advancement, but there’s still much to explore. We need more testing to fully understand how it will perform in complex multi-agent systems and how it might integrate with other AI models for multi-modal tasks (tasks that involve multiple types of data, like text and images together).

The Economics of Innovation: Balancing Cost and Efficiency

Understanding o1’s Price Tag

Let’s talk money, shall we? Because when it comes to cutting-edge AI, the price tag can sometimes make your eyes water. But stick with me – there’s more to this story than just numbers.

Brace yourselves, because these figures might make your credit card weep:

  • o1 Preview: A cool $15.00 per million input tokens and a wallet-busting $60.00 per million output tokens.
  • GPT4o: A more modest $5.00 per million input tokens and $15.00 per million output tokens.

Now, I know what you’re thinking: “Holy smokes, o1 is more than thrice as expensive!” And you’re not wrong. At first glance, o1 looks like it’s trying to empty your wallet faster than a teenager with your credit card at a gaming convention.

But here’s where it gets interesting. The real magic of o1 isn’t in its price per token – it’s in how it might slash the number of tokens you need to get the job done.

So, before you start sweating about the cost, let’s dive into how this actually plays out in the real world.

Efficiency Gains: When Less Is More (Expensive)

o1 costs more per token than GPT4o. It’s like comparing a fancy espresso machine to your trusty old drip coffee maker. But here’s the kicker: o1 is so efficient, especially in agentic systems, that it might actually save you money in the long run. How? By being the Usain Bolt of AI – getting the job done in fewer steps. In an agentic system, using GPT4 or Claude 3.5, you can end up with a situation where the back and forth between agents grows exponentially with the complexity of the task. A reflexive model like o1 can cut down on this complexity, greatly reducing the number of tokens needed to complete a task.

Let me break it down for you with a little example, assuming that because o1 is more efficient, it requires less back and forth in an agentic system:

Old Way (GPT4o): 5 API calls, 1000 tokens input/output each Cost: 5 * $0.005 + 5 * $0.015 = $0.1 New Way (o1): 2 API calls, 1000 tokens input/output each Cost: 2 * $0.015 + 2 * $0.06= $0.15 Okay, I see you squinting at those numbers. Yes, o1 is still a tiny bit more expensive in this scenario. But it’s a much more reasonable 50% increase not 3-4 times more expensive.

So, what does this mean for your business? Well, for those really gnarly, complex tasks that would have your regular AI model spinning its wheels, o1 could be a game-changer. We’re talking about tasks that might have taken multiple rounds of back-and-forth with GPT4o, now potentially solved in one go with o1.

Quality Over Quantity: When the Price Is Right

O1 isn’t just another model upgrade – it’s a potential game-changer for how businesses use teams of agentic system to solve complex problems.

The idea of a manager agent coordinating the work of other agents is nothing new, it dates back from mid 2023, which is ancient history in agentic system terms. But it was really hard to achieve. Models have a tendency to either stop too early or too late. O1’s reflexive capabilities could be the key to finally cracking this nut, making it easier to manage complex AI systems and reducing the need for multiple agents.

We’re still in early stages, and o1 needs to prove itself in real-world business scenarios. But for companies willing to explore, this technology could offer a significant edge in applying agentic system to complex problems that were previously out of reach.

That’s why o1 is potentially a true game changer.

Reshaping the AI Landscape: New Players, New Rules?

The introduction of o1 is OpenAI’s latest move in the rapidly evolving AI landscape. But it’s worth remembering the “We have no moat” memo - a leaked Google document from May 2023 that argued open-source AI would outcompete both Google and OpenAI. Just a year later, it feels like ancient history, yet its core message remains relevant.

O1’s release might be seen as OpenAI’s attempt to build that moat, with its reflexive capabilities potentially becoming the new focus in AI development and very little information of how exactly they acheive this feat. The “Open” in OpenAI is becoming really missleading.

But in a field moving this quickly, can any company maintain a significant lead for long? This development puts pressure on competitors like Google, Anthropic, Meta, and Mistral. We’re likely to see a surge in research and products centered on self-improving, self-checking AI systems. For businesses, this could mean more robust and reliable AI tools in the near future.

As the dust settles in the coming weeks, we’ll get a clearer picture of o1’s true impact. For now, one thing is certain: the AI landscape continues to evolve at breakneck speed, and staying alert to these changes is crucial for both businesses and AI professionals.