OpenAIs latest model unleashes some powerful new functionality
Sep 19, 2024
Introduction
The release of OpenAI's o1 model is significant. For many teams this is going to unlock capabilities that 12 months ago where well beyond their reach. The primary beneficiaries are likely going to be analysts and those looking to leverage simpler prompts. We have done a deep benchmarking internally and wanted to share some of the insights.
Key Features of OpenAI's o1 Model
Multi-Step Reasoning: The o1 model excels in multi-step reasoning, allowing it to perform complex tasks that require several stages of thought and analysis. It plans its own strategy, essentially running the model multiple times and building prompts for the next cycle so that each cycle of the AI tackles a more focused set of issues and problems. This behaviour, often replicated by teams using 'agentic' or multi-agent design patters, is now available out of the box, making it more accessible to everyone not just the teams who have built their own frameworks.
Scalability and Adaptability: Designed to handle large-scale data and adapt to various contexts, the o1 model can process vast amounts of information quickly and efficiently, making it ideal for complex legal tasks. Because its running the models again and again you find that the models forget fewer details when you feed lots of details in. This was a major drawback of previous generations.
Mathematical and Analytical Capabilities: Beyond multi-step reasoning, OpenAI has improved the handling of numbers, likely through some kind of memory module that helps the LLM keep count. This foundational solution addresses a hard challenge that many teams have been working on in their own way.
Implications for Wordsmith
Improved Legal Calculations and legal scenario modeling
With its advanced reasoning, the o1 model can improve analysis of numerical legal concepts by identifying and modeling potential risks.
Some of our most common use-cases at Wordsmith are legal and finance teams throwing hypothetical scenarios against agreements to stress test how they will behave. Typical risk modeling includes.
"What would happen if we wanted to terminate this agreement in the event that the provider had 2 days of downtime"
"How much would it cost us to terminate without cause after 15 months?"
"Theoretically, what is the max liability exposure that this agreement could create for us in a scenario where our platform had major downtime" - the Crowdstrike scenario.
For these situations o1 is a major step up. Breaking down these scenarios into the required steps to give a great answer. By far the biggest improvements we have found are in numerical modeling and liability scenarios. This model performs about 17% better on our mathematical evaluations.
Better Outcomes with Simpler Prompts
The o1 model simplifies complex prompt designs. For example, previously, a question might need to be very detailed to get a comprehensive analysis. Now, you can simply ask "analyze email thread," and the model will infer and perform the necessary pre-planning.
Unlike previous models, the o1 model does not require overly detailed prompts. Instead, it is designed to "fill in the gaps" on its own, making it easier to use. The downside here is that it might not analyze the same things you would, however to get you going its very powerful.
For example, instead of asking, "Can you breakdown this email and identify what actions I need to take, prioritizing them by urgency and the date of the request, giving extra weight to anything that feels time sensitive?" you can now say, "what action to do i need to take here" and the model will deliver a detailed response, making a great stab at the most important priorities and breaking down what it think should should be looking at.
Considerations
Lack of Customization Over Planning
The LLM planning means that you are not able to be explicit about the steps that you want it to take and how you want it to think at each stage. Depending on your workflows and how your system is designed, this might in places lead to a degradation in the desired output.
Overthinking
There is a limit to the step-up that additional cycles of AI will give you on iterating on a problem. Somewhat connected to the above, sometimes you need to introduce domain knowledge or change the sequences to get the output you want.
Cost and Speed
The advanced capabilities of the o1 model will come at a higher cost, which is an important consideration for its deployment. Additionally, the model's latency can limit its applications, especially in scenarios requiring quick responses. Sometimes 10-20 second delays are just not an acceptable user experience and its better to iterate on smaller sharper cycles.
Conclusion
The lessons learned from OpenAI's o1 model present exciting opportunities for Wordsmith and the legal AI industry. While its not perfect for every scenario, it will be a level up for many parts of the legal tech experience.