In this tutorial, we explore OpenAI's o1 reasoning model and compare its performance against the traditional GPT model for data analysis and image interpretation. The reasoning model has been a hot topic recently, with OpenAI and other companies like DeepSeek developing their own versions.
Unlike traditional GPT models, the o1 reasoning model takes multiple steps to process a prompt before generating responses. This approach allows the model to evaluate different solutions and refine its reasoning iteratively. It also uses reasoning tokens to explore multiple options before settling on the best response. These tokens are used internally and are not included in the final output.
A key distinction between OpenAI's o1 model and DeepSeek's R1 model is that o1 does not reveal its reasoning process, whereas R1 provides visible steps in its output. This fundamental difference impacts how we interpret responses from these models.
Key Differences: o1 Reasoning Model vs. GPT
Multi-step reasoning: o1 iterates through different solutions before producing a final response.
Optimized for complex tasks: The model excels in data analysis, coding, planning, and understanding data visualizations.
Reasoning tokens: These help the model refine its output but are not shown in the response.
Reasoning effort parameter: Controls the depth of reasoning, with options for low, medium, and high effort levels. Higher effort leads to more thorough responses but requires more computational resources.
No temperature setting: Unlike GPT models, o1 does not support the temperature parameter.
Setting Up the OpenAI API
To use OpenAI's API, you need to purchase an API key from OpenAI and store it securely. We recommend using AWS Secrets Manager for safe storage. This ensures the key is not exposed in your code.
Additionally, install the required libraries:
pip install openai boto3 pandas
These libraries allow us to interact with OpenAI's models and manage our API keys securely.
Analyzing Data with the o1 Model
Our dataset for this experiment is diamonds.csv, which contains details about diamonds, including weight, color, clarity, and price. One interesting observation is that diamonds with "IF" (Internally Flawless) clarity tend to have the lowest average price, which seems counterintuitive.
To investigate this, we first analyze the dataset using GPT-4o and then compare the results with the o1 reasoning model.
Using GPT-4o for Data Analysis
We define a function to process the prompt using GPT-4o:
def openai_gpt_help(prompt):
messages = [{"role": "user", "content": prompt}]
response = client.chat.completions.create(
model='gpt-4o',
messages=messages,
temperature=0
)
pprint(f"Tokens used: {response.usage}")
return response.choices[0].message.content
GPT-4o generates a response but does not provide a final answer. Instead, it suggests analyzing different factors like weight and rating, providing Python code to generate charts for further investigation. This means the burden of interpretation still falls on the user.
Using o1 for Data Analysis
Next, we use the o1 model with a high reasoning effort setting:
def openai_o_help(prompt):
messages = [{"role": "user", "content": prompt}]
response = client.chat.completions.create(
model='o1',
reasoning_effort="high",
messages=messages,
)
pprint(f"Tokens used: {response.usage}")
return response.choices[0].message.content
Unlike GPT, the o1 model identifies the key insight—that IF diamonds tend to be smaller, which explains their lower price. It also provides Python code to generate supporting visualizations, offering a more direct and insightful response.
Token Usage Comparison
GPT-4o: No reasoning tokens used
o1 Model: Approximately 2,000 reasoning tokens used to refine the response
The o1 model takes longer and costs more, but the quality of its response is significantly better for complex reasoning tasks.
Analyzing Images with AI
Next, we test how these models interpret images by analyzing a misleading line chart where the Y-axis is flipped. This makes a decreasing trend appear as an increasing one.
We use the following prompt:
image_prompt = [
{"type": "text", "text": "What is wrong with this image?"},
{"type": "image_url", "image_url": {"url": image_url}}
]
GPT-4o Image Analysis
GPT-4o detects the issue but misses key details, stating that the X-axis starts at 600 instead of 0, which is incorrect.
o1 Image Analysis
The o1 model correctly identifies that the Y-axis is upside down, leading to a misleading visualization. This demonstrates its stronger reasoning capabilities when interpreting complex visual data.
Conclusion
In this experiment, OpenAI’s o1 reasoning model outperforms GPT-4o for tasks requiring deep analysis:
For data analysis, o1 provides a clear, structured answer instead of just suggesting exploratory steps.
For image interpretation, o1 gives a more precise analysis of misleading charts.
However, o1 is more expensive and slower than GPT-4o due to the computational cost of reasoning tokens. This makes it ideal for high-stakes decision-making tasks but potentially overkill for simpler queries.