OpenAI o1 Reasoning Model vs. GPT: Smarter AI for Data & Images

In this tutorial, we explore OpenAI's o1 reasoning model and compare its performance against the traditional GPT model for data analysis and image interpretation. The reasoning model has been a hot topic recently, with OpenAI and other companies like DeepSeek developing their own versions.

Unlike traditional GPT models, the o1 reasoning model takes multiple steps to process a prompt before generating responses. This approach allows the model to evaluate different solutions and refine its reasoning iteratively. It also uses reasoning tokens to explore multiple options before settling on the best response. These tokens are used internally and are not included in the final output.

A key distinction between OpenAI's o1 model and DeepSeek's R1 model is that o1 does not reveal its reasoning process, whereas R1 provides visible steps in its output. This fundamental difference impacts how we interpret responses from these models.

Demo code: https://github.com/lbsocial/data-analysis-with-generative-ai/blob/main/Openai_o1_Reasoning_vs_GPT.ipynb

Key Differences: o1 Reasoning Model vs. GPT

Multi-step reasoning: o1 iterates through different solutions before producing a final response.
Optimized for complex tasks: The model excels in data analysis, coding, planning, and understanding data visualizations.
Reasoning tokens: These help the model refine its output but are not shown in the response.
Reasoning effort parameter: Controls the depth of reasoning, with options for low, medium, and high effort levels. Higher effort leads to more thorough responses but requires more computational resources.
No temperature setting: Unlike GPT models, o1 does not support the temperature parameter.

Setting Up the OpenAI API

To use OpenAI's API, you need to purchase an API key from OpenAI and store it securely. We recommend using AWS Secrets Manager for safe storage. This ensures the key is not exposed in your code.

Additionally, install the required libraries:

pip install openai boto3 pandas

These libraries allow us to interact with OpenAI's models and manage our API keys securely.

Analyzing Data with the o1 Model

Our dataset for this experiment is diamonds.csv, which contains details about diamonds, including weight, color, clarity, and price. One interesting observation is that diamonds with "IF" (Internally Flawless) clarity tend to have the lowest average price, which seems counterintuitive.

To investigate this, we first analyze the dataset using GPT-4o and then compare the results with the o1 reasoning model.

Using GPT-4o for Data Analysis

We define a function to process the prompt using GPT-4o:

def openai_gpt_help(prompt):

    messages = [{"role": "user", "content": prompt}]

    response = client.chat.completions.create(

        model='gpt-4o',

        messages=messages,

        temperature=0

    pprint(f"Tokens used: {response.usage}")

    return response.choices[0].message.content

GPT-4o generates a response but does not provide a final answer. Instead, it suggests analyzing different factors like weight and rating, providing Python code to generate charts for further investigation. This means the burden of interpretation still falls on the user.

Using o1 for Data Analysis

Next, we use the o1 model with a high reasoning effort setting:

def openai_o_help(prompt):

    messages = [{"role": "user", "content": prompt}]

    response = client.chat.completions.create(

        model='o1',

        reasoning_effort="high",

        messages=messages,

    pprint(f"Tokens used: {response.usage}")

    return response.choices[0].message.content

Unlike GPT, the o1 model identifies the key insight—that IF diamonds tend to be smaller, which explains their lower price. It also provides Python code to generate supporting visualizations, offering a more direct and insightful response.

Token Usage Comparison

GPT-4o: No reasoning tokens used
o1 Model: Approximately 2,000 reasoning tokens used to refine the response

The o1 model takes longer and costs more, but the quality of its response is significantly better for complex reasoning tasks.

Analyzing Images with AI

Next, we test how these models interpret images by analyzing a misleading line chart where the Y-axis is flipped. This makes a decreasing trend appear as an increasing one.

We use the following prompt:

image_prompt = [

    {"type": "text", "text": "What is wrong with this image?"},

    {"type": "image_url", "image_url": {"url": image_url}}

GPT-4o Image Analysis

GPT-4o detects the issue but misses key details, stating that the X-axis starts at 600 instead of 0, which is incorrect.

o1 Image Analysis

The o1 model correctly identifies that the Y-axis is upside down, leading to a misleading visualization. This demonstrates its stronger reasoning capabilities when interpreting complex visual data.

Conclusion

In this experiment, OpenAI’s o1 reasoning model outperforms GPT-4o for tasks requiring deep analysis:

For data analysis, o1 provides a clear, structured answer instead of just suggesting exploratory steps.
For image interpretation, o1 gives a more precise analysis of misleading charts.

However, o1 is more expensive and slower than GPT-4o due to the computational cost of reasoning tokens. This makes it ideal for high-stakes decision-making tasks but potentially overkill for simpler queries.