Do new AI reasoning models require new approaches to prompting?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


The era of reasoning AI is well underway.

After OpenAI once again kickstarted an AI revolution with its o1 reasoning model introduced back in September 2024 — which takes longer to answer questions but with the payoff of higher performance, especially on complex, multi-step problems in math and science — the commercial AI field has been flooded with copycats and competitors.

There’s DeepSeek’s R1, Google Gemini 2 Flash Thinking, and just today, LlamaV-o1, all of which seek to offer similar built-in “reasoning” to OpenAI’s new o1 and upcoming o3 model families. These models engage in “chain-of-thought” (CoT) prompting — or “self-prompting” — forcing them to reflect on their analysis midstream, double back, check over their own work and ultimately arrive at a better answer than just shooting it out of their embeddings as fast as possible, as other large language models (LLMs) do.

Yet the high cost of o1 and o1-mini ($15.00/1M input tokens vs. $1.25/1M input tokens for GPT-4o on OpenAI’s API) has caused some to balk at the supposed performance gains. Is it really worth paying 12X as much as the typical, state-of-the-art LLM?

As it turns out, there are a growing number of converts — but the key to unlocking reasoning models’ true value may lie in the user prompting them differently.

Shawn Wang (founder of AI news service Smol) featured on his Substack over the weekend a guest post from Ben Hylak, the former Apple Inc., interface designer for visionOS (which powers the Vision Pro spatial computing headset). The post has gone viral as it convincingly explains how Hylak prompts OpenAI’s o1 model to receive incredibly valuable outputs (for him).

In short, instead of the human user writing prompts for the o1 model, they should think about writing “briefs,” or more detailed explanations that include lots of context up-front about what the user wants the model to output, who the user is and what format in which they want the model to output information for them.

As Hylak writes on Substack:

With most models, we’ve been trained to tell the model how we want it to answer us. e.g. ‘You are an expert software engineer. Think slowly and carefully

This is the opposite of how I’ve found success with o1. I don’t instruct it on the how — only the what. Then let o1 take over and plan and resolve its own steps. This is what the autonomous reasoning is for, and can actually be much faster than if you were to manually review and chat as the “human in the loop”.

Hylak also includes a great annotated screenshot of an example prompt for o1 that produced a useful results for a list of hikes:

This blog post was so helpful, OpenAI’s own president and co-founder Greg Brockman re-shared it on his X account with the message: “o1 is a different kind of model. Great performance requires using it in a new way relative to standard chat models.”

I tried it myself on my recurring quest to learn to speak fluent Spanish and here was the result, for those curious. Perhaps not as impressive as Hylak’s well-constructed prompt and response, but definitely showing strong potential.

Separately, even when it comes to non-reasoning LLMs such as Claude 3.5 Sonnet, there may be room for regular users to improve their prompting to get better, less constrained results.

As Louis Arge, former Teton.ai engineer and current creator of neuromodulation device openFUS, wrote on X, “one trick i’ve discovered is that LLMs trust their own prompts more than my prompts,” and provided an example of how he convinced Claude to be “less of a coward” by first “trigger[ing] a fight” with him over its outputs.

All of which goes to show that prompt engineering remains a valuable skill as the AI era wears on.

Related Posts

This DOGE Engineer Has Access to the National Oceanic and Atmospheric Administration

An engineer named Nikhil Rajpal is representing Elon Musk’s so-called Department of Government Efficiency (DOGE) task force at the National Oceanic and Atmospheric Administration (NOAA), according to multiple sources. Government…

Read more

Apple’s M2 MacBook Air drops to $800

The M2 MacBook Air is on sale for just $800 via Amazon. This is a decent discount, as this model has been going for around $1,000 lately. The well-reviewed laptop…

Read more

Elon Musk’s Takeover Is Causing Rifts in Donald Trump’s Inner Circle

Elon Musk and President Donald Trump, publicly at least, are on good terms. Yet when it comes to the staff in and around the new administration, it’s a different story….

Read more

Why Did Fiat Discontinue The Multipla, And How Much Is One Worth Today?

Studio MDF/Shutterstock The Fiat Multipla is often considered one of the ugliest and weirdest cars ever made due to its bulbous behind, beluga-like blubbery forehead, and extra set of headlights….

Read more

How To Remove Snow From Your Car Without A Scraper

We may receive a commission on purchases made from links. Tetra Images/Getty If you’ve ever lived somewhere that experiences harsh winters with heavy snowfall, you probably understand just how debilitating…

Read more

5 Things You Can’t Buy With An Amazon Gift Card

Nicole Glass Photography/Shutterstock It often feels like you can buy almost anything from Amazon. The company’s website and mobile apps offer millions of products, ranging from books and music to…

Read more

Leave a Reply