Make AI Verify Its Claims

From:

Prompt Guy <thinkaiprompt@mail.beehiiv.com>

To:

Hidden Recipient <hidden@emailshot.io>

Date:

3/2/2026, 8:31 AM

March 02, 2026

Make AI Verify Its Claims

Chain-of-Verification generates the questions that test whether an answer is actually right.

Racheal From Thinkaiprompt

In partnership with

Reading Time: 5 minutes

Hey Prompt Lover,

New month. We're still in the research.

March is here and we are deep into The Prompt Report — the 200-page academic paper covering 1,565 studies on prompting and prompt engineering that I've been breaking down for you piece by piece since we started this series.

If you're new here, here's the short version of what's happened so far. I read a 200-page research paper so you don't have to. Every newsletter in this series takes one technique from that paper, explains why it works, and gives you a prompt you can copy and use today. No theory without application.

No technique without testing.

Here's where we've been.

Module 1 covered the five-component prompt structure and why prompt sensitivity can collapse your results from a single formatting change.

Module 2 went deep on few-shot prompting — the most cited technique in the entire paper — including why example order matters as much as example quality and how to generate your own examples when you don't have any.

Module 3 covered reasoning techniques: Chain-of-Thought, Contrastive examples, and Self-Consistency.

Module 4 covered complexity: breaking big tasks into sequences and using Tree-of-Thought to search for answers instead of walking toward the first one.

Last issue we opened Module 5 with Self-Refine — building a two-stage process where AI critiques its own draft against specific criteria before you ever see it. A lot of you tested it. The replies were consistent.

The critique pass was finding the same problems you'd been catching manually. Which means you were doing work the AI could have done for you the whole time.

Today we close Module 5.

And this newsletter covers the technique I'd argue matters most for anyone using AI on work where being wrong has real consequences.

Here's Why This Matters

Here's something worth sitting with for a moment.

AI doesn't know when it's wrong.

Not in the way you might hope. The model produces confident language regardless of whether the underlying information is accurate.

A correct fact and an incorrect one come out in the same tone, with the same grammatical confidence, formatted the same way.

You cannot tell them apart by reading the output. You can only tell them apart by checking.

Most people don't check. Not because they're careless. Because checking every claim in an AI output manually is slow, and the whole point of using AI was to go faster.

Chain-of-Verification gives you a structured way to make the AI do the checking. Not perfectly. Not comprehensively. But systematically enough that the errors most likely to cause problems get surfaced before they reach your client, your boss, your investor, or your audience.

The research found this technique improves factual accuracy on knowledge-intensive tasks significantly over unverified outputs.

The structure forces the AI to look at its own claims from the outside — as statements to be tested rather than conclusions to be trusted — and that shift in perspective catches errors that generation never would.

Your Tax Data, Finally in One Place

Are you tired of hunting down data, fixing errors, and manually updating disconnected spreadsheets?

Tax reporting isn’t a simple as it used to be. You need real-time, flexible reporting so you can confidently make decisions backed by accurate, centralized data.

Learn how bringing all your tax information into one central system automates repetitive tasks, improves scenario planning, and frees your team to focus on strategy instead of data entry.

Whether you operate in one country or dozens, Longview Tax scales with you—reducing risk, speeding up your close process, and helping you optimize tax policies across all jurisdictions.

Download the paper

What You'll Learn In This Newsletter

By the end of this issue, you'll have:

• A clear explanation of why AI outputs contain confident errors and how verification catches them

• The exact Chain-of-Verification structure from the research

• A working template for any task where factual or logical accuracy matters

• How to combine this with Self-Refine for a complete two-layer quality process

Let's get started.

What Most People Do Wrong

Most people read AI output the same way they read something a trusted colleague sent them.

With a baseline assumption of accuracy. Checking the parts that feel obviously wrong. Accepting the parts that feel plausible. Moving on.

That approach works for human-generated content because humans usually know when they're uncertain. They hedge. They qualify. They say "I think" or "you might want to check this." The uncertainty shows up in the language.

AI doesn't do that. AI produces the same confident tone whether it knows something or half-knows it. The hedge that would tell you to verify something never arrives. So you don't verify. And occasionally something wrong makes it through because it sounded exactly like everything that was right.

The fix isn't reading more carefully. The fix is building verification into the process before you read it at all.

Quick Reality Check

I now run Chain-of-Verification on any AI output that contains facts, figures, attributions, or claims I'm going to repeat to someone else. Not because AI is unreliable. Because confident wrong information looks identical to confident right information and I'd rather find out which is which before the meeting than during it.

The Prompt That Works

Chain-of-Verification runs in four stages. Each one builds on the last.

▼ COPY THIS PROMPT:

Stage 1 — Generate the output:

[Run your normal task prompt here. Get your first output. Then continue to Stage 2.]

Stage 2 — Generate verification questions:


Here is an output I need to verify:

[Paste Stage 1 output here]

Generate a list of specific verification questions that would test whether the claims in this output are accurate. Focus on:

Factual claims that could be checked against a source

Numbers, dates, percentages, or figures

Attributions — claims that something belongs to or was said by a specific person or company

Causal claims — statements that one thing caused or leads to another

Comparative claims — statements that one thing is better, larger, faster, or more effective than another

List each question separately. Be specific. "Is this number correct?" is not a useful question. "Was Company X's Series B funding round $40 million or a different amount?" is.

Stage 3 — Answer the verification questions:

Here are the verification questions from the previous step:

[Paste Stage 2 output here]

Answer each question as accurately as you can. For each answer, state your confidence level: High, Medium, or Low. For any answer where your confidence is Medium or Low, explain specifically what you're uncertain about and what a reliable source would look like to verify it.

Stage 4 — Produce verified final output:

Here is the original output:

[Paste Stage 1 output here]

Here are the verification results:

[Paste Stage 3 output here]

Produce a revised version of the original output that:

Corrects any claims the verification identified as likely wrong

Qualifies any claims where confidence was Medium or Low with appropriate language

Flags any claims that could not be verified with a note that external verification is recommended

Leaves confirmed accurate claims unchanged

How To Use This Prompt

Step 1: Run your normal task prompt first and get your Stage 1 output. Don't change anything about how you normally prompt. Chain-of-Verification is a layer on top of your existing process, not a replacement for it.

Step 2: Run Stage 2 on the output. Read the verification questions before moving on. If a question is too vague to be testable, tighten it yourself before Stage 3. The quality of Stage 3 depends entirely on the quality of Stage 2.

Step 3: Run Stage 3. Pay close attention to the Medium and Low confidence answers. These are your flags. Anything the AI marks as uncertain is a candidate for manual verification before you use the output.

Step 4: Run Stage 4 to get the verified final output. This version will have corrections, qualifications, and flags built in. It's not a clean document yet — it's a draft with the problems surfaced. Your job now is to decide which flagged items you'll verify manually and which you'll remove from the output entirely if they can't be confirmed.

Step 5: Do a final pass on the flagged items yourself. Chain-of-Verification doesn't replace human judgment on high-stakes claims. It narrows the field dramatically. Instead of checking everything, you're checking the specific items the process flagged. That's a manageable task instead of an overwhelming one.

Why This Prompt Works

Chain-of-Verification works because it separates production from scrutiny.

When AI generates an output, it's producing. Moving forward. The claims it makes are a byproduct of generation, not the result of checking. When you force it to generate verification questions about its own claims, you're making it look at those claims as statements to be tested. That's a different cognitive task. And different tasks surface different problems.

The research found that the question-generation step is the most important part of the process. The act of generating specific, testable questions about the output forces a level of analytical attention that generation alone never activates. The AI is essentially auditing itself. And auditing finds things that producing never would.

The four-stage structure also matters. Running it all in one prompt produces weaker results than running each stage separately. The separation forces genuine attention at each step rather than a compressed version of all four happening simultaneously.

Combining Self-Refine And Chain-of-Verification

Last newsletter covered Self-Refine — critiquing the writing quality of the output before you use it.

This newsletter covers Chain-of-Verification — checking the factual and logical accuracy of the output before you use it.

For high-stakes work, run both. In this order:

First, Self-Refine. Fix the writing problems. Tighten the structure. Remove the generic language.

Then Chain-of-Verification. Check the factual claims in the improved draft.

You end up with output that's both well-written and factually checked. That combination is what professional-grade AI-assisted work actually looks like. Not a first draft cleaned up quickly. A two-layer process that catches the two different types of problems AI output consistently contains.

The Bigger Lesson Here

Confident language is not the same as accurate information.

That distinction matters more with AI than with almost any other writing tool because AI never signals its own uncertainty the way a careful human writer does. Every claim comes out wrapped in the same confident prose. The errors hide in plain sight.

Chain-of-Verification doesn't make AI more accurate at the generation stage. It creates a process that finds inaccuracies before they reach anyone who matters. That's a different kind of reliability. Not better inputs. Better checking.

In any workflow where the output will be read by someone whose trust you can't afford to lose, that checking is not optional. It's the job.

What Changes After Using This

The first time you run Chain-of-Verification on an output you were about to send, you will find something.

Maybe a figure that needs qualifying. Maybe an attribution that's slightly off. Maybe a causal claim that sounds logical but isn't supported by the context you provided. Something.

After a few weeks of running this on anything high-stakes, you'll develop a different relationship with AI output. Less default trust. More structured verification. Not because AI is unreliable but because you'll have built a process that makes reliability something you can actually check rather than something you have to hope for.

That shift — from hoping the output is right to knowing which parts have been checked — is the most professionally important change this series can produce in how you work.

Try This Right Now

Find the last piece of AI output you used for something that mattered. A report. An analysis. A client deliverable. An email making a claim you needed to be right.

Run Stage 2 on it right now. Just Stage 2. Generate the verification questions.

Read them. Count how many of those questions you actually checked before you sent the original output.

That number is the gap Chain-of-Verification closes.

What's Coming Next

Module 5 is done.

Next we move into Module 6: Prompt Engineering Automation. The first newsletter covers Meta-Prompting — using AI to improve your prompt before you run it. Not after the output disappoints you.

Before. The technique takes a broken or underperforming prompt, asks AI to diagnose what's wrong with it, and generates an improved version.

It's the technique I wish I'd had in my first year of prompt work. The one that makes the AI fix the problem you don't know how to fix yourself.

New module. New month. We keep going.

Reply With Your Results

Run Chain-of-Verification on something factual this week and reply with what it flagged.

Tell me what the verification questions caught. Tell me if anything in the output was wrong that you would have used without checking. Tell me if the process saved you from a mistake that would have mattered.

I read every reply.

— Prompt Guy

Update your email preferences or unsubscribe here

228 Park Ave S, #29976, New York, New York 10003, United States

Similar newsletters

There are other similar shared emails that you might be interested in: