How Claude Reached 95% Analytics Accuracy Artwork

Intellectually Curious

Intellectually Curious is a podcast by Mike Breault featuring over 1,800 AI-powered explorations across science, mathematics, philosophy, and personal growth. Each short-form episode is generated, refined, and published with the help of large language models—turning curiosity into an ongoing audio encyclopedia. Designed for anyone who loves learning, it offers quick dives into everything from combinatorics and cryptography to systems thinking and psychology.

Inspiration for this podcast:

"Muad'Dib learned rapidly because his first training was in how to learn. And the first lesson of all was the basic trust that he could learn. It's shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad'Dib knew that every experience carries its lesson."

― Frank Herbert, Dune

Note: These podcasts were made with NotebookLM. AI can make mistakes. Please double-check any critical information.

Show More

Intellectually Curious

How Claude Reached 95% Analytics Accuracy

June 04, 2026 • Mike Breault

0:00 | 6:19

We dissect how Anthropic tackled data ambiguity, staleness, and retrieval chaos to automate the majority of business analytics with Claude. Anthropic's technical guide describes the development of an agentic analytics stack designed to automate business data insights using Claude. The strategy centers on overcoming three primary obstacles: conceptual ambiguity, data staleness, and retrieval failures. To ensure high accuracy, the framework prioritizes robust data foundations, a strictly enforced semantic layer, and specialized procedural skills that guide the AI's reasoning. The methodology also incorporates adversarial reviews and continuous offline evaluations to maintain the integrity of automated reports. Ultimately, this system allows data teams to shift their focus from repetitive queries to high-level strategic modeling.

Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC

SPEAKER_00 0:00

So I will literally never forget this boardroom meeting from a few years ago. Just an absolute spreadsheet disaster. We had uh three department heads sitting around this big table, all looking at their own reports, and we realized that every single one of them had a completely different definition for the term active user.

SPEAKER_01 0:18

Oh no, that is a classic trap.

SPEAKER_00 0:20

Right. Like one included trial accounts, one didn't, one excluded weekend logins. I mean, nobody actually knew how many users we had.

SPEAKER_01 0:27

Aaron Powell It is so common though. We all want this, you know, single source of truth, but raw business data is just inherently messy. And that specific ambiguity is exactly why AI has traditionally just crashed and burned when trying to handle business analytics.

SPEAKER_00 0:41

Aaron Powell, which is exactly what we're jumping into today for you. We're looking at how Anthropic actually cracked this code, figuring out how to get their AI, Claude, to accurately automate like 95% of business analytics queries.

SPEAKER_01 0:54

It is a massive leap forward, truly.

SPEAKER_00 0:56

Yeah. And real quick before we get into the actual mechanics of that, if you need help with AI training or automation or integration or software development, or you're just uncovering where agents could make the most impact for your business or personal life, check out Embersilk.com for your AI needs. They are the sponsor of today's deep dive.

SPEAKER_01 1:15

Highly recommend them.

SPEAKER_00 1:16

So looking at Anthropic's challenge here, it really makes sense that AI struggles with data. Like generating code is easier because code either runs or it doesn't. There are natural guardrails. But with data, there's usually only one correct answer and it's buried in this massive database.

SPEAKER_01 1:32

Aaron Powell Right, exactly. The AI has absolutely no context for what it's looking at.

SPEAKER_00 1:36

Yeah, I picture it like sending this master chef into a giant pantry where all the cans have their labels ripped off. Like the chef knows how to cook, obviously, but they have no idea if they are grabbing crushed tomatoes or uh dog food. It just creates this false sense of precision.

SPEAKER_01 1:50

Aaron Powell That is a great analogy. And if they just start cooking, you get a disaster. I mean, Anthropic saw this happening in three main ways. First is concept ambiguity, just like your boardroom story.

SPEAKER_00 2:00

Aaron Powell Right, the active user product.

SPEAKER_01 2:02

Exactly. The AI grabs the active user can, but it doesn't know if that includes fraudulent accounts. Second is data staleness, because, well, business definitions change over time.

SPEAKER_00 2:13

Aaron Powell Oh, for sure. Definitions evolve constantly.

SPEAKER_01 2:15

Aaron Powell Yeah. And the third is retrieval failure, where the AI just gets completely lost in the sheer volume of data tables. It is overwhelming.

SPEAKER_00 2:23

Aaron Powell So if the core issue is that the AI doesn't know what's inside the cans, you can't just fix that by giving it access to a bigger pantry, right?

SPEAKER_01 2:30

Aaron Ross Powell You really can't. Though it's funny because they actually tried that. Early on, Anthropic ran an experiment where they gave Claude raw access to thousands of perfectly executed historical queries.

SPEAKER_00 2:41

Aaron Powell Oh, wow. They thought it would just learn the patterns.

SPEAKER_01 2:43

Aaron Powell They did. But the accuracy improved by less than one percentage point. Just throwing raw data at the AI didn't work at all because it lacked underlying structure.

SPEAKER_00 2:52

Aaron Powell Huh. So they had to build like a master inventory list first.

SPEAKER_01 2:57

Aaron Powell Yeah. They built what they call the agentic data stack. The first layer is data foundations, which basically forces the company to establish one governed official answer for every metric. But the real key is adding a semantic layer on top of that.

SPEAKER_00 3:12

Aaron Powell A semantic layer. How does that work in practice?

SPEAKER_01 3:15

Think of it as a strict internal dictionary. Before the AI does any math, this dictionary explicitly forces it to understand that, say, revenue means the exact same thing every single time.

SPEAKER_00 3:26

Aaron Powell Okay, so the AI knows the definitions now, but how does it actually navigate the database without getting lost in the weeds?

SPEAKER_01 3:33

So they introduced this routing system called skills. These are basically just simple text folders full of step-by-step instructions.

SPEAKER_00 3:40

Oh, I see.

SPEAKER_01 3:40

Yeah. So if a user asks about user growth, the skill acts as a checklist saying, stop before you search the whole database, read this specific reference file on growth metrics first.

SPEAKER_00 3:51

Aaron Powell That makes total sense. It's essentially giving the AI a standard operating procedure. And the jump in performance is wild, right? Because without those routing skills, Claude was only getting what, 21% of analytics questions right?

SPEAKER_01 4:03

Right, just 21%. But with them, it hit over 95%.

SPEAKER_00 4:07

That is incredible. Over 95% accuracy.

SPEAKER_01 4:09

And to catch that remaining tiny margin of error, they built in an adversarial review phase. So cool. Before Claude gives the user a final answer, it spins up a separate subagent.

SPEAKER_00 4:22

Wait, a separate agent?

SPEAKER_01 4:23

Yeah. And its only job is to aggressively audit the math and challenge the main AI's assumptions. That internal devil's advocate boosted accuracy by another six percent. Plus, they attach a provenance footer to the bottom of the output showing the exact data sources used.

SPEAKER_00 4:39

So you aren't just blindly trusting a black box, but let me push back on this for a second. If you have an AI agent successfully navigating the database, auditing its own math, and hitting 95% accuracy, what are the human analysts actually doing?

SPEAKER_01 4:53

You mean are they obsolete?

SPEAKER_00 4:54

Right. Doesn't the company just look at this and see a massive opportunity to cut headcount?

SPEAKER_01 4:59

It is a totally fair fear, but it really misunderstands what analysts actually want to be doing and what humans are best at. Automating the road slog of pulling routine numbers doesn't make humans obsolete at all. It frees them up.

SPEAKER_00 5:12

It frees them from just being SQL query monkeys.

SPEAKER_01 5:14

Exactly. When an AI handles the repetitive task of fetching the daily active user count, human teams are finally empowered to tackle like complex causal modeling, meaning they can investigate why user retention dropped or they can design entirely new metrics.

SPEAKER_00 5:30

So they can stop acting like human calculators and really start acting like strategic innovators, focusing on advanced forecasting and all that.

SPEAKER_01 5:37

Yes. It is a huge leap forward for human potential. We get to do the creative, complex thinking that we are naturally built for. It's incredibly optimistic for the future of the industry.

SPEAKER_00 5:47

I absolutely love that perspective. So for everyone listening to Intellectually Curious today, I want to leave you with a thought to mull over. If AI agents can now be programmed to aggressively cross examine their own underlying data assumptions before presenting a final answer, how might that inspire you to cross examine your own daily cognitive biases?

SPEAKER_01 6:08

Oh, that is a great question to end on.

SPEAKER_00 6:10

Right. Well, if you enjoyed this deep dive, please subscribe to the show. Hey, leave us a five star review if you can. It really does help get the word out. Thanks for tuning in and stay curious.