Intellectually Curious

How Claude Reached 95% Analytics Accuracy

Mike Breault

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 6:19

We dissect how Anthropic tackled data ambiguity, staleness, and retrieval chaos to automate the majority of business analytics with Claude. Anthropic's technical guide describes the development of an agentic analytics stack designed to automate business data insights using Claude. The strategy centers on overcoming three primary obstacles: conceptual ambiguity, data staleness, and retrieval failures. To ensure high accuracy, the framework prioritizes robust data foundations, a strictly enforced semantic layer, and specialized procedural skills that guide the AI's reasoning. The methodology also incorporates adversarial reviews and continuous offline evaluations to maintain the integrity of automated reports. Ultimately, this system allows data teams to shift their focus from repetitive queries to high-level strategic modeling.


Note:  This podcast was AI-generated, and sometimes AI can make mistakes.  Please double-check any critical information.

Sponsored by Embersilk LLC

SPEAKER_00

So I will literally never forget this boardroom meeting from a few years ago. Just an absolute spreadsheet disaster. We had uh three department heads sitting around this big table, all looking at their own reports, and we realized that every single one of them had a completely different definition for the term active user.

SPEAKER_01

Oh no, that is a classic trap.

SPEAKER_00

Right. Like one included trial accounts, one didn't, one excluded weekend logins. I mean, nobody actually knew how many users we had.

SPEAKER_01

Aaron Powell It is so common though. We all want this, you know, single source of truth, but raw business data is just inherently messy. And that specific ambiguity is exactly why AI has traditionally just crashed and burned when trying to handle business analytics.

SPEAKER_00

Aaron Powell, which is exactly what we're jumping into today for you. We're looking at how Anthropic actually cracked this code, figuring out how to get their AI, Claude, to accurately automate like 95% of business analytics queries.

SPEAKER_01

It is a massive leap forward, truly.

SPEAKER_00

Yeah. And real quick before we get into the actual mechanics of that, if you need help with AI training or automation or integration or software development, or you're just uncovering where agents could make the most impact for your business or personal life, check out Embersilk.com for your AI needs. They are the sponsor of today's deep dive.

SPEAKER_01

Highly recommend them.

SPEAKER_00

So looking at Anthropic's challenge here, it really makes sense that AI struggles with data. Like generating code is easier because code either runs or it doesn't. There are natural guardrails. But with data, there's usually only one correct answer and it's buried in this massive database.

SPEAKER_01

Aaron Powell Right, exactly. The AI has absolutely no context for what it's looking at.

SPEAKER_00

Yeah, I picture it like sending this master chef into a giant pantry where all the cans have their labels ripped off. Like the chef knows how to cook, obviously, but they have no idea if they are grabbing crushed tomatoes or uh dog food. It just creates this false sense of precision.

SPEAKER_01

Aaron Powell That is a great analogy. And if they just start cooking, you get a disaster. I mean, Anthropic saw this happening in three main ways. First is concept ambiguity, just like your boardroom story.

SPEAKER_00

Aaron Powell Right, the active user product.

SPEAKER_01

Exactly. The AI grabs the active user can, but it doesn't know if that includes fraudulent accounts. Second is data staleness, because, well, business definitions change over time.

SPEAKER_00

Aaron Powell Oh, for sure. Definitions evolve constantly.

SPEAKER_01

Aaron Powell Yeah. And the third is retrieval failure, where the AI just gets completely lost in the sheer volume of data tables. It is overwhelming.

SPEAKER_00

Aaron Powell So if the core issue is that the AI doesn't know what's inside the cans, you can't just fix that by giving it access to a bigger pantry, right?

SPEAKER_01

Aaron Ross Powell You really can't. Though it's funny because they actually tried that. Early on, Anthropic ran an experiment where they gave Claude raw access to thousands of perfectly executed historical queries.

SPEAKER_00

Aaron Powell Oh, wow. They thought it would just learn the patterns.

SPEAKER_01

Aaron Powell They did. But the accuracy improved by less than one percentage point. Just throwing raw data at the AI didn't work at all because it lacked underlying structure.

SPEAKER_00

Aaron Powell Huh. So they had to build like a master inventory list first.

SPEAKER_01

Aaron Powell Yeah. They built what they call the agentic data stack. The first layer is data foundations, which basically forces the company to establish one governed official answer for every metric. But the real key is adding a semantic layer on top of that.

SPEAKER_00

Aaron Powell A semantic layer. How does that work in practice?

SPEAKER_01

Think of it as a strict internal dictionary. Before the AI does any math, this dictionary explicitly forces it to understand that, say, revenue means the exact same thing every single time.

SPEAKER_00

Aaron Powell Okay, so the AI knows the definitions now, but how does it actually navigate the database without getting lost in the weeds?

SPEAKER_01

So they introduced this routing system called skills. These are basically just simple text folders full of step-by-step instructions.

SPEAKER_00

Oh, I see.

SPEAKER_01

Yeah. So if a user asks about user growth, the skill acts as a checklist saying, stop before you search the whole database, read this specific reference file on growth metrics first.

SPEAKER_00

Aaron Powell That makes total sense. It's essentially giving the AI a standard operating procedure. And the jump in performance is wild, right? Because without those routing skills, Claude was only getting what, 21% of analytics questions right?

SPEAKER_01

Right, just 21%. But with them, it hit over 95%.

SPEAKER_00

That is incredible. Over 95% accuracy.

SPEAKER_01

And to catch that remaining tiny margin of error, they built in an adversarial review phase. So cool. Before Claude gives the user a final answer, it spins up a separate subagent.

SPEAKER_00

Wait, a separate agent?

SPEAKER_01

Yeah. And its only job is to aggressively audit the math and challenge the main AI's assumptions. That internal devil's advocate boosted accuracy by another six percent. Plus, they attach a provenance footer to the bottom of the output showing the exact data sources used.

SPEAKER_00

So you aren't just blindly trusting a black box, but let me push back on this for a second. If you have an AI agent successfully navigating the database, auditing its own math, and hitting 95% accuracy, what are the human analysts actually doing?

SPEAKER_01

You mean are they obsolete?

SPEAKER_00

Right. Doesn't the company just look at this and see a massive opportunity to cut headcount?

SPEAKER_01

It is a totally fair fear, but it really misunderstands what analysts actually want to be doing and what humans are best at. Automating the road slog of pulling routine numbers doesn't make humans obsolete at all. It frees them up.

SPEAKER_00

It frees them from just being SQL query monkeys.

SPEAKER_01

Exactly. When an AI handles the repetitive task of fetching the daily active user count, human teams are finally empowered to tackle like complex causal modeling, meaning they can investigate why user retention dropped or they can design entirely new metrics.

SPEAKER_00

So they can stop acting like human calculators and really start acting like strategic innovators, focusing on advanced forecasting and all that.

SPEAKER_01

Yes. It is a huge leap forward for human potential. We get to do the creative, complex thinking that we are naturally built for. It's incredibly optimistic for the future of the industry.

SPEAKER_00

I absolutely love that perspective. So for everyone listening to Intellectually Curious today, I want to leave you with a thought to mull over. If AI agents can now be programmed to aggressively cross examine their own underlying data assumptions before presenting a final answer, how might that inspire you to cross examine your own daily cognitive biases?

SPEAKER_01

Oh, that is a great question to end on.

SPEAKER_00

Right. Well, if you enjoyed this deep dive, please subscribe to the show. Hey, leave us a five star review if you can. It really does help get the word out. Thanks for tuning in and stay curious.