SSD Unleashed: How Simple Self-Distillation Turns AI Guesses into Mastery Artwork

Intellectually Curious

Intellectually Curious is a podcast by Mike Breault featuring over 1,800 AI-powered explorations across science, mathematics, philosophy, and personal growth. Each short-form episode is generated, refined, and published with the help of large language models—turning curiosity into an ongoing audio encyclopedia. Designed for anyone who loves learning, it offers quick dives into everything from combinatorics and cryptography to systems thinking and psychology.

Inspiration for this podcast:

"Muad'Dib learned rapidly because his first training was in how to learn. And the first lesson of all was the basic trust that he could learn. It's shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad'Dib knew that every experience carries its lesson."

― Frank Herbert, Dune

Note: These podcasts were made with NotebookLM. AI can make mistakes. Please double-check any critical information.

Show More

Intellectually Curious

SSD Unleashed: How Simple Self-Distillation Turns AI Guesses into Mastery

April 06, 2026 • Mike Breault

0:00 | 4:34

A deep dive into Simple Self-Distillation (SSD): how large language models can improve by training on their own unverified outputs with zero external supervision. We unpack the Precision Exploration Conflict, the roles of locks (need for precision) and forks (creative exploration), and how SSD reshapes token distributions to sharpen precision while preserving exploration. We review the Quinn 330B Instruct results on LiveCodeBench (notable ~30% relative gains and stronger improvements on hard problems) and discuss the surprising finding that even data with gibberish can help models learn the geometry of problem-solving. Finally, we consider what latent capabilities might be unlocked when models learn from their own guesses and what this could mean for AI-assisted problem solving.

Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC

SPEAKER_02 0:00

I recently tried to assemble um one of those incredibly complex flat pack entertainment centers.

SPEAKER_00 0:07

Oh no. Those are a nightmare. Aaron Ross Powell Right.

SPEAKER_02 0:09

And of course I lost the manual immediately, so I just started, you know, blindly guessing the steps.

SPEAKER_00 0:13

Let me guess it didn't end well.

SPEAKER_02 0:14

Aaron Ross Powell It ended in total disaster, just a backwards wobbly mess. But uh researchers at Apple actually just prove that when an AI blindly guesses its own steps without a manual, it doesn't build a wobbly bookshelf.

SPEAKER_00 0:27

Aaron Powell Yeah. It actually figures out how to become a master builder. Aaron Powell.

SPEAKER_02 0:31

Which is wild. Trevor Burrus, Exactly. So today our mission for this deep dive into the source material is unpacking a fascinating paper on something called simple self-distillation, or SSD.

SPEAKER_00 0:41

It's an incredibly optimistic breakthrough.

SPEAKER_02 0:43

Aaron Powell It really is. It basically shows how large language models can autonomously unlock their latent coding potential. The future of human and AI problem solving is looking so bright here. So let's jump right in. How does this work?

SPEAKER_00 0:56

Well, the method is, I mean, it's shockingly straightforward. Usually to train an AI, you need external teachers or human labels or you know, feedback on whether the code actually executes properly.

SPEAKER_02 1:06

Right. Someone has to tell it what it did wrong.

SPEAKER_00 1:08

Exactly. But SSD stips all of that entirely. The AI generates code using specific temperature and truncation settings, and then get this, it fine-tunes itself on its own unverified raw outputs.

SPEAKER_02 1:20

With zero external teachers, just learning from itself.

SPEAKER_00 1:23

Zero. No execution feedback, nothing.

SPEAKER_02 1:26

Wow. It's doing all of this without any human hand holding.

SPEAKER_00 1:30

Yeah.

SPEAKER_02 1:30

But you know, for those of you who do need a little hand holding to integrate AI into your businesses, that is exactly what EmberSilk specializes in. They're the sponsor of today's deep dive.

SPEAKER_00 1:39

A very helpful resource for sure.

SPEAKER_02 1:41

Definitely. If you need help with AI training or automation or integration or software development, basically uncovering where agents can make the most impact for your business or personal life, check out Embersilk.com for your AI needs.

SPEAKER_00 1:54

Highly recommend them.

SPEAKER_02 1:55

So getting back to SSD, the data here is just phenomenal. They tested this straightforward method on the Quinn 330B instruct model, and the pass rate on live code bench went from about 42.4% to 55.3%.

SPEAKER_00 2:08

That is a 30% relative game.

SPEAKER_02 2:10

And the biggest improvements were on the absolute hardest coding problems. But okay, I have to push back here for a second. Let's unpack this. If I practice bad habits playing the piano, I just get worse, right? I memorize the mistakes. So how does an AI training on its own potentially flawed, unverified code actually improve?

SPEAKER_00 2:32

Aaron Powell That's the million-dollar question. To answer it, we have to look at what researchers call the precision exploration conflict. Aaron Powell Okay.

SPEAKER_02 2:39

What is that?

SPEAKER_00 2:40

Well, generating code features two distinct things, locks and forks. Locks are those moments of strict syntax that demand absolute precision.

SPEAKER_02 2:48

Aaron Powell Like putting a bracket in the exact right place.

SPEAKER_00 2:50

Exactly. But forks are different. Forks are the creative algorithmic choices where you actually need exploration. There are multiple valid paths.

SPEAKER_02 2:58

Aaron Powell So a lock is like making sure the word is spelled right, and a fork is like deciding which word tells the best story.

SPEAKER_00 3:04

That's a perfect way to put it. And standard decoding forces this really clumsy compromise between the two.

SPEAKER_02 3:09

Aaron Powell Because it can't be precise and creative at the same time.

SPEAKER_00 3:13

Right. But SSD fundamentally reshapes the token distributions. It suppresses distractions at those locks, which creates these sharp spikes of precision. Oh, I see. And at the same time, it preserves diverse valid choices at the forks. So it creates these broad plateaus of exploration.

SPEAKER_01 3:31

Aaron Powell And researchers did a stress test to prove this, right? Which kind of blew my mind.

SPEAKER_00 3:35

Yes. They trained the model on data where 62% was literal gibberish.

SPEAKER_01 3:40

62% gibberish.

SPEAKER_00 3:42

Complete nonsense. And incredibly, the model still improved.

SPEAKER_01 3:45

Wait, really? Even with gibberish?

SPEAKER_00 3:47

Yeah.

SPEAKER_01 3:47

Yeah.

SPEAKER_00 3:47

Because it proves it isn't just memorizing correct code, it's learning the underlying geometry of token probabilities. It's figuring out the mathematical shape of problem solving itself.

SPEAKER_02 3:57

That is so inspiring. It's finding its own brilliance just by analyzing the shape of its own unverified guesses.

SPEAKER_00 4:03

It really shows how much untapped potential these models already possess.

SPEAKER_02 4:07

It totally does. Which leaves me with a thought for you to ponder if AI models have massive untapped potential, just waiting to be unlocked by looking at their own guesses. What other latent capabilities are hiding the tools you use every day?

SPEAKER_00 4:22

There is so much more to discover.

SPEAKER_02 4:24

So much more. It's a wonderful time to be intellectually curious. If you enjoyed this podcast, please subscribe to the show. Hey, leave us a five star review if you can. It really does help get the word out. Thanks for tuning in.