Intellectually Curious

SSD Unleashed: How Simple Self-Distillation Turns AI Guesses into Mastery

Mike Breault

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 4:34

A deep dive into Simple Self-Distillation (SSD): how large language models can improve by training on their own unverified outputs with zero external supervision. We unpack the Precision Exploration Conflict, the roles of locks (need for precision) and forks (creative exploration), and how SSD reshapes token distributions to sharpen precision while preserving exploration. We review the Quinn 330B Instruct results on LiveCodeBench (notable ~30% relative gains and stronger improvements on hard problems) and discuss the surprising finding that even data with gibberish can help models learn the geometry of problem-solving. Finally, we consider what latent capabilities might be unlocked when models learn from their own guesses and what this could mean for AI-assisted problem solving.


Note:  This podcast was AI-generated, and sometimes AI can make mistakes.  Please double-check any critical information.

Sponsored by Embersilk LLC

SPEAKER_02

I recently tried to assemble um one of those incredibly complex flat pack entertainment centers.

SPEAKER_00

Oh no. Those are a nightmare. Aaron Ross Powell Right.

SPEAKER_02

And of course I lost the manual immediately, so I just started, you know, blindly guessing the steps.

SPEAKER_00

Let me guess it didn't end well.

SPEAKER_02

Aaron Ross Powell It ended in total disaster, just a backwards wobbly mess. But uh researchers at Apple actually just prove that when an AI blindly guesses its own steps without a manual, it doesn't build a wobbly bookshelf.

SPEAKER_00

Aaron Powell Yeah. It actually figures out how to become a master builder. Aaron Powell.

SPEAKER_02

Which is wild. Trevor Burrus, Exactly. So today our mission for this deep dive into the source material is unpacking a fascinating paper on something called simple self-distillation, or SSD.

SPEAKER_00

It's an incredibly optimistic breakthrough.

SPEAKER_02

Aaron Powell It really is. It basically shows how large language models can autonomously unlock their latent coding potential. The future of human and AI problem solving is looking so bright here. So let's jump right in. How does this work?

SPEAKER_00

Well, the method is, I mean, it's shockingly straightforward. Usually to train an AI, you need external teachers or human labels or you know, feedback on whether the code actually executes properly.

SPEAKER_02

Right. Someone has to tell it what it did wrong.

SPEAKER_00

Exactly. But SSD stips all of that entirely. The AI generates code using specific temperature and truncation settings, and then get this, it fine-tunes itself on its own unverified raw outputs.

SPEAKER_02

With zero external teachers, just learning from itself.

SPEAKER_00

Zero. No execution feedback, nothing.

SPEAKER_02

Wow. It's doing all of this without any human hand holding.

SPEAKER_00

Yeah.

SPEAKER_02

But you know, for those of you who do need a little hand holding to integrate AI into your businesses, that is exactly what EmberSilk specializes in. They're the sponsor of today's deep dive.

SPEAKER_00

A very helpful resource for sure.

SPEAKER_02

Definitely. If you need help with AI training or automation or integration or software development, basically uncovering where agents can make the most impact for your business or personal life, check out Embersilk.com for your AI needs.

SPEAKER_00

Highly recommend them.

SPEAKER_02

So getting back to SSD, the data here is just phenomenal. They tested this straightforward method on the Quinn 330B instruct model, and the pass rate on live code bench went from about 42.4% to 55.3%.

SPEAKER_00

That is a 30% relative game.

SPEAKER_02

And the biggest improvements were on the absolute hardest coding problems. But okay, I have to push back here for a second. Let's unpack this. If I practice bad habits playing the piano, I just get worse, right? I memorize the mistakes. So how does an AI training on its own potentially flawed, unverified code actually improve?

SPEAKER_00

Aaron Powell That's the million-dollar question. To answer it, we have to look at what researchers call the precision exploration conflict. Aaron Powell Okay.

SPEAKER_02

What is that?

SPEAKER_00

Well, generating code features two distinct things, locks and forks. Locks are those moments of strict syntax that demand absolute precision.

SPEAKER_02

Aaron Powell Like putting a bracket in the exact right place.

SPEAKER_00

Exactly. But forks are different. Forks are the creative algorithmic choices where you actually need exploration. There are multiple valid paths.

SPEAKER_02

Aaron Powell So a lock is like making sure the word is spelled right, and a fork is like deciding which word tells the best story.

SPEAKER_00

That's a perfect way to put it. And standard decoding forces this really clumsy compromise between the two.

SPEAKER_02

Aaron Powell Because it can't be precise and creative at the same time.

SPEAKER_00

Right. But SSD fundamentally reshapes the token distributions. It suppresses distractions at those locks, which creates these sharp spikes of precision. Oh, I see. And at the same time, it preserves diverse valid choices at the forks. So it creates these broad plateaus of exploration.

SPEAKER_01

Aaron Powell And researchers did a stress test to prove this, right? Which kind of blew my mind.

SPEAKER_00

Yes. They trained the model on data where 62% was literal gibberish.

SPEAKER_01

62% gibberish.

SPEAKER_00

Complete nonsense. And incredibly, the model still improved.

SPEAKER_01

Wait, really? Even with gibberish?

SPEAKER_00

Yeah.

SPEAKER_01

Yeah.

SPEAKER_00

Because it proves it isn't just memorizing correct code, it's learning the underlying geometry of token probabilities. It's figuring out the mathematical shape of problem solving itself.

SPEAKER_02

That is so inspiring. It's finding its own brilliance just by analyzing the shape of its own unverified guesses.

SPEAKER_00

It really shows how much untapped potential these models already possess.

SPEAKER_02

It totally does. Which leaves me with a thought for you to ponder if AI models have massive untapped potential, just waiting to be unlocked by looking at their own guesses. What other latent capabilities are hiding the tools you use every day?

SPEAKER_00

There is so much more to discover.

SPEAKER_02

So much more. It's a wonderful time to be intellectually curious. If you enjoyed this podcast, please subscribe to the show. Hey, leave us a five star review if you can. It really does help get the word out. Thanks for tuning in.