Self-Harness: Can AI Rewrite Its Own Operating Rules? Artwork

Intellectually Curious

Intellectually Curious is a podcast by Mike Breault featuring over 1,800 AI-powered explorations across science, mathematics, philosophy, and personal growth. Each short-form episode is generated, refined, and published with the help of large language models—turning curiosity into an ongoing audio encyclopedia. Designed for anyone who loves learning, it offers quick dives into everything from combinatorics and cryptography to systems thinking and psychology.

Inspiration for this podcast:

"Muad'Dib learned rapidly because his first training was in how to learn. And the first lesson of all was the basic trust that he could learn. It's shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad'Dib knew that every experience carries its lesson."

― Frank Herbert, Dune

Note: These podcasts were made with NotebookLM. AI can make mistakes. Please double-check any critical information.

Show More

Intellectually Curious

Self-Harness: Can AI Rewrite Its Own Operating Rules?

June 11, 2026 • Mike Breault

0:00 | 5:57

We dive into the Shanghai AI Lab’s self-harness idea—a three-stage loop (weakness mining, harness proposal, and proposal validation) that lets AI models inspect their own failures, propose minimal workspace edits, and sandbox-test changes before evolving. Explore how personalized, autonomous fixes improve unseen-task performance, the risks of self-modification, and what this could mean for scalable AI agents and future scientific discovery.

Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC

SPEAKER_00 0:00

So last weekend I spent, I don't know, three hours stubbornly trying to assemble this complex piece of IKEA furniture.

SPEAKER_01 0:08

Oh no, three hours.

SPEAKER_00 0:09

Yeah, I'm sweating. Totally convinced the manufacturer of mispacking like a a dozen screws.

SPEAKER_01 0:14

Right.

SPEAKER_00 0:15

Only to realize uh I was looking at the instructions for a completely different cabinet.

SPEAKER_01 0:20

Oh man. That is a classic case of capable hardware failing because of a mismatched software environment.

SPEAKER_00 0:25

Right.

SPEAKER_01 0:26

Your underlying processing power was totally fine, but your rules of engagement were just entirely wrong.

SPEAKER_00 0:31

Exactly. And that perfectly sets up our mission for today. We're diving into this fascinating new paper from the Shanghai AI lab, exploring a concept called self-harness.

SPEAKER_01 0:43

It is such a groundbreaking paradigm.

SPEAKER_00 0:45

We're basically looking at whether AI models have finally figured out how to, you know, rewrite their own setup instructions.

SPEAKER_01 0:51

Right, which totally changes how we think about an agent's workspace.

SPEAKER_00 0:54

Speaking of making sure your technical setup instructions are perfectly tailored, by the way, this deep dive is sponsored by Embersilk.

SPEAKER_01 1:00

Oh, nice.

SPEAKER_00 1:01

Yeah. If you need help with AI training or automation or integration or software development, they've got you covered. If you're uncovering where agents can make the most impact for your business or personal life, just check out Embersilk.com for all your AI needs. Trevor Burrus, Jr.

SPEAKER_01 1:15

Which is actually exactly what this paper is trying to automate.

SPEAKER_00 1:18

Wait, the setup.

SPEAKER_01 1:18

Yeah, because an AI's harness is, well, it's it's workspace. It's the system prompts, the tools, the rules mediating how it interacts with the world.

SPEAKER_00 1:28

Aaron Powell And up until now, human engineers have had to like painstakingly hand code these workspaces, right?

SPEAKER_01 1:33

Trevor Burrus Exactly, which is exhausting, and it just doesn't scale at all given how fast new models are evolving. A workspace tailored for one model often causes another to just trip over its own logic.

SPEAKER_00 1:45

Aaron Powell Because what works for a heavy-duty reasoning model might completely derail a smaller, faster one.

SPEAKER_01 1:51

Aaron Ross Powell Precisely. So what the Shanghai AI lab proposes with self-harness is shifting that entire burden. Instead of humans constantly patching the workspace or relying on massive external AIs, the model basically looks in the mirror and fixes itself. Yeah, the researchers actually drop this beautiful quote from the philosopher Henri Bergson in the paper. They wrote, uh to exist is to change, to change is to mature, to mature is to go on creating oneself endlessly. Aaron Powell Okay.

SPEAKER_00 2:19

Bergson is a pretty heavy philosophical pull for a computer science paper.

SPEAKER_01 2:24

Aaron Powell It really is.

SPEAKER_00 2:25

But I mean, does the architecture actually live up to that claim? Like endless self-creation. Yeah. Because honestly, if an AI is dynamically changing its own core operating rules, my immediate fear is, I don't know, some cascading failure where it just corrupts its own logic.

SPEAKER_01 2:40

Aaron Powell Oh, sure. That's the big risk.

SPEAKER_00 2:41

Aaron Powell So how do they stop it from accidentally breaking itself? There has to be like a strict testing sandbox before these changes go live, right?

SPEAKER_01 2:48

You hit the nail on the head. To prevent total chaos, they've implemented this rigorous three-stage loop. The first stage is called weakness mining.

SPEAKER_00 2:56

Trevor Burrus, Jr.: Weakness mining. Okay.

SPEAKER_01 2:57

Yeah. Think of it like a professional athlete watching game tape. The A doesn't just see, oh, I dropped the ball. It looks at a massive cluster of its own execution traces and realizes, oh, I drop the ball every time I try to catch it over my left shoulder.

SPEAKER_00 3:10

Oh, I see. So it's identifying the underlying structural flaw.

SPEAKER_01 3:12

Aaron Powell Exactly. It clusters verifiable patterns of failure, not just one-off glitches.

SPEAKER_00 3:17

Okay. That makes sense.

SPEAKER_01 3:18

Which leads to stage two, harness proposal. Based on that really specific pattern, the agent generates targeted minimal edits to its workspace.

SPEAKER_00 3:28

Aaron Powell Ah, minimal edits. So it's not tossing out the whole manual. Trevor Burrus, Jr.

SPEAKER_01 3:31

Right. It doesn't rewrite the entire instruction booklet. It surgically modifies just the broken step.

SPEAKER_00 3:37

And then stage three has to be that sandbox we talked about.

SPEAKER_01 3:39

Yes. Proposal validation. This is strict regression testing. The AI takes that proposed rule change and runs it against a battery of tests to ensure it actually improves performance, you know, without causing backward steps somewhere else.

SPEAKER_00 3:52

And if it passes.

SPEAKER_01 3:53

If it passes, the workspace evolves. If not, it's just discarded.

SPEAKER_00 3:56

Aaron Powell, I mean, if you've ever spent hours tweaking system prompts to get an LLM to just behave, you're going to appreciate these results. On the Terminal Bench 2.0 tests, the Minimax M2.5 model jumped from a 40.5% pass rate to nearly 62%.

SPEAKER_01 4:13

And that's on completely unseen tasks.

SPEAKER_00 4:16

That is a massive leap. But what's truly compelling to me is how personalized these fixes get.

SPEAKER_01 4:21

Oh, totally. The models develop highly individualized workflows. Like the Quinn 3.5 model had this bad habit of stubbornly repeating failed shell commands.

SPEAKER_00 4:32

Just hitting its head against the wall.

SPEAKER_01 4:33

Exactly. But through self-harness, it noticed this loop in its own game tape and literally wrote a rule into its harness, forcing itself to pause, analyze the error output, and generate a totally different syntax before trying again.

SPEAKER_00 4:47

That is wild. And didn't the GLM5 model teach itself to uh save its environment variables across different sessions?

SPEAKER_01 4:55

It did, yeah. So it wouldn't lose context. They are literally autonomously cleaning up their own bad habit.

SPEAKER_00 5:00

Incredible.

SPEAKER_01 5:01

They are. And this brings us to such a hopeful frontier for anyone intellectually curious about the future. I mean, Demis Asabas recently won a Nobel Prize for using AI to predict protein structures.

SPEAKER_00 5:11

Right. Proving AI is this profound tool for scientific discovery.

SPEAKER_01 5:15

Exactly. But imagine a scenario where a scientific AI agent uses self-harness to optimize its own scaffolding specifically for quantum physics or, you know, molecular biology.

SPEAKER_00 5:26

Oh, wow. It could quietly rewrite its operating rules overnight to process data in ways human engineers couldn't even conceptualize.

SPEAKER_01 5:35

Right.

SPEAKER_00 5:36

We might eventually have AI agents constructing entirely new research methodologies, allowing humanity to cure diseases or solve the wonders of the universe faster than ever before.

SPEAKER_01 5:46

It's an amazingly optimistic vision.

SPEAKER_00 5:48

It really is. What an inspiring thought to leave you with. If you enjoyed this deep dive, please subscribe to the show. Hey, leave us a five star review if you can. It really does help get the word out. Thanks for tuning in.