Intellectually Curious

Bootstrapping AI Training with Composer Autoinstall

Mike Breault

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 5:27

We dive into Cursor’s May 2026 work on Composer Auto Install, a two-stage bootstrapping system that auto-generates runnable training environments for AI coders. An initial agent drafts setup commands; a second agent tests them, fabricating missing pieces and even patching dependencies live to get code running. The result is a dramatic jump in TerminalBench scores (61.7% vs 47.9%) and a scalable path to teaching AI to code—without getting bogged down by messy environment setup.


Note:  This podcast was AI-generated, and sometimes AI can make mistakes.  Please double-check any critical information.

Sponsored by Embersilk LLC

SPEAKER_00

Last week I decided uh I was finally gonna learn how to cook a proper beef wellington. But like imagine if before I could even chop an onion or, you know, sear the meat, the instructor just hands me a toolbox and tells me to go plumb the kitchen sink.

SPEAKER_01

Right, and wire the oven while you're at it?

SPEAKER_00

Exactly. I mean, I'd never learn to cook, I'd just be a terrible plumber. And that is exactly the uh the frustrating hurdle AI developers hit when trying to train coding models.

SPEAKER_01

Aaron Powell Yeah, because if an AI model is spending all its computational power trying to figure out a broken package manager or like a missing dependency, it's not actually learning how to code.

SPEAKER_00

Right. So instead of training on Python or whatever, the model is burning millions of parameters just trying to get the environment to run.

SPEAKER_01

I don't know.

SPEAKER_00

Which brings us to today's deep dive into a really fascinating May 2026 research post from Cursor on uh Composer Auto Install.

SPEAKER_01

It's such an ingenious system.

SPEAKER_00

It really is. And you know, before we get into it, if you are inspired by this kind of AI automation and uh you need help uncovering where AI agents can make the most impact for your business, you should really check out Embersilk.com.

SPEAKER_01

Oh, definitely.

SPEAKER_00

Yeah. They are today's sponsor and they help with everything from AI training to integration and software development. So check out Embersilk.com for your AI needs. But uh getting back to cursor, how exactly do they stop their models from acting like frustrated plumbers?

SPEAKER_01

Aaron Powell So it really comes down to automating the Fed up. Like when you're training a coding model using reinforcement learning, the AI absolutely needs a runnable environment. It requires that um that feedback loop of execution, you know? Trevor Burrus, Jr.

SPEAKER_00

Like trying the code, seeing if it compiles, adjusting, like paying an expensive tutor, but you spend the whole hour just looking for your textbook.

SPEAKER_01

Right, exactly. But doing that manually across thousands of unique, messy code bases is physically impossible. Yeah. You would spend lifetimes just configuring files.

SPEAKER_00

But wait, shouldn't human developers just, I don't know, I'm sure these training environments work perfectly from the start?

SPEAKER_01

Um in in a perfect world, sure. But at scale, across thousands of diverse repositories. No way. So Cursor used their older model, Composer 1.5, to automatically build the training environments for the new model, Composer 2.

SPEAKER_00

Wait, the older AI builds the classroom for the newer one. How does it actually do that without a human stepping in to fix the inevitable bugs?

SPEAKER_01

It uses this uh two-stage bootstrapping process. First is goal setting. An agent scans the target code base, it checks the README's, the make files, and it proposes 10 setup commands.

SPEAKER_00

Okay, and I assume it predicts what the successful output should look like.

SPEAKER_01

Yep, exactly. Then the second agent takes three of those commands and actually tries to execute them.

SPEAKER_00

And when things inevitably break, because you know it's software.

SPEAKER_01

That is where the second agent gets really creative. It actively problem solves. It'll like mock missing files or great placeholder database tables or even spin up gummy docker containers just to force the code to run.

SPEAKER_00

Wait, really? It just fakes them.

SPEAKER_01

Yeah, and it loops up to five times if it hits an error, just trying new workarounds.

SPEAKER_00

Hold on, if we're generating fake tables and dummy containers just to pass a setup test, aren't we training the new model in a completely like hallucinated environment? It sounds like building a movie set where the doors don't actually lead anywhere.

SPEAKER_01

Aaron Powell It does sound counterintuitive, I'll give you that. But think about what the reinforcement learning model actually needs here. It doesn't care about the real data in the database.

SPEAKER_00

Oh, so it just needs the underlying logic to compile.

SPEAKER_01

Right. It just needs to execute, so it gets that reward signal for writing correct code syntax. The uh the movie set is all it needs to practice the actual mechanics of coding.

SPEAKER_00

Okay, so the structural feedback is there, even if the data isn't. But I mean, does this actually work on a genuinely messy project out in the wild?

SPEAKER_01

Oh, it does. They tested this on the CeeLo Monrepo, which is this really large blockchain project with um pretty sparse documentation.

SPEAKER_00

A total nightmare to set up, basically.

SPEAKER_01

Exactly. And the AI realized it was missing a dependency called Foundry, but it didn't just fake it, it actively searched the live web, read the Foundry docs, saw it needed authentication, and then created a localized mock user just to successfully run the app.

SPEAKER_00

Wait, so it's dynamically patching the environment on the fly using the internet?

SPEAKER_01

Yes. And the benchmark results show exactly why this matters. Because Composer 1.5 could set up these environments so effectively, Composer 2 jumped to a 61.7% score on Terminal Bench.

SPEAKER_00

Wow. And that's the benchmark for configuring developer environments, right?

SPEAKER_01

Yep, and that score is a huge leap from the older models 47.9%.

SPEAKER_00

That is just such a brilliant positive feedback loop. Like the older generation clears the brush, paving a wider, faster road for the next generation to learn on. It actually reminds me a lot of um Demis Sisavis recently winning the Nobel Prize for using AI to predict protein structures.

SPEAKER_01

Oh, absolutely. He sees AI as this incredible tool that will help scientists make even more discoveries in the years to come.

SPEAKER_00

Right. By having AI pave the way for its successors, we're removing all these tedious roadblocks. It's so inspiring.

SPEAKER_01

It really is. Just imagine the unprecedented scientific and creative breakthroughs humanity will achieve when AI models handle their own run management and data pre-processing. We won't be bogged down by the setup.

SPEAKER_00

I love that. Well, if you enjoy this deep dive, please subscribe to the show and hey, leave us a five-star review if you can. It really does help get the word out. Thanks for tuning in.