Intellectually Curious

Autonomous AI Agents in Research: Codex, Claude Code, and the Future of the Workflow

Mike Breault

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 5:09

In this Intellectually Curious deep dive, we unpack a VoxDev webinar featuring Aniket Panjwani on how autonomous AI agents are transforming research workflows. From iterative loops and skill-based wrappers to Git-backed safety and disciplined planning, Codex and Claude Code can run regressions, critique hypotheses, and accelerate learning with minimal human busywork. We cover practical setups, how to structure context windows, and the director-vs-micromanager mindset.


Note:  This podcast was AI-generated, and sometimes AI can make mistakes.  Please double-check any critical information.

Sponsored by Embersilk LLC

SPEAKER_01

I mean, I once spent literally three agonizing weeks in grad school trying to format uh a single data table, just tweaking margins, fighting with our libraries, and basically questioning every life choice that led me to that exact moment.

SPEAKER_00

Oh, I remember those days, the endless syntax battles. But well, those days are officially over.

SPEAKER_01

Right. It is such a huge relief. For those of you listening to Intellectually Curious today, our deep dive is into a Vox Dev webinar by Anniket Panjuani. He is an economist and AI director. And we are looking at how autonomous AI agents, uh specifically Codex and Claude Code, are completely transforming the research workflow.

SPEAKER_00

Yeah, and the mission today is to really show you how to apply these exact tools to supercharge your own learning, you know, to just skip the busy work entirely.

SPEAKER_01

Okay, so let's unpack this. We hear so much about AI speed, like uh Dartmouth economist Paul Nobosad writing a functional paper in just three hours. But I want to know how that is actually happening under the hood.

SPEAKER_00

Aaron Powell Well, the critical shift is that these agents are now executing iterative loops. So they don't just generate a block of text and then stop. They actually chain reasoning steps together. Like if an agent writes code to run a regression and hits an error, it reads that error, rewrites the code, and tests it again.

SPEAKER_01

All completely autonomously.

SPEAKER_00

Exactly, completely on its own. And what's fascinating here is that this drastically lowers the cost of human experimentation. When testing a hypothesis takes hours instead of like months, you can explore wildly different theories.

SPEAKER_01

And it isn't just basic data cleaning, neither, is it? I mean, Terence Tao, who is one of the world's preeminent mathematicians, he watched an AI solve an open-airdose math problem.

SPEAKER_00

Yeah, by stringing together these incredibly complex logical steps, it is wild.

SPEAKER_01

Which brings us to how researchers are actually harnessing this power without it, you know, going completely off the rails. Panjuani highlights this mechanism called skills.

SPEAKER_00

Right. Skills are huge. Econometrician Pedro San Ana, for instance, built these specific skills named review paper and data analysis.

SPEAKER_01

So if I am understanding this, a skill isn't just a clever prompt. It is essentially putting the AI on a specialized set of tracks, right?

SPEAKER_00

Yes, that is a great way to put it. You feed it pre-written scripts and specific R libraries. So it isn't just guessing how to format a table or evaluate an econometric specification.

SPEAKER_01

It is executing your exact methodological recipe.

SPEAKER_00

Precisely. That structural constraint is what makes it so reliable. The review paper skill, for example, is coded to systematically scan a PDF against a known database of common referee objections.

SPEAKER_01

Oh wow.

SPEAKER_00

Yeah, it basically forces the AI to evaluate the robustness of the methodology rather than just spitting out a generic summary of the text.

SPEAKER_01

Now building those custom wrappers and integrating them into a daily workflow is usually where people hit a wall. But if you need help uncovering where autonomous agents could make the most impact for your business or personal projects, our sponsor, Embersilk, actually specializes in this.

SPEAKER_00

You really do make it accessible.

SPEAKER_01

Absolutely. You can visit Embersilk.com for AI, training, automation, integration, or software development to get these systems up and running. Because once they are running, you definitely need a strategy to manage them.

SPEAKER_00

Oh, for sure. Letting an agent rewrite its own code autonomously introduces real operational risks if you aren't careful.

SPEAKER_01

Right. Let's say I unleash clawed code on my project directory. How do I stop it from just overriding my entire life's work with a flawed mathematical assumption?

SPEAKER_00

Aaron Powell Well, you instruct the AI to use git. It acts as the agent's memory in sandbox.

SPEAKER_01

Okay, so it creates a separate branch every time it attempts a new analytical approach.

SPEAKER_00

Aaron Powell Exactly. Git provides a rollback. But you know, git doesn't stop an AI from making a bad assumption early on and then spiraling out of control for 20 automated steps.

SPEAKER_01

Aaron Powell Yeah. How do you catch that hallucination before it wastes all that compute?

SPEAKER_00

Aaron Powell You have to manage the AI's context window. The most common mistake people make is letting the agent brainstorm, debug, and execute all in one giant continuous chat thread.

SPEAKER_01

So the context window just fills up.

SPEAKER_00

Yeah, and the AI gets overwhelmed by its own conversational noise.

SPEAKER_01

The actual instructions get diluted.

SPEAKER_00

Precisely. So the co-tip here is to separate the planning from the execution. Use one session purely to brainstorm and force the AI to output a finalized plan document.

SPEAKER_01

Oh, I see. Then you open a fresh session, hand it that static plan, and just say, execute these steps.

SPEAKER_00

Exactly. It keeps the agent constrained and sharply focused on your approved methodology.

SPEAKER_01

That completely changes what it means to be a learner. You step into the role of a director rather than a micromanager.

SPEAKER_00

It really is such an optimistic time to dive into these tools.

SPEAKER_01

It truly is. Before I share my final thought on what that means for you, if you enjoyed the show, please subscribe. Hey, leave us a five-star review if you can. It really does help get the word out. Thanks for tuning in.

SPEAKER_00

We really appreciate you listening.

SPEAKER_01

We do. Because here is the real question for you to ponder today. If autonomous agents can reliably iterate the code, run the regressions, and anticipate the referee objections, what entirely undiscovered fields of human imagination will you unlock when the friction of execution is completely removed?