Intellectually Curious

NVIDIA Cosmos 3: Foundations for Physical AI Reasoning and Action

Mike Breault

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 5:53

Dive into NVIDIA’s Cosmos 3, an open, omni‑modal foundation model that treats physical action as a native modality. Rather than merely predicting video frames, Cosmos 3 reasons about physics and outputs precise trajectories and torques, enabling physics‑accurate simulations for real‑world scenarios. We unpack its mixture of transformers, edge‑to‑cloud compute tiers, and the Cosmos Coalition, and explore how robotics, autonomous driving, and smart infrastructure use it to pre‑test innovations and generate safe, edge‑case scenarios without risk.


Note:  This podcast was AI-generated, and sometimes AI can make mistakes.  Please double-check any critical information.

Sponsored by Embersilk LLC

SPEAKER_01

So uh picture this. I'm reaching for my favorite coffee mug this morning and well, it slips. Oh no. Yeah. And my brain instantly runs this like super complex physics simulation, calculating velocity, trajectory, mass, you know.

SPEAKER_00

Right, doing the whole mental math thing.

SPEAKER_01

Exactly. So I dive to catch it and I completely miss. It just shatters everywhere.

SPEAKER_00

Aaron Powell Well, I mean, so your hardware failed, but the internal simulation was flawless.

SPEAKER_01

Right. I knew exactly where it should have landed. And that innate physical intuition is, well, it's something we kind of take for granted.

SPEAKER_00

It really is.

SPEAKER_01

But it's the exact hurdle that has historically held back robotics. So today's deep dive is into how we're finally clearing that hurdle.

SPEAKER_00

Aaron Powell Yeah, we're looking at NVIDIA's May 31st, 2026 release of Cosmos 3.

SPEAKER_01

Exactly. It's a new open foundation model, giving physical AI the ability to actually understand and like simulate the real world before acting.

SPEAKER_00

Aaron Powell Which is huge. And honestly, bridging that gap between software intelligence and physical action is well, it's incredibly complex.

SPEAKER_01

Oh, absolutely.

SPEAKER_00

And if you're trying to navigate this transition yourself, whether you need help with AI training, automation, or software development, you should check out Embersilk.com.

SPEAKER_01

Good shout.

SPEAKER_00

Yeah. Uncovering where agents can make the most impact for your business or personal life is exactly what Embersilk does best.

SPEAKER_01

Perfect. Okay, so let's look under the hood of Cosmos 3.

SPEAKER_00

Let's do it.

SPEAKER_01

Because for anyone following AI, the foundational shift here isn't just, you know, scaling up more compute. It's this mixture of Transformers architecture.

SPEAKER_00

Right. The mixture of transformers.

SPEAKER_01

Yeah, like we've seen mixture of experts route text queries to specialized subnetworks, but how is Cosmos applying that to actual physical space?

SPEAKER_00

Aaron Powell Well, basically by treating action as a native modality.

SPEAKER_01

Meaning what exactly?

SPEAKER_00

So Cosmos 3 is an omni-model, which means it processes text, video, and audio natively. But it divides the physical processing into specialized blocks.

SPEAKER_01

Okay, so breaking down the scene.

SPEAKER_00

Right. First, a reasoning block analyzes the physics of a scene like why your mug is falling.

SPEAKER_01

Ouch. Still too soon.

SPEAKER_00

Sorry. Then a routing linkism hands that context to a generation block. But here's the cool part. It doesn't just output a video prediction.

SPEAKER_01

Wait, it doesn't?

SPEAKER_00

No, it actually generates precise numerical trajectories.

SPEAKER_01

Hold on. So it's not just generating like the next frame of a video to visually guess what happened.

SPEAKER_00

Exactly. It's not just a video generator.

SPEAKER_01

It's functioning more like a video game physics engine then. Like calculating gravity, mass, and friction to output the actual mechanical telemetry. Yes. Like joint angles and torque needed to move a physical machine.

SPEAKER_00

That is the exact mechanism. To learn a complex task, a robot really needs the underlying physics, not just a pretty picture of the room.

SPEAKER_01

That makes total sense. It's almost like a mental dress rehearsal, like um visualizing a perfect golf swing before you even hit the ball.

SPEAKER_00

That's a great way to put it. And it's already translating to industry.

SPEAKER_01

Oh, really? Yeah.

SPEAKER_00

Agile Robots is currently using Cosmos 3 to train their Thor 3 humanoids for precise, autonomous industrial manufacturing.

SPEAKER_01

Wow, Thor 3. But okay, that makes sense for a controlled assembly line. But how does a physics simulation scale to totally unpredictable environments? Like, say an autonomous vehicle navigating a super busy city.

SPEAKER_00

Right. That's where the model's ability to safely generate long tail edge cases comes in.

SPEAKER_01

Edge cases, like what?

SPEAKER_00

So say a pedestrian steps out from between parked cars. You obviously can't train an AI by hitting real pedestrians.

SPEAKER_01

Yeah, definitely not.

SPEAKER_00

Instead, Cosmos 3 generates that rare scenario in a physics accurate simulation.

SPEAKER_01

Ah, I see.

SPEAKER_00

Because it fundamentally calculates momentum and spatial limits. The car's AI can run thousands of variations of that sudden stop in a completely risk-free virtual environment.

SPEAKER_01

Aaron Powell Which completely explains why NVIDIA made this open source via the Cosmos Coalition.

SPEAKER_00

Exactly.

SPEAKER_01

Simulating the infinite randomness of the physical world is just, well, it's way too massive for one company servers.

SPEAKER_00

Oh, absolutely. They need partners like Runway and skilled AI contributing to the ecosystem just to cover all those edge cases.

SPEAKER_01

Yeah, it's a super pragmatic move to accelerate the whole field.

SPEAKER_00

It is. And it's exactly why they released it in distinct compute tiers, too.

SPEAKER_01

Right. There's a super tier, a nano tier.

SPEAKER_00

Yep. Super for high fidelity lab calculations, nano for high speed throughput, and soon an edge version.

SPEAKER_01

Aaron Powell Because you can't really have a self-driving car waiting for a cloud server in another state to calculate breaking physics.

SPEAKER_00

No, definitely not.

SPEAKER_01

The compute has to happen locally, on the edge, like instantly.

SPEAKER_00

Aaron Powell The latency trade-off is absolutely critical in the physical world.

SPEAKER_01

Right. So for you listening right now, the impact of this isn't just better factory robots.

SPEAKER_00

Not at all.

SPEAKER_01

Imagine your city completely optimized by partners like Linkervision, where smart infrastructure processes spatial contexts locally in real time.

SPEAKER_00

Predicting traffic anomalies and just rerouting cars to dissolve gridlock before it even forms.

SPEAKER_01

Exactly. It's a beautifully seamless vision of daily life.

SPEAKER_00

It completely shifts our technology from being just a reactive tool to a proactive physical collaborator.

SPEAKER_01

Which kind of leaves me thinking.

SPEAKER_00

Yeah.

SPEAKER_01

If we now have this open AI model that perfectly simulates the physical consequences of actions before they happen, how soon do we start pretesting brilliant, delicate physical inventions?

SPEAKER_00

Oh, that is a great point.

SPEAKER_01

Like imagine a team designing an advanced surgical robot and perfecting every single millimeter of a complex procedure entirely in a risk-free simulation before it ever touches a patient.

SPEAKER_00

Wow. The ceiling for human progress here is just incredibly high.

SPEAKER_01

It really is. The capability to innovate without physical risk is staggering.

SPEAKER_00

Absolutely.

SPEAKER_01

Well, let's just hope my next coffee mug is simulated first.

SPEAKER_00

I have luck with that.

SPEAKER_01

Thanks. And hey, if you enjoyed this deep dive, please subscribe to the show. Leave us a five star review if you can. It really does help get the word out to other curious minds. Thanks for tuning in.