NVIDIA Cosmos 3: Foundations for Physical AI Reasoning and Action Artwork

Intellectually Curious

Intellectually Curious is a podcast by Mike Breault featuring over 1,800 AI-powered explorations across science, mathematics, philosophy, and personal growth. Each short-form episode is generated, refined, and published with the help of large language models—turning curiosity into an ongoing audio encyclopedia. Designed for anyone who loves learning, it offers quick dives into everything from combinatorics and cryptography to systems thinking and psychology.

Inspiration for this podcast:

"Muad'Dib learned rapidly because his first training was in how to learn. And the first lesson of all was the basic trust that he could learn. It's shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad'Dib knew that every experience carries its lesson."

― Frank Herbert, Dune

Note: These podcasts were made with NotebookLM. AI can make mistakes. Please double-check any critical information.

Show More

Intellectually Curious

NVIDIA Cosmos 3: Foundations for Physical AI Reasoning and Action

June 01, 2026 • Mike Breault

0:00 | 5:53

Dive into NVIDIA’s Cosmos 3, an open, omni‑modal foundation model that treats physical action as a native modality. Rather than merely predicting video frames, Cosmos 3 reasons about physics and outputs precise trajectories and torques, enabling physics‑accurate simulations for real‑world scenarios. We unpack its mixture of transformers, edge‑to‑cloud compute tiers, and the Cosmos Coalition, and explore how robotics, autonomous driving, and smart infrastructure use it to pre‑test innovations and generate safe, edge‑case scenarios without risk.

Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC

SPEAKER_01 0:00

So uh picture this. I'm reaching for my favorite coffee mug this morning and well, it slips. Oh no. Yeah. And my brain instantly runs this like super complex physics simulation, calculating velocity, trajectory, mass, you know.

SPEAKER_00 0:16

Right, doing the whole mental math thing.

SPEAKER_01 0:17

Exactly. So I dive to catch it and I completely miss. It just shatters everywhere.

SPEAKER_00 0:22

Aaron Powell Well, I mean, so your hardware failed, but the internal simulation was flawless.

SPEAKER_01 0:26

Right. I knew exactly where it should have landed. And that innate physical intuition is, well, it's something we kind of take for granted.

SPEAKER_00 0:33

It really is.

SPEAKER_01 0:34

But it's the exact hurdle that has historically held back robotics. So today's deep dive is into how we're finally clearing that hurdle.

SPEAKER_00 0:41

Aaron Powell Yeah, we're looking at NVIDIA's May 31st, 2026 release of Cosmos 3.

SPEAKER_01 0:47

Exactly. It's a new open foundation model, giving physical AI the ability to actually understand and like simulate the real world before acting.

SPEAKER_00 0:55

Aaron Powell Which is huge. And honestly, bridging that gap between software intelligence and physical action is well, it's incredibly complex.

SPEAKER_01 1:02

Oh, absolutely.

SPEAKER_00 1:02

And if you're trying to navigate this transition yourself, whether you need help with AI training, automation, or software development, you should check out Embersilk.com.

SPEAKER_01 1:11

Good shout.

SPEAKER_00 1:12

Yeah. Uncovering where agents can make the most impact for your business or personal life is exactly what Embersilk does best.

SPEAKER_01 1:19

Perfect. Okay, so let's look under the hood of Cosmos 3.

SPEAKER_00 1:22

Let's do it.

SPEAKER_01 1:23

Because for anyone following AI, the foundational shift here isn't just, you know, scaling up more compute. It's this mixture of Transformers architecture.

SPEAKER_00 1:32

Right. The mixture of transformers.

SPEAKER_01 1:33

Yeah, like we've seen mixture of experts route text queries to specialized subnetworks, but how is Cosmos applying that to actual physical space?

SPEAKER_00 1:41

Aaron Powell Well, basically by treating action as a native modality.

SPEAKER_01 1:45

Meaning what exactly?

SPEAKER_00 1:46

So Cosmos 3 is an omni-model, which means it processes text, video, and audio natively. But it divides the physical processing into specialized blocks.

SPEAKER_01 1:55

Okay, so breaking down the scene.

SPEAKER_00 1:57

Right. First, a reasoning block analyzes the physics of a scene like why your mug is falling.

SPEAKER_01 2:02

Ouch. Still too soon.

SPEAKER_00 2:03

Sorry. Then a routing linkism hands that context to a generation block. But here's the cool part. It doesn't just output a video prediction.

SPEAKER_01 2:13

Wait, it doesn't?

SPEAKER_00 2:14

No, it actually generates precise numerical trajectories.

SPEAKER_01 2:17

Hold on. So it's not just generating like the next frame of a video to visually guess what happened.

SPEAKER_00 2:24

Exactly. It's not just a video generator.

SPEAKER_01 2:26

It's functioning more like a video game physics engine then. Like calculating gravity, mass, and friction to output the actual mechanical telemetry. Yes. Like joint angles and torque needed to move a physical machine.

SPEAKER_00 2:39

That is the exact mechanism. To learn a complex task, a robot really needs the underlying physics, not just a pretty picture of the room.

SPEAKER_01 2:47

That makes total sense. It's almost like a mental dress rehearsal, like um visualizing a perfect golf swing before you even hit the ball.

SPEAKER_00 2:54

That's a great way to put it. And it's already translating to industry.

SPEAKER_01 2:57

Oh, really? Yeah.

SPEAKER_00 2:58

Agile Robots is currently using Cosmos 3 to train their Thor 3 humanoids for precise, autonomous industrial manufacturing.

SPEAKER_01 3:05

Wow, Thor 3. But okay, that makes sense for a controlled assembly line. But how does a physics simulation scale to totally unpredictable environments? Like, say an autonomous vehicle navigating a super busy city.

SPEAKER_00 3:17

Right. That's where the model's ability to safely generate long tail edge cases comes in.

SPEAKER_01 3:22

Edge cases, like what?

SPEAKER_00 3:24

So say a pedestrian steps out from between parked cars. You obviously can't train an AI by hitting real pedestrians.

SPEAKER_01 3:31

Yeah, definitely not.

SPEAKER_00 3:32

Instead, Cosmos 3 generates that rare scenario in a physics accurate simulation.

SPEAKER_01 3:37

Ah, I see.

SPEAKER_00 3:38

Because it fundamentally calculates momentum and spatial limits. The car's AI can run thousands of variations of that sudden stop in a completely risk-free virtual environment.

SPEAKER_01 3:48

Aaron Powell Which completely explains why NVIDIA made this open source via the Cosmos Coalition.

SPEAKER_00 3:53

Exactly.

SPEAKER_01 3:54

Simulating the infinite randomness of the physical world is just, well, it's way too massive for one company servers.

SPEAKER_00 4:00

Oh, absolutely. They need partners like Runway and skilled AI contributing to the ecosystem just to cover all those edge cases.

SPEAKER_01 4:07

Yeah, it's a super pragmatic move to accelerate the whole field.

SPEAKER_00 4:10

It is. And it's exactly why they released it in distinct compute tiers, too.

SPEAKER_01 4:14

Right. There's a super tier, a nano tier.

SPEAKER_00 4:16

Yep. Super for high fidelity lab calculations, nano for high speed throughput, and soon an edge version.

SPEAKER_01 4:22

Aaron Powell Because you can't really have a self-driving car waiting for a cloud server in another state to calculate breaking physics.

SPEAKER_00 4:29

No, definitely not.

SPEAKER_01 4:30

The compute has to happen locally, on the edge, like instantly.

SPEAKER_00 4:33

Aaron Powell The latency trade-off is absolutely critical in the physical world.

SPEAKER_01 4:36

Right. So for you listening right now, the impact of this isn't just better factory robots.

SPEAKER_00 4:41

Not at all.

SPEAKER_01 4:42

Imagine your city completely optimized by partners like Linkervision, where smart infrastructure processes spatial contexts locally in real time.

SPEAKER_00 4:50

Predicting traffic anomalies and just rerouting cars to dissolve gridlock before it even forms.

SPEAKER_01 4:55

Exactly. It's a beautifully seamless vision of daily life.

SPEAKER_00 4:58

It completely shifts our technology from being just a reactive tool to a proactive physical collaborator.

SPEAKER_01 5:03

Which kind of leaves me thinking.

SPEAKER_00 5:05

Yeah.

SPEAKER_01 5:05

If we now have this open AI model that perfectly simulates the physical consequences of actions before they happen, how soon do we start pretesting brilliant, delicate physical inventions?

SPEAKER_00 5:17

Oh, that is a great point.

SPEAKER_01 5:19

Like imagine a team designing an advanced surgical robot and perfecting every single millimeter of a complex procedure entirely in a risk-free simulation before it ever touches a patient.

SPEAKER_00 5:30

Wow. The ceiling for human progress here is just incredibly high.

SPEAKER_01 5:35

It really is. The capability to innovate without physical risk is staggering.

SPEAKER_00 5:39

Absolutely.

SPEAKER_01 5:40

Well, let's just hope my next coffee mug is simulated first.

SPEAKER_00 5:42

I have luck with that.

SPEAKER_01 5:43

Thanks. And hey, if you enjoyed this deep dive, please subscribe to the show. Leave us a five star review if you can. It really does help get the word out to other curious minds. Thanks for tuning in.