Building AlphaGo from Scratch Artwork

Intellectually Curious

Intellectually Curious is a podcast by Mike Breault featuring over 1,800 AI-powered explorations across science, mathematics, philosophy, and personal growth. Each short-form episode is generated, refined, and published with the help of large language models—turning curiosity into an ongoing audio encyclopedia. Designed for anyone who loves learning, it offers quick dives into everything from combinatorics and cryptography to systems thinking and psychology.

Inspiration for this podcast:

"Muad'Dib learned rapidly because his first training was in how to learn. And the first lesson of all was the basic trust that he could learn. It's shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad'Dib knew that every experience carries its lesson."

― Frank Herbert, Dune

Note: These podcasts were made with NotebookLM. AI can make mistakes. Please double-check any critical information.

Show More

Intellectually Curious

Building AlphaGo from Scratch

May 17, 2026 • Mike Breault

0:00 | 5:57

A deep dive on Dwarkesh Patel interview with Eric Jang into how AlphaGo conquered Go by combining a value network, a policy network, and Monte Carlo tree search. We unpack how these two nets shrink the game’s vast space, how self-play trains better strategies, and what this implies for solving hard real‑world problems in science and education—while noting the limits when moving from closed games to open-ended tasks like language models.

Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC

SPEAKER_00 0:00

So uh the other day I'm playing tic-tac-toe with my nephew, right? Oh boy. Yeah. And I am trying and just totally failing to plan, like maybe five moves ahead. My brain is literally sweating over a simple three by three grid.

SPEAKER_01 0:14

It happens to the best of us.

SPEAKER_00 0:15

Right. But it actually got me thinking about the game of Go. Because, you know, in Go, you have roughly 361 to the power of 300 possible game outcomes.

SPEAKER_01 0:25

Aaron Powell, which is just a totally absurd number.

SPEAKER_00 0:27

Aaron Powell It is. That is literally more possibilities than there are atoms in the observable universe.

SPEAKER_01 0:32

Aaron Powell And that is exactly why uh for decades, honestly, computer scientists thought mastering Go was practically impossible. I mean, you can't just brute force search a game tree that massive.

SPEAKER_00 0:41

But we did. And well, that's our mission for this deep dive in our intellectually curious series.

SPEAKER_01 0:47

That's a great topic.

SPEAKER_00 0:48

We are looking at uh primitives of intelligence, building AlphaGo from scratch. We're drawing from this really amazing breakdown by AI researcher Eric Chang. And it's just wild to think that we beat an infinitely complex math problem by, you know, basically teaching a computer how to vibe.

SPEAKER_01 1:05

I love that. Vibe is honestly a really great way to put it.

SPEAKER_00 1:08

Right.

SPEAKER_01 1:08

Because to solve a game tree bigger than the universe, developers didn't they didn't try to map the universe at all. They actually use two distinct neural networks to just shrink the problem down. Aaron Powell Okay.

SPEAKER_00 1:21

So how does that work?

SPEAKER_01 1:22

Aaron Ross Powell Well, first you have the value network, which uh shrinks the depth of the game. So it looks at a board state and essentially predicts, hey, am I gonna win or lose from here? Aaron Powell Gotcha. And then second is the policy network, which shrinks the breath. It looks at all, you know, 361 possible moves and basically says, okay, these five are the most promising. Let's just ignore the rest.

SPEAKER_00 1:43

Aaron Powell Okay. Let's untack this a bit because the value network is basically just human intuition.

SPEAKER_01 1:48

Yes, exactly.

SPEAKER_00 1:49

Okay, it's like when you glance into a teenager's messy bedroom.

SPEAKER_01 1:52

Oh, perfect analogy.

SPEAKER_00 1:53

Trevor Burrus, Jr.: You don't need to like individually count every single sock and candy wrapper to know what you're dealing with. You just take one look and your brain instantly goes, Yep, that's a three-hour cleaning job. Trevor Burrus, Jr.

SPEAKER_01 2:03

Right. You're condensing this massive calculation into just one quick glance. And that value function is what prevents the AI from endlessly calculating down a path that's already losing.

SPEAKER_00 2:14

Aaron Powell Wait, okay, but as a skeptical learner here, if the value network is just making a guess to save time, doesn't it run the risk of, you know, guessing wrong, like missing a brilliant unconventional move?

SPEAKER_01 2:27

Aaron Powell Well, that's the beauty of it. It is a guess initially, sure, but it gets refined through something called Monte Carlo Tree Search or MCTS. MCTS, right? This is where it gets truly superhuman. The AI makes that initial guess using its policy network, but then it actually simulates thousands of future games from that specific point.

SPEAKER_00 2:45

So it's playing out all those chaotic futures.

SPEAKER_01 2:47

Aaron Powell Exactly. It plays them out, sees which moves mathematically lead to a win, and it scores them. And through that simulation, it finds a move that works better than its first guess. Oh wow. And here is the absolute genius part. It then trains its own neural network to predict that better proven move from the get-go next time.

SPEAKER_00 3:04

Aaron Powell So it essentially becomes its own teacher. Okay, so if this MCTS magic works so well, why can't we just apply it to, I don't know, large language models right now and make them infinitely smarter?

SPEAKER_01 3:17

That is the million-dollar question. It comes down to verifiable truth. You know, in Go, the game eventually ends. You definitively know if you won or lost.

SPEAKER_00 3:26

So you have perfect objective reality.

SPEAKER_01 3:28

Exactly, to ground the simulation. But language is completely open-ended. There isn't a strict mathematical win or loss to train that value function against.

SPEAKER_00 3:37

Which makes total sense. And honestly, since LLMs can't use this self-play magic yet, it's no wonder businesses are struggling to figure out how to actually get value out of them.

SPEAKER_01 3:46

Yeah, it's a huge hurdle right now.

SPEAKER_00 3:47

And that's exactly where today's sponsor, Embersilk, comes in. If you need help uncovering where AI agents could make the most impact for your business or even your personal life, you should check them out. You can go to embersilk.com for all your AI training, automation, integration, or software development needs.

SPEAKER_01 4:04

And you know, what's truly fascinating here, beyond just the board game, is how this concept completely redefines humanity's approach to massive real world problems.

SPEAKER_00 4:13

Aaron Powell Right, because the original research breaks down how a relatively simple, what, 10-layer neural network can tackle famously impossible NP hard problems.

SPEAKER_01 4:24

Yeah, in what they call a single forward pass.

SPEAKER_00 4:26

Aaron Powell Which sounds super intimidating. That basically means it processes the massive data and spits out the answer in one single continuous calculation, right? Instead of having to stop, check its work, and calculate again.

SPEAKER_01 4:39

Exactly. If we connect this to the bigger picture, Eric Jang compares it to tracking the weather. To predict a hurricane, you don't actually need to perfectly simulate the trajectory of every single wind molecule.

SPEAKER_00 4:52

Right. That would be impossible.

SPEAKER_01 4:53

You just need to capture the macroscopic structure of the system.

SPEAKER_00 4:56

And that is incredibly optimistic for all of us. Right. I mean, it proves we can crack impossibly complex scientific hurdles without needing infinite computing power.

SPEAKER_01 5:04

We really can.

SPEAKER_00 5:05

We've already seen this exact logic decode protein structures with AlphaFold. It just shows that human ingenuity can unlock endless progress and solve challenges we once thought were literally impossible.

SPEAKER_01 5:17

It really does.

SPEAKER_00 5:18

Yeah.

SPEAKER_01 5:18

Which uh brings up a pretty exciting thought for you to ponder.

SPEAKER_00 5:21

Ooh, let's hear it.

SPEAKER_01 5:23

Well, if a small neural net can intuitively compress the complexity of the universe's largest board game into a few intelligent guesses, what happens when we apply this to mapping out the perfect personalized learning pathway for every single student on Earth? Well, wow. Just imagine a world where the AI intuits exactly what you need to learn next to reach your ultimate potential. It could completely revolutionize global education.

SPEAKER_00 5:48

I absolutely love that vision. So hopeful. If you enjoyed this podcast, please subscribe to the show. Hey, leave us a five star review if you can. It really does help get the word out. Thanks for tuning in.