Intellectually Curious
Intellectually Curious is a podcast by Mike Breault featuring over 1,800 AI-powered explorations across science, mathematics, philosophy, and personal growth. Each short-form episode is generated, refined, and published with the help of large language models—turning curiosity into an ongoing audio encyclopedia. Designed for anyone who loves learning, it offers quick dives into everything from combinatorics and cryptography to systems thinking and psychology.
Inspiration for this podcast:
"Muad'Dib learned rapidly because his first training was in how to learn. And the first lesson of all was the basic trust that he could learn. It's shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad'Dib knew that every experience carries its lesson."
― Frank Herbert, Dune
Note: These podcasts were made with NotebookLM. AI can make mistakes. Please double-check any critical information.
Intellectually Curious
Talkie Time Machine: A 13B AI Trained on the 1930s Library
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
We dive into Talkie, a 13‑billion‑parameter AI raised in a sealed pre‑1931 library. Trained on 260 billion words published before 1931 and guided by etiquette manuals, Victorian prose, and historical letters, Talkie challenges our ideas of AI reasoning, generalization, and how a mind built from the past perceives the future. We explore how it learns to converse without modern data, its surprising ability to encode modern concepts like programming languages, and the engineering battles against temporal leakage and OCR quirks. A thought-provoking look at how training data shape intelligence—and what a mind forged in the past can reveal about the future of AI.
Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.
Sponsored by Embersilk LLC
You know, looking over the research logs you sent for today's deep dive, I I instantly had this ridiculous fantasy I've actually had for years.
SPEAKER_00Oh yeah. What's that?
SPEAKER_01Well, I just want to build a time machine, grab some Victorian gentleman right off the streets of London, hand him a modern smartphone, and just, you know, watch his mind completely melt.
SPEAKER_00Right. Yeah. It's the ultimate thought experiment. Like, how would a mind built totally by the past process the realities of the future?
SPEAKER_01Aaron Powell Exactly. And that is really what we are unpacking today. Your sources introduce us to this thing called Pocky, which is essentially that time machine, but um as a 13 billion parameter AI model.
SPEAKER_00Yeah, 13 billion is massive.
SPEAKER_01Right. So think of those parameters as the synthetic brain connections processing everything. It was trained exclusively on texts from before 1931. So it's this brilliant entity just trapped in the past. But uh before we get into what this vintage AI teaches us about machine learning, I should mention that if you want to uncover where AI agents can make a real impact in your own life or business, you should check out our sponsor, Embersilk.
SPEAKER_00They're great for that.
SPEAKER_01Yeah, you can go to Embersilk.com for AI training, automation, integration, software development, all your AI needs. So, okay, let's get into the mechanics here. How do you actually build a digital time traveler without accidentally giving it a modern smartphone?
SPEAKER_00Aaron Powell Well, it's really all about meticulously curating its environment. The researchers fed Talkie this massive corpus of uh 260 billion words, but with a hard physical cutoff.
SPEAKER_01Aaron Powell Like an absolute cutoff.
SPEAKER_00Aaron Ross Powell Exactly. Everything had to be published before December 31st, 1930. They chose that date specifically because that's when public domain copyright ends in the U.S., which meant they had free access to these huge archives. Aaron Powell Okay.
SPEAKER_01So it's basically like raising an AI inside a sealed 1930s library. But looking at your notes, I mean, modern AI is learned to chat by scanning billions of modern internet conversations, right? Trevor Burrus, Jr.
SPEAKER_00Right. Reddit threads, social media, all of it. Aaron Powell Yeah.
SPEAKER_01So if Taki is sealed in this old library without the Internet, how does it even know how to hold a back-and-forth dialogue?
SPEAKER_00Aaron Powell See, that's where the researchers had to get really creative. Because to teach it how to interact, they actually used 19th century etiquette manuals. Oh. Yeah, etiquette manuals, old parlor game guides, and historical letter writing books.
SPEAKER_01Aaron Powell That is wild. So it basically learned the rules of a parlor conversation. But manners are just memorized social formulas, right? How do we know it actually learned to think and isn't just, you know, parroting back old Victorian greetings?
SPEAKER_00Aaron Powell Well, they tested it against one of the biggest mysteries in AI right now, which is the contamination problem. You see, when a modern AI aces a complex coding test, we don't actually know if it's reasoning through the problem or if it just, you know, memorized the exact answer from some programming website.
SPEAKER_01Oh, because it's read the whole internet already.
SPEAKER_00Exactly. But talking solves this. Its training data is provably uncontaminated by the modern internet.
SPEAKER_01Wait, I'm struggling with this part though. Like, if this AI has literally never seen a single line of computer code, because code didn't exist in 1930, how is it physically possible for it to output functioning Python? It shouldn't even know what a programming language is.
SPEAKER_00You're right, it absolutely shouldn't. But underneath those etiquette manuals and old novels, there's a fundamental structure of human logic.
SPEAKER_01Oh, like cause and effect.
SPEAKER_00Right, cause and effect, sequential steps, grammar. And Python is really just another grammar. So once researchers showed Taki a few examples of the rules of Python in their prompt, it applied that underlying 1930s logic to the new syntax and successfully wrote simple programs.
SPEAKER_01That is incredible.
SPEAKER_00It really is. It proves AI can genuinely generalize logic to entirely new concepts.
SPEAKER_01Okay, that makes perfect sense. It maps old logic onto a new framework. Which leads to my next question. Um, if it maps logic that well, can it predict the future?
SPEAKER_00Well, the researchers tested exactly that. They fed talkie historical New York Times headlines from after 1930 to measure its surprisingness level.
SPEAKER_01Surprising.
SPEAKER_00Yeah, it works by uh having the AI predict the next word. So when actual history drastically differed from its 1930s predictions, its error rate or its surprise just spiked.
SPEAKER_01So when did it get the most surprise? Like when did history break its brain?
SPEAKER_00There were massive spikes in surprise during the 1950s and 60s. The AI could actually extrapolate the immediate future of the 1930s pretty logically.
SPEAKER_01Right. The trajectory made sense to it.
SPEAKER_00Exactly. But the rapid technological and cultural divergence of the mid-20th century, it just completely defied his historical expectations.
SPEAKER_01I mean that tracks. But you know, executing this perfectly sounds impossible, honestly. We leave modern fingerprints on everything. How does the data not get contaminated?
SPEAKER_00Oh, it's a huge engineering challenge. They call it temporal leakage. For instance, an early version of Taki accidentally knew all about Franklin D. Roosevelt's New Deal.
SPEAKER_01Wait, really? How did it know that if the cutoff was 1930?
SPEAKER_00Well, it turned out some of the historical documents had modern digital metadata attached to the files, and the AI just absorbed it.
SPEAKER_01Wow, just from the file tags. And not to mention the physical scanning of the text, right? The sources mentioned that 1930s optical character recognition, or OCR, is pretty terrible.
SPEAKER_00Yeah, classic OCR really struggles with degraded paper. When they scanned the Wizard of Oz, the computer read it as absolute gibberish.
SPEAKER_01Just complete nonsense.
SPEAKER_00Right. But rather than giving up, the researchers are actually building a brand new vintage OCR system from the ground up just to recognize old saunts and perfectly transcribe historical text.
SPEAKER_01That's amazing. They're literally engineering their way through the past.
SPEAKER_00They are, yeah.
SPEAKER_01Well, I hope this deep dive gave you exactly the insights you were looking for when you send us these sources. If you enjoyed this deep dive, please subscribe to the show. Hey, leave us a five-star review if you can. It really does help get the word out. Thanks for tuning in. But I do want to leave you with one last thought to chew on today. If an AI built entirely on dusty novels and parlor guides can extract the underlying logic to write computer code, imagine what incredible unseen problem solving frameworks might be hiding out there in music or architecture or disciplines we haven't even thought to train AI on yet. The future of human ingenuity is looking incredibly bright.