Intellectually Curious

Talkie Time Machine: A 13B AI Trained on the 1930s Library

Mike Breault

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 6:14

We dive into Talkie, a 13‑billion‑parameter AI raised in a sealed pre‑1931 library. Trained on 260 billion words published before 1931 and guided by etiquette manuals, Victorian prose, and historical letters, Talkie challenges our ideas of AI reasoning, generalization, and how a mind built from the past perceives the future. We explore how it learns to converse without modern data, its surprising ability to encode modern concepts like programming languages, and the engineering battles against temporal leakage and OCR quirks. A thought-provoking look at how training data shape intelligence—and what a mind forged in the past can reveal about the future of AI.


Note:  This podcast was AI-generated, and sometimes AI can make mistakes.  Please double-check any critical information.

Sponsored by Embersilk LLC

SPEAKER_01

You know, looking over the research logs you sent for today's deep dive, I I instantly had this ridiculous fantasy I've actually had for years.

SPEAKER_00

Oh yeah. What's that?

SPEAKER_01

Well, I just want to build a time machine, grab some Victorian gentleman right off the streets of London, hand him a modern smartphone, and just, you know, watch his mind completely melt.

SPEAKER_00

Right. Yeah. It's the ultimate thought experiment. Like, how would a mind built totally by the past process the realities of the future?

SPEAKER_01

Aaron Powell Exactly. And that is really what we are unpacking today. Your sources introduce us to this thing called Pocky, which is essentially that time machine, but um as a 13 billion parameter AI model.

SPEAKER_00

Yeah, 13 billion is massive.

SPEAKER_01

Right. So think of those parameters as the synthetic brain connections processing everything. It was trained exclusively on texts from before 1931. So it's this brilliant entity just trapped in the past. But uh before we get into what this vintage AI teaches us about machine learning, I should mention that if you want to uncover where AI agents can make a real impact in your own life or business, you should check out our sponsor, Embersilk.

SPEAKER_00

They're great for that.

SPEAKER_01

Yeah, you can go to Embersilk.com for AI training, automation, integration, software development, all your AI needs. So, okay, let's get into the mechanics here. How do you actually build a digital time traveler without accidentally giving it a modern smartphone?

SPEAKER_00

Aaron Powell Well, it's really all about meticulously curating its environment. The researchers fed Talkie this massive corpus of uh 260 billion words, but with a hard physical cutoff.

SPEAKER_01

Aaron Powell Like an absolute cutoff.

SPEAKER_00

Aaron Ross Powell Exactly. Everything had to be published before December 31st, 1930. They chose that date specifically because that's when public domain copyright ends in the U.S., which meant they had free access to these huge archives. Aaron Powell Okay.

SPEAKER_01

So it's basically like raising an AI inside a sealed 1930s library. But looking at your notes, I mean, modern AI is learned to chat by scanning billions of modern internet conversations, right? Trevor Burrus, Jr.

SPEAKER_00

Right. Reddit threads, social media, all of it. Aaron Powell Yeah.

SPEAKER_01

So if Taki is sealed in this old library without the Internet, how does it even know how to hold a back-and-forth dialogue?

SPEAKER_00

Aaron Powell See, that's where the researchers had to get really creative. Because to teach it how to interact, they actually used 19th century etiquette manuals. Oh. Yeah, etiquette manuals, old parlor game guides, and historical letter writing books.

SPEAKER_01

Aaron Powell That is wild. So it basically learned the rules of a parlor conversation. But manners are just memorized social formulas, right? How do we know it actually learned to think and isn't just, you know, parroting back old Victorian greetings?

SPEAKER_00

Aaron Powell Well, they tested it against one of the biggest mysteries in AI right now, which is the contamination problem. You see, when a modern AI aces a complex coding test, we don't actually know if it's reasoning through the problem or if it just, you know, memorized the exact answer from some programming website.

SPEAKER_01

Oh, because it's read the whole internet already.

SPEAKER_00

Exactly. But talking solves this. Its training data is provably uncontaminated by the modern internet.

SPEAKER_01

Wait, I'm struggling with this part though. Like, if this AI has literally never seen a single line of computer code, because code didn't exist in 1930, how is it physically possible for it to output functioning Python? It shouldn't even know what a programming language is.

SPEAKER_00

You're right, it absolutely shouldn't. But underneath those etiquette manuals and old novels, there's a fundamental structure of human logic.

SPEAKER_01

Oh, like cause and effect.

SPEAKER_00

Right, cause and effect, sequential steps, grammar. And Python is really just another grammar. So once researchers showed Taki a few examples of the rules of Python in their prompt, it applied that underlying 1930s logic to the new syntax and successfully wrote simple programs.

SPEAKER_01

That is incredible.

SPEAKER_00

It really is. It proves AI can genuinely generalize logic to entirely new concepts.

SPEAKER_01

Okay, that makes perfect sense. It maps old logic onto a new framework. Which leads to my next question. Um, if it maps logic that well, can it predict the future?

SPEAKER_00

Well, the researchers tested exactly that. They fed talkie historical New York Times headlines from after 1930 to measure its surprisingness level.

SPEAKER_01

Surprising.

SPEAKER_00

Yeah, it works by uh having the AI predict the next word. So when actual history drastically differed from its 1930s predictions, its error rate or its surprise just spiked.

SPEAKER_01

So when did it get the most surprise? Like when did history break its brain?

SPEAKER_00

There were massive spikes in surprise during the 1950s and 60s. The AI could actually extrapolate the immediate future of the 1930s pretty logically.

SPEAKER_01

Right. The trajectory made sense to it.

SPEAKER_00

Exactly. But the rapid technological and cultural divergence of the mid-20th century, it just completely defied his historical expectations.

SPEAKER_01

I mean that tracks. But you know, executing this perfectly sounds impossible, honestly. We leave modern fingerprints on everything. How does the data not get contaminated?

SPEAKER_00

Oh, it's a huge engineering challenge. They call it temporal leakage. For instance, an early version of Taki accidentally knew all about Franklin D. Roosevelt's New Deal.

SPEAKER_01

Wait, really? How did it know that if the cutoff was 1930?

SPEAKER_00

Well, it turned out some of the historical documents had modern digital metadata attached to the files, and the AI just absorbed it.

SPEAKER_01

Wow, just from the file tags. And not to mention the physical scanning of the text, right? The sources mentioned that 1930s optical character recognition, or OCR, is pretty terrible.

SPEAKER_00

Yeah, classic OCR really struggles with degraded paper. When they scanned the Wizard of Oz, the computer read it as absolute gibberish.

SPEAKER_01

Just complete nonsense.

SPEAKER_00

Right. But rather than giving up, the researchers are actually building a brand new vintage OCR system from the ground up just to recognize old saunts and perfectly transcribe historical text.

SPEAKER_01

That's amazing. They're literally engineering their way through the past.

SPEAKER_00

They are, yeah.

SPEAKER_01

Well, I hope this deep dive gave you exactly the insights you were looking for when you send us these sources. If you enjoyed this deep dive, please subscribe to the show. Hey, leave us a five-star review if you can. It really does help get the word out. Thanks for tuning in. But I do want to leave you with one last thought to chew on today. If an AI built entirely on dusty novels and parlor guides can extract the underlying logic to write computer code, imagine what incredible unseen problem solving frameworks might be hiding out there in music or architecture or disciplines we haven't even thought to train AI on yet. The future of human ingenuity is looking incredibly bright.