Get the latest ideas from Odd Lots.
Plus the best new takeaways about artificial intelligence from other top podcasts — read in minutes, not hours.
or
By continuing, you agree to podbrain's Terms and Privacy Policy.
Joe Weisenthal and Tracy Alloway host Max Spiro, founder and CEO of Pangram Labs, a company specializing in AI-generated content detection.
The conversation explores the technical challenges of distinguishing human from AI writing as artificial intelligence becomes increasingly sophisticated and prevalent across internet platforms.
Spiro reveals that his detection system works by analyzing thousands of micro-decisions in text that neither humans nor AI can easily articulate, achieving remarkable accuracy through deep learning models trained on millions of examples.
The discussion covers the broader implications for internet authenticity, from Reddit bot farms gaming product mentions to journalism ethics, as society grapples with an estimated 40% of web content now being AI-generated.
The Technical Foundation of AI Detection
Pangram Labs achieves a false positive rate of 1 in 10,000 for human writing and 99% accuracy detecting AI text by training models on 'tens of millions of examples' comparing human writing with AI-generated mirrors.
The detection works by analyzing thousands of micro-decisions in text: 'There's dozens or hundreds of ways to phrase every single phrase, and over the course of fifty or one hundred or two hundred words, you're making thousands of decisions' - Max.
Unlike perplexity-based methods that fail with non-native speakers, Pangram uses deep learning models that learn decision patterns neither humans nor AI can articulate, requiring increasingly larger parameter counts to capture frontier model behaviors.
The system can differentiate between AI-assisted and AI-generated content by measuring 'cosine difference' - the distance between original human text and AI-edited versions in multidimensional space.
The Scale of AI Content Infiltration
Current internet composition shows 40% AI-generated content overall, with Medium at over 50% for new articles and Reddit at approximately 10%, representing a dramatic shift since 2023.
A Guardian writer covering the Winter Olympics showed clear evidence of switching to AI usage 'mid to late 2024' when their historical content was analyzed, demonstrating detectable behavioral changes.
SEO article farms have 'switched over to using AI because then instead of having to pay writers you could turn out articles for pennies on the dollar' - Max, driving much of the 40% figure.
Startups now sell promises to companies for 'organic mentions on Reddit' using AI bots that 'seem organic' but naturally recommend products, corrupting platforms that train future AI models.
Gaming Detection and Future Challenges
Adversarial testing through multiple language translations (English→Chinese→Hebrew→English) still resulted in successful AI detection, demonstrating robustness against obfuscation attempts.
A friend's overnight experiment using Claude API to generate human-passing text succeeded but produced 'incoherent' and 'grammatically incorrect' output, suggesting current limitations in gaming the system.
Training data scarcity poses future challenges as pre-2023 human content remains the 'near infinite data reservoir,' but modern topics require identifying 'trusted actors' for ongoing model training.
As frontier models become more capable, 'their output distribution gets more complex' requiring Pangram to continuously increase model size to capture higher complexity functions.
Societal Implications and Platform Responses
The fundamental heuristic that 'if you came across a piece of writing and the punctuation was excellent and the spelling was excellent...this has been written by a smart person' has been completely severed.
Major platforms show 'mixed incentives' - Google simultaneously promotes AI writing tools while 'working very hard to deal with the AI slop on the Internet in search results' to serve real content.
Max advocates for social norms where 'it is rude to send other people undisclosed AI outputs' because people seek human opinions, not ChatGPT responses they 'could have done myself.'
The worst-case scenario resembles 'dead internet theory' where 'every space that's open and accessible is just flooded by bots' forcing authentic communication into 'very walled garden like closed servers like Discord.'
From Odd Lots. Get a note like this from every new episode.