4 min read
Python Is Poisoning AI-Generated Code (And Developers Keep Drinking)
The language of AI might be the worst choice for the developers AI created
Python’s massive training corpus is mostly garbage. Jupyter notebooks, Stack Overflow snippets, tutorial code, university assignments. AI models learned the median Python programmer’s output — and that median is catastrophically low. Anthropic knows this. They built Claude Code in TypeScript.
A note on honesty: this is a hypothesis with substantial evidence, not settled science. No study directly measures “training data quality by language.” We’re assembling circumstantial evidence. The case is strong. It is not proven.
The Uncomfortable Truth
Here’s something nobody in the Python community wants to hear: the language that powers every major AI model is one of the worst choices for the code those models generate.
Not because Python is a bad language. It’s excellent at what it was designed for. But AI code generation doesn’t care about what a language was designed for. It cares about what exists in training data. And Python’s training data is a dumpster fire.
What’s Actually in Python’s Training Data
Python has more code on GitHub than any other language. Sounds like an advantage, right? It’s a liability.
The Python corpus is dominated by:
- Jupyter notebooks — experimental code optimized for exploration, not production
- Stack Overflow answers — isolated snippets stripped of context
- Tutorial code — written to teach, not to maintain
- University assignments — first attempts by learners
- Quick scripts — hacks with no thought to longevity
When an AI model trains on this corpus, it learns to generate the median. And the median Python code is a Jupyter notebook written by someone who learned programming last Tuesday.
The language that enabled the AI revolution may be among the worst choices for the developers that revolution created.
Anthropic Knows. They Chose TypeScript.
Anthropic — the company behind Claude — built their AI coding tool in TypeScript. Not Python. Multiple factors influenced this: Python is poorly suited for distributing CLI tools as consumer products. But Anthropic explicitly cited the model’s proficiency: they wanted a stack “which we didn’t need to teach.”
This is one data point, not proof. But it’s a revealing one. A company with deep insight into model capabilities chose a statically-typed language for an AI-assisted project. 90% of Claude Code was written by Claude itself — in TypeScript.
If Anthropic’s own engineers don’t trust AI-generated Python enough to build their flagship product with it, why should you?
The Vibecoder Trap
This matters most for vibecoders — people building software primarily through AI assistance. They choose Python because it looks simple, because AI seems fluent in it, because it’s “the language of AI.”
What they get: a Flask endpoint with no error handling. SQL injection vulnerabilities. Missing authentication. Unpinned dependencies. Code that runs on whatever Python version happens to exist.
Python’s permissiveness ensures nothing stops this code from reaching production. There’s no compiler saying “stop.” No type system saying “think.” Just a runtime error waiting for the first real traffic.
The Feedback Loop That Makes It Worse
AI generates mediocre Python. Developers commit it. That code enters training datasets. The next generation of models trains on it. Quality regresses toward a mean that was already low.
Nature published the proof in July 2024: “AI models collapse when trained on recursively generated data.” The mechanism is mathematically demonstrated. Model collapse is not a risk — it’s a certainty without active intervention.
So What Do You Do?
If you’re a vibecoder starting a new project: use TypeScript. The type system provides guardrails. The training data is cleaner. The compiler catches errors you can’t see. This isn’t permanent advice — Python with aggressive curation might catch up. But today, TypeScript offers the best chance of shipping code that works.
If you’re stuck with Python: review every line. Enforce type hints with mypy. Add the error handling the AI forgot. Don’t trust output just because it runs.
And if you’re building AI tools? We need a new default: Foundation Standard code. Not experimental-grade. Not production-hardened. The middle tier — code with brakes and seatbelts, even if it doesn’t have AC and a stereo yet. Code with the structure to become production code when the time comes.
Want the full evidence?
This is the short version. The deep dive has 18 peer-reviewed citations, the Anthropic evidence in detail, and the complete Foundation Standard proposal.
- Part 1: Python Is Poisoning AI-Generated Code (You are here)
- Part 2: The Full Evidence — A Deep Dive
- Part 3: The Fix — Foundation Standard & Why AI Needs Airbags