Why We Abandoned Python
After 18 Months
18 months of production Python. A working orchestrator. Tests passing. Then we rewrote it in Go. Not a port — a new system. Here’s why.
The RapidLaunch orchestrator was built in Python/Flask over 18 months. It worked. We rewrote it in Go anyway. Python’s duck typing hides errors until runtime — and in infrastructure code, runtime errors mean misconfigured servers. Celery’s distributed async is fragile compared to goroutines. Deployment is virtualenv hell vs. a single binary. And AI-generated Python has the worst signal-to-noise ratio of any major language. The internal documentation says “NEW system, not a port.” The Python codebase was frozen and reclassified as reference material.
The Decision
The RapidLaunch orchestrator was originally built in Python. Flask for the API. Celery for async processing. SQLAlchemy for the ORM. Redis for the message broker. 18 months of development, a working product, tests passing, deployed in production.
We rewrote it in Go.
Not a port — the internal documentation explicitly says “NEW system, not a port.” The Python codebase was frozen and reclassified as “reference material for understanding requirements, not a codebase to translate.” We didn’t translate line by line. We took the requirements, the lessons, and the domain knowledge — and built it properly in a language suited to what we were actually building.
Duck Typing Hides Errors Until Production
Python’s dynamic typing is a feature in exploratory code and a liability in infrastructure. A function that accepts Any and returns Any works fine until a string arrives where an integer was expected — and in infrastructure code, that runtime error doesn’t mean a failed page render. It means:
- A container provisioned with the wrong VMID
- A firewall rule applied to the wrong interface
- A DNS record pointing at the wrong IP
- A secret injected into the wrong container
We enforced type hints. We ran mypy. We added runtime validation. And still, Python’s type system is advisory — it doesn’t prevent the code from running with wrong types. It just warns you, if you remember to check.
Go’s type system catches these at compile time. The code doesn’t run until the types are correct. In infrastructure orchestration, that’s not a convenience — it’s a safety requirement.
Deployment Hell vs. Single Binary
Deploying a Python application means managing virtualenvs, pip dependencies, version conflicts, system Python vs. project Python, and the occasional C extension that won’t compile on the target. Every dependency is a potential failure point, and dependencies have dependencies.
Go compiles to a single static binary. Copy the file. Run it. No runtime. No package manager. No virtual environment. No “works on my machine.”
For an orchestrator that runs on multiple Proxmox nodes across multiple data centres, this is the difference between reliable deployment and deployment roulette. One binary per node. scp it. systemctl restart. Done.
Celery Fragility vs. Goroutines
Celery is powerful, but it’s a complex distributed system: Redis as a broker, a separate worker process, serialisation between them, failure modes when workers die or Redis restarts. In 18 months of production, we hit every edge case:
- Lost jobs when Redis restarted before AOF flush
- Stuck workers that stopped processing but didn’t die
- Serialisation errors when object shapes changed between deployments
- Broker reconnection failures after network blips
- Memory leaks in long-running workers
Go’s goroutines are built into the language. No external broker. No serialisation. No separate process to monitor. Concurrent operations are a language primitive, not a distributed systems problem. For an orchestrator that manages thousands of async container operations, removing that entire failure surface was the single biggest reliability improvement we made.
The AI-Generated Code Quality Problem
This one is specific to AI-driven development — and it’s the reason this lesson connects to everything else we do.
Python has the largest training corpus of any programming language — and the worst signal-to-noise ratio. The massive volume of Jupyter notebooks, tutorials, Stack Overflow snippets, and beginner code has trained AI models to generate experimental-quality Python by default. Ask an LLM to write Python and you’ll get code that works, runs, and passes basic tests — but lacks type annotations, uses bare except clauses, mixes concerns, and takes shortcuts that no production codebase should accept.
Getting AI to produce robust, typed, well-structured Python requires constant correction. Getting AI to produce robust Go requires… asking it to write Go. The language’s strict compiler, mandatory error handling, and convention-over-configuration culture means the training data is already closer to production quality. AI-generated Go needs fewer adversarial review cycles to reach 100/100.
AI-generated Python enters training data. Future models learn from it. The next generation produces slightly worse Python. That enters training data. Nature published the proof: models trained on recursively generated data collapse. Python’s position as the most popular AI-assisted language means it’s the most exposed to this degradation loop. The signal-to-noise ratio is getting worse, not better.
What We Kept
Python isn’t gone entirely. The Flask API still runs in production while the Go rewrite progresses — it works, it’s tested, and replacing a working system is not the same as building a new one. Ansible playbooks are still in Python/YAML. Some utility scripts remain.
The lesson isn’t “Python is bad.” Python is excellent for data science, scripting, rapid prototyping, and any domain where flexibility matters more than safety. The lesson is that Python is the wrong choice for infrastructure code that AI agents will write, maintain, and extend — because the type system is too loose, the deployment is too fragile, and the AI training data is too polluted.
The lesson isn’t “don’t use Python.” It’s “choose the language that makes AI-generated code safest by default.” For infrastructure, that’s a compiled, statically typed language with strict error handling. Today, that’s Go.
Planning a migration or choosing a stack for AI-driven development?
Language choice matters more when AI is writing the code. We can help you make the right call — and build the process to back it up.