Lessons Learned

Why We Abandoned Python
After 18 Months

18 months of production Python. A working orchestrator. Tests passing. Then we rewrote it in Go. Not a port — a new system. Here’s why.

Applied Minds AI
·
Architecture Decision

The short version

The RapidLaunch orchestrator was built in Python/Flask over 18 months. It worked. We rewrote it in Go anyway. Python’s duck typing hides errors until runtime — and in infrastructure code, runtime errors mean misconfigured servers. Celery’s distributed async is fragile compared to goroutines. Deployment is virtualenv hell vs. a single binary. And AI-generated Python has the worst signal-to-noise ratio of any major language. The internal documentation says “NEW system, not a port.” The Python codebase was frozen and reclassified as reference material.

The Decision

The RapidLaunch orchestrator was originally built in Python. Flask for the API. Celery for async processing. SQLAlchemy for the ORM. Redis for the message broker. 18 months of development, a working product, tests passing, deployed in production.

We rewrote it in Go.

Not a port — the internal documentation explicitly says “NEW system, not a port.” The Python codebase was frozen and reclassified as “reference material for understanding requirements, not a codebase to translate.” We didn’t translate line by line. We took the requirements, the lessons, and the domain knowledge — and built it properly in a language suited to what we were actually building.

Duck Typing Hides Errors Until Production

Python’s dynamic typing is a feature in exploratory code and a liability in infrastructure. A function that accepts Any and returns Any works fine until a string arrives where an integer was expected — and in infrastructure code, that runtime error doesn’t mean a failed page render. It means:

IMAGE

Architecture ComparisonArchitecture comparison: Side-by-side — Python orchestrator (left) with duck-typed modules, runtime errors, GIL bottleneck vs Go rewrite (right) with typed interfaces, compile-time guarantees, goroutine concurrency

A container provisioned with the wrong VMID
A firewall rule applied to the wrong interface
A DNS record pointing at the wrong IP
A secret injected into the wrong container

We enforced type hints. We ran mypy. We added runtime validation. And still, Python’s type system is advisory — it doesn’t prevent the code from running with wrong types. It just warns you, if you remember to check.

Go’s type system catches these at compile time. The code doesn’t run until the types are correct. In infrastructure orchestration, that’s not a convenience — it’s a safety requirement.

Deployment Hell vs. Single Binary

Deploying a Python application means managing virtualenvs, pip dependencies, version conflicts, system Python vs. project Python, and the occasional C extension that won’t compile on the target. Every dependency is a potential failure point, and dependencies have dependencies.

Go compiles to a single static binary. Copy the file. Run it. No runtime. No package manager. No virtual environment. No “works on my machine.”

For an orchestrator that runs on multiple Proxmox nodes across multiple data centres, this is the difference between reliable deployment and deployment roulette. One binary per node. scp it. systemctl restart. Done.

Celery Fragility vs. Goroutines

Celery is powerful, but it’s a complex distributed system: Redis as a broker, a separate worker process, serialisation between them, failure modes when workers die or Redis restarts. In 18 months of production, we hit every edge case:

Lost jobs when Redis restarted before AOF flush
Stuck workers that stopped processing but didn’t die
Serialisation errors when object shapes changed between deployments
Broker reconnection failures after network blips
Memory leaks in long-running workers

Go’s goroutines are built into the language. No external broker. No serialisation. No separate process to monitor. Concurrent operations are a language primitive, not a distributed systems problem. For an orchestrator that manages thousands of async container operations, removing that entire failure surface was the single biggest reliability improvement we made.

The AI-Generated Code Quality Problem

This one is specific to AI-driven development — and it’s the reason this lesson connects to everything else we do.

Python has the largest training corpus of any programming language — and the worst signal-to-noise ratio. The massive volume of Jupyter notebooks, tutorials, Stack Overflow snippets, and beginner code has trained AI models to generate experimental-quality Python by default. Ask an LLM to write Python and you’ll get code that works, runs, and passes basic tests — but lacks type annotations, uses bare except clauses, mixes concerns, and takes shortcuts that no production codebase should accept.

Getting AI to produce robust, typed, well-structured Python requires constant correction. Getting AI to produce robust Go requires… asking it to write Go. The language’s strict compiler, mandatory error handling, and convention-over-configuration culture means the training data is already closer to production quality. AI-generated Go needs fewer adversarial review cycles to reach 100/100.

The feedback loop

AI-generated Python enters training data. Future models learn from it. The next generation produces slightly worse Python. That enters training data. Nature published the proof: models trained on recursively generated data collapse. Python’s position as the most popular AI-assisted language means it’s the most exposed to this degradation loop. The signal-to-noise ratio is getting worse, not better.

What We Kept

Python isn’t gone entirely. The Flask API still runs in production while the Go rewrite progresses — it works, it’s tested, and replacing a working system is not the same as building a new one. Ansible playbooks are still in Python/YAML. Some utility scripts remain.

The lesson isn’t “Python is bad.” Python is excellent for data science, scripting, rapid prototyping, and any domain where flexibility matters more than safety. The lesson is that Python is the wrong choice for infrastructure code that AI agents will write, maintain, and extend — because the type system is too loose, the deployment is too fragile, and the AI training data is too polluted.

Months of Python development before the Go decision

Binary — Go compiles to a single deployable file

External brokers — goroutines replace Celery + Redis

100%

Test coverage maintained throughout the entire migration

The lesson isn’t “don’t use Python.” It’s “choose the language that makes AI-generated code safest by default.” For infrastructure, that’s a compiled, statically typed language with strict error handling. Today, that’s Go.

Planning a migration or choosing a stack for AI-driven development?

Language choice matters more when AI is writing the code. We can help you make the right call — and build the process to back it up.

Book a Strategy Call →

Why We Abandoned Python After 18 Months

Why We Abandoned Python
After 18 Months

The Decision

Duck Typing Hides Errors Until Production

Deployment Hell vs. Single Binary

Celery Fragility vs. Goroutines

The AI-Generated Code Quality Problem

What We Kept

Planning a migration or choosing a stack for AI-driven development?

Want the engineering discipline without the overhead?

Pages

Services

Get Started

Why We Abandoned Python After 18 Months

Why We Abandoned PythonAfter 18 Months

The Decision

Duck Typing Hides Errors Until Production

Deployment Hell vs. Single Binary

Celery Fragility vs. Goroutines

The AI-Generated Code Quality Problem

What We Kept

Planning a migration or choosing a stack for AI-driven development?

Want the engineering discipline without the overhead?

Pages

Services

Get Started

Why We Abandoned Python
After 18 Months