The 9 Stages of AI-Assisted Software Development

Stages 1–3

The developer and the assistant

Where most teams are today. The AI helps — but the developer owns every decision, every review, every integration.

Developer does

Writes code manually
Reviews AI suggestions line by line
Full cognitive load stays with the developer

AI does

Completes lines and functions from context
Developer and AI work at the same level — same task, same pace

Automated: nothing

Developer does

Describes what needs changing
Reviews all output — still owns every line
Writes fewer lines, but review burden grows

AI does

Rewrites functions, files, components on instruction
Output volume increases significantly

Automated: nothing

Developer does

Prompts brief, reviews PRD, approves approach
Tests and debugs with AI — conversational back-and-forth
Still deeply involved in every decision

AI does

Drafts PRDs, generates implementation
Iterates on feedback across multiple rounds

This is where most “vibe coders” operate. LLM-controlled TDD, brute-force retry loops, and conversational debugging. It works — until it doesn’t. The productivity gain is real but linear: one developer, one conversation, one task. The ceiling is the developer’s attention.

Automated: some development. Review stays manual.

Stages 4–6

Building the infrastructure

The transition from Stage 3 to Stage 4 is the largest single leap — from ad hoc prompting to a designed, repeatable process. Something has to be built.

Developer does

Configures pipeline steps manually
Monitors output, intervenes on failures
Fills gaps between automated phases

AI does

Executes specific phases: code gen, tests, basic review
Phase transitions are manual or fragile

Automated: code generation, test execution. Manual: review, integration, deployment.

Developer does

Writes specs, reviews pipeline output
Makes architectural decisions
Pipeline handles everything else

AI does

Full TDD cycle: spec → red tests → green → review → remediation
Quality gates between every phase
Deterministic orchestration — scripts control, LLMs execute

~50%

Tests are harder than code

Half of pipeline remediation cycles are triggered by test quality, not implementation quality. Writing tests that verify behavior rather than implementation is the hardest part.

≠

Retry loops are not pipelines

A generate-fail-retry loop is not a pipeline. A designed pipeline adds adversarial review gates, scoring, stall detection, and circuit breakers. The difference is structural.

Pre-built pipelines

At this stage, pipelines are pre-configured — different gate thresholds, review depth, model selection, and retry limits for each context. The system defaults to the appropriate pipeline; users can request new ones, but most work runs through what’s already there. Includes pipelines for non-software artifacts like documentation and training materials.

PoC MVP Team Org SaaS Mission-Critical

3–4×

net output increase

Per-task, the pipeline is slower — but devs run 4–6 tasks at once

Review gates, adversarial checks, and remediation loops add hours to each task. That sounds like a problem — until you compare it to a world where devs handle one or two tasks and context-switch constantly. The pipeline builds overnight. Net delivery is 3–4× higher, not lower.

Single pipeline — spec to deploy

Automated: everything except spec authoring and architectural decisions.

Deep dive: Stage 5 →

Why deterministic orchestration matters

At Stage 5+, a critical choice: who controls the workflow?

LLM-directed (path of least resistance) — unbounded retries, step skipping, context loss, self-review blind spots
Deterministic (scripts enforce sequence, LLMs execute steps) — reproducible, auditable, decisions from state files not model judgment

Why deterministic pipelines outperform LLM-directed workflows →

Developer does

Provides specifications or intent
Reviews generated pipeline configuration
Selects pipeline type — PoC gets minimal gates, production gets full adversarial review

AI does

Reads spec, generates complete pipeline — stages, agents, gates, retry logic
Generates pipelines on demand for new artifact types
Makes non-software outputs possible without manual pipeline building

Beyond software

The system generates customised pipelines for any task, giving it the ability to produce any digital artifact — not just software. A single PRD now triggers specs across multiple domains. At Stage 5 you’d rely on pre-built pipelines. At Stage 6, the system generates new ones as needed:

PRD: “Add patient intake to the portal”

↓ generates specs for:

Intake form + API

Software specs → code, tests, deploy

Staff training materials

Tutorial videos, walkthroughs, quizzes

Patient-facing docs

Help pages, FAQ, onboarding emails

Compliance & ops

Audit logging, monitoring, runbooks

Each output is produced by its own pipeline — with validation, review, and quality gates appropriate to that artifact type. They stay consistent because they’re all derived from the same PRD.

Automated: pipeline generation, configuration, execution. Manual: specifications, architectural review.

Deep dive: Stage 6 →

A different kind of capability

Stages 1–6 are about production — building things faster, more reliably, at scale. Stages 7–9 are about awareness — what the system knows about itself, the organization it serves, and the world around it. These aren’t levels you unlock in sequence. They’re capabilities that become effective when the production infrastructure is mature enough to act on what the system learns.

At these stages, “software” is just one output. The same pipeline discipline applies to documentation, training materials, marketing, monitoring — everything the organization produces digitally.

Stage 7

Knows itself

What it built, what failed, what was learned. Manager Agents own each major feature.

Stage 8

Knows who it serves

Policies, regulations, domain conventions. Manager Agents apply institutional context automatically.

Stage 9

Acts on the world

Manager Agents don’t wait for specs — they originate work from goals, signals, and constraints.

Stages 7–9

Expanding awareness

Stages 1–6 describe production maturity — building things faster, more reliably, at scale. Stages 7–9 describe something different: an expanding sphere of awareness. The innermost ring is the system itself — what it built, what failed, what was learned. The next ring is the organization it serves — policies, standards, regulations, institutional knowledge. The outermost ring is the environment around it — technology shifts, regulatory changes, user behavior, ecosystem health. At each ring, the system can govern more — because stewards can only govern what the system can see.

Developer does

Governs knowledge — promotion rules, contradiction resolution, what retires
Sets approval thresholds for autonomous changes
Reviews and approves Manager Agent proposals
Defines self-reference boundaries — what the system can modify about itself

AI does

Accumulates structured knowledge across every run: specs, deficiency records, test outcomes, remediation history, model performance
Maintains the current-state system descriptor — as-built reality, continuously updated, separate from PRDs and design specs
Classifies all knowledge by type: intent, constraint, design, reality, outcome, procedural, runtime, data, security
A formal reasoning layer (logic engine, not LLM) governs policy applicability, contradiction handling, and escalation
Manager Agents introduced — reactive but informed. They know production history, not just current health signals

The defining capability — Knowledge Synthesis

PoC pipeline

Build fast, minimal gates

→

Iterate

Tickets, bugs, edge cases

→

Knowledge

Specs, contracts, test suites

→

Synthesized MVP

Built from everything learned

At Stage 7, we fully embrace what earlier stages implied: code is a disposable artifact. A proof-of-concept (PoC) accumulates knowledge over weeks of iteration — every deficiency, edge case, security finding, behavioral contract. The system synthesizes this into a new spec set and produces a clean codebase with all hard-won knowledge baked in from day one. The organization’s investment is not in the code. It is in the spec history, the deficiency records, and the behavioral contracts. The code is a rendered artifact — the latest expression of accumulated knowledge. When technology changes, the code is regenerated. Nothing learned is lost.

Automated: knowledge accumulation, feature health monitoring, Knowledge Synthesis. Manual: knowledge governance, synthesis approval.

Deep dive: Stage 7 →

Developer does

Connects knowledge sources — wikis, runbooks, compliance files, infrastructure repos
Resolves policy conflicts and exception approvals
Reviews escalations when regulatory implications are uncertain

AI does

Discovers, synthesizes, and applies org knowledge automatically during artifact production
Infers applicable policies — “Toronto hospital” implies PHIPA, Ontario regs, audit logging, data residency — without the developer writing any of it
Treats security as a parallel reasoning domain — trust boundaries, threat models, supply-chain risk — not just a validator step
Evaluates impact with institutional context: regulatory consequences, policy compliance, cross-system effects
Refuses to produce artifacts when compliance cannot be verified — escalates rather than guesses
Manager Agents expand: detect regulatory changes affecting their feature, monitor for institutional drift

The defining capability — automatic policy inference

Feature brief

“Patient intake form — Toronto hospital”

→

Policy inference

→ PHIPA patient data requirements

→ Ontario healthcare regulations

→ Org audit logging standards

→ Data residency constraints

Developer wrote none of this

→

Correct spec

Institutionally correct before a line of code is written

⛔

The system also refuses

If compliance can’t be verified — policy conflict, uncertain regulatory implications, missing validation — the system escalates to a human rather than produce an artifact it can’t stand behind.

Automated: compliance application, cross-system consistency, policy enforcement. Manual: knowledge curation, policy decisions.

Deep dive: Stage 8 →

Developer does

Sets goals and strategic intent
Governs budget, risk, scope, and approval thresholds
Approves or rejects proposals — including those originating from goals or user conversations
Governance design is now the most consequential activity — the system self-organizes toward whatever configuration the constraints make stable

AI does

Monitors four signal domains: operational (crashes, latency, errors), artifact integrity (stale docs, inconsistent diagrams), environmental (regulatory updates, dependency CVEs, API changes), and human (bug reports, feature requests, user conversations)
Converts user natural-language ideas into structured proposals — “It would be nice if this exported to Excel” becomes a spec candidate
Originates work from goals: goal → inferred specs → pipelines → artifacts — without a developer writing a single ticket
Coordinates via the artifact dependency graph — a change in one artifact cascades to docs, SDKs, diagrams, monitoring rules, runbooks, each handled by the responsible Manager Agent
Every proposal feeds back into Stage 7 knowledge — the cycle is continuous

The defining capability — goal-directed origination

Four signal domains

● Operational — latency spike, validator failure

● Artifact integrity — stale docs, broken tutorial

● Environmental — CVE published, regulation updated

● Human — “can the form auto-fill insurance data?”

→

Manager Agents assess

Auth Agent

Intake Agent

Reports Agent

→

Proposed work

“Patch 3 affected modules, update audit logs, regenerate docs”
Est. 2h · low risk

✓ Approve

Reject

The complete steward workflow: signals → analysis → impact evaluation → proposal → spec → Stage 6 pipeline → artifact update → back into Stage 7 knowledge. The cycle is continuous. Every proposal requires human approval before execution.

Goal: “Improve patient onboarding”

Redesigned intake workflows Updated patient documentation Compliance changes Staff training materials Monitoring dashboards

All inferred from the goal. All coordinated across Manager Agents. All governed.

Automated: signal monitoring, impact assessment, pipeline triggering, cross-agent coordination. Manual: goal setting, budget governance, approval.

Deep dive: Stage 9 — Proactive Origination →

The 9 Stages of AI‑Assisted Software Development

The developer and the assistant

Building the infrastructure

Expanding awareness

Emergent properties of a mature system

Continue the series

Ready to design your team’s next stage?

Pages

Services

Get Started