A structured workflow for building features with AI coding agents
The biggest risk with AI coding agents isn’t that they write bad code. It’s that they solve the wrong problem. Hand an agent a vague prompt and it will confidently produce something — often impressive, occasionally correct, rarely what you actually needed.
The fix isn’t better prompts. It’s structure. A workflow that separates thinking from doing, uses the agent’s strengths (tireless research, parallel execution, consistent follow-through) while keeping human judgement where it matters most: deciding what to build and whether it’s right.
Here’s the seven-step process I use with Claude Code. The principles apply to any AI coding agent, but the specifics lean on Claude Code’s sub-agent system and task management.
Step 1: Gather requirements
Nothing changes here. Talk to stakeholders, understand the problem, capture constraints. No AI agent can substitute for the conversation where a product manager says “oh, and it also needs to work offline” or a designer sketches a flow on a whiteboard.
The AI workflow starts after you know what you’re building. Requirements don’t need to be a formal PRD — a clear written summary of what the feature should do, who it’s for, and what constraints exist is enough. The important thing is that it’s written down, because everything downstream depends on it.
Step 2: High-level scoping with parallel research
This is where the agent earns its keep. Feed the requirements into Claude Code and instruct it to research the codebase before proposing anything. The key is parallelism — spawn multiple sub-agents, each investigating a different angle simultaneously.
A prompt might look like this:
Here are the requirements for [feature]. Before proposing an approach,
I need you to research our codebase thoroughly. Launch parallel sub-agents to:
1. Find all existing code related to [domain area] — models, services,
API endpoints, tests. Map out the current architecture.
2. Identify similar patterns we've used before for [comparable feature].
How did we handle [specific concern]?
3. Check for potential conflicts — what existing functionality might
this feature affect? Are there shared components or database tables
that would need changes?
Synthesise the findings into a high-level scoping document covering:
proposed approach, files and systems affected, risks, and open questions.
The sub-agents fan out across the codebase, each working in its own context window. One might trace the data model, another maps the API layer, a third checks test coverage. The results come back to the main session, which synthesises them into a single scoping document.
The output is a markdown file — not code, not a PR. Just a clear description of the proposed approach, the parts of the system it touches, and the questions that still need answering.
Step 3: Review the scope with specialised sub-agents
Read the scope yourself first. You’ll catch things the agent missed — business logic nuances, team conventions that aren’t documented, political considerations that don’t live in code.
Then run it through a second round of AI review. This time, the prompt dispatches specialised reviewers:
Read the scoping document at [path]. Dispatch parallel review agents:
1. Architecture review — does the proposed approach fit our existing
patterns? Are there simpler alternatives?
2. Edge case analysis — what failure modes, race conditions, or data
integrity issues could arise?
3. Dependency review — what existing tests, features, or integrations
might break?
Compile the findings into a review summary with specific concerns
and suggested changes.
This is a feedback loop. The review produces concerns, you update the scope (or ask the agent to), and run the review again. Two or three rounds is usually enough. The goal isn’t perfection — it’s catching the structural problems before they become implementation problems.
The writer/reviewer separation matters here. A fresh agent reviewing a document it didn’t write will catch things the original author missed, just like human code review. If the same session both writes and reviews, it’s biased toward defending its own decisions.
Step 4: Human sign-off
Share the scope with relevant dev team members. This is a sanity check, not a rubber stamp. You’re looking for:
- “We tried something similar last year and it didn’t work because…”
- “This would conflict with what Team X is building right now”
- “The data model assumption in section 3 is wrong”
AI-generated scope is a starting point for discussion, not a substitute for team buy-in. The document is a communication tool — it gives the team something concrete to react to rather than debating in the abstract.
Once the team is satisfied, sign it off. This is the commitment point.
Step 5: Detailed implementation plan
Convert the approved scope into a step-by-step implementation plan. The scope says what and why; the plan says how, in order.
Read the approved scoping document at [path]. Convert it into a
detailed implementation plan. For each step, specify:
- Which files to create or modify
- What functions, classes, or components to write
- What tests to add (unit, integration, or both)
- Dependencies on previous steps
- Verification criteria — how do we know this step is done?
Order the steps so each one builds on the last and can be tested
independently.
The plan should be granular enough that each step could be a single commit. “Add user authentication” is too vague. “Add validateSession middleware to src/middleware/auth.ts that checks JWT expiry and returns 401 for expired tokens” is about right.
Step 6: Plan review feedback loop
Same process as step 3, applied to the implementation plan. The scope review asked “are we building the right thing?” — the plan review asks “will this sequence of steps actually get us there?”
Specific things the review agents should check:
- Ordering — are there hidden dependencies between steps? Will step 4 fail because step 3 didn’t set up a prerequisite?
- Completeness — does the plan cover everything in the scope? Are there scope items with no corresponding implementation step?
- Testability — can each step be verified independently, or are there steps that only work once everything is wired together?
- Migration safety — if the plan involves database changes, are they backwards-compatible? Is there a rollback path?
This feedback loop is typically faster than the scope review — the structural decisions are already made, so you’re mostly catching gaps and ordering issues.
Step 7: Execute with small, testable tasks
Hand the final plan to a fresh Claude Code instance. A clean context window is important here — you don’t want the implementation session carrying baggage from hours of scoping and review.
The instruction is straightforward:
Read the implementation plan at [path]. Break it down into the
smallest possible testable tasks. Work through them one at a time:
- Before implementing each task, add it to your task list
- Write tests first where applicable
- After completing each task, verify it against the implementation plan
- Run the test suite before moving to the next task
- If a task reveals a problem with the plan, stop and flag it
Do not move to the next task until the current one passes verification.
The agent works through the task list methodically — writing a test, implementing the code, running the suite, checking it off, moving on. The task list is the accountability mechanism. If something goes wrong at step 12 of 20, you can see exactly where things stand and what’s left.
This is where test-driven development really shines with AI agents. A failing test is unambiguous feedback. The agent doesn’t need to guess whether its implementation is correct — the test tells it. And if a task turns out to be more complex than the plan anticipated, the agent flags it rather than improvising.
Why this works (and the common objections)
The obvious criticism: this looks like waterfall with extra steps. Marmelab published a pointed critique of spec-driven development, arguing that it reintroduces the planning overhead that agile was supposed to eliminate, and that agents don’t reliably follow their own specifications anyway.
Fair points. But the comparison misses something. Traditional waterfall failed because the feedback loop between “write the spec” and “discover it’s wrong” was measured in months. Here, the entire scoping and planning cycle takes hours, not weeks. You’re doing what Addy Osmani calls “a waterfall in 15 minutes” — compressing the planning phase until the overhead is negligible compared to the bugs it prevents.
A peer-reviewed study of professional developers found they overwhelmingly reject “vibe coding” — trusting agents blindly — in favour of deliberate planning and validation. The developers who get the best results from AI agents aren’t the ones with the cleverest prompts. They’re the ones who plan before implementing and verify every output.
The real tradeoff is upfront structure versus rework. An hour of scoping and review is cheap compared to discovering at implementation time that the approach doesn’t fit the existing architecture, or that you missed a critical edge case that requires rethinking the data model.
Practical takeaways
- Planning is the multiplier. The quality of AI-generated code is determined by how clearly you scope and describe the task, not which model you use.
- Separate writing from reviewing. Fresh context catches things the original author misses — this applies to AI sessions just as much as human code review.
- Use parallelism for research, not implementation. Sub-agents are excellent for investigating a codebase from multiple angles simultaneously. Implementation should be sequential and verified at each step.
- Keep humans in the loop at decision points. AI handles the volume; humans handle the judgement. Team sign-off on the scope is non-negotiable.
- Small tasks with tests beat large tasks without them. The tighter the feedback loop, the less damage a wrong turn can do.
None of this is revolutionary. It’s the same discipline that makes any engineering process work — clear requirements, reviewed plans, incremental execution, continuous verification. AI agents just make each phase faster while making the discipline more important, not less.