The AI coding workflow is getting simpler

The useful part of Theo Browne’s latest video on how he codes with AI is not the tool swap. Everyone is swapping tools. His workflow has become more conversational.

Five months ago, his process leaned heavily on Cursor, Opus, plan mode, and detailed plans. Now the emphasis has moved toward Codex, T3 Code, fresh threads, shorter prompts, better project context, screenshots, remote control, and verification. The model choice will age quickly. The more durable lesson is simpler: tell the model what you’re actually trying to build, let it tell you how it thinks the thing should work, then push back until you share the same shape.

It maps closely to where my own usage has landed. The best results do not come from feeding the model a giant bundle of potentially relevant details. They come from stating the goal clearly, having the conversation, and adding detail only when the model shows you which detail matters.

Theo’s best line in the video is also the whole point: “The simpler your flow, the better.”

The tool choice is a symptom

Theo spends plenty of time talking about Codex, T3 Code, Claude Code, Cursor, and the difference between harnesses and interfaces. That part is useful, but it is easy to mistake it for a buyer’s guide.

The center of gravity has moved away from the IDE sidebar and toward the conversation as the workspace.

OpenAI describes the Codex app as a command center for agents: separate threads, project organization, diff review, worktrees, and continuity across the CLI and IDE extension. T3 Code describes itself as a minimal web GUI for coding agents, currently wrapping Codex, Claude, and OpenCode rather than being its own model or harness.

Agentic coding now means managing conversations, diffs, terminals, previews, screenshots, logs, review comments, and long-running tasks. If the tool makes conversation awkward, you will compensate with bad habits.

If starting a new thread feels expensive, you keep shoving unrelated work into the same context. If pasting a screenshot is annoying, you describe visual bugs badly. If browser verification is hard, you stop asking the agent to check what it built. If remote work only works through a brittle terminal session, you avoid kicking off work unless you are sitting at the right machine.

The interface has to fit that reality. The workflow has moved beyond “run a command and read stdout.”

Context is not a bigger prompt

The easiest mistake is to respond to better models by writing longer instructions.

Theo’s updated approach goes the other way. His AGENTS.md for Lakebed is not a giant file full of file paths and rigid implementation rules. It is closer to an onboarding note: what the project is, why it exists, how to think about users, and what certain terms mean.

Persistent instructions are good at this. Codex reads AGENTS.md before doing work, layering global and project-specific guidance. Claude Code has the same broad pattern with CLAUDE.md, which it uses for project instructions, conventions, and shared context.

But those files are not magic. They work best when they remove friction from normal conversation. The goal is not to preload every possible implementation detail. The goal is to make plain language work:

text

We are building a small product for developers who use agents to ship apps.

"Developer" means the person using the product.
"Agent" means the AI system the developer delegates work to.
"We" means the humans building this codebase.

Prefer changes that keep the local workflow simple, even if that means
the first version supports fewer edge cases.

That kind of context does not tell the model which file to edit. It tells the model how to interpret the work when you say what you want.

Context makes short prompts viable. Ceremony makes every prompt carry the weight of a tiny spec document.

Start with the goal, then talk

Theo’s examples have very little prompt engineering.

He shows prompts that are two sentences: add support for server-side environment variables, or let an admin manually bump a user’s rate limits. The agent explores the codebase, proposes the shape of the change, then builds it.

They work because the prompt is aimed at the right level. It says what should become true. It does not pretend to know every file, function, and edge case before the model has even looked around.

Claude Code’s prompt library makes the same point in its own way: describe the outcome, not the steps; give the model a way to check its work; point it at a reference when consistency matters. That is delegation, not incantation.

A good prompt for this style is often plain:

text

I want users to be able to bring server-side environment variables
for deployments.

The local source of truth should be a file in the project. Running
deploy should sync it to the hosted environment. First tell me the
shape of the change and any tradeoffs I should decide on.

The last sentence does the work. Do not rush straight to implementation when the shape is still fuzzy. Let the model explain how it thinks the feature should work. Read that answer as a design review, not as disposable preamble.

Then have the conversation.

If the model proposes too much, narrow it. If it misses the user goal, restate the goal. If it makes a bad product assumption, correct the assumption. If it hand-waves cost, operations, permissions, naming, or backwards compatibility, ask about that specific tradeoff. If the idea is easier to explain with an example, give it an example instead of adding three more paragraphs of abstract instruction.

Only once the model is thinking about the problem the same way you are should you ask it to build.

The agent does not need a recipe for every step. It needs the outcome, the constraints, and enough conversation to make sane tradeoffs.

Goal modes are for the stopping condition

/goal becomes useful after the conversational part has done its job.

Codex’s goal docs describe /goal as a way to give Codex one durable objective with a verifiable stopping condition. Claude Code’s /goal docs use the same basic shape: set a completion condition, then let the agent keep working across turns until the condition is met.

This should not push you back into giant upfront prompts. Codex puts the order in the right place: start by having a conversation about what you want to build, then ask it to set a goal and start working. I like that framing. The conversation is where you discover the real goal. Goal mode is where you write down the contract.

A bad goal is just a backlog disguised as autonomy:

text

/goal improve the app, fix bugs, make the UI better, and clean up the code

A useful goal is narrower:

text

/goal Implement server-side environment variables for deployments. Stop when
deploy reads the local env file, syncs missing variables to the hosted
environment, the existing deploy tests pass, and you have shown the affected
flow still works.

I still would not send that as the first message. First I would talk through where the file should live, how it should interact with secrets, what should happen on conflict, and what counts as a safe deployment. Once those tradeoffs are clear, /goal is a good way to keep the agent moving without turning the thread into a babysitting session.

Read the conversation, not just the diff

Theo says this directly: you have to “read what it says.”

It sounds obvious until you watch developers use these tools. We skim the text, jump to the diff, then complain when the agent built the wrong thing. But the conversation is the work. It is where the model exposes its assumptions. If the assumptions are wrong, the code is already suspect.

For bigger changes, I care less about whether the first answer is perfectly formatted and more about whether the agent is thinking in the right frame:

Does it understand the user problem?
Is it choosing the same tradeoffs I would choose?
Is it naming the risks I was worried about?
Is it asking about the parts that are genuinely ambiguous?
Is it making the problem simpler before writing code?

If not, I steer the conversation before asking for implementation. Sometimes that means pushing back. Sometimes it means adding a concrete example. Sometimes it means telling the model to be shorter because the answer is too padded to read carefully.

Unread context is wasted context.

Fresh threads beat heroic context windows

The most useful habit in the video is starting fresh threads constantly.

Theo says he started more than a hundred threads on one project in five days, but most of them were not running in parallel worktrees. He would do one task, finish it, then start the next task in a new thread.

It feels counterintuitive if you have been trained to preserve as much context as possible. Old context is not neutral. A thread about environment variables leaves residue when the next task is about rate limits. It changes what the model pays attention to. It makes some files, terms, and decisions feel more important than they are.

For me, the rule is simple:

Same concern, same thread.
New concern, new thread.
Big uncertainty, planning thread first.
Risky implementation, fresh build thread after the plan is clear.

Yes, the agent has to re-explore the codebase. That is fine. Exploration is cheap compared with debugging a model that is confidently optimizing for the wrong local history.

Parallel worktrees still have a place. They are useful for isolated experiments, risky refactors, or tasks that can proceed independently. Running ten agents does not mean you can keep ten pieces of work in your head.

The limiting factor is often human review bandwidth, not agent throughput.

Remote work changes the shape of delegation

Theo’s remote coding section resonated because it gets at a practical annoyance: agent work should not depend on your laptop staying open in exactly the right place.

OpenAI added Codex remote access from the ChatGPT mobile app, so you can continue threads, answer questions, approve actions, review diffs, and move across connected hosts from your phone while Codex runs on a Mac host. T3 Code’s remote access docs recommend using a trusted private network such as a tailnet when connecting to a T3 Code server from another device.

The product details will keep changing. The stable principle is that the agent should run where the repo, tools, secrets, browsers, and test environment live. You should be able to supervise it from wherever you are.

Pure SSH terminal workflows increasingly feel wrong for agent coding. Terminals are great for many things. They are poor containers for screenshots, visual diffs, browser checks, image prompts, multiple threads, and review queues.

I still use CLIs for quick local tasks. For real product work, I want the agent in an environment where it can see enough, verify enough, and hand back enough evidence that I can make a decision quickly.

Verification is the workflow

The prompt is incomplete if the agent does not know how to prove it is done.

Codex is interesting here. OpenAI’s April update says Codex can operate computer apps with its own cursor, use an in-app browser, review PRs, work with multiple terminal tabs, and connect to remote devboxes over SSH in alpha. That moves verification closer to the coding loop instead of leaving it as a separate human chore.

I usually make that explicit:

text

Build the change.

Before handing it back:
- run the relevant tests
- start the app if needed
- open the affected flow in the browser
- check the UI against the screenshot
- summarize what passed and what you could not verify

For backend work, the proof might be a test suite, a migration dry run, or a CLI command. For frontend work, it is often a screenshot. For integration work, it might be a deployed preview plus logs. For review feedback, it can be a loop that keeps pulling comments until no actionable items remain.

The habit matters more than the tool: never let the agent stop at “I changed the files.” Make it show the evidence.

PRs are review containers, not proof

Theo’s PR flow is also more nuanced than the usual “everything should be a PR” advice.

On solo work, he often commits straight to main when the change is small and obvious. For security-sensitive changes, hosting changes, or work that needs another set of eyes, he uses PRs as review containers. Then he brings in CodeRabbit, other agents, or CLI review tools to find issues and asks the agent to fix them.

A PR is not proof that a change is good. It is a place to collect review, CI, discussion, and a merge decision.

When tools make PR creation one-click, they also make PR bloat easy. You can fill a repo with branches that felt promising for ten minutes and then quietly went stale. At that point, the agent can help again: compare the branch to latest main, decide whether any useful work remains, fix conflicts if it is worth keeping, or recommend closing it.

My default is:

Use direct commits when the blast radius is small and you are the only reviewer.
Use a PR when the decision needs review, CI, history, or second opinions.
Use an agent to clean up stale branches before they become archaeology.

Teams need stricter defaults than solo projects, but the principle holds. PRs are a boundary for review. They are not the unit of productivity.

What I am taking from it

The video is chaotic in the useful way: a working developer showing the mess of a live workflow rather than presenting a tidy framework after the fact.

The lesson is smaller than a framework.

Say what you are trying to build before you say how to build it. Do not open with the implementation unless the implementation is the point.

Let the model tell you how it thinks the thing should work. Treat that answer as something to discuss, not something to skim on the way to the diff. Push back in normal language: correct assumptions, add examples, narrow scope, ask about tradeoffs, and steer the model until the shape feels right.

Keep persistent instructions short and human. Explain the product, terms, values, and constraints. Do not turn AGENTS.md into a brittle operating manual full of details that may not matter for the next task.

Use an app-style surface when the task needs screenshots, diffs, browsers, review comments, or parallel threads. The CLI is still useful, but serious agent work now needs a richer workspace.

Start new threads more often. A clean context is usually worth more than a long memory of unrelated work. Give examples before you over-explain. A concrete user flow, screenshot, command, or URL often beats a pile of abstract requirements.

Read the agent’s reasoning before the diff. If the frame is wrong, the implementation probably is too.

Make verification part of the prompt. Tests, browser checks, screenshots, logs, and review loops are what turn generated code into shippable work. Use PRs deliberately. They are review boundaries, not proof that work happened.

None of this needs to become a grand methodology. The best AI coding workflow is becoming less about clever prompt systems and more about boring delegation: tell the agent the goal, talk through the shape, keep the thread clean, and make it verify the work.