Stop giving your AI agent tasks — give it goals

You’ve managed people before. You know the type of lead who says “move that button 3px to the left, change the font to 14px, add 8px padding.” And you know the one who says “users are abandoning the checkout flow — figure out why and fix it.”

The second person gets better results. Not because they’re lazier, but because they’re giving their team the one thing that makes autonomous work possible: context about what actually matters.

Most people working with AI agents today are the first type. They write meticulous step-by-step instructions, and then wonder why the output feels brittle, slightly off, or weirdly literal. The agent did exactly what you asked. That’s the problem.

The task trap

Our instinct with AI is to be specific. Painfully, exhaustively specific. It feels responsible — you’re not “vibe coding,” you’re being precise. You write a prompt that reads like a recipe: do this, then this, then that, in this order, using these exact tools.

It backfires for the same reason micro-management backfires with humans. When the agent hits something unexpected — and it always does — it has no judgement to fall back on. You told it what to do, but never why. So when step 4 doesn’t work as planned, the agent either barrels ahead anyway or stops dead. It can’t adapt because you never gave it the information it would need to improvise.

There’s something deeper here too. When you over-specify, you’re essentially pre-solving the problem and then asking the AI to type up your solution. You’re paying for a senior engineer and using them as a transcriptionist. The model’s actual strength — reasoning about problems, exploring alternatives, catching things you missed — gets bypassed entirely.

The difference is concrete. Anthropic’s own Claude Code documentation lays out before-and-after examples that make the pattern obvious:

Task-style: “Add tests for foo.py.”

Goal-style: “Write tests for foo.py covering the edge case where the user is logged out. Avoid mocks.”

Task-style: “Fix the login bug.”

Goal-style: “Users report that login fails after session timeout. Check the auth flow in src/auth/, especially token refresh. Write a failing test that reproduces the issue, then fix it.”

Task-style: “Make the dashboard look better.”

Goal-style: “[paste screenshot] Implement this design. Take a screenshot of the result and compare it to the original. List differences and fix them.”

The goal-style prompts aren’t longer for the sake of it. They do three things the task-style versions don’t: they explain why (the user-facing symptom), point to where (specific files or patterns to reference), and define what success looks like (verification criteria the agent can check itself).

That last one matters more than people realise. When you give an agent a way to verify its own work — a test to pass, a screenshot to match, a build to succeed — you’ve replaced micro-management with accountability. You’re managing by outcomes instead of by process.

The senior coworker model

OpenAI’s documentation for their reasoning models makes this distinction explicit. They describe reasoning models as “senior coworkers” — you give them a goal to achieve and trust them to work out the details. Their GPT models, by contrast, are “junior coworkers” who perform best with explicit instructions.

Andrej Karpathy put it more bluntly: “It’s not magic, it’s delegation.” He draws a direct parallel to managing human teams: “People who decompose work well for junior engineers decompose it well for agents too, while those who micromanage humans micromanage machines and get about as far.”

I keep coming back to that line. The skills that make someone a good engineering manager — scoping work clearly, providing context, defining success criteria, trusting people to figure out the details — turn out to be exactly the skills that make someone effective with AI agents. The people who struggle to delegate to humans struggle to delegate to AI for the same reasons.

Karel Van den Bussche captured the abstraction shift well in a recent dev.to post: “A prompt says ‘write me a function that does X.’ A workstream says ‘here is the problem, here are the files involved, here are the acceptance criteria, and here is how I will verify the result.’” His key observation: if you’re rewriting most of what AI produces, the problem probably isn’t the AI. You’re operating at too low an abstraction level.

Context over commands

The industry is starting to call this “context engineering” — a deliberate shift from crafting the perfect prompt to architecting the information environment an agent operates within. Simon Willison defines it as “carefully and skillfully constructing the right context to get great results from LLMs.” The question stops being “how do I phrase this instruction?” and becomes “what does the model need to know to act correctly?”

This plays out practically in how you set up your CLAUDE.md or AGENTS.md files. The best ones read like onboarding docs for a new hire, not scripts for a robot. They share architectural intent: here’s our tech stack, here’s why we structured it this way, here are the conventions that differ from defaults, here’s what good code looks like in this codebase.

GitHub analysed over 2,500 repositories with AGENTS.md files and found a consistent pattern: “One real code snippet showing your style beats three paragraphs describing it.” The top-performing agent config files cover six areas — commands, testing, project structure, code style, git workflow, and boundaries — and they stay lean. Under 300 lines. Because here’s the thing: every line in your agent config competes for attention with the actual work. Bloat causes the important stuff to get lost.

Anthropic’s own guidance describes a “Goldilocks zone” between two failure modes: hardcoding complex, brittle logic (micro-management) and vague, high-level guidance that fails to give the model concrete signals (under-management). The sweet spot is clear goals with enough context to act on them.

The “interview me” inversion

There’s a pattern in the Claude Code docs that flips the whole dynamic on its head. Instead of you prescribing what the agent should do, you tell it what you want to build and ask it to interview you:

text

I want to build [brief description]. Interview me about
technical implementation, UI/UX, edge cases, and tradeoffs.

Don't ask obvious questions — dig into the hard parts
I might not have considered.

Keep going until we've covered everything, then write
a complete spec.

This is managing at its purest. You’re not scripting the agent’s behaviour. You’re defining a goal (understand this problem deeply enough to write a spec) and letting the agent determine what information it needs. The result is usually better than what you’d get by trying to anticipate every detail upfront, because the agent asks questions you didn’t think to answer.

Once the spec exists, you start a fresh session and hand it off for implementation. The spec becomes the shared understanding — the “why” and “what success looks like” that the implementing agent references throughout its work.

Addy Osmani describes this broader pattern well: “You start with a plan. Before prompting anything, you write a design doc or spec. You break the work into well-defined tasks. You decide on the architecture. This is the part vibe coders skip, and it’s exactly where projects go off the rails.”

What this actually looks like day-to-day

The shift is less dramatic than it sounds. You don’t need a formal methodology. It’s mostly about catching yourself when you’re about to write a step-by-step recipe and asking: what if I just told the agent what I’m trying to achieve instead?

Tell it why, not just what. “We need to handle session timeouts gracefully because users are losing work” gives the agent a north star that “add a timeout handler to auth.ts” doesn’t.

Define success criteria, not steps. “The user should be able to resume where they left off after re-authenticating” is more useful than a sequence of implementation instructions. The agent can figure out the steps. It can’t figure out what you actually wanted.

Keep your config files lean and goal-oriented. If a rule in your CLAUDE.md is something the agent would figure out by reading the code, cut it. Every unnecessary instruction dilutes the important ones.

Restart instead of correcting. If you’ve course-corrected twice and the agent still isn’t getting it, the problem is almost certainly the initial framing. Start a fresh session with a better-scoped goal rather than layering patches on a confused context.

Verify outcomes, not process. Give the agent a test suite, a screenshot, a build command — something it can check against. Self-verification beats step-by-step oversight every time.

None of this means you should be vague. “Make it good” is not a goal. The difference between a task and a goal isn’t specificity — it’s altitude. A goal can be extremely specific about what success looks like while leaving the path open. “Reduce the checkout abandonment rate by making the flow completable in under 60 seconds” is specific, measurable, and gives the agent complete freedom in how to get there.

The people getting the best results from AI agents right now aren’t writing better prompts. They’re better managers.

Stop giving your AI agent tasks — give it goals

The task trap

The senior coworker model

Context over commands

The “interview me” inversion

What this actually looks like day-to-day

If you enjoyed this, you might also like

Comments

The task trap

What changes when you share the goal

The senior coworker model

Context over commands

The “interview me” inversion

What this actually looks like day-to-day

If you enjoyed this, you might also like

Comments