Superpowers: the Claude Code plugin that enforces what you already know you should do

The biggest problem with AI coding agents isn’t capability. Claude can write a full-stack feature, refactor a legacy module, and generate comprehensive test suites. The problem is discipline. Left to its own devices, Claude will skip the design phase, jump straight to code, write tests after the fact (if at all), and confidently rationalize every shortcut along the way.

You know this because you’ve watched it happen. You type “add a notification system” and thirty seconds later there’s code in your editor — code that doesn’t fit your existing patterns, misses edge cases you would have caught during planning, and has no tests. The agent did exactly what you asked. It just didn’t do what you needed.

Superpowers is a Claude Code plugin that exists to fix this. With roughly 53,000 GitHub stars and over 52,000 installs through Anthropic’s official marketplace, it’s the most widely adopted Claude Code plugin by a significant margin. Its premise is simple: teach Claude structured software development methodology through composable, reusable “skills” — and make it very difficult for the agent to skip them.

What Superpowers actually is

Superpowers isn’t a prompt library or a collection of code snippets. It’s a methodology-as-code framework created by Jesse Vincent (@obra) that ships as a set of markdown files — called skills — which Claude reads and follows as workflow instructions.

Each skill is a SKILL.md file that defines when it should be triggered, what process to follow, and what guardrails to enforce. The TDD skill, for example, doesn’t just suggest writing tests first — it establishes an “iron law” that production code written before a failing test must be deleted and started over. The brainstorming skill doesn’t just recommend thinking before coding — it includes a hard gate that literally prevents implementation until a design has been presented and approved.

The plugin launched on October 9, 2025 — the same day Anthropic shipped their plugin system for Claude Code. It’s now at version 4.3.0, has been accepted into Anthropic’s official verified plugins marketplace, and supports Claude Code, OpenCode, and OpenAI’s Codex. Simon Willison, the Django co-creator and one of the most respected voices in developer tooling, described Vincent as “one of the most creative users of coding agents.”

The enforced workflow

The core of Superpowers is a mandatory progression: brainstorm → plan → implement → review. Each phase has its own skill with explicit rules about what must happen before the next phase begins.

Brainstorming comes first. When you ask Claude to build something, the brainstorming skill intercepts and forces a Socratic design conversation. Claude asks clarifying questions — not generic ones, but questions informed by reading your actual codebase. Building a notification system? It’ll ask about per-notification preferences, rate limiting, failure retry strategies, and how notifications interact with your existing event system. The output is a design document, not code.

Planning converts the approved design into a step-by-step implementation plan. Each task is scoped to roughly 2–5 minutes of work, with exact file paths, complete code samples, and verification criteria. The plan is granular enough that each step could be a single commit.

Implementation follows the plan using test-driven development. Every feature begins with a failing test. The agent watches the test fail, writes minimal code to make it pass, then refactors. This cycle is enforced, not suggested.

Review uses a two-stage process introduced in version 4.0. A spec compliance reviewer checks that the implementation actually matches the plan. A separate code quality reviewer assesses architecture, test coverage, and maintainability. The separation matters — a single reviewer tends to conflate “well-written” with “correct,” and those are different things.

The contrast with unstructured usage is stark. Richard Porter, who wrote about his experience with the plugin, described pre-Superpowers feature development as “watching Claude write code for 10 minutes, then spending an hour fixing inconsistencies and adding skipped tests.” With Superpowers, features spanning 15+ files execute consistently because every architectural decision was captured during planning and every step was verified against the plan.

The skills that matter most

Superpowers ships 14 skills across testing, debugging, design, execution, and collaboration. Three stand out.

Brainstorming

The brainstorming skill is arguably the plugin’s most important contribution. It enforces what experienced engineers do naturally — think before you build — but what AI agents almost never do unprompted.

The skill works through progressive questioning. Rather than asking all questions upfront, it asks a few, processes the answers, then asks deeper follow-up questions informed by what it learned. The output is a design document saved to docs/plans/, which becomes the input for planning.

Version 4.3.0 added hard gates to the brainstorming skill — explicit blocks in the skill definition that prevent Claude from writing any implementation code until the design is presented and the user approves it. This was necessary because earlier versions relied on instructions alone, and Claude would routinely rationalize skipping the design phase with thoughts like “this is too simple to need a design.” The hard gate makes that impossible.

Test-driven development

The TDD skill enforces strict red-green-refactor cycles:

Red — write a failing test. Run it. Confirm it fails for the expected reason (a missing feature, not a syntax error).
Green — write the minimum code to make the test pass. No more.
Refactor — clean up while keeping tests green. Extract helpers, improve naming, remove duplication.

The skill includes a list of red flags that require starting over: code written before a test, a test that passes immediately, any inability to explain why the test initially failed. It also explicitly addresses the rationalisations Claude uses to skip TDD — “just this once,” “too simple to test,” “I’ll test after,” “I already manually tested it” — and rejects all of them.

This matters because LLMs are particularly prone to writing tests that confirm their own implementation rather than tests that define expected behaviour. When the same session writes code and then writes tests for that code, the tests tend to be tautological. Red-green-refactor breaks this cycle by requiring the test to exist and fail before the implementation exists at all.

Subagent-driven development

This skill addresses context window pollution — the problem where a long-running Claude session accumulates so much context from research, planning, and earlier implementation that it starts losing track of its own decisions.

The approach: dispatch a fresh sub-agent for each task in the implementation plan. Each sub-agent gets the full task description and relevant context but starts with a clean slate. After implementation, two separate review agents evaluate the work:

A spec compliance reviewer independently reads the task specification and the implementation, checking whether they actually match. This reviewer is explicitly instructed to be sceptical — to read the code itself rather than trusting the implementer’s summary.
A code quality reviewer assesses patterns, error handling, type safety, naming, and test coverage. This reviewer only runs after spec compliance passes.

If either reviewer finds issues, the implementer fixes them and the reviewer checks again. It’s a loop, not a single pass.

Under the hood

The plugin bootstraps itself through a SessionStart hook — a shell script that fires every time a Claude Code session begins, resumes, or clears context. The hook reads the using-superpowers meta-skill and injects it into Claude’s system prompt wrapped in <EXTREMELY_IMPORTANT> tags. This establishes what Vincent calls “THE RULE”: if there’s even a 1% chance a skill applies to the current task, Claude must invoke it.

The meta-skill includes a rationalization table — twelve common thoughts that signal Claude is about to skip a workflow, each paired with a correction:

Thought	Reality
”This is just a simple question”	Questions are tasks. Check for skills.
”I need more context first”	Skill check comes before clarifying questions.
”Let me explore the codebase first”	Skills tell you how to explore. Check first.
”The skill is overkill”	Simple things become complex. Use it.
”I’ll just do this one thing first”	Check before doing anything.

This table was developed through what Vincent calls pressure testing — deliberately probing where Claude will deviate from instructions and then hardening the skill’s language against those specific failure modes.

The architecture uses a dual-repository design. The plugin infrastructure (hooks, commands, bootstrap logic) lives in one repository, while the actual skill content lives in another. This allows skills to be updated independently and community contributors to submit new skills without touching the plugin’s integration code.

Skill descriptions are carefully crafted for what might be called “Claude Search Optimisation” — analogous to SEO, but for Claude’s skill discovery system. A skill’s description field doesn’t explain what the skill does; it describes the triggering conditions — the symptoms and scenarios where the skill should activate. Version 4.0 overhauled all skill descriptions after discovering that Claude was making incorrect assumptions about skill capabilities based on descriptions that were too detailed about internal processes.

Latent space engineering

Vincent’s most provocative contribution isn’t a skill — it’s a concept. In a January 2026 blog post titled “Latent Space Engineering,” he draws a distinction between context engineering (providing factual information to a model) and latent space engineering (influencing the model’s internal state to shape its behaviour).

The practical techniques include emotional framing (“You’ve totally got this” in skill instructions), competitive motivation for reviewer agents (framing reviews as tests of the reviewer’s rigour, not just checklists to complete), and style transfer by including exemplar code that demonstrates the desired quality bar.

Vincent references peer-reviewed research on applying Robert Cialdini’s persuasion principles — authority, commitment, reciprocity, scarcity — to LLM behaviour, and uses these principles to pressure-test skills. If an authority framing (“as an expert senior developer…”) makes Claude more compliant with a workflow, does that mean the technique is effective engineering or pseudoscientific manipulation?

The Hacker News discussion surfaced this tension directly. Some commenters called the approach “voodoo nonsense,” arguing that psychological persuasion tactics are meaningless when applied to statistical language models. Others pointed out that if the techniques produce measurably better outputs — regardless of whether the mechanism is “real” persuasion — the pragmatic case is strong enough.

Vincent’s position is pragmatic: “I don’t care if it works for the reasons I think it works. I care that it works.” The techniques are embedded throughout Superpowers’ skills, and the plugin’s adoption suggests they produce results that users find valuable, even if the theoretical foundations are debated.

Where it falls short

Superpowers has real limitations, and the community is clear-eyed about them.

Overhead for small tasks. The brainstorm-plan-execute cycle adds 10–20 minutes of upfront work. For a single-file bug fix or a quick refactor, that’s more structure than the task warrants. Richard Porter recommends reserving Superpowers for features touching three or more files or requiring architectural decisions. For everything else, vanilla Claude Code is faster.

Token cost. Sub-agent-driven development dispatches fresh agents for each task, each consuming its own context window. Simon Willison noted on Hacker News that five sub-tasks in one of his projects each consumed 50,000+ tokens due to duplicated context. For large implementation plans with many tasks, the token bill adds up.

No rigorous benchmarks. The most common criticism on Hacker News was the lack of controlled studies. Practitioner testimonials are overwhelmingly positive — Colin McNamara claimed his productivity “exceeds what my entire teams at Oracle Cloud Infrastructure could produce” — but no one has published A/B comparisons measuring output quality or development speed with and without the plugin.

The prompt engineering question. Sceptics ask whether Superpowers is meaningfully different from a well-crafted CLAUDE.md file that says “always brainstorm before coding, always write tests first.” The answer is probably yes — the enforcement mechanisms (hard gates, session hooks, rationalization tables) go beyond what static instructions can achieve — but the degree of difference is hard to quantify without the benchmarks that don’t yet exist.

Getting started

Installation takes two commands in Claude Code:

bash

/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace

After installation, the plugin activates automatically on every session. There’s nothing to configure — the SessionStart hook handles bootstrapping. You can invoke skills explicitly via slash commands (/superpowers:brainstorm, /superpowers:write-plan), but the plugin is designed to trigger skills automatically based on what you’re doing.

The broader principle matters more than the specific plugin. AI coding agents are powerful enough to build complex features autonomously. They are not disciplined enough to do it well without structure. Whether that structure comes from Superpowers, a custom plugin, or a meticulously maintained CLAUDE.md file, the pattern is the same: separate thinking from doing, enforce testing, verify against the plan, and never let the agent skip the boring parts. The boring parts are where the quality lives.