Two competing Claude Code skill frameworks. Both ballooned on GitHub (superpowers ~161K stars, gstack ~78K). Different target users, different failure modes.
At a glance
| Dimension | obra/superpowers | garrytan/gstack |
|---|---|---|
| Author | Jesse Vincent (Prime Radiant) | Garry Tan (YC President & CEO) |
| Target user | Engineers writing production code | Founders/CEOs shipping products solo |
| Core metaphor | TDD methodology + subagent-driven dev | Virtual eng team (23 role-based specialists) |
| Install | Plugin marketplace (one command) | git clone + ./setup + optional team mode |
| Key bet | Rigor: spec → plan → RED-GREEN-REFACTOR → review | Process: Think → Plan → Build → Review → Test → Ship → Reflect |
| Distinctive skill | subagent-driven-development with 2-stage review | /cso (OWASP+STRIDE), /qa (real browser), /design-html (Pretext), /pair-agent |
| Voice | Methodology/philosophy | Founder bravado, LOC-per-day flex, Karpathy framing |
Where each wins
superpowers wins on discipline. TDD enforcement, worktree isolation, pre-review checklists, evidence-over-claims. If the code is going to production and a bug costs real money, this framework prevents sloppy work.
gstack wins on breadth. A real-browser QA loop (/qa with Chromium clicks + screenshots), a security audit skill (/cso — OWASP Top 10 + STRIDE threat model with 8/10 confidence gate), mockup-to-HTML (/design-html with Pretext for reflowing text), cross-agent coordination (/pair-agent via ngrok + scoped tokens). It’s a wider surface area than superpowers — design, DX, security, QA, release, monitoring, retro.
Failure modes
superpowers — already evaluated and skipped 2026-04-08: 22K tokens at startup (~11% of context), forces full brainstorm-TDD cycle on every task regardless of size. Heavy for vault automation, agent pipelines, or content work where TDD is overkill. Fine for building production SaaS.
gstack — heavier still (23 skills + 8 power tools + standalone CLIs). Opinionated framing (“I ship 810× my 2013 pace”) suggests the author optimizes for personal heroics, not team legibility. The WIP: continuous-checkpoint mode and filter-squash on ship are clever but add surface area. No token-cost figure published — would need to measure.
Fit for a mixed, non-SaaS workflow
For work that is mostly vault automation, agent pipelines, content, and strategy — not “write React components with tests” — neither framework is a clean wholesale fit:
- superpowers: Skip wholesale. TDD rigor doesn’t match most non-code work.
/implement-specalready covers the spec→build loop. - gstack: Don’t install wholesale either — same context-bloat problem, scaled up. But three primitives are genuinely valuable for any code you ship to others:
/cso— security audit for shipped code. Higher signal than theanthropics/security-guidanceplugin./qa— real-browser testing is the right abstraction for operator-facing UI. Auto-generates regression tests per fix./design-html— mockup → production HTML with reflowing text. Useful for landing pages and storefront explorations.
The rest (/office-hours, /plan-ceo-review, /retro) duplicates judgment you apply yourself or have existing skills for.
Recommendation
Cherry-pick from gstack; don’t install the framework. Extract the three skills above as standalone ~/.claude/skills/ entries. Measure token cost before layering. Re-run the superpowers evaluation only if Anthropic ships native context trimming for plugins.
Sources
part of tooling