gstack launched on March 12 and hit 40,000 GitHub stars in under two weeks. The excitement is justified. Assigning AI agents specific roles instead of treating them as generic assistants produces measurably better output.

I have been running a similar workflow since mid-2025 while building Luminik across product, GTM, and content systems. Instruction files with detailed coding conventions. A specs repo that agents reference every session. Multiple agents running in parallel across repos overnight. gstack formalizes part of what I arrived at through trial and error.

Where the Approaches Overlap

gstack’s core idea: when you give an AI agent a specific persona with constrained responsibilities (a reviewer who only reviews, a planner who only plans), the output improves. This is correct. It maps to how engineering organizations actually work. A QA engineer thinks differently from a product lead. gstack codifies the separation into slash commands like /build, /review, /ship.

I arrived at the same principle from a different direction. I kept watching Claude Code produce code that compiled and passed checks but felt subtly wrong. The patterns drifted. Naming was inconsistent. Architecture choices were locally reasonable and globally incoherent. The fix was writing CLAUDE.md files: project-level instruction files that Claude Code reads at session start. Cursor gets equivalent instructions via .cursorrules.

Where the agent gets its judgment
RoleSlash commandPlanner, builder, reviewer, shipper.
RepoCLAUDE.mdCoding conventions and local boundaries.
ProductSpecs repoBusiness rules, data model, workflows.
MemoryQueryable brainCross-session context that survives a single run.

gstack helps most with the first layer. My setup spends more effort on the later layers.

Where gstack’s Scope Stops

gstack covers the development lifecycle: think, plan, build, review, test, ship, reflect. Within that scope it is well-designed. /plan-ceo-review asking “what is the 10-star product hiding inside this request” is a genuinely useful forcing function. The engineering workflow is roughly 40% of what agents do for me. The other 60% is domain-specific work no generic framework can anticipate.

  What gstack handles What stays a founder's problem
Role separation Planner, builder, reviewer, shipper as slash commands Domain context (vocabulary, workflow boundaries, data-model rules) only exists if you write it down
Per-session quality Constraint plus persona produces sharper output Cross-session memory needs a specs repo. The framework cannot maintain it for you.
Single-agent flow Sequential think-plan-build-review-test-ship loop Parallel agent work needs branch isolation and scope discipline. Concurrency is on you.
Three problems the framework cannot fix on your behalf. They are the founder's problem regardless of which tools you pick.

1. Domain Context Is Harder to Encode Than Code Conventions

My CLAUDE.md files do not just encode engineering patterns. They encode business logic. For Luminik that means event-pipeline vocabulary, workflow boundaries, data-model assumptions, and the rules that keep source, enrich, sequence, capture, and attribute from drifting into five disconnected features. None of that is engineering knowledge. It is domain judgment, and it changes as the product learns.

Engineering conventions are relatively stable. Error handling patterns do not change week to week. Domain context does. Keeping instruction files current is significant ongoing work that any framework-level approach leaves to the user.

2. Cross-Session Coordination Needs a Specs Repo

gstack’s slash commands are stateless. Each one runs in its own context. That works for a single feature cycle. It does not work for a product under continuous development for months, where decisions made in November constrain what is possible in March.

I maintain a dedicated specs repo: business requirements broken into workflows, data model definitions, integration contracts, UI/UX requirements, testing criteria. Every agent session references it. After every PR merge, specs get updated. The moment specs drift from the codebase, agents start making decisions on outdated assumptions. Features still ship. Then something breaks because the agent was building against a system that no longer exists.

3. Parallel Agents Need Isolation and Roles

gstack’s model is sequential: one agent, switching roles. My workflow runs multiple agents in parallel. On a typical evening, Cursor on a frontend feature, Claude Code on a backend refactor, a cloud-hosted agent on test cleanup. Each works in a separate repo or branch. Each produces a discrete PR by morning.

The constraint that matters most is isolation. If two agents touch overlapping parts of the codebase, I wake up to merge conflicts and divergent assumptions. The biggest time compression I have found in solo development comes from running three or four agents simultaneously on well-separated work, not from making one agent cycle through roles faster.

A Note on Output Metrics

gstack’s README reports impressive output numbers. 10,000 to 20,000 lines per day. 600,000-plus lines in 60 days. These numbers are plausible with AI agents. I have had similar volume weeks.

My most valuable days on Luminik are often the ones with the fewest lines committed. Sometimes a customer conversation reveals that the spec is carrying the wrong assumption. Rewriting that assumption before code hardens around it can save weeks.

Deciding what to build, maintaining system coherence, encoding domain judgment, manually testing UI flows that agents consistently get wrong. That is where my hours go. None of it shows up in LOC counters.

What I Would Recommend to Someone Starting This Workflow

Prasad Subrahmanya

Prasad Subrahmanya

Founder of Luminik. Built Aura at Bain to $3.6M ARR, co-founded Mainteny through its $2.7M seed, and shipped the first MVP solo in 3 months.