gstack launched on March 12 and hit 40,000 GitHub stars in under two weeks. The excitement is justified. The core approach, assigning AI agents specific roles instead of treating them as generic assistants, produces measurably better output.

I've been running a similar workflow since mid-2025, shipping Luminik to a paying client. CLAUDE.md files with detailed coding conventions. A specs repo that agents reference every session. Multiple agents running in parallel across repos overnight. gstack formalizes part of what I arrived at through trial and error.

What I want to do here is compare approaches. Where does gstack's model overlap with what I've built? Where does it stop? And what problems remain unsolved regardless of which framework you use?

Where the Approaches Overlap

gstack's core idea: when you give an AI agent a specific persona with constrained responsibilities (a reviewer who only reviews, a planner who only plans), the output improves. This is correct. It maps to how engineering organizations actually work. A QA engineer thinks differently from a product lead. gstack codifies that separation into slash commands like /build, /review, /ship.

I arrived at the same principle from a different direction. I kept watching Claude Code produce code that compiled and passed checks but felt subtly wrong. The patterns drifted. Naming was inconsistent. Architecture choices were locally reasonable and globally incoherent.

The fix was writing CLAUDE.md files: project-level instruction files that Claude Code reads at session start. Mine cover Quarkus conventions, React component patterns, error handling rules, module boundary definitions. Cursor gets equivalent instructions via .cursorrules. Both tools read these files automatically when a session starts.

The output went from "plausible code for a generic project" to "recognizably my codebase." The underlying principle is the same one gstack uses: constrain the agent's context, and it performs better. gstack does it through role-based slash commands. I do it through per-repo instruction files. Both work.

What gstack's Scope Doesn't Cover

gstack covers the development lifecycle: think, plan, build, review, test, ship, reflect. The /plan-ceo-review command that asks "what's the 10-star product hiding inside this request" is a genuinely useful forcing function. Within that scope, it's well-designed.

But when I'm building Luminik, the engineering workflow is maybe 40% of what agents do for me. The other 60% is domain-specific work that no generic framework can anticipate. Three specific gaps stand out.

1. Domain Context Is Harder to Encode Than Code Conventions

My CLAUDE.md files don't just encode engineering patterns. They encode business logic. Example: Luminik runs event marketing campaigns. My instruction files define ICP criteria for specific events, outreach sequence structures by persona segment, CRM field mappings, lead scoring thresholds, and attribution model rules.

None of this is engineering knowledge. It's domain knowledge. And it changes constantly. After every client call, I'm updating instruction files with new target lists, revised ICP definitions, shifted priorities.

gstack's /plan-ceo-review asks good product questions. But it can't ask "does this outreach sequence match the methodology we agreed on for Web Summit" or "is this lead classification consistent with the ICP criteria for this specific campaign." That judgment lives in my CLAUDE.md files and specs, and I update them weekly.

Engineering conventions are relatively stable. Your error handling patterns don't change week to week. Domain context does. Keeping instruction files current is a significant, ongoing cost that any framework-level approach leaves to the user.

2. Cross-Session Coordination Needs a Specs Repo

gstack's slash commands are stateless. Each one runs in its own context. That works for a single feature cycle. It doesn't work for a product under continuous development for months, where decisions made in November constrain what's possible in March.

I maintain a dedicated specs repo. Not a task board. Not Jira. It contains business requirements broken into workflows, specific tasks with implementation details, data model definitions, integration contracts, UI/UX requirements, and testing criteria. All my agent sessions reference it.

After every PR merge, specs get updated. This is non-negotiable. The moment specs drift from the codebase, agents start making decisions based on outdated assumptions. You don't notice it immediately. Features still ship. Then something breaks because the agent was building against a version of the system that no longer exists.

This is the living document problem: how do you maintain a single source of truth that agents can reference across sessions, across days, across weeks? gstack doesn't address it. I'm not sure any framework can. It might be inherently manual work.

3. Parallel Agents Need Isolation, Not Just Roles

gstack's model is sequential: think, plan, build, review, test, ship. One agent, switching roles. My workflow runs multiple agents in parallel. On a typical evening, I'll task Cursor on a frontend feature, Claude Code on a backend refactor, and a cloud-hosted agent on test cleanup. Each works in a separate repo or branch. Each produces a discrete PR by morning.

The constraint that matters most here is isolation. If two agents touch overlapping parts of the codebase, I wake up to merge conflicts and divergent assumptions. The work has to be cleanly separated: different branches, different modules, different repos.

This is an orchestration problem. gstack is designed for one agent at a time, which is a reasonable starting point. But the biggest time compression I've found in solo development comes from running 3-4 agents simultaneously on well-separated work, not from making one agent cycle through roles faster.

A Note on Output Metrics

gstack's README reports impressive output numbers: 10,000 to 20,000 lines per day, 600,000+ lines in 60 days. These numbers are plausible with AI agents. I've had similar volume weeks.

But my most valuable days on Luminik are often the ones with the fewest lines committed. Last month I spent two days rewriting specs after a client conversation revealed a wrong assumption about how their team assigns leads post-event. Zero lines of product code shipped. The specs rewrite saved weeks of building the wrong thing.

Deciding what to build, maintaining system coherence, encoding domain judgment into instruction files, manually testing UI flows that agents consistently get wrong: this is where my hours go. None of it shows up in LOC counters.

What I'd Recommend to Someone Starting This Workflow

gstack is a good starting point. So is building your own system from scratch. Either way, you'll end up dealing with the same set of problems. Here's what I wish I'd known earlier:

Write CLAUDE.md files that encode business logic, not just code style.

My CLAUDE.md files started as coding conventions (naming, error handling, module structure). The real value came when I added domain-specific rules: how lead scoring works, what fields map to what in our CRM, how outreach sequences should be structured per persona type. The agent went from "generically correct code" to "code that reflects how our product actually works."

Keep a specs repo and update it after every merge.

I use a dedicated repository with business requirements, data model definitions, integration contracts, and testing criteria. Every agent session references it. The update discipline is the hard part. Skip one week and agents start building against stale assumptions.

Structure your codebase for parallel agent work.

I run Cursor, Claude Code, and cloud-hosted agents simultaneously. This only works because the repos have clear module boundaries. If I need to refactor the backend event processing pipeline and build a new frontend dashboard, those can run in parallel because they touch different repos. Design for this from the start.

Manually test everything.

Agents generate code fast. They also miss edge cases consistently. I click through every UI flow myself. At a 0-to-1 stage, there's no substitute for having your eyes on every interaction path.

gstack is well-packaged, open source, and validates a pattern that works. The problems I've described here (domain context encoding, cross-session coordination, parallel orchestration) exist regardless of which framework you use. They're the unsolved parts of this workflow, and I expect they'll get easier as tooling matures.

If you're running a similar workflow and have figured out parts I haven't, I'd like to hear about it. Reach out on LinkedIn.

Prasad Subrahmanya

Prasad Subrahmanya

Founder & CEO at Luminik. Previously built Aura at Bain ($3.6M ARR) and led Mainteny ($2.7M seed). Building an AI co-pilot for event marketing teams.

Back to Blog