I started my career at BlueJeans in 2014. The team had backend engineers, frontend engineers, dedicated QA, people handling deployments and infrastructure. Even for a focused product, you needed bodies—because the work demanded coordination at every step.

Setting up a new developer's machine meant following a guide that was always slightly out of date. Someone would sit with you. Half a day gone, sometimes more for complex setups. Integrations meant weeks of reading API documentation page by page, mapping fields by hand, writing boilerplate that everyone knew would be fragile and nobody wanted to touch again. Seeding a staging environment with realistic data was its own mini-project—separate tickets, separate timelines. These were real costs, paid in hours and attention, by real people sitting next to each other.

The work moved through people. Requirements got clarified in hallway conversations. Design patterns got transmitted through code reviews and whiteboard sessions. When something broke, you'd walk over to someone's desk. When something was ambiguous, you'd hash it out in a meeting. A lot of what made teams effective was implicit—never documented, never versioned, just carried in people's heads and habits.

That was the cost structure of the time. And it worked, within its constraints.

The Setup I'm Running Now

I'm building Luminik alone. Backend, frontend, mobile, CRM integrations, infrastructure, security, data pipelines. The kind of scope that in 2014 would have assumed five or six people before anyone discussed timelines.

My monthly tooling spend looks like this:

Monthly AI Tooling Costs

That's roughly $1,500–2,000/month on AI tooling alone, as a bootstrapped solo founder. This isn't indulgence. It's the infrastructure cost of compressing a team into one desk.

What a Working Day Actually Looks Like

The tools divide along capability lines.

Cursor handles active feature development—new flows, UI work, wiring things together. This is where I spend most of my coding hours. It's fast for focused, contextual work within a specific part of the codebase.

Claude Code handles large refactors where I need the agent to hold broader context and use tools more aggressively—file system access, terminal commands, multi-file changes that ripple across the codebase. I've found it more reliable than Cursor for structural changes that touch many files.

I run both simultaneously on different parts of the codebase. While Cursor is building a new feature in the frontend, Claude Code might be refactoring a backend module. Parallel development is where the leverage multiplies.

Codex and CodeRabbit sit as automated reviewers on every pull request. CodeRabbit catches issues I wouldn't think to check when I'm tired—security-adjacent concerns, scope mismatches, documentation drift. Codex provides a second automated perspective. Between the two, most PRs get substantive review before I look at them.

GitHub CLI is wired in throughout the workflow. Agents create pull requests, read review comments, and address feedback themselves. The loop from code → PR → review → fix → merge happens with minimal manual intervention, though I always review the final state myself.

Where Compression Shows Up

Certain categories of work that used to take weeks now take hours or days. These are specific, measurable examples from the last few months:

CRM Integrations: I built Salesforce and HubSpot integrations within a week. Agents read the API documentation themselves, inspected fields, matched them to my data model, and generated the connection layer. In 2014, this would have been a multi-week project involving dedicated developers reading every page of API docs, writing mapping code manually, debugging serialization issues, handling edge cases in field types.

Data Seeding: Seeding my staging environment plus test Salesforce and HubSpot accounts with realistic data took under an hour via scripts the agents wrote. This used to be a separate development effort with its own timeline and tickets.

Live Debugging: Yesterday, I hit a staging issue. Cursor checked my ECS task logs via AWS CLI access, found the failing code path, looked at my Flyway migration files, and pointed to the exact error. End-to-end debugging across infrastructure, migration files, and application code—in one session, without switching tools.

Security: I ran Aikido Security across my repos, got the findings report, fed it to Claude Code, got fixes pushed, and CodeRabbit verified the fixes on the PR. The security audit → fix → verify cycle that would involve multiple people and multiple handoffs compressed into a single afternoon.

MCP and Browser Access: Agents can access documentation directly, go check fields in external systems, match them to my data model. These were manual research tasks for a developer in the pre-AI era—open browser, read docs, take notes, translate to code. Now the agent does the lookup and the translation in the same session.

Skills Files: Encoding Judgment

This is the part most people skip when they talk about AI-assisted development, and it's the part that makes the biggest difference.

I've built skills files—detailed instruction sets tuned specifically for my codebase and my preferences:

Every new session starts by loading these alongside full repo context. This is what makes agents behave like they understand your system instead of generating plausible code for a generic one.

Without skills files, agents produce code that compiles, passes basic checks, and feels slightly wrong everywhere. The patterns drift. The naming is inconsistent. The architecture choices are locally reasonable and globally incoherent.

With well-maintained skills files, the output is recognizably "my" codebase. The agent makes choices I would make—because I've told it how I think about these decisions.

The deeper issue here is that in 2014, this kind of judgment lived in people's heads. Senior engineers transmitted it through code reviews, pair programming, whiteboard sessions, lunch conversations. It was implicit and social. When a junior engineer made an architectural choice that felt wrong, someone would walk over and explain why. That correction was immediate, contextual, and unrecorded.

Now it has to be written down, versioned, and maintained—or it simply doesn't exist in the system. Agents don't absorb judgment from proximity. They absorb it from explicit instruction.

That encoding work—capturing decisions, conventions, patterns, preferences in a format agents can consume—is a genuinely new category of engineering labor. It didn't exist in 2014 because it wasn't needed. Now it's foundational.

Workspace Architecture: Single Project, Multiple Repos

All my repos—backend, frontend, mobile, integrations, and specs—live in a single project workspace. Separate repos, shared context.

This is deliberately different from a monorepo. I wanted agents to see the full system without coupling the deployment pipeline. Release workflows stay independent per repo. But when an agent is working on the backend, it can see the frontend contracts. When it's working on the frontend, it can see the API schemas. When it's working on integrations, it can see both.

What this improves immediately: agents introduce fewer regressions because they understand cross-repo dependencies. Backend changes that would break frontend implementations get caught at development time. Mobile and web stay aligned because the agent sees both simultaneously.

The new failure mode: everything is visible, so drift shows up immediately. An inconsistency between backend and frontend that might have lived undetected for a sprint in a team environment surfaces the moment context is loaded. That's a good kind of discomfort—it forces resolution early.

Specs Driven Development: The Most Important Repo

I don't use a task board. No Jira. No Linear. No Trello.

The specs repo is the roadmap, the plan, and the coordination layer. It contains:

All written using what I call Specs Driven Development—the idea that specifications should be detailed enough to be directly consumable by both humans and AI agents, and that they should be the single source of coordination truth.

After every PR merge, specs get updated to reflect current state. This is non-negotiable. The moment specs drift from the codebase, two things happen: agents start making decisions based on outdated assumptions, and my own mental model starts diverging from reality.

Agents are explicitly instructed to treat code as the source of truth and specs as the intent layer. Code tells you what the system does. Specs tell you what the system should do and why.

When specs lag reality, velocity drops. You don't notice it at first. Features still ship. PRs still merge. Then something breaks in an unexpected place, and you realize the agent was building against a version of the system that no longer exists. That debugging session costs more than the time it would have taken to keep specs current.

When specs encode assumptions too early, they start lying. You write down how a workflow should behave before you've watched a customer use it, and the agent builds exactly what you specified—which turns out to be wrong. Keeping specs honest means being willing to rewrite them frequently, which feels like wasted work and is actually the most valuable work.

Keeping specs alive while multiple agents work in parallel is constant, unglamorous, necessary work. It's the coordination cost that used to be distributed across a team and now lands entirely on me.

Overnight Agents and Mobile Steering

Before bed, I task agents via cloud environments—individual repos for tests, refactoring, cleanup—so there's a PR waiting for review in the morning.

The key constraint is isolation. If you task two agents on overlapping parts of the codebase overnight, you wake up to merge conflicts and divergent assumptions. So the work has to be cleanly separated: one agent refactoring a backend module, another running test suites, a third cleaning up a specific integration. Each on its own branch, each producing a discrete PR.

I've steered development sessions from my phone sitting at a coffee shop. The loop is tighter than it has any right to be for a single person. There are tools like Omnara that sync local Claude Code sessions to the cloud—I haven't tried it yet, but the direction is clear: development is becoming location-independent in a way that doesn't require a laptop.

Claude Flow is the tool I want to work most. The promise of fully autonomous overnight coding sessions is real. In practice, I haven't found the right configuration for my use cases yet. The agent either stays too conservative (producing trivial changes that aren't worth the session) or gets too ambitious (introducing structural decisions I haven't approved). I'm sure there's a learning curve I haven't crested yet, or maybe a setup that's more custom to my codebase. I keep experimenting.

Where Things Still Break Down

Agents miss edge cases consistently. This is the most common failure mode I encounter. The happy path works. The second and third paths work. The fourth path—the one that depends on a specific state combination that only happens when a user does something slightly unusual—breaks.

When backend and frontend move together, subtle breakage goes up even when both sides look correct independently. An API returns data in a shape the frontend technically handles, but the UI renders it wrong because the mapping assumes a different ordering or null behavior. Both sides pass their own tests. The integration fails.

UI flows work in isolation and break when a real person clicks through them in sequence. Step 1 sets state. Step 2 reads it. Step 3 modifies it. Step 4 depends on the result of step 2 and 3 together. Agents test each step. They rarely test the chain.

I test every button, every path, manually. At a 0-to-1 stage, I want my eyes on everything. This is deliberate. I haven't yet found a tool that can reliably test real business use cases on the UI—clicking through actual workflows, checking visual consistency, verifying interaction patterns. Claude Code can open a browser and click buttons. The capability exists. The reliability for complex, multi-step business flows isn't there yet.

AI also drifts on UI/UX consistency. Spacing, alignment, interaction patterns, visual hierarchy—these compound into the product feeling "off" if you let them slide. Each individual choice might be reasonable. The aggregate is subtly wrong. Catching this requires aesthetic judgment that agents don't have, and I'm not sure how you'd encode it into skills files.

The Deeper Issue: What Agents Can and Can't Do

Agents operate on what you give them.

If you don't instruct them to use a decorator pattern for a specific use case, they won't arrive at it independently. If you don't specify error handling conventions, each module will handle errors differently. If you don't define module boundaries, the agent will make locally optimal choices that create globally messy architecture.

The quality of what comes out tracks directly to the quality of your skills files, your specs, your prompts. This has always been true of software development—well-documented teams and products moved faster at scale than undocumented ones. The difference now is that the cost of poor documentation is immediate and visible, not gradual and hidden.

In a team environment, implicit knowledge circulated through social mechanisms. Code reviews caught pattern violations. Pair programming transmitted conventions. Team meetings aligned understanding. None of that exists when you're building alone with agents. Everything that was social is now textual.

This is the actual job now. Writing code is a shrinking fraction of my day. Deciding what agents should build, reviewing what they produced, updating the skills files and specs and rules that keep the whole system coherent—that's where the hours go.

The Cost Structure Has Changed, the Work Hasn't Disappeared

People sometimes describe AI-assisted development as if the work got easier. From the inside, it feels different.

The mechanical parts compressed dramatically. Boilerplate generation, API integration scaffolding, data seeding, documentation lookup, test writing, security scanning—all of these are faster by an order of magnitude.

The judgment parts didn't compress at all. What to build. How to structure it. When to stop adding and start removing. How to keep the system coherent as it grows. How to maintain quality when the agent is happy to ship something that technically works and aesthetically drifts.

The new cost is encoding. Every decision that used to live in someone's head now needs to be written down in a format an agent can consume. Every convention that used to spread through proximity now needs to be maintained as a versioned file. Every correction that used to happen in a code review comment now needs to be generalized into a rule.

I can experiment faster than at any point in twelve years of building software. I can also introduce structural mistakes faster. The only reliable counterweight I've found is discipline about specs, skills files, standards, and manual verification.

The total cost is different from 2014. Lower in some dimensions, higher in others. The shape changed. The amount of care required didn't.

My Full Toolchain

For anyone building a similar setup, here's the complete list:

Tool Role Notes
Cursor Active feature development Highest token spend; used for focused, contextual work
Claude Code Large refactors, tool-heavy tasks Better for structural changes across many files
Codex Automated PR review Second review perspective alongside CodeRabbit
CodeRabbit Automated PR review Catches security issues, scope drift, documentation gaps
GitHub CLI Workflow automation Agents create PRs, read comments, push fixes
Aikido Security Security scanning Findings fed to agents for automated fixes
Claude Flow Overnight autonomous sessions Still experimenting; hasn't clicked yet
Single project workspace Context sharing All repos visible to agents simultaneously
Specs repo Coordination layer Roadmap, requirements, intent—updated after every merge
Skills files Judgment encoding Design patterns, conventions, principles tuned to my codebase
AWS CLI Live debugging Agents access ECS logs, infra directly

If you're building alone or thinking about it, I'm happy to share more details about any part of this setup. The tools change fast, but the underlying patterns—encoding judgment, maintaining specs, parallel agent workflows—seem durable. Reach out on LinkedIn.

Prasad Subrahmanya

Prasad Subrahmanya

Founder & CEO at Luminik. Previously built Aura at Bain ($3.6M ARR) and led Mainteny ($2.7M seed). Building an AI co-pilot for event marketing teams.

Back to Blog