I Built an AI Orchestrator...

The Spark

I recently attended a webinar from Codex and learned how they build autonomously. I've always been curious and continue to be fascinated by how others approach agentic development and learn from them. One of the pain points that I've been dealing with is having an agent or multiple agents wait for my decisions and approvals for the simplest commands. Yes, I do run auto-accept on AFTER a good planning session, but there are still commands that require my attention. Often times, working in parallel worktrees adds complexity in having to shift gears and change contexts. AI does a good job at it, but not my brain.

On the webinar, I learned about how Codex optimized the prompt to PR loop wherein a Codex can autonomously gather requirements, make implementation plan, implement, test, validate, self-drive feedback loops to test different scenarios and finally submit a PR.

I wanted to try it. First at work. I'm currently working on a project that has multiple stories that I know can be worked on in parallel with agents. The downside is setting up a few work hours to create the orchestrator, skills, and tools. The code was relatively cheap to generate, but being intentional in steering the agent to understand the infrastructure, constraints, and patterns of our codebase took time. Definitely cannot vibe code that.

Ship workflow orchestrator flowchart

At Work: 2 Weeks in a Day

It took a lot of reviewing the markdown files, fine tuning some skills, and setting up the right tools. But once I got it running, I got 2 weeks of work done from story to implementation plan to implementation to testing to validation to PR.

The shift was simple. Instead of me being the bottleneck making every micro-decision, I became the reviewer. The orchestrator agent reasons and does the work. I review the code with intent and make sure it passes requirements, meets our coding standards, and follows our architectural patterns.

This Weekend: The Passion Project

This weekend, I applied the same orchestrator workflow to a project I've been wanting to release, but feeling overwhelmed. Same approach. Same pattern. Let the agents build. I review with intent.

I'm building a personal plugin marketplace for Claude Code skills and workflows and I plan on open sourcing/sharing it with the community. Stay tuned.

Reviewing With Intent

Reviewing generated output has never been more important than it is right now. I think that's the biggest insight I got from this whole experience. It's not a cost. It's the skill. At the same time, I learned that agents do a great job explaining their reasoning and decisions which helps me upskill and expound my knowledge in areas I'm not as familiar with.

I think agents can write code fast. They can scaffold, test, validate. But it will still miss something and engineers will still be needed to verify those outputs. Requirements that don't translate perfectly. Edge cases that live in your head and not in the docs. Business logic that only makes sense if you've been in the meetings. Gaps that are introduced from async communication.

Over the course of the weekend, my confidence rate of this workflow increased dramatically from 60% to around 85%. Every session sharpens the constraints. Every review teaches the orchestrator what "good" looks like in my codebase.

What I've Learned

This workflow or setup has taught me that it really is important to understand the infrastructure, constraints, and patterns of your codebase before you start building. That's where the beauty and craft is in agentic development. It is not always a one-size-fits-all approach, but everything is customizable and can be tailored to your specific needs. In a few weeks, I probably have another approach that will be totally opposite to this one.

As a side quest, I also learned how to screen mirror my Mac mini to our Samsung TV so I could hang out with my wife and kid while developing. Very fun times! Here is a photo.

TV dev setup - screen mirroring Mac mini while hanging out with family

Next Steps

One thing I'm still figuring out is how to include screenshot validations directly in the PR. Right now the orchestrator takes UI screenshots during validation, but they stay local. Attaching them to the PR body or as comments would give human reviewers much better context, especially for UI changes where a diff alone doesn't tell the full story.