In 6-12 months, agents may be the exclusive reviewer for most of our code. The job is changing from “how does the code get written and reviewed” to three different questions:
- Should we build this?
- Is this a solid implementation plan I can hand to agents?
- How do we prove the agents built what we wanted?
A few people pushed back with a related question: if agents are writing and reviewing our code, how do humans steer them to not produce increasingly bad code at incredible speed, especially as we have less context on the actual code?
My answer: agents will soon write better code than we do, but they still struggle with the bigger picture - what to build, where it fits, what else it touches. All of this boils down to human judgment in the face of sycophantic models, which tend to think all of your ideas are great unless you tell them to disagree, and then they hate all of your ideas.
The fix is, in part, more agents.
Agents already write better code than humans
If I took one of Opus’s PRs with me in a time machine to two years ago and had an engineer at that time implement the same feature, I’d bet Opus’s PR wins more often than not. Layer on a /simplify pass and a GPT 5.5 code review and that gap widens substantially. In 6-12 months, models will consistently write better code than humans.
Assuming the code should exist in the first place, the problem isn’t “agents can’t write good code.” They can, and we should treat situations where they don’t as bugs to fix through better rules and more context.
Where they do struggle is the bigger picture: should this exist, where does it fit in the system, what else does it touch. Two examples:
Solving flaky tests without backend context. Agents almost always add retries instead of asking for more context so they can find the root cause.
Fixing an issue in one repo. Agents don’t know whether similar issues exist in other repos, which leads to config drift, code drift, and partial fixes.
More agents, not fewer
How do we make them better at the bigger picture? More agents. Agents could:
- Explore each repo and create a Devin-style wiki with architecture diagrams, upstream/downstream services, and links to important files.
- Take each individual wiki and create a higher-level “System Architecture” wiki.
- Update these wikis weekly.
- Work with us to create our ideal north star architecture.
- Compare existing wikis with the north star and help plan the highest-impact refactors to get closer and prevent drift through linters and rules.
This isn’t fantasy. It’s achievable today.
Engineering principles matter more, not less
Matt Pocock’s Claude Code for Real Engineers course introduced his now-viral skills. They’re informed by real engineering principles from books like Domain Driven Design, Extreme Programming, A Philosophy of Software Design, and The Pragmatic Programmer. Ideas like bounded contexts, deep modules, and tracer bullets.
Engineers should still read these books. With agent speed, the engineering principles are arguably more important than ever. The skills don’t answer questions for you - they guide you. They pose alternatives with tradeoffs and have you decide. You can zoom out on unfamiliar topics to learn more. You’re pair programming with an all-knowing wizard that lacks your context and judgment.
What is a senior engineer, anyway?
Another question I get: “How will we get new senior engineers if junior engineers aren’t writing or reviewing code?”
Google’s AI overview says a senior engineer is “an experienced technical professional responsible for leading complex projects, designing systems, and mentoring junior staff.”
Notice there’s no mention of writing or reviewing code. The role’s processes, languages, and tools have changed drastically across decades, but at the end of the day, we lead complex projects to solve customer problems.
In five years, we won’t have senior engineers as they are today, because the shape of the job will have changed. In five years, the senior engineers will lead complex projects using and adapting to the tools available to them.
Does that mean human code-writing and reviewing skills atrophy? Yes. The job’s shape has changed. I haven’t written a line of assembly since college. That doesn’t stop me from leading complex projects and solving customer problems.
But we should be careful here: some amount of code fluency is still part of how engineers build judgment. The goal isn’t zero code reading. It’s not needing to read every line.
How the job changes
For all but the most exploratory tasks, we’ll spend less time as humans-in-the-loop watching over individual agent shoulders and more time orchestrating many parallel threads. At some N, local agents are too high a tax on your hard drive and RAM and we’ll require robust remote agents. The job shifts to:
- Pair programming with agents to decide what to build and create solid plans.
- Writing rules and feedback loops so agents write excellent code and catch bugs during review while running autonomously.
- Validating like PMs - checking acceptance criteria and doing exploratory QA until we can confidently deliver code we’ve proven to work.
This means we need:
- Better ways of validating agent-written code than line-by-line diffs. The plan is the most important review artifact, not the resulting code.
- Lower release risk through rollback-safe, backward-compatible changes and feature flags.
- Improved production monitoring and faster incident response.
Cosine: an orchestration runner
Before we get there, we need to improve our patterns, prompts, and workflows to allow for more and longer autonomous delivery sessions. Local agents give the tightest feedback loop.
I’ve been building a tool called Cosine - an orchestration runner that watches a project board and farms tickets to local coding agents running Claude Code and Codex CLIs. The agents use enterprise plans we’re already paying for instead of separate API charges, and the orchestrator stops when you’re close to running out of usage so there’s always some left for you.
The workflow:
- Run
cosine run --watchto watch a project board for tickets in TODO status. - On each detected issue, create a ticket with the
agent-codexoragent-claudelabel. - Cosine starts a local agent session in a separate worktree using the ticket description as the prompt.
- Upon completion, the agent marks the original ticket Done and creates a new ticket with the fix plan for human review.
- I review the plan, move approved plans to TODO, and close or ask for further investigation on others.
- Cosine picks up the approved ticket, executes the fix, and opens a PR.
The agents run in tmux workspaces backed by git worktrees in sandboxes so you can more safely use auto modes. They’re more steerable than hosted agents, and since they use the native CLI harnesses, they perform better too.
The shape of the job has changed
The engineering principles from Domain Driven Design, A Philosophy of Software Design, and The Pragmatic Programmer aren’t going away. They’re more important than ever. Bounded contexts, deep modules, tracer bullets - these ideas guide judgment. The difference is how that judgment gets applied.
We spend less time typing and reviewing code, more time:
- Deciding what to build and what not to build
- Creating solid plans that agents can execute without clarification
- Writing rules and feedback loops so agents produce excellent output autonomously
- Validating behavior like product managers, not reading diffs like code reviewers
The shape of the job has already changed. Lean into more agents for the wikis, the architecture, the plans, the reviews. Spend your time on what only humans can do: deciding what to build and proving we built the right thing.
References
- The Five Levels: from Spicy Autocomplete to the Dark Factory - Dan Shapiro
- Matt Pocock’s Skills - Engineering skills for the agent era
- Claude Code for Real Engineers - Matt Pocock’s course
- Code Proven to Work - Simon Willison on validation
- OpenAI Symphony - Agent orchestration framework