08 / 25
Code Review in an Agentic World
Core Questions
- Which reviews are automated?
- What do humans still review?
- How is drift prevented?
Human code review for agent PRs doesn't scale. If agents can produce 50 PRs a day, humans can't review 50 PRs a day — not carefully, anyway. The answer isn't to skip review; it's to redesign what gets reviewed, by whom, and how.
The review scaling problem
Traditional code review assumes humans write code at human speed. One developer, a few PRs per day. Reviewers can read every line, understand the context, catch bugs.
Agents break this assumption. They can produce more code, faster. If you try to maintain the same review process, you get:
- Review backlog: PRs pile up faster than humans can review them.
- Rubber stamping: Reviewers skim and approve to clear the queue.
- Reviewer burnout: Humans spend all day reading agent code instead of writing their own.
The new model: tiered review
Not all code needs the same level of review. Tier your review process:
- Automated review: Agents and tools check for common issues. No human needed unless something fails.
- Lightweight human review: Human glances at the diff, checks the summary, approves if nothing looks wrong.
- Deep human review: Human reads every line, understands the design, thinks about edge cases.
The tier depends on what changed, not who wrote it.
What can be automated
Some review tasks are mechanical. They follow rules. They can be automated completely:
Automatable Review Tasks
Linting & formatting
ESLint, Prettier, gofmt. Either it passes or it doesn't. No judgment needed.
Type checking
TypeScript, mypy, type annotations. The compiler is a reviewer.
Test coverage
Did coverage decrease? Are new functions tested? Measurable.
Security scanning
Semgrep, CodeQL, dependency vulnerabilities. Known patterns, automated detection.
Architecture rules
Import restrictions, layer violations, dependency cycles. Enforced by tools like dependency-cruiser or ArchUnit.
API compatibility
Did the public API change? Breaking changes detected by schema diffing.
If these checks pass, a human reviewer doesn't need to verify them. The human can focus on things automation can't catch.
What humans still review
Some things require human judgment. These are the high-value review tasks:
Human Review Required
Design & architecture
Is this the right approach? Does it fit the system's patterns? Will it scale? These require understanding context that agents don't have.
Business logic correctness
Does this actually implement what was requested? Are the edge cases handled according to business rules? Only someone who understands the domain can judge.
Security-sensitive code
Authentication, authorization, cryptography, data handling. Even with security scanners, human review is essential for sensitive areas.
Novel patterns
First use of a new library, new architectural pattern, new integration. Humans decide if this is a good precedent.
User-facing copy
Error messages, help text, UI labels. Tone and clarity require human judgment.
Agent reviewers
Agents can review code too. Not just linting — actual code review. An agent reviewer can:
- Summarize what the PR does
- Flag potential issues
- Check if tests cover the changes
- Verify the PR matches the spec
- Suggest improvements
The agent review workflow
This is agents reviewing agents. It sounds recursive but it works. The reviewer agent has a different job than the author agent — it's looking for problems, not building features.
Review tiers by change type
Define which tier of review applies based on what the PR changes:
Review Tier Matrix
Documentation only
README, comments, JSDoc. Low risk.
Test changes only
New tests, test refactoring. Improves safety.
Standard code changes
Bug fixes, features, refactoring in non-sensitive areas.
Sensitive areas
Auth, payments, data handling, security configs.
Architecture changes
New patterns, API changes, infrastructure modifications.
Implement this with CODEOWNERS and branch protection rules. Files in /docs have different rules than files in /src/auth.
Preventing drift
When agents produce lots of code quickly, architectural drift happens fast. Small deviations compound. Suddenly your codebase has three different ways to do the same thing.
Anti-drift strategies
What goes wrong
Rubber stamp reviews
Humans can't keep up. They approve without reading. Bad code ships. Bugs pile up. Trust in the agent erodes.
Review theater
Lots of comments, no substance. Nitpicking style while missing logic bugs. Reviews feel thorough but catch nothing important.
Agents reviewing themselves
Same agent writes and reviews. Blind spots are shared. Use different agents or configurations for author vs reviewer.
Over-automation
Everything auto-merges. Nobody looks at anything. Subtle bugs accumulate. Eventually something big breaks and nobody understands the code.
Summary
- →Tier your reviews: not all code needs the same level of scrutiny.
- →Automate mechanical checks. Reserve humans for judgment calls.
- →Agent reviewers can pre-screen PRs, catching issues before humans look.
- →Prevent drift with ADRs, pattern libraries, and automated enforcement.
Related Guides
Stay updated
Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)
Found this useful? Share it with your team.