All Guides

08 / 25

Code Review in an Agentic World

Core Questions

  • Which reviews are automated?
  • What do humans still review?
  • How is drift prevented?

Human code review for agent PRs doesn't scale. If agents can produce 50 PRs a day, humans can't review 50 PRs a day — not carefully, anyway. The answer isn't to skip review; it's to redesign what gets reviewed, by whom, and how.

The review scaling problem

Traditional code review assumes humans write code at human speed. One developer, a few PRs per day. Reviewers can read every line, understand the context, catch bugs.

Agents break this assumption. They can produce more code, faster. If you try to maintain the same review process, you get:

  • Review backlog: PRs pile up faster than humans can review them.
  • Rubber stamping: Reviewers skim and approve to clear the queue.
  • Reviewer burnout: Humans spend all day reading agent code instead of writing their own.

The new model: tiered review

Not all code needs the same level of review. Tier your review process:

  • Automated review: Agents and tools check for common issues. No human needed unless something fails.
  • Lightweight human review: Human glances at the diff, checks the summary, approves if nothing looks wrong.
  • Deep human review: Human reads every line, understands the design, thinks about edge cases.

The tier depends on what changed, not who wrote it.

What can be automated

Some review tasks are mechanical. They follow rules. They can be automated completely:

Automatable Review Tasks

Linting & formatting

ESLint, Prettier, gofmt. Either it passes or it doesn't. No judgment needed.

Type checking

TypeScript, mypy, type annotations. The compiler is a reviewer.

Test coverage

Did coverage decrease? Are new functions tested? Measurable.

Security scanning

Semgrep, CodeQL, dependency vulnerabilities. Known patterns, automated detection.

Architecture rules

Import restrictions, layer violations, dependency cycles. Enforced by tools like dependency-cruiser or ArchUnit.

API compatibility

Did the public API change? Breaking changes detected by schema diffing.

If these checks pass, a human reviewer doesn't need to verify them. The human can focus on things automation can't catch.

What humans still review

Some things require human judgment. These are the high-value review tasks:

Human Review Required

Design & architecture

Is this the right approach? Does it fit the system's patterns? Will it scale? These require understanding context that agents don't have.

Business logic correctness

Does this actually implement what was requested? Are the edge cases handled according to business rules? Only someone who understands the domain can judge.

Security-sensitive code

Authentication, authorization, cryptography, data handling. Even with security scanners, human review is essential for sensitive areas.

Novel patterns

First use of a new library, new architectural pattern, new integration. Humans decide if this is a good precedent.

User-facing copy

Error messages, help text, UI labels. Tone and clarity require human judgment.

Agent reviewers

Agents can review code too. Not just linting — actual code review. An agent reviewer can:

  • Summarize what the PR does
  • Flag potential issues
  • Check if tests cover the changes
  • Verify the PR matches the spec
  • Suggest improvements

The agent review workflow

1
PR opened: Agent-authored or human-authored, doesn't matter. Triggers reviewer agent.
2
Agent review: Reviewer agent reads the diff, the linked issue, the tests. Posts a review comment with summary, concerns, questions.
3
Author response: If reviewer raised issues, the author (agent or human) addresses them. May require iteration.
4
Human review: For PRs that require it (based on what changed), a human does final review. Agent review has already caught the obvious issues.

This is agents reviewing agents. It sounds recursive but it works. The reviewer agent has a different job than the author agent — it's looking for problems, not building features.

Review tiers by change type

Define which tier of review applies based on what the PR changes:

Review Tier Matrix

Documentation only

README, comments, JSDoc. Low risk.

Auto-merge if CI passes

Test changes only

New tests, test refactoring. Improves safety.

Agent review + auto-merge

Standard code changes

Bug fixes, features, refactoring in non-sensitive areas.

Agent review + lightweight human

Sensitive areas

Auth, payments, data handling, security configs.

Agent review + deep human

Architecture changes

New patterns, API changes, infrastructure modifications.

Design review + multiple humans

Implement this with CODEOWNERS and branch protection rules. Files in /docs have different rules than files in /src/auth.

Preventing drift

When agents produce lots of code quickly, architectural drift happens fast. Small deviations compound. Suddenly your codebase has three different ways to do the same thing.

Anti-drift strategies

Architecture decision records (ADRs): Document decisions about patterns. Agents can read them. Reviewers can check compliance.
Pattern libraries: Instead of "don't do X," show "here's how we do X." Example code that agents can reference.
Automated pattern detection: Lint rules that catch known anti-patterns. Fail the build when patterns drift.
Periodic architecture review: Weekly or monthly human review of overall patterns. Catch drift before it compounds.

What goes wrong

Rubber stamp reviews

Humans can't keep up. They approve without reading. Bad code ships. Bugs pile up. Trust in the agent erodes.

Review theater

Lots of comments, no substance. Nitpicking style while missing logic bugs. Reviews feel thorough but catch nothing important.

Agents reviewing themselves

Same agent writes and reviews. Blind spots are shared. Use different agents or configurations for author vs reviewer.

Over-automation

Everything auto-merges. Nobody looks at anything. Subtle bugs accumulate. Eventually something big breaks and nobody understands the code.

Summary

  • Tier your reviews: not all code needs the same level of scrutiny.
  • Automate mechanical checks. Reserve humans for judgment calls.
  • Agent reviewers can pre-screen PRs, catching issues before humans look.
  • Prevent drift with ADRs, pattern libraries, and automated enforcement.

Stay updated

Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)

Found this useful? Share it with your team.