All Guides

23 / 25

PR Review in Practice

Core Questions

  • What does a real agent-authored PR review look like?
  • Who reviews what, and when does a human step in?
  • How do you scale review when agents are opening dozens of PRs a day?

An agent opens 20 PRs today. Tomorrow it opens 30. The week after, 50. Traditional code review doesn't scale to agentic workloads. You need a different approach: tiered review, auto-merge criteria, and humans focusing on what humans are good at — intent, architecture, and edge cases — not syntax and formatting.

The review scaling problem

Traditional code review assumes humans write code at human speed. A developer might open 2-3 PRs a day. With two reviewers per PR and a 24-hour turnaround, the math works. Agents break this assumption.

PR Volume Comparison

Human developer

2-3 PRs/day, each representing hours of work. Deep context, complex changes. Worth spending 30 minutes to review.

Agent (current)

20-50 PRs/day, each representing minutes of work. Narrowly scoped, well-defined changes. 30 minutes per PR = 25 hours of review/day.

Agent (scaled)

100+ PRs/day across multiple repos. Human review for all of them is impossible. Need automation.

The solution isn't to skip review — it's to review differently. Automate what can be automated. Have agents review agents. Reserve human attention for what humans uniquely provide.

Tiered review system

Not all PRs need the same level of review. A one-line typo fix and a new authentication system are fundamentally different. Tier your review process by risk and complexity.

Review Tiers

Tier 1: Auto-merge

Low-risk, well-defined changes. Dependency updates within semver bounds, typo fixes, formatting, documentation. CI passes → merge.

Tier 2: Agent review

Medium-risk changes. Bug fixes, small features, refactors within established patterns. Another agent reviews before merge.

Tier 3: Human review

Higher-risk changes. New APIs, database schema changes, security-adjacent code. Human reviews architecture and intent.

Tier 4: Deep human review

Critical changes. Auth systems, payment flows, data migrations. Multiple humans, extended review period, extra scrutiny.

# .github/review-tiers.yml
tiers:
  auto_merge:
    criteria:
      - path_match: ["*.md", "docs/**"]
      - path_match: ["package-lock.json", "pnpm-lock.yaml"]
        condition: "semver_compatible"
      - label: "typo-fix"
    requirements:
      - ci_pass: true
      
  agent_review:
    criteria:
      - lines_changed: "<100"
      - files_changed: "<5"
      - no_path_match: ["**/auth/**", "**/payment/**", "**/migrations/**"]
    requirements:
      - ci_pass: true
      - agent_approval: 1
      
  human_review:
    criteria:
      - path_match: ["**/api/**", "lib/db/**"]
      - lines_changed: ">=100"
      - new_dependencies: true
    requirements:
      - ci_pass: true
      - human_approval: 1
      
  deep_review:
    criteria:
      - path_match: ["**/auth/**", "**/payment/**", "**/migrations/**"]
      - label: "security"
      - label: "breaking-change"
    requirements:
      - ci_pass: true
      - human_approval: 2
      - review_period: "24h"

Auto-merge criteria

Auto-merge isn't "no review" — it's "automated review." The criteria define what automated checks must pass before merge happens without human intervention.

Auto-Merge Requirements

All CI checks pass

Tests, linting, type checking, build — everything green. No exceptions.

No new warnings

Linter warnings, deprecation notices, security advisories. Can't auto-merge if it makes things worse.

Coverage not decreased

Test coverage must stay the same or improve. Dropping coverage requires human review.

No security vulnerabilities

Dependency scanning, SAST, secret detection. Any security finding blocks auto-merge.

Matches change scope

PR description matches actual changes. If agent says "fix typo"but touches API code, escalate.

Agent-to-agent review

When humans can't review every PR, agents review agents. A separate reviewer agent examines the PR with different context and priorities than the authoring agent.

Agent Reviewer Checklist

Does it match the spec?

Compare the PR against the original task specification. Does the implementation address what was asked?

Are there obvious bugs?

Off-by-one errors, null checks missing, async/await issues. Pattern matching for common mistakes.

Does it follow conventions?

Naming patterns, file structure, code organization. Check against codebase conventions documented in AGENTS.md.

Are edge cases handled?

Empty inputs, error states, boundary conditions. Generate edge cases and verify they're addressed.

Is the scope appropriate?

Did the author stay focused or wander? Flag scope creep or unrelated changes for human attention.

# Example agent review comment

## Review Summary

**Verdict: ✅ Approve with minor suggestions**

### Spec Alignment
The PR addresses the task "Fix login validation error messages" correctly:
- ✅ Error messages are now user-friendly
- ✅ Error display component added
- ✅ Tests cover validation cases

### Code Quality
- **Line 24**: Consider extracting the validation regex to a constant
- **Line 45**: The error message could include the invalid character for better debugging

### Edge Cases
- ✅ Empty email handled
- ✅ Invalid format handled  
- ⚠️ Consider: What happens with very long email addresses? (>254 chars)

### Scope
- ✅ Changes are focused on validation
- ✅ No unrelated modifications

### Recommendation
Approve and merge. Minor suggestions are optional improvements, not blockers.

Human review focus

When humans review agent PRs, they should focus on what humans do best: understanding intent, evaluating architecture decisions, and catching subtle issues that require domain expertise.

Human Review Priorities

Focus on

  • • Is this the right approach?
  • • Does it fit the architecture?
  • • Are there security implications?
  • • Will this scale?
  • • Edge cases that need domain knowledge
  • • User experience considerations

Skip (automated)

  • • Formatting and style
  • • Import ordering
  • • Variable naming conventions
  • • Test coverage numbers
  • • Dependency versions
  • • Linting errors

Principle

Review the intent and the architecture, not the syntax

If an agent wrote syntactically correct code that solves the wrong problem, that's the failure. If the code has a typo but the approach is sound, that's an easy fix. Prioritize your attention accordingly.

Review SLAs

Agents work fast. If reviews take days, you lose the throughput benefit. Set SLAs for each review tier to keep the pipeline flowing.

Review SLA Targets

Tier 1 (auto-merge)

< 30 minutes

CI completes, auto-merge triggers. No human wait time.

Tier 2 (agent review)

< 2 hours

Agent reviewer picks up PR, reviews, approves or requests changes.

Tier 3 (human review)

< 24 hours

Human reviewer assigned, completes review within business day.

Tier 4 (deep review)

< 48 hours

Extended review period. Multiple reviewers, thorough examination.

Track SLA compliance. If human reviews consistently miss targets, either add reviewers or shift more PRs to agent review tier.

Escalation paths

Sometimes an agent reviewer finds something it can't evaluate. Or a human reviewer finds something that needs specialist attention. Clear escalation paths prevent PRs from getting stuck.

Escalation Triggers

Agent → Human

  • • Agent reviewer confidence below threshold
  • • Changes touch escalation paths (auth, payments)
  • • Agent can't determine if change matches spec
  • • Unusual patterns not seen in training data

Human → Specialist

  • • Security implications identified
  • • Performance concerns requiring load testing
  • • Legal/compliance questions
  • • Architecture decisions above reviewer's expertise

Any → Original Author

  • • Requirements unclear
  • • Spec seems wrong or incomplete
  • • Multiple valid approaches, need decision

Comment quality and signal

Review comments should be actionable. "This looks wrong" is noise."This will fail when email is empty — add a null check" is signal.

Comment Categories

BLOCKER

Must fix before merge. Bug, security issue, or broken functionality. Clear action item with suggested fix.

SUGGESTION

Improvement opportunity. Not required but recommended. Author decides whether to address.

QUESTION

Need clarification. Why was this approach chosen? Is this behavior intentional? Doesn't block but needs answer.

PRAISE

Good work callout. Reinforces positive patterns. Yes, even for agents — it helps tune behavior.

What goes wrong

Review backlog grows

Agents produce PRs faster than reviewers process them. Backlog grows. Agents sit idle waiting for approval.

Fix: Shift more PRs to auto-merge and agent review tiers. Add human reviewers. Set stricter SLAs with alerts when breached.

Rubber-stamping

Overwhelmed reviewers approve without reading. Quality slips. Bugs reach production that should have been caught.

Fix: Track metrics per reviewer: time spent, bugs caught, post-merge issues. Make review quality visible. Reduce load by improving auto-merge criteria.

Wrong tier classification

High-risk change gets auto-merged because it didn't match escalation patterns. Security vulnerability ships.

Fix: Conservative tier rules — err toward human review. Audit auto-merged PRs periodically. Update rules when gaps found.

Agent reviewer misses issues

Agent approves PR with subtle bug. Human wouldn't have caught it either, but now there's no human in the loop.

Fix: Sample audit of agent-approved PRs. Track post-merge bug rate by review type. Tune agent reviewer or escalate more to humans if quality drops.

Summary

  • Traditional code review doesn't scale to agentic workloads — you need tiered review
  • Auto-merge for low-risk changes, agent review for medium, human review for high-risk
  • Humans should focus on intent and architecture, not syntax — automate the rest
  • Set SLAs per tier and track compliance — fast review preserves agent throughput
  • Clear escalation paths prevent PRs from getting stuck when reviewers are uncertain

Stay updated

Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)

Found this useful? Share it with your team.