23 / 25
PR Review in Practice
Core Questions
- What does a real agent-authored PR review look like?
- Who reviews what, and when does a human step in?
- How do you scale review when agents are opening dozens of PRs a day?
An agent opens 20 PRs today. Tomorrow it opens 30. The week after, 50. Traditional code review doesn't scale to agentic workloads. You need a different approach: tiered review, auto-merge criteria, and humans focusing on what humans are good at — intent, architecture, and edge cases — not syntax and formatting.
The review scaling problem
Traditional code review assumes humans write code at human speed. A developer might open 2-3 PRs a day. With two reviewers per PR and a 24-hour turnaround, the math works. Agents break this assumption.
PR Volume Comparison
Human developer
2-3 PRs/day, each representing hours of work. Deep context, complex changes. Worth spending 30 minutes to review.
Agent (current)
20-50 PRs/day, each representing minutes of work. Narrowly scoped, well-defined changes. 30 minutes per PR = 25 hours of review/day.
Agent (scaled)
100+ PRs/day across multiple repos. Human review for all of them is impossible. Need automation.
The solution isn't to skip review — it's to review differently. Automate what can be automated. Have agents review agents. Reserve human attention for what humans uniquely provide.
Tiered review system
Not all PRs need the same level of review. A one-line typo fix and a new authentication system are fundamentally different. Tier your review process by risk and complexity.
Review Tiers
Low-risk, well-defined changes. Dependency updates within semver bounds, typo fixes, formatting, documentation. CI passes → merge.
Medium-risk changes. Bug fixes, small features, refactors within established patterns. Another agent reviews before merge.
Higher-risk changes. New APIs, database schema changes, security-adjacent code. Human reviews architecture and intent.
Critical changes. Auth systems, payment flows, data migrations. Multiple humans, extended review period, extra scrutiny.
# .github/review-tiers.yml
tiers:
auto_merge:
criteria:
- path_match: ["*.md", "docs/**"]
- path_match: ["package-lock.json", "pnpm-lock.yaml"]
condition: "semver_compatible"
- label: "typo-fix"
requirements:
- ci_pass: true
agent_review:
criteria:
- lines_changed: "<100"
- files_changed: "<5"
- no_path_match: ["**/auth/**", "**/payment/**", "**/migrations/**"]
requirements:
- ci_pass: true
- agent_approval: 1
human_review:
criteria:
- path_match: ["**/api/**", "lib/db/**"]
- lines_changed: ">=100"
- new_dependencies: true
requirements:
- ci_pass: true
- human_approval: 1
deep_review:
criteria:
- path_match: ["**/auth/**", "**/payment/**", "**/migrations/**"]
- label: "security"
- label: "breaking-change"
requirements:
- ci_pass: true
- human_approval: 2
- review_period: "24h"Auto-merge criteria
Auto-merge isn't "no review" — it's "automated review." The criteria define what automated checks must pass before merge happens without human intervention.
Auto-Merge Requirements
All CI checks pass
Tests, linting, type checking, build — everything green. No exceptions.
No new warnings
Linter warnings, deprecation notices, security advisories. Can't auto-merge if it makes things worse.
Coverage not decreased
Test coverage must stay the same or improve. Dropping coverage requires human review.
No security vulnerabilities
Dependency scanning, SAST, secret detection. Any security finding blocks auto-merge.
Matches change scope
PR description matches actual changes. If agent says "fix typo"but touches API code, escalate.
Agent-to-agent review
When humans can't review every PR, agents review agents. A separate reviewer agent examines the PR with different context and priorities than the authoring agent.
Agent Reviewer Checklist
Does it match the spec?
Compare the PR against the original task specification. Does the implementation address what was asked?
Are there obvious bugs?
Off-by-one errors, null checks missing, async/await issues. Pattern matching for common mistakes.
Does it follow conventions?
Naming patterns, file structure, code organization. Check against codebase conventions documented in AGENTS.md.
Are edge cases handled?
Empty inputs, error states, boundary conditions. Generate edge cases and verify they're addressed.
Is the scope appropriate?
Did the author stay focused or wander? Flag scope creep or unrelated changes for human attention.
# Example agent review comment
## Review Summary
**Verdict: ✅ Approve with minor suggestions**
### Spec Alignment
The PR addresses the task "Fix login validation error messages" correctly:
- ✅ Error messages are now user-friendly
- ✅ Error display component added
- ✅ Tests cover validation cases
### Code Quality
- **Line 24**: Consider extracting the validation regex to a constant
- **Line 45**: The error message could include the invalid character for better debugging
### Edge Cases
- ✅ Empty email handled
- ✅ Invalid format handled
- ⚠️ Consider: What happens with very long email addresses? (>254 chars)
### Scope
- ✅ Changes are focused on validation
- ✅ No unrelated modifications
### Recommendation
Approve and merge. Minor suggestions are optional improvements, not blockers.Human review focus
When humans review agent PRs, they should focus on what humans do best: understanding intent, evaluating architecture decisions, and catching subtle issues that require domain expertise.
Human Review Priorities
Focus on
- • Is this the right approach?
- • Does it fit the architecture?
- • Are there security implications?
- • Will this scale?
- • Edge cases that need domain knowledge
- • User experience considerations
Skip (automated)
- • Formatting and style
- • Import ordering
- • Variable naming conventions
- • Test coverage numbers
- • Dependency versions
- • Linting errors
Principle
Review the intent and the architecture, not the syntax
If an agent wrote syntactically correct code that solves the wrong problem, that's the failure. If the code has a typo but the approach is sound, that's an easy fix. Prioritize your attention accordingly.
Review SLAs
Agents work fast. If reviews take days, you lose the throughput benefit. Set SLAs for each review tier to keep the pipeline flowing.
Review SLA Targets
Tier 1 (auto-merge)
< 30 minutesCI completes, auto-merge triggers. No human wait time.
Tier 2 (agent review)
< 2 hoursAgent reviewer picks up PR, reviews, approves or requests changes.
Tier 3 (human review)
< 24 hoursHuman reviewer assigned, completes review within business day.
Tier 4 (deep review)
< 48 hoursExtended review period. Multiple reviewers, thorough examination.
Track SLA compliance. If human reviews consistently miss targets, either add reviewers or shift more PRs to agent review tier.
Escalation paths
Sometimes an agent reviewer finds something it can't evaluate. Or a human reviewer finds something that needs specialist attention. Clear escalation paths prevent PRs from getting stuck.
Escalation Triggers
Agent → Human
- • Agent reviewer confidence below threshold
- • Changes touch escalation paths (auth, payments)
- • Agent can't determine if change matches spec
- • Unusual patterns not seen in training data
Human → Specialist
- • Security implications identified
- • Performance concerns requiring load testing
- • Legal/compliance questions
- • Architecture decisions above reviewer's expertise
Any → Original Author
- • Requirements unclear
- • Spec seems wrong or incomplete
- • Multiple valid approaches, need decision
Comment quality and signal
Review comments should be actionable. "This looks wrong" is noise."This will fail when email is empty — add a null check" is signal.
Comment Categories
Must fix before merge. Bug, security issue, or broken functionality. Clear action item with suggested fix.
Improvement opportunity. Not required but recommended. Author decides whether to address.
Need clarification. Why was this approach chosen? Is this behavior intentional? Doesn't block but needs answer.
Good work callout. Reinforces positive patterns. Yes, even for agents — it helps tune behavior.
What goes wrong
Review backlog grows
Agents produce PRs faster than reviewers process them. Backlog grows. Agents sit idle waiting for approval.
Fix: Shift more PRs to auto-merge and agent review tiers. Add human reviewers. Set stricter SLAs with alerts when breached.
Rubber-stamping
Overwhelmed reviewers approve without reading. Quality slips. Bugs reach production that should have been caught.
Fix: Track metrics per reviewer: time spent, bugs caught, post-merge issues. Make review quality visible. Reduce load by improving auto-merge criteria.
Wrong tier classification
High-risk change gets auto-merged because it didn't match escalation patterns. Security vulnerability ships.
Fix: Conservative tier rules — err toward human review. Audit auto-merged PRs periodically. Update rules when gaps found.
Agent reviewer misses issues
Agent approves PR with subtle bug. Human wouldn't have caught it either, but now there's no human in the loop.
Fix: Sample audit of agent-approved PRs. Track post-merge bug rate by review type. Tune agent reviewer or escalate more to humans if quality drops.
Summary
- •Traditional code review doesn't scale to agentic workloads — you need tiered review
- •Auto-merge for low-risk changes, agent review for medium, human review for high-risk
- •Humans should focus on intent and architecture, not syntax — automate the rest
- •Set SLAs per tier and track compliance — fast review preserves agent throughput
- •Clear escalation paths prevent PRs from getting stuck when reviewers are uncertain
Related guides
Stay updated
Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)
Found this useful? Share it with your team.