20 / 25
Reproduce Steps
Core Questions
- Before any implementation begins, how do you verify the bug or behavior exists?
- How does an agent confirm it can reproduce the issue?
- What happens if reproduction fails?
You can't fix what you can't see. Before an agent writes a single line of fix, it needs to witness the bug with its own eyes (metaphorically). Reproduction is the foundation — skip it, and you're building on hallucination.
Why reproduction comes first
When a human debugs, they naturally try to reproduce the issue first. It's instinct — you want to see the problem before you fix it. Agents don't have this instinct. Given a bug report, they'll happily reason about what mightbe wrong and write a fix for that imagined problem.
This leads to a specific failure mode:
The hallucination trap
- Agent reads bug report: "Users see a crash when clicking Save"
- Agent scans the code, finds something that could cause a crash
- Agent writes a fix for that potential issue
- Agent runs tests — they pass (they always passed)
- Agent declares the bug fixed
- The actual bug? Still there. The agent fixed a different, hypothetical problem.
Reproduction breaks this cycle. When the agent must demonstrate the bug before fixing it, several things happen:
- Reality check: The agent confirms the bug actually exists in the current codebase. Maybe it's already fixed. Maybe it's environment-specific.
- Understanding: Seeing the actual error — the stack trace, the wrong output, the crash — gives the agent real information to work with.
- Verification baseline: The agent now has a concrete test: run these steps, see this failure. After the fix, run the same steps, see success.
What reproduction looks like
Reproduction isn't just "I tried it and it broke." It's structured, documented proof that the agent observed the problem.
Reproduction Artifacts
Console output / logs
The actual error message, stack trace, or unexpected output. Copy-pasted, not paraphrased. This is evidence.
Screenshots
For UI bugs, a screenshot of the broken state. Shows exactly what the user sees. Agents can capture these with headless browsers.
Video / screen recording
For interaction bugs, a recording showing the steps and the failure. More context than a screenshot; shows timing and sequence.
Reproduction script
A script that triggers the bug automatically. Best artifact — it's repeatable and can become a regression test.
Environment state
What versions, what config, what data state. Bugs are often environment-specific; capturing this helps.
Example: Reproduction report
## Reproduction Report **Bug:** User profile page crashes when email is empty **Environment:** Node 20.11, macOS 14.2, Chrome 121 **Branch:** main @ commit abc123 ### Steps executed: 1. Started dev server: `npm run dev` 2. Logged in as test user ([email protected]) 3. Navigated to /profile/edit 4. Cleared the email field 5. Clicked "Save Changes" ### Observed behavior: - Page displayed white screen - Console error: ``` TypeError: Cannot read properties of null (reading 'toLowerCase') at validateEmail (src/utils/validation.ts:23:15) at ProfileForm.handleSubmit (src/components/ProfileForm.tsx:45:12) ``` ### Screenshot: [profile-crash-screenshot.png] ### Reproduction script: ```bash # Automated repro npm run test:e2e -- --spec cypress/e2e/profile-empty-email.cy.ts ``` **Reproduction confirmed:** ✓ Bug is reproducible
Reproduction as a gate
Reproduction shouldn't be optional — it should be a gate. The agent cannot proceed to implementation until reproduction is confirmed. This is a workflow constraint, not just a best practice.
Gated Workflow
Receive task
Agent gets a bug report with reproduction steps.
Attempt reproduction
Agent follows the steps exactly. Documents what happens.
Gate: Did reproduction succeed?
If YES → proceed to implementation.
If NO → stop and report. Do not guess.
Implement fix
Only reached if reproduction succeeded. Agent has real data to work with.
Encode this in your agent instructions:
CRITICAL: Before implementing any fix: 1. You MUST first reproduce the reported issue 2. Document the reproduction with actual output/screenshots 3. If you cannot reproduce after 3 attempts, STOP and report: - What you tried - What you observed - What environment details might be relevant 4. Do NOT proceed to implementation without confirmed reproduction
When reproduction fails
Sometimes the agent can't reproduce the bug. This is valuable information — not a failure. The correct response is to stop and report, not to guess.
Reproduction Failure Scenarios
Bug was already fixed
Someone else fixed it, or it was fixed as a side effect of another change. Agent should check recent commits and report.
Environment mismatch
Bug only happens in production, or with specific data, or on certain OS. Agent should report what environment it tested and ask for clarification.
Incomplete reproduction steps
The bug report is missing crucial details. Agent should ask for more specific steps rather than guessing.
Flaky / timing-dependent
The bug only happens sometimes. Agent should attempt multiple times and report the flakiness. May need different debugging approach.
Example: Reproduction failure report
## Reproduction Attempt Failed **Bug:** User profile page crashes when email is empty **Attempts:** 3 ### What I tried: 1. Started dev server on main branch (commit abc123) 2. Logged in as [email protected] 3. Navigated to /profile/edit 4. Cleared email field 5. Clicked "Save Changes" ### What I observed: - Form showed validation error "Email is required" - No crash occurred - Console showed no errors ### Possible explanations: - Bug may have been fixed in recent commit def456 (Jan 15) which added client-side validation - Bug may be environment-specific (I tested on Node 20.11) - Bug may require specific user data state ### Request: Please provide: - Specific commit where bug was observed - Any specific user account or data state needed - Browser/OS where bug was seen **Proceeding to implementation: NO (reproduction not confirmed)**
Automated reproduction
The gold standard: reproduction steps that can be executed automatically. This isn't always possible, but when it is, it's powerful.
Reproduction script patterns
API bug
#!/bin/bash
# repro-api-crash.sh
curl -X POST http://localhost:3000/api/users \
-H "Content-Type: application/json" \
-d '{"email": ""}' \
-w "\nHTTP Status: %{http_code}\n"
# Expected: HTTP 400 with validation error
# Actual: HTTP 500 with server crashUI bug (Playwright)
// repro-profile-crash.spec.ts
test('empty email should show validation error', async ({ page }) => {
await page.goto('/profile/edit');
await page.fill('#email', '');
await page.click('button[type="submit"]');
// This should pass but currently crashes
await expect(page.locator('.error')).toHaveText('Email is required');
});Unit-level bug
// repro-validation.test.ts
test('validateEmail handles null input', () => {
// This should return false but currently throws
expect(validateEmail(null)).toBe(false);
});When the agent writes a reproduction script, it becomes a regression test. After the fix, the script should pass. Check it in. Now you have automated verification that this bug never comes back.
Handling flaky reproduction
Some bugs are timing-dependent, load-dependent, or otherwise flaky. They don't reproduce every time. This is tricky for agents because they may try once, fail to reproduce, and incorrectly conclude the bug doesn't exist.
Strategies for flaky bugs
What goes wrong
Skipped reproduction
Agent dives straight into fixing. Writes a plausible fix for an imagined problem. The real bug remains. Hours wasted on the wrong thing.
Fake reproduction
Agent claims to have reproduced but didn't actually run the steps."I can see how this would fail" is not reproduction. Require artifacts.
Wrong reproduction
Agent reproduces a different bug than the one reported. Finds a crash, not the crash. Fixes the wrong thing. Compare against the original report carefully.
Giving up too early
Agent tries once, fails to reproduce, declares bug invalid. Many bugs require specific conditions. Persistence and systematic variation are needed.
Summary
- →Reproduction must happen before implementation. Make it a gate, not a suggestion.
- →Require artifacts: logs, screenshots, scripts. "I tried it" is not enough.
- →If reproduction fails, stop and report. Don't guess at fixes.
- →Reproduction scripts become regression tests. Check them in.
Related Guides
Stay updated
Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)
Found this useful? Share it with your team.