The Agentic Dev Loop — Designing Modern Development Systems for Human + Agent Teams

You can't fix what you can't see. Before an agent writes a single line of fix, it needs to witness the bug with its own eyes (metaphorically). Reproduction is the foundation — skip it, and you're building on hallucination.

Why reproduction comes first

When a human debugs, they naturally try to reproduce the issue first. It's instinct — you want to see the problem before you fix it. Agents don't have this instinct. Given a bug report, they'll happily reason about what mightbe wrong and write a fix for that imagined problem.

This leads to a specific failure mode:

The hallucination trap

Agent reads bug report: "Users see a crash when clicking Save"
Agent scans the code, finds something that could cause a crash
Agent writes a fix for that potential issue
Agent runs tests — they pass (they always passed)
Agent declares the bug fixed
The actual bug? Still there. The agent fixed a different, hypothetical problem.

Reproduction breaks this cycle. When the agent must demonstrate the bug before fixing it, several things happen:

Reality check: The agent confirms the bug actually exists in the current codebase. Maybe it's already fixed. Maybe it's environment-specific.
Understanding: Seeing the actual error — the stack trace, the wrong output, the crash — gives the agent real information to work with.
Verification baseline: The agent now has a concrete test: run these steps, see this failure. After the fix, run the same steps, see success.

What reproduction looks like

Reproduction isn't just "I tried it and it broke." It's structured, documented proof that the agent observed the problem.

Reproduction Artifacts

Console output / logs

The actual error message, stack trace, or unexpected output. Copy-pasted, not paraphrased. This is evidence.

Screenshots

For UI bugs, a screenshot of the broken state. Shows exactly what the user sees. Agents can capture these with headless browsers.

Video / screen recording

For interaction bugs, a recording showing the steps and the failure. More context than a screenshot; shows timing and sequence.

Reproduction script

A script that triggers the bug automatically. Best artifact — it's repeatable and can become a regression test.

Environment state

What versions, what config, what data state. Bugs are often environment-specific; capturing this helps.

Example: Reproduction report

## Reproduction Report

**Bug:** User profile page crashes when email is empty
**Environment:** Node 20.11, macOS 14.2, Chrome 121
**Branch:** main @ commit abc123

### Steps executed:
1. Started dev server: `npm run dev`
2. Logged in as test user ([email protected])
3. Navigated to /profile/edit
4. Cleared the email field
5. Clicked "Save Changes"

### Observed behavior:
- Page displayed white screen
- Console error:

```
TypeError: Cannot read properties of null (reading 'toLowerCase')
    at validateEmail (src/utils/validation.ts:23:15)
    at ProfileForm.handleSubmit (src/components/ProfileForm.tsx:45:12)
```

### Screenshot:
[profile-crash-screenshot.png]

### Reproduction script:
```bash
# Automated repro
npm run test:e2e -- --spec cypress/e2e/profile-empty-email.cy.ts
```

**Reproduction confirmed:** ✓ Bug is reproducible

Reproduction as a gate

Reproduction shouldn't be optional — it should be a gate. The agent cannot proceed to implementation until reproduction is confirmed. This is a workflow constraint, not just a best practice.

Gated Workflow

Receive task

Agent gets a bug report with reproduction steps.

Attempt reproduction

Agent follows the steps exactly. Documents what happens.

Gate: Did reproduction succeed?

If YES → proceed to implementation.
If NO → stop and report. Do not guess.

Implement fix

Only reached if reproduction succeeded. Agent has real data to work with.

Encode this in your agent instructions:

CRITICAL: Before implementing any fix:
1. You MUST first reproduce the reported issue
2. Document the reproduction with actual output/screenshots
3. If you cannot reproduce after 3 attempts, STOP and report:
   - What you tried
   - What you observed
   - What environment details might be relevant
4. Do NOT proceed to implementation without confirmed reproduction

When reproduction fails

Sometimes the agent can't reproduce the bug. This is valuable information — not a failure. The correct response is to stop and report, not to guess.

Reproduction Failure Scenarios

Bug was already fixed

Someone else fixed it, or it was fixed as a side effect of another change. Agent should check recent commits and report.

Environment mismatch

Bug only happens in production, or with specific data, or on certain OS. Agent should report what environment it tested and ask for clarification.

Incomplete reproduction steps

The bug report is missing crucial details. Agent should ask for more specific steps rather than guessing.

Flaky / timing-dependent

The bug only happens sometimes. Agent should attempt multiple times and report the flakiness. May need different debugging approach.

Example: Reproduction failure report

## Reproduction Attempt Failed

**Bug:** User profile page crashes when email is empty
**Attempts:** 3

### What I tried:
1. Started dev server on main branch (commit abc123)
2. Logged in as [email protected]
3. Navigated to /profile/edit
4. Cleared email field
5. Clicked "Save Changes"

### What I observed:
- Form showed validation error "Email is required"
- No crash occurred
- Console showed no errors

### Possible explanations:
- Bug may have been fixed in recent commit def456 (Jan 15)
  which added client-side validation
- Bug may be environment-specific (I tested on Node 20.11)
- Bug may require specific user data state

### Request:
Please provide:
- Specific commit where bug was observed
- Any specific user account or data state needed
- Browser/OS where bug was seen

**Proceeding to implementation: NO (reproduction not confirmed)**

Automated reproduction

The gold standard: reproduction steps that can be executed automatically. This isn't always possible, but when it is, it's powerful.

Reproduction script patterns

API bug

#!/bin/bash
# repro-api-crash.sh
curl -X POST http://localhost:3000/api/users \
  -H "Content-Type: application/json" \
  -d '{"email": ""}' \
  -w "\nHTTP Status: %{http_code}\n"
# Expected: HTTP 400 with validation error
# Actual: HTTP 500 with server crash

UI bug (Playwright)

// repro-profile-crash.spec.ts
test('empty email should show validation error', async ({ page }) => {
  await page.goto('/profile/edit');
  await page.fill('#email', '');
  await page.click('button[type="submit"]');
  
  // This should pass but currently crashes
  await expect(page.locator('.error')).toHaveText('Email is required');
});

Unit-level bug

// repro-validation.test.ts
test('validateEmail handles null input', () => {
  // This should return false but currently throws
  expect(validateEmail(null)).toBe(false);
});

When the agent writes a reproduction script, it becomes a regression test. After the fix, the script should pass. Check it in. Now you have automated verification that this bug never comes back.

Handling flaky reproduction

Some bugs are timing-dependent, load-dependent, or otherwise flaky. They don't reproduce every time. This is tricky for agents because they may try once, fail to reproduce, and incorrectly conclude the bug doesn't exist.

Strategies for flaky bugs

→

Multiple attempts: Try reproduction 5-10 times before declaring failure. Document the success rate.

→

Stress testing: Run the reproduction in a loop. Flaky bugs often appear under load.

→

Timing manipulation: Add delays, run with slow network simulation, increase concurrency.

→

Log analysis: Even if the bug doesn't reproduce visibly, check logs for errors or warnings.

→

Escalate: If still can't reproduce, report to a human with detailed findings. Don't guess at fixes.

What goes wrong

Skipped reproduction

Agent dives straight into fixing. Writes a plausible fix for an imagined problem. The real bug remains. Hours wasted on the wrong thing.

Fake reproduction

Agent claims to have reproduced but didn't actually run the steps."I can see how this would fail" is not reproduction. Require artifacts.

Wrong reproduction

Agent reproduces a different bug than the one reported. Finds a crash, not the crash. Fixes the wrong thing. Compare against the original report carefully.

Giving up too early

Agent tries once, fails to reproduce, declares bug invalid. Many bugs require specific conditions. Persistence and systematic variation are needed.

Summary

→Reproduction must happen before implementation. Make it a gate, not a suggestion.
→Require artifacts: logs, screenshots, scripts. "I tried it" is not enough.
→If reproduction fails, stop and report. Don't guess at fixes.
→Reproduction scripts become regression tests. Check them in.

Related Guides

Expect & Demo Steps

Verifying the fix actually works after implementation.

Writing Specs & Prompts

How to write specs that include reproduction steps.

Stay updated

Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)

Reproduce Steps

Core Questions