The Agentic Dev Loop — Designing Modern Development Systems for Human + Agent Teams

Reproduction proves the bug exists. Demo steps prove the fix works. This is the other half of the verification loop — after implementation, the agent must demonstrate that the new behavior matches expectations."It compiles" is not verification. "Tests pass" is closer."Here's a screenshot of it working" is best.

What demo steps prove

Demo steps answer a simple question: does the implementation actually work? Not in theory. Not according to the tests. In practice, running the actual application, doing the actual thing.

This matters because:

Tests can be wrong. The agent might write tests that pass but don't actually verify the right behavior.
Integration gaps exist. Unit tests pass but the feature doesn't work end-to-end.
UI matters. The logic is correct but the user can't actually use it because a button is hidden or a form doesn't submit.

The verification loop

Reproduce

See the bug

Implement

Write the fix

Demo

Prove it works

All three steps use the same scenarios. The bug that reproduced in step 1 should not reproduce in step 3.

What good demo steps look like

Demo steps mirror reproduction steps — same scenarios, opposite expected outcome. They should be:

Demo Step Qualities

Executable

Concrete commands or actions, not descriptions. "Click the Save button"not "save the form."

Observable

Expected outcomes that can be seen or measured. "Toast shows 'Profile updated'{"}" not "it works."

Complete

Cover all acceptance criteria, not just the happy path. Include edge cases that were part of the original bug.

Evidenced

Produce artifacts: screenshots, logs, test output. "I ran it and it worked"is not evidence.

Example: Demo documentation

## Demo: Empty Email Validation Fix

### Setup
- Branch: fix/empty-email-validation
- Dev server running on localhost:3000
- Logged in as [email protected]

### Demo Steps

**Scenario 1: Empty email shows validation error**
1. Navigate to /profile/edit
2. Clear the email field
3. Click "Save Changes"
4. ✓ EXPECTED: Red border on email field
5. ✓ EXPECTED: Error message "Email is required"
6. ✓ EXPECTED: Form does not submit

**Scenario 2: Invalid email shows validation error**
1. Enter "not-an-email" in email field
2. Click "Save Changes"
3. ✓ EXPECTED: Error message "Please enter a valid email"

**Scenario 3: Valid email submits successfully**
1. Enter "[email protected]" in email field
2. Click "Save Changes"
3. ✓ EXPECTED: Toast shows "Profile updated"
4. ✓ EXPECTED: Page redirects to /profile

### Evidence
- Screenshot: demo-validation-error.png
- Screenshot: demo-success-toast.png
- Console log: No errors

### Verification
All acceptance criteria demonstrated. Ready for review.

Automating demos

Manual demos work but don't scale. When an agent can run automated demos, verification becomes consistent and fast.

Automation Approaches

E2E tests as demos

Playwright, Cypress, or Selenium tests that exercise the actual UI. Run them, capture screenshots, report results.

npx playwright test --project=chromium --reporter=html

API contract tests

For backend changes, hit the actual endpoints and verify responses. Show the requests and responses as evidence.

npm run test:api -- --verbose 2>&1 | tee demo-output.log

Visual regression tests

Capture screenshots before and after. Diff them. Show that the intended change happened and nothing else broke visually.

Recorded sessions

Record a video of the demo. Tools like Playwright can record tests. The video becomes documentation.

npx playwright test --project=chromium --video=on

Playwright demo script

// demo/empty-email-validation.spec.ts
import { test, expect } from '@playwright/test';

test.describe('Empty Email Validation Demo', () => {
  test.beforeEach(async ({ page }) => {
    await page.goto('/login');
    await page.fill('#email', '[email protected]');
    await page.fill('#password', 'testpass');
    await page.click('button[type="submit"]');
  });

  test('shows validation error for empty email', async ({ page }) => {
    await page.goto('/profile/edit');
    await page.fill('#email', '');
    await page.click('button[type="submit"]');
    
    // Capture screenshot as evidence
    await page.screenshot({ path: 'demo-validation-error.png' });
    
    await expect(page.locator('#email')).toHaveClass(/border-red/);
    await expect(page.locator('.error-message')).toHaveText('Email is required');
  });

  test('submits successfully with valid email', async ({ page }) => {
    await page.goto('/profile/edit');
    await page.fill('#email', '[email protected]');
    await page.click('button[type="submit"]');
    
    // Capture screenshot as evidence
    await page.screenshot({ path: 'demo-success.png' });
    
    await expect(page.locator('.toast')).toHaveText('Profile updated');
    await expect(page).toHaveURL('/profile');
  });
});

Acceptance criteria as code

The best demo steps are acceptance criteria that became tests. When you write specs (see Guide 15), phrase acceptance criteria as testable assertions:

Hard to automate

• Form should feel responsive
• Error messages should be helpful
• The fix should be clean
• It should work correctly

Easy to automate

• Form submits in < 200ms
• Empty email shows "Email is required"
• No new ESLint warnings introduced
• /api/profile returns 200 with valid data

When acceptance criteria are specific and measurable, agents can verify them automatically. When they're vague, agents either skip verification or make up their own interpretation.

When to require human sign-off

Not everything can be automated. Some demos require human judgment. Define when human sign-off is needed:

Human Sign-off Triggers

Visual/UX changes

New UI, layout changes, styling updates. Automated tests can check that elements exist; humans judge whether they look right.

Copy/content changes

New error messages, help text, user-facing strings. Automated tests can check presence; humans judge tone and clarity.

Security-sensitive changes

Auth, permissions, data access. Even if tests pass, a human should verify the approach is sound.

Subjective quality

"Does this feel right?" "Is this the right abstraction?"Code review territory — not demo territory.

In your specs, mark which acceptance criteria require human verification:

## Acceptance criteria
- [ ] Empty email shows validation error [AUTO]
- [ ] Invalid email shows format error [AUTO]
- [ ] Error message styling matches design system [HUMAN]
- [ ] Success toast appears on save [AUTO]
- [ ] Copy is clear and helpful [HUMAN]

Demo artifacts in PRs

Demo evidence should be attached to the PR. This makes review easier and creates a record of what was verified.

PR with demo section

## Summary
Fix empty email validation crash on profile page.

## Changes
- Added null check in validateEmail()
- Added client-side required validation
- Added test coverage

## Demo

### Before (reproduction)
![crash-before](./screenshots/crash-before.png)
- Empty email caused TypeError crash

### After (verification)
![validation-after](./screenshots/validation-after.png)
- Empty email now shows validation error

### Test run
```
$ npm test -- --grep "email validation"
✓ shows error for empty email (45ms)
✓ shows error for invalid format (38ms)
✓ accepts valid email (52ms)
3 passing (135ms)
```

### E2E demo
```
$ npx playwright test demo/email-validation.spec.ts
✓ shows validation error for empty email (1.2s)
✓ submits successfully with valid email (0.9s)
2 passed (2.1s)
```

## Checklist
- [x] Reproduction confirmed before fix
- [x] Demo shows fix works
- [x] Tests added
- [x] Screenshots attached

What goes wrong

"Tests pass" as only verification

Agent writes tests for its fix. Tests pass. But the tests are wrong — they test the implementation, not the requirement. The bug is still there.

No evidence

Agent says "I verified it works" but provides no screenshots, logs, or output. Reviewer has no way to confirm without re-doing the demo.

Happy path only

Agent demos the main flow but skips edge cases. The original bug was an edge case. It's still broken.

Demo doesn't match reproduction

Agent reproduced one scenario but demos a different one. The fix works for the demo case but not the original bug. Always demo the exact reproduction steps.

Summary

→Demo steps prove the fix works — "tests pass" is not enough.
→Demos should mirror reproduction steps — same scenarios, opposite expected outcome.
→Automate demos where possible — E2E tests, API tests, visual regression.
→Attach demo evidence to PRs — screenshots, logs, test output.

Related Guides

Reproduce Steps

Verifying the bug exists before implementation.

Artifacts of Success & Failure

What evidence gets produced by each agent run.

Stay updated

Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)

Expect & Demo Steps

Core Questions