21 / 25
Expect & Demo Steps
Core Questions
- After implementation, how do you verify the change works?
- What does the acceptance loop look like?
- How do you automate the 'does this actually work' check?
Reproduction proves the bug exists. Demo steps prove the fix works. This is the other half of the verification loop — after implementation, the agent must demonstrate that the new behavior matches expectations."It compiles" is not verification. "Tests pass" is closer."Here's a screenshot of it working" is best.
What demo steps prove
Demo steps answer a simple question: does the implementation actually work? Not in theory. Not according to the tests. In practice, running the actual application, doing the actual thing.
This matters because:
- Tests can be wrong. The agent might write tests that pass but don't actually verify the right behavior.
- Integration gaps exist. Unit tests pass but the feature doesn't work end-to-end.
- UI matters. The logic is correct but the user can't actually use it because a button is hidden or a form doesn't submit.
The verification loop
Reproduce
See the bug
Implement
Write the fix
Demo
Prove it works
All three steps use the same scenarios. The bug that reproduced in step 1 should not reproduce in step 3.
What good demo steps look like
Demo steps mirror reproduction steps — same scenarios, opposite expected outcome. They should be:
Demo Step Qualities
Executable
Concrete commands or actions, not descriptions. "Click the Save button"not "save the form."
Observable
Expected outcomes that can be seen or measured. "Toast shows 'Profile updated'{"}" not "it works."
Complete
Cover all acceptance criteria, not just the happy path. Include edge cases that were part of the original bug.
Evidenced
Produce artifacts: screenshots, logs, test output. "I ran it and it worked"is not evidence.
Example: Demo documentation
## Demo: Empty Email Validation Fix ### Setup - Branch: fix/empty-email-validation - Dev server running on localhost:3000 - Logged in as [email protected] ### Demo Steps **Scenario 1: Empty email shows validation error** 1. Navigate to /profile/edit 2. Clear the email field 3. Click "Save Changes" 4. ✓ EXPECTED: Red border on email field 5. ✓ EXPECTED: Error message "Email is required" 6. ✓ EXPECTED: Form does not submit **Scenario 2: Invalid email shows validation error** 1. Enter "not-an-email" in email field 2. Click "Save Changes" 3. ✓ EXPECTED: Error message "Please enter a valid email" **Scenario 3: Valid email submits successfully** 1. Enter "[email protected]" in email field 2. Click "Save Changes" 3. ✓ EXPECTED: Toast shows "Profile updated" 4. ✓ EXPECTED: Page redirects to /profile ### Evidence - Screenshot: demo-validation-error.png - Screenshot: demo-success-toast.png - Console log: No errors ### Verification All acceptance criteria demonstrated. Ready for review.
Automating demos
Manual demos work but don't scale. When an agent can run automated demos, verification becomes consistent and fast.
Automation Approaches
E2E tests as demos
Playwright, Cypress, or Selenium tests that exercise the actual UI. Run them, capture screenshots, report results.
API contract tests
For backend changes, hit the actual endpoints and verify responses. Show the requests and responses as evidence.
Visual regression tests
Capture screenshots before and after. Diff them. Show that the intended change happened and nothing else broke visually.
Recorded sessions
Record a video of the demo. Tools like Playwright can record tests. The video becomes documentation.
Playwright demo script
// demo/empty-email-validation.spec.ts
import { test, expect } from '@playwright/test';
test.describe('Empty Email Validation Demo', () => {
test.beforeEach(async ({ page }) => {
await page.goto('/login');
await page.fill('#email', '[email protected]');
await page.fill('#password', 'testpass');
await page.click('button[type="submit"]');
});
test('shows validation error for empty email', async ({ page }) => {
await page.goto('/profile/edit');
await page.fill('#email', '');
await page.click('button[type="submit"]');
// Capture screenshot as evidence
await page.screenshot({ path: 'demo-validation-error.png' });
await expect(page.locator('#email')).toHaveClass(/border-red/);
await expect(page.locator('.error-message')).toHaveText('Email is required');
});
test('submits successfully with valid email', async ({ page }) => {
await page.goto('/profile/edit');
await page.fill('#email', '[email protected]');
await page.click('button[type="submit"]');
// Capture screenshot as evidence
await page.screenshot({ path: 'demo-success.png' });
await expect(page.locator('.toast')).toHaveText('Profile updated');
await expect(page).toHaveURL('/profile');
});
});Acceptance criteria as code
The best demo steps are acceptance criteria that became tests. When you write specs (see Guide 15), phrase acceptance criteria as testable assertions:
Hard to automate
- • Form should feel responsive
- • Error messages should be helpful
- • The fix should be clean
- • It should work correctly
Easy to automate
- • Form submits in < 200ms
- • Empty email shows "Email is required"
- • No new ESLint warnings introduced
- • /api/profile returns 200 with valid data
When acceptance criteria are specific and measurable, agents can verify them automatically. When they're vague, agents either skip verification or make up their own interpretation.
When to require human sign-off
Not everything can be automated. Some demos require human judgment. Define when human sign-off is needed:
Human Sign-off Triggers
Visual/UX changes
New UI, layout changes, styling updates. Automated tests can check that elements exist; humans judge whether they look right.
Copy/content changes
New error messages, help text, user-facing strings. Automated tests can check presence; humans judge tone and clarity.
Security-sensitive changes
Auth, permissions, data access. Even if tests pass, a human should verify the approach is sound.
Subjective quality
"Does this feel right?" "Is this the right abstraction?"Code review territory — not demo territory.
In your specs, mark which acceptance criteria require human verification:
## Acceptance criteria - [ ] Empty email shows validation error [AUTO] - [ ] Invalid email shows format error [AUTO] - [ ] Error message styling matches design system [HUMAN] - [ ] Success toast appears on save [AUTO] - [ ] Copy is clear and helpful [HUMAN]
Demo artifacts in PRs
Demo evidence should be attached to the PR. This makes review easier and creates a record of what was verified.
PR with demo section
## Summary Fix empty email validation crash on profile page. ## Changes - Added null check in validateEmail() - Added client-side required validation - Added test coverage ## Demo ### Before (reproduction)  - Empty email caused TypeError crash ### After (verification)  - Empty email now shows validation error ### Test run ``` $ npm test -- --grep "email validation" ✓ shows error for empty email (45ms) ✓ shows error for invalid format (38ms) ✓ accepts valid email (52ms) 3 passing (135ms) ``` ### E2E demo ``` $ npx playwright test demo/email-validation.spec.ts ✓ shows validation error for empty email (1.2s) ✓ submits successfully with valid email (0.9s) 2 passed (2.1s) ``` ## Checklist - [x] Reproduction confirmed before fix - [x] Demo shows fix works - [x] Tests added - [x] Screenshots attached
What goes wrong
"Tests pass" as only verification
Agent writes tests for its fix. Tests pass. But the tests are wrong — they test the implementation, not the requirement. The bug is still there.
No evidence
Agent says "I verified it works" but provides no screenshots, logs, or output. Reviewer has no way to confirm without re-doing the demo.
Happy path only
Agent demos the main flow but skips edge cases. The original bug was an edge case. It's still broken.
Demo doesn't match reproduction
Agent reproduced one scenario but demos a different one. The fix works for the demo case but not the original bug. Always demo the exact reproduction steps.
Summary
- →Demo steps prove the fix works — "tests pass" is not enough.
- →Demos should mirror reproduction steps — same scenarios, opposite expected outcome.
- →Automate demos where possible — E2E tests, API tests, visual regression.
- →Attach demo evidence to PRs — screenshots, logs, test output.
Related Guides
Stay updated
Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)
Found this useful? Share it with your team.