Reference Example

An Example Agentic Dev Loop

This is a concrete example of what "good" looks like when an agent can take a ticket from intake to merge, with reproducible evidence, scoped identity, and clear human checkpoints. Swap vendors freely. The contracts and artifacts matter more than the brand names.

Read the Guides Start at Task Intake

Workflow Diagram

The loop is linear most of the time. When CI or Bugbot flags issues, the agent iterates back through implementation with bounded retries and escalation rules.

Trigger

A ticket is assigned to an agent identity in Linear (or Jira, GitHub Issues, etc). Assignment is the start signal.

Environment

The agent boots a remote environment on `exe.dev`, pinned to a reproducible toolchain, with an explicit environment identity.

Evidence

The agent reproduces the issue with `browser-use`, captures screenshots, and produces a GIF with `ffmpeg`. If it cannot reproduce, it stops and escalates.

The full run, end to end

Think of this as a status ladder with artifacts attached. The agent is responsible for posting updates and evidence back to the tracker and PR. Humans should not have to guess what is happening.

Step 01

Ticket intake (assigned in Linear)

Issue is assigned to an agent user. The agent immediately acknowledges and links the run ID it will use for audit and traceability.

Artifacts required

Linear comment: acknowledged
Run ID
Links: repo, branch plan

Step 02

Provision identity (Creddy)

The agent requests scoped, short lived credentials for every system it touches. No user impersonation. Credentials are attached to the run identity and expire.

Artifacts required

Cred lease ID
Scopes summary
Expiry timestamp

Step 03

Boot environment (exe.dev)

Agent creates a fresh remote environment, pins toolchain, and records environment identity so humans can reproduce the exact run later.

Artifacts required

Environment URL
Environment identity
Toolchain fingerprint

Step 04

Reproduce (browser-use)

Agent reproduces the bug using scripted, repeatable steps. It collects evidence. If it cannot reproduce, it stops. No fixes without evidence.

Artifacts required

Repro steps
Screenshots
GIF recording
Console logs

Step 05

Open PR stub (GitHub)

Agent opens a draft PR early, links the ticket, and posts the reproduction evidence before implementation. The PR is the workspace for all discussion.

Artifacts required

Draft PR
Spec link
Repro evidence comment

Step 06

Plan with Opus, implement with Sonnet

Agent writes a plan, identifies risk boundaries, then implements in small checkpoints. Each checkpoint keeps the PR reviewable and debuggable.

Artifacts required

Plan comment
Checkpoint commits
Local notes attached to PR

Step 07

Commit, sign, and run CI

Agent authors and commits under its own identity, with signed commits. CI is treated as a gate, not a suggestion.

Artifacts required

Signed commits
CI logs
Test outputs

Step 08

Watchers and responders

Agent keeps a watcher loop for CI failures, Cursor Bugbot feedback, and human review comments. It responds with bounded iterations and escalation rules.

Artifacts required

Bugbot thread links
CI fix iterations
Escalation note if blocked

Step 09

Demo with evidence (browser-use and ffmpeg)

After implementation, the agent runs a demo flow in the real environment. It captures screenshots and records a short GIF. The artifact is attached to the PR before human review begins.

Artifacts required

Demo steps
Screenshots
GIF artifact
PR comment with evidence

Step 10

Human review stays on track

Guards enforce that humans review the right surfaces. Review focuses on high risk logic and contracts, not on linting or formatting.

Artifacts required

Focused review checklist
Highlighted files
Demo steps

Step 11

Merge and teardown

After approval, PR is merged. The remote environment spins down, credentials expire, and the agent posts a final summary and marks the ticket done.

Artifacts required

Merge SHA
Teardown confirmation
Final summary
Ticket closed

Status updates as a contract

The agent should post status changes back to the ticket and PR as it crosses gates. These gates make the run legible to humans, and they reduce hallucinated fixes.

acknowledged
env_ready
reproduced
plan_posted
fix_implemented
tests_added
demo_recorded
ready_for_human
merged
teardown_complete

Guardrails that keep it safe

This loop only works when identity and boundaries are real. The agent is a first class actor, not a human impersonator.

Agent has its own identity everywhere, including GitHub, secret manager, VPN, and internal services.
Credentials are scoped and short lived. No long lived user tokens. Creds expire at teardown.
Evidence before fixes. Reproduction is a gate. If the agent cannot reproduce, it escalates.
Human review focuses on high risk surfaces, not formatting or static analysis output.

Read: Identity, Secrets and Trust