Reference Example
An Example Agentic Dev Loop
This is a concrete example of what "good" looks like when an agent can take a ticket from intake to merge, with reproducible evidence, scoped identity, and clear human checkpoints. Swap vendors freely. The contracts and artifacts matter more than the brand names.
Workflow Diagram
The loop is linear most of the time. When CI or Bugbot flags issues, the agent iterates back through implementation with bounded retries and escalation rules.
Trigger
A ticket is assigned to an agent identity in Linear (or Jira, GitHub Issues, etc). Assignment is the start signal.
Environment
The agent boots a remote environment on `exe.dev`, pinned to a reproducible toolchain, with an explicit environment identity.
Evidence
The agent reproduces the issue with `browser-use`, captures screenshots, and produces a GIF with `ffmpeg`. If it cannot reproduce, it stops and escalates.
The full run, end to end
Think of this as a status ladder with artifacts attached. The agent is responsible for posting updates and evidence back to the tracker and PR. Humans should not have to guess what is happening.
Step 01
Ticket intake (assigned in Linear)
Issue is assigned to an agent user. The agent immediately acknowledges and links the run ID it will use for audit and traceability.
- Linear comment: acknowledged
- Run ID
- Links: repo, branch plan
Step 02
Provision identity (Creddy)
The agent requests scoped, short lived credentials for every system it touches. No user impersonation. Credentials are attached to the run identity and expire.
- Cred lease ID
- Scopes summary
- Expiry timestamp
Step 03
Boot environment (exe.dev)
Agent creates a fresh remote environment, pins toolchain, and records environment identity so humans can reproduce the exact run later.
- Environment URL
- Environment identity
- Toolchain fingerprint
Step 04
Reproduce (browser-use)
Agent reproduces the bug using scripted, repeatable steps. It collects evidence. If it cannot reproduce, it stops. No fixes without evidence.
- Repro steps
- Screenshots
- GIF recording
- Console logs
Step 05
Open PR stub (GitHub)
Agent opens a draft PR early, links the ticket, and posts the reproduction evidence before implementation. The PR is the workspace for all discussion.
- Draft PR
- Spec link
- Repro evidence comment
Step 06
Plan with Opus, implement with Sonnet
Agent writes a plan, identifies risk boundaries, then implements in small checkpoints. Each checkpoint keeps the PR reviewable and debuggable.
- Plan comment
- Checkpoint commits
- Local notes attached to PR
Step 07
Commit, sign, and run CI
Agent authors and commits under its own identity, with signed commits. CI is treated as a gate, not a suggestion.
- Signed commits
- CI logs
- Test outputs
Step 08
Watchers and responders
Agent keeps a watcher loop for CI failures, Cursor Bugbot feedback, and human review comments. It responds with bounded iterations and escalation rules.
- Bugbot thread links
- CI fix iterations
- Escalation note if blocked
Step 09
Demo with evidence (browser-use and ffmpeg)
After implementation, the agent runs a demo flow in the real environment. It captures screenshots and records a short GIF. The artifact is attached to the PR before human review begins.
- Demo steps
- Screenshots
- GIF artifact
- PR comment with evidence
Step 10
Human review stays on track
Guards enforce that humans review the right surfaces. Review focuses on high risk logic and contracts, not on linting or formatting.
- Focused review checklist
- Highlighted files
- Demo steps
Step 11
Merge and teardown
After approval, PR is merged. The remote environment spins down, credentials expire, and the agent posts a final summary and marks the ticket done.
- Merge SHA
- Teardown confirmation
- Final summary
- Ticket closed
Status updates as a contract
The agent should post status changes back to the ticket and PR as it crosses gates. These gates make the run legible to humans, and they reduce hallucinated fixes.
acknowledged env_ready reproduced plan_posted fix_implemented tests_added demo_recorded ready_for_human merged teardown_complete
Guardrails that keep it safe
This loop only works when identity and boundaries are real. The agent is a first class actor, not a human impersonator.
- Agent has its own identity everywhere, including GitHub, secret manager, VPN, and internal services.
- Credentials are scoped and short lived. No long lived user tokens. Creds expire at teardown.
- Evidence before fixes. Reproduction is a gate. If the agent cannot reproduce, it escalates.
- Human review focuses on high risk surfaces, not formatting or static analysis output.