05 / 25
Task Routing & Orchestration
Core Questions
- Which agent handles which task?
- How is concurrency controlled?
- How are failures retried?
This guide isn't about building a fancy router. It's about the real problem platform teams face: how does agent work start, how do you keep it from growing unbounded, and how do humans stay in the loop via the bug tracker.
How an agent starts (recommended: assignment)
The easiest way to make agent work legible is to use the system you already use to track work: GitHub issues, Jira, Linear, etc. The trigger should be explicit and visible to humans.
A popular and growing pattern is to create an agent identity in your tracker and treat it like a teammate: when an issue is assigned to that agent, the agent starts.
Trigger options (in practice)
- Assignment (recommended): assign the issue to an agent user. Agent acknowledges and starts.
- Slash commands: a comment like
/agent fixstarts work on demand. - Chat triggers (Slack, etc): convenient for small “prompt tasks” and triage, but easy to lose context and hard to make reproducible.
- Labels/workflow state: applying
agent:runor moving to “Ready for agent”. - Scheduled/batch: nightly triage, flaky test sweeps, backlog cleanup.
Default recommendation: start with assignment. It makes the “why did an agent start?” question answerable without new tooling.
Task types: prompt vs spec
Not every task should start from a chat prompt. Agents are great at deeper work, but deeper work needs durable input: acceptance criteria, constraints, and links. That usually means a markdown spec attached to an issue/ticket.
Prompt tasks (chat is fine)
- “Investigate this error log”
- “Summarize this PR”
- “Triage which test is flaky”
- “Generate a quick patch idea”
Spec tasks (tracker is better)
- Multi-file refactors
- New features
- Large bug fixes with edge cases
- Migrations and “touch prod-adjacent code”
Rule of thumb: if a human would write a 1-2 page markdown design note, the agent needs one too. Assignment + a spec beats a Slack thread.
Status updates in the bug tracker
If assignment is your trigger, the tracker becomes your UI. The agent should post status when it starts, when it hits key phases, and when it needs help. Otherwise humans can't tell progress from silence.
Treat status updates as part of the platform contract. A task that doesn't report status is a task you can't trust.
Recommended status events
Acknowledged
Agent posts within minutes of assignment: task accepted, and what it will do next.
Environment ready
Runtime/toolchain is up; agent can run tests and commands. If it can't, it should say so explicitly.
Reproduced (when applicable)
For bugs: confirm the bug exists and attach evidence (logs, failing test, screenshot). If it can’t reproduce, it should stop and ask.
Plan posted
Short plan, assumptions, and what signals will be used to verify (tests, screenshots, logs).
Progress checkpoints
When it finishes a major step: reproduced, fixed, tests passing, PR opened.
Needs help / blocked
If stuck: what failed, what it tried, and the smallest question a human can answer to unblock it.
Example: start comment template
[agent] Starting work - Task: investigate + fix - Repo/branch: <repo>@<branch> - Commit: <sha> (if pinned) - Environment: <nix flake.lock hash | image digest> - Next: reproduce issue + add a failing test (if applicable) - Updates: I will post here at each checkpoint and when blocked
Lifecycle checkpoints (what “progress” looks like)
Agents should report progress when they cross meaningful boundaries. This keeps humans oriented and prevents “silent failure.”
- Reproduced: confirmed the bug and captured evidence
- Fix in progress: approach selected and constraints noted
- Verified: tests pass, repro no longer occurs, artifacts saved
- PR opened: link + summary + what to review
- Blocked: smallest human decision needed
When the pool is full
With fixed concurrency, an assignment doesn’t always mean “starting now.” That’s fine, but it must be visible. On assignment, either start immediately or acknowledge that the issue is queued.
[agent] Queued - Reason: agent pool is full - Queue: <global|team|repo> - Position: <n> (optional) - ETA: <estimate> (optional) - I will post again when I actually start
Concurrency: start fixed, not unbounded
The temptation is to let agents scale without limit: every new assignment starts immediately. That feels fast, until it becomes chaos: cost spikes, conflicts multiply, and human review turns into a bottleneck.
Default recommendation: start with a fixed-size pool of concurrent agent tasks. Treat it like a queue of work items, not an unlimited swarm.
Concurrency caps (practical defaults)
Global pool size
Max tasks running across the org. Everything else waits, even if it's assigned.
Per-repo cap
Avoid multiple tasks contending for the same codebase and reviewers.
Per-area / per-file cap
Prevent two agents from editing the same surface area simultaneously.
Per-team quota (optional)
One team can’t starve everyone else. Useful once adoption spreads.
When the pool is full, you need admission control: which assigned issue starts next? FIFO is fine at first. As you scale, add priority (P0s jump the line) and aging (nothing starves forever).
Failure modes and escalation
Agents fail in different ways than CI. Some failures are transient. Some are environmental. Some are “I am stuck in a loop and need a human decision.” Your system should push those states back to the tracker instead of silently retrying forever.
Failure classification (and what to do)
Loop detection (the common failure mode)
When an agent gets stuck, it often looks like “make a change, run tests, fail, try again” without new information. Don’t let it burn the whole pool. Set a threshold and escalate.
- Max failed attempts on the same command/test
- Max wall-clock time without producing new artifacts (tests, logs, PR)
- Max cost / tokens / tool calls for a single task
Budgets and stop conditions
Fixed concurrency controls surprise spend. Budgets prevent runaway tasks inside that fixed pool. The key is to enforce budgets and report outcomes back to the tracker.
Budget levels (enforce, don’t suggest)
Per-task budget
Max cost/time per task. On exceed: stop, post status, and request human input.
Hourly budget
Max spend per hour. When exceeded: pause starts; leave assigned issues queued.
Daily budget
Hard daily cap. When reached: stop starting tasks and post a global status/alert.
Per-repo/team budget
Allocate budgets to teams. Prevent one repo or team from consuming the whole pool.
One good default: if the agent can’t get to “tests running” quickly, it should stop and ask for help. Environment problems aren’t fixed by more retries.
Pair budgets with permission guardrails: agents can open PRs and propose changes, but protected branches and merges remain human-owned. See Identity, Secrets & Trust Boundaries for why agent identity and scoped credentials matter here.
Minimum observability
You don’t need a complex system to start. You do need to be able to answer: “what is running, what is stuck, and what is costing us.”
Signals to track
Pool utilization
How often you’re saturated; whether the fixed pool is too small or too large.
Queue wait time
Time from assignment to start (especially for high priority issues).
Completion and escalation rates
What fraction completes cleanly vs gets blocked vs loops.
Cost per task
By repo/type. Use this to tune budgets and pool size.
What goes wrong
Unbounded starts
Every assignment starts immediately. Costs spike, tasks conflict, and humans can’t review the output fast enough. Start with a fixed pool.
Silent agent
The issue is assigned, but no status arrives. Nobody knows if it started, failed, or is stuck. Make status updates mandatory.
Looping on environment failures
The dev environment is broken and the agent keeps retrying installs/tests. It burns budget and blocks the pool. Escalate early with a minimal ask.
Review bottleneck
Agents produce PRs faster than humans can review. Throughput becomes limited by approvals, not compute. That’s normal: adjust pool size and priorities to match human capacity.
Summary
- →Default trigger: assign issues to an agent identity. Make agent work visible in the tracker.
- →Require status updates at key phases: acknowledged, environment ready, plan, checkpoints, blocked.
- →Start with a fixed-size concurrency pool. Add priorities and quotas as adoption grows.
- →Design for failure modes: transient retries, environment/infra blocks, and loop detection with escalation.
Related Guides
Stay updated
Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)
Found this useful? Share it with your team.