09 / 25
CI-Native Agents
Core Questions
- How do agents integrate with pipelines?
- When do they run?
- How are flaky behaviors detected?
If agents aren't in CI, they're not in your workflow. They're just expensive toys making suggestions nobody acts on. CI-native means agents are triggered by your pipeline, run in your infrastructure, and produce results that block or advance your workflow.
When agents run in CI
Different triggers for different purposes:
CI Trigger Points
On PR open/update
Agent reviews code, suggests improvements, checks for issues. Runs alongside tests and linting. Results appear as PR comments.
On PR approval
Final check before merge. Agent verifies nothing was missed. Last gate before code hits main.
On merge to main
Post-merge analysis. Check for integration issues. Update documentation. Trigger downstream tasks.
Scheduled (nightly/weekly)
Periodic health checks. Security scans. Dependency updates. Architecture drift detection.
On issue/task creation
Agent picks up new issues and starts working. Fully automated task intake.
GitHub Actions integration
Most common CI platform. Here's how to wire up agents:
PR review agent workflow
# .github/workflows/agent-review.yml
name: Agent Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get diff
run: |
git diff origin/main...HEAD > diff.txt
- name: Run agent review
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
PR_NUMBER: ${{ github.event.pull_request.number }}
run: |
agent-review \
--diff diff.txt \
--output review.json
- name: Post review
uses: actions/github-script@v7
with:
script: |
const review = require('./review.json');
await github.rest.pulls.createReview({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: process.env.PR_NUMBER,
body: review.summary,
comments: review.comments
});Scheduled analysis workflow
# .github/workflows/nightly-analysis.yml
name: Nightly Analysis
on:
schedule:
- cron: '0 2 * * *' # 2 AM daily
jobs:
analyze:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run security scan
run: agent-security-scan --output security.json
- name: Run architecture check
run: agent-arch-check --output arch.json
- name: Run dependency analysis
run: agent-deps --output deps.json
- name: Create issues for findings
run: |
agent-create-issues \
--security security.json \
--arch arch.json \
--deps deps.jsonBlocking vs non-blocking
Agent results can block merges or just inform. Choose based on confidence:
Blocking
- • Security vulnerabilities found
- • Tests fail
- • Lint errors
- • Type errors
- • Required review not done
High confidence rules. False positives are rare and acceptable.
Non-blocking (advisory)
- • Style suggestions
- • Performance hints
- • Complexity warnings
- • Documentation gaps
- • Potential improvements
Helpful but subjective. Human decides whether to act.
Handling flaky agents
Like flaky tests, agents can produce inconsistent results. Same input, different output. Detect and handle this:
Flakiness detection
CI-triggered task creation
CI can not just run agents but create work for them:
CI → Agent Task Flows
Test failure → fix task
CI detects test failure. Creates issue with failure details. Agent picks up and attempts fix.
Security scan → remediation task
Scan finds vulnerability. Creates prioritized issue. Agent updates dependencies or patches code.
Coverage drop → test task
Coverage decreased in PR. Creates task to add tests for uncovered code.
Stale dependencies → update task
Weekly scan finds outdated packages. Creates update PR automatically.
What goes wrong
CI timeout
Agent takes too long. CI job times out. No result posted. Set appropriate timeouts and have fallback behavior.
Rate limiting
Many PRs trigger many agent runs. API rate limits hit. Runs fail. Batch requests or queue during high volume.
Secret exposure
Agent output includes secrets from environment. Posted to PR comment. Scrub outputs before posting publicly.
Noise overload
Agent comments on every PR with low-value suggestions. Developers start ignoring. Only comment when meaningful.
Summary
- →Trigger agents at PR open, approval, merge, and on schedule.
- →Make high-confidence checks blocking; keep subjective ones advisory.
- →Detect and handle flaky agent results. Use low temperature for consistency.
- →Use CI findings to create tasks for agents automatically.
Related Guides
Stay updated
Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)
Found this useful? Share it with your team.