All Guides

09 / 25

CI-Native Agents

Core Questions

  • How do agents integrate with pipelines?
  • When do they run?
  • How are flaky behaviors detected?

If agents aren't in CI, they're not in your workflow. They're just expensive toys making suggestions nobody acts on. CI-native means agents are triggered by your pipeline, run in your infrastructure, and produce results that block or advance your workflow.

When agents run in CI

Different triggers for different purposes:

CI Trigger Points

On PR open/update

Agent reviews code, suggests improvements, checks for issues. Runs alongside tests and linting. Results appear as PR comments.

On PR approval

Final check before merge. Agent verifies nothing was missed. Last gate before code hits main.

On merge to main

Post-merge analysis. Check for integration issues. Update documentation. Trigger downstream tasks.

Scheduled (nightly/weekly)

Periodic health checks. Security scans. Dependency updates. Architecture drift detection.

On issue/task creation

Agent picks up new issues and starts working. Fully automated task intake.

GitHub Actions integration

Most common CI platform. Here's how to wire up agents:

PR review agent workflow

# .github/workflows/agent-review.yml
name: Agent Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
          
      - name: Get diff
        run: |
          git diff origin/main...HEAD > diff.txt
          
      - name: Run agent review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
        run: |
          agent-review \
            --diff diff.txt \
            --output review.json
            
      - name: Post review
        uses: actions/github-script@v7
        with:
          script: |
            const review = require('./review.json');
            await github.rest.pulls.createReview({
              owner: context.repo.owner,
              repo: context.repo.repo,
              pull_number: process.env.PR_NUMBER,
              body: review.summary,
              comments: review.comments
            });

Scheduled analysis workflow

# .github/workflows/nightly-analysis.yml
name: Nightly Analysis
on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM daily

jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run security scan
        run: agent-security-scan --output security.json
        
      - name: Run architecture check
        run: agent-arch-check --output arch.json
        
      - name: Run dependency analysis
        run: agent-deps --output deps.json
        
      - name: Create issues for findings
        run: |
          agent-create-issues \
            --security security.json \
            --arch arch.json \
            --deps deps.json

Blocking vs non-blocking

Agent results can block merges or just inform. Choose based on confidence:

Blocking

  • • Security vulnerabilities found
  • • Tests fail
  • • Lint errors
  • • Type errors
  • • Required review not done

High confidence rules. False positives are rare and acceptable.

Non-blocking (advisory)

  • • Style suggestions
  • • Performance hints
  • • Complexity warnings
  • • Documentation gaps
  • • Potential improvements

Helpful but subjective. Human decides whether to act.

Handling flaky agents

Like flaky tests, agents can produce inconsistent results. Same input, different output. Detect and handle this:

Flakiness detection

Run twice: For critical checks, run the agent twice. If results differ significantly, flag for human review.
Track consistency: Log whether reruns produce same results. High variance = flaky agent.
Temperature 0: For deterministic checks, use temperature 0. Reduces variance.
Seed consistency: If model supports it, use consistent seeds for reproducibility.

CI-triggered task creation

CI can not just run agents but create work for them:

CI → Agent Task Flows

Test failure → fix task

CI detects test failure. Creates issue with failure details. Agent picks up and attempts fix.

Security scan → remediation task

Scan finds vulnerability. Creates prioritized issue. Agent updates dependencies or patches code.

Coverage drop → test task

Coverage decreased in PR. Creates task to add tests for uncovered code.

Stale dependencies → update task

Weekly scan finds outdated packages. Creates update PR automatically.

What goes wrong

CI timeout

Agent takes too long. CI job times out. No result posted. Set appropriate timeouts and have fallback behavior.

Rate limiting

Many PRs trigger many agent runs. API rate limits hit. Runs fail. Batch requests or queue during high volume.

Secret exposure

Agent output includes secrets from environment. Posted to PR comment. Scrub outputs before posting publicly.

Noise overload

Agent comments on every PR with low-value suggestions. Developers start ignoring. Only comment when meaningful.

Summary

  • Trigger agents at PR open, approval, merge, and on schedule.
  • Make high-confidence checks blocking; keep subjective ones advisory.
  • Detect and handle flaky agent results. Use low temperature for consistency.
  • Use CI findings to create tasks for agents automatically.

Stay updated

Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)

Found this useful? Share it with your team.