24 / 25
Building with the Claude Agent SDK
Core Questions
- How do you use the Claude Agent SDK to wire up agentic workflows?
- What are the key patterns, gotchas, and practical examples?
- How does the SDK fit into the rest of your agentic infrastructure?
The Claude Agent SDK provides the primitives for building agentic systems: tool use, multi-turn conversations, structured outputs, and computer use. But the SDK is just the wire. The real work is designing what's on either end — the tools you expose, the constraints you enforce, and the feedback loops you build.
SDK fundamentals
The SDK handles the conversation with Claude. You provide context, tools, and constraints. Claude reasons, calls tools, and produces outputs. Your code orchestrates the loop.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
// Basic agentic loop
async function runAgent(task: string) {
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: task }
];
while (true) {
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
system: "You are a coding assistant. Use tools to complete tasks.",
tools: getTools(),
messages,
});
// Add assistant response to history
messages.push({ role: "assistant", content: response.content });
// Check if we're done
if (response.stop_reason === "end_turn") {
return extractResult(response);
}
// Process tool calls
if (response.stop_reason === "tool_use") {
const toolResults = await processToolCalls(response.content);
messages.push({ role: "user", content: toolResults });
}
}
}Core Loop Components
Messages array
Accumulates conversation history. Each turn adds assistant response and tool results. Context grows until task completes.
Tool definitions
JSON schema describing available tools. Claude decides which to call based on the task. You implement the actual tool logic.
Stop conditions
end_turn means Claude is done. tool_use means process tools and continue. max_tokens means truncated.
Tool results
Execute tools, return results as user message. Claude sees what happened and decides next step.
Designing effective tools
Tools are how Claude interacts with the world. Well-designed tools are focused, predictable, and give useful feedback. Poorly designed tools lead to confusion and errors.
// Good tool design
const tools: Anthropic.Tool[] = [
{
name: "read_file",
description: "Read the contents of a file. Returns the file content as a string.",
input_schema: {
type: "object",
properties: {
path: {
type: "string",
description: "Path to the file, relative to project root"
}
},
required: ["path"]
}
},
{
name: "write_file",
description: "Write content to a file. Creates the file if it doesn't exist, overwrites if it does.",
input_schema: {
type: "object",
properties: {
path: {
type: "string",
description: "Path to the file, relative to project root"
},
content: {
type: "string",
description: "Content to write to the file"
}
},
required: ["path", "content"]
}
},
{
name: "run_command",
description: "Execute a shell command. Returns stdout, stderr, and exit code.",
input_schema: {
type: "object",
properties: {
command: {
type: "string",
description: "The command to execute"
},
cwd: {
type: "string",
description: "Working directory (optional, defaults to project root)"
}
},
required: ["command"]
}
}
];Tool Design Principles
Single responsibility
One tool, one action. read_file reads. write_file writes. Don't combine into read_or_write_file.
Clear descriptions
Claude uses descriptions to decide when to call tools. Be specific about what the tool does and what it returns.
Predictable outputs
Return structured data when possible. JSON over free text. Include success/failure status. Don't surprise the model.
Useful errors
When tools fail, explain why. "File not found: path/to/file.ts" is actionable. "Error" is not.
Processing tool calls
Claude returns tool calls as structured content. Your code executes them and returns results in the expected format.
async function processToolCalls(
content: Anthropic.ContentBlock[]
): Promise<Anthropic.ToolResultBlockParam[]> {
const results: Anthropic.ToolResultBlockParam[] = [];
for (const block of content) {
if (block.type !== "tool_use") continue;
const { id, name, input } = block;
try {
const output = await executeToolSafe(name, input);
results.push({
type: "tool_result",
tool_use_id: id,
content: JSON.stringify(output),
});
} catch (error) {
results.push({
type: "tool_result",
tool_use_id: id,
content: `Error: ${error.message}`,
is_error: true,
});
}
}
return results;
}
async function executeToolSafe(name: string, input: unknown) {
// Validate and sanitize inputs before execution
switch (name) {
case "read_file":
return await readFileSafe(input.path);
case "write_file":
return await writeFileSafe(input.path, input.content);
case "run_command":
return await runCommandSafe(input.command, input.cwd);
default:
throw new Error(`Unknown tool: ${name}`);
}
}System prompts for agents
The system prompt defines agent behavior. It sets the role, constraints, and working style. A good system prompt prevents common mistakes before they happen.
const systemPrompt = `You are a senior software engineer working on a TypeScript codebase.
## Your Role
- Implement features and fix bugs based on task descriptions
- Write clean, tested, production-ready code
- Follow existing patterns in the codebase
## Working Style
1. Read relevant files first to understand context
2. Plan your approach before writing code
3. Make incremental changes, testing as you go
4. Commit logical units of work
## Constraints
- Do not modify files outside the task scope
- Do not add new dependencies without explicit approval
- Do not delete tests unless the tested code is removed
- Ask for clarification if requirements are ambiguous
## Code Standards
- Use TypeScript strict mode
- Follow existing naming conventions
- Write tests for new functionality
- Keep functions under 50 lines
## When Stuck
If you cannot complete a task:
1. Explain what you tried
2. Explain what blocked you
3. Suggest alternative approaches
`;Principle
Constraints in the system prompt are suggestions, not enforcement
Claude tries to follow system prompt constraints, but it's not guaranteed. Critical constraints need enforcement in code — input validation, tool restrictions, output filtering.
Error handling and retries
Agents fail. API rate limits, tool errors, model mistakes. Robust error handling is essential for production agent systems.
async function runAgentWithRetry(
task: string,
maxRetries = 3,
maxTurns = 20
) {
let lastError: Error | null = null;
for (let retry = 0; retry < maxRetries; retry++) {
try {
return await runAgentLoop(task, maxTurns);
} catch (error) {
lastError = error;
if (error instanceof Anthropic.RateLimitError) {
// Back off and retry
await sleep(Math.pow(2, retry) * 1000);
continue;
}
if (error instanceof Anthropic.APIError && error.status >= 500) {
// Server error, retry
await sleep(1000);
continue;
}
// Non-retryable error
throw error;
}
}
throw new Error(`Failed after ${maxRetries} retries: ${lastError?.message}`);
}
async function runAgentLoop(task: string, maxTurns: number) {
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: task }
];
for (let turn = 0; turn < maxTurns; turn++) {
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
system: systemPrompt,
tools: getTools(),
messages,
});
messages.push({ role: "assistant", content: response.content });
if (response.stop_reason === "end_turn") {
return extractResult(response);
}
if (response.stop_reason === "tool_use") {
const results = await processToolCalls(response.content);
messages.push({ role: "user", content: results });
}
}
throw new Error("Max turns exceeded");
}Structured outputs
When you need specific output formats, use structured generation. Define a schema and Claude will conform to it.
// Using tool_choice to force structured output
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: "Analyze this code for bugs..." }],
tools: [{
name: "report_analysis",
description: "Report the code analysis results",
input_schema: {
type: "object",
properties: {
bugs: {
type: "array",
items: {
type: "object",
properties: {
file: { type: "string" },
line: { type: "number" },
severity: { enum: ["low", "medium", "high", "critical"] },
description: { type: "string" },
suggestion: { type: "string" }
},
required: ["file", "line", "severity", "description"]
}
},
summary: { type: "string" },
overall_quality: { enum: ["good", "needs_work", "critical_issues"] }
},
required: ["bugs", "summary", "overall_quality"]
}
}],
tool_choice: { type: "tool", name: "report_analysis" }
});
// Extract structured result
const toolUse = response.content.find(b => b.type === "tool_use");
const analysis = toolUse.input; // Typed to schemaContext management
As conversations grow, you'll hit context limits. Manage context proactively — summarize, truncate, or reset when needed.
Context Management Strategies
Token counting
Track token usage. When approaching limits, take action before hitting them. Use tiktoken or similar for accurate counts.
Sliding window
Keep recent messages, drop old ones. Maintains recency at cost of long-term context. Good for long-running tasks.
Summarization
Periodically summarize conversation history into a compact form. Replace old messages with summary. Preserves key decisions.
Checkpoint and reset
Save state externally, start fresh conversation with state summary. Useful for very long workflows.
Integration with CI/CD
Agents that run in CI need to fit the pipeline model: triggered by events, run to completion, produce artifacts, exit with status codes.
# .github/workflows/agent-task.yml
name: Agent Task
on:
issues:
types: [labeled]
jobs:
run-agent:
if: github.event.label.name == 'agent-task'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Run agent
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ISSUE_BODY: ${{ github.event.issue.body }}
run: |
npm run agent:execute
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: agent-output
path: artifacts/
- name: Create PR if changes
if: success()
run: |
git config user.name "Agent Bot"
git config user.email "[email protected]"
git checkout -b agent/issue-${{ github.event.issue.number }}
git add -A
git commit -m "Agent: ${{ github.event.issue.title }}"
git push origin agent/issue-${{ github.event.issue.number }}
gh pr create --fillWhat goes wrong
Infinite loops
Agent keeps calling tools without making progress. Token costs explode. Task never completes.
Fix: Set max turns. Detect repetitive tool calls. Add convergence detection — if output isn't changing, stop.
Tool misuse
Agent calls wrong tool or passes invalid arguments. Tool fails or does unexpected things.
Fix: Better tool descriptions. Input validation in tool implementation. Return helpful errors so agent can self-correct.
Context window exhaustion
Long task fills context. Model starts forgetting early instructions or files it read. Output quality degrades.
Fix: Monitor token usage. Implement context management. For long tasks, checkpoint and restart with summarized state.
Ignoring constraints
System prompt says don't modify auth code. Agent modifies auth code anyway. Constraint was suggestion, not enforcement.
Fix: Enforce constraints in tool code, not just prompts. Validate tool inputs against allowlists. Reject operations that violate rules.
Summary
- •The SDK provides the agentic loop — messages, tools, stop conditions. You provide the logic.
- •Design tools with single responsibility, clear descriptions, and useful error messages
- •System prompts guide behavior but don't enforce it — enforce critical constraints in code
- •Handle errors gracefully: retries for transient failures, useful messages for permanent ones
- •Manage context proactively — summarize, truncate, or checkpoint before hitting limits
Related guides
Stay updated
Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)
Found this useful? Share it with your team.