How to Create an AI Agent with Claude

Foundations

What Is an AI Agent?

An AI agent is a large language model (LLM) placed inside a loop — one that can use tools, read from and write to memory, and make autonomous decisions about what to do next to accomplish a goal.

If you’ve used Claude Code, you’ve seen what an AI agent can actually do: read files, run commands, edit code, figure out the steps to accomplish a task — it doesn’t just help you write code, it takes ownership of problems and works through them the way a thoughtful engineer would.

— Nader Dabit, Claude Agent SDK Guide

The key insight that separates an AI agent from a simple chatbot is autonomy over multi-step tasks. A chatbot responds to a message; an agent accepts a goal and executes a sequence of actions — using tools, inspecting results, and deciding what to do next — until that goal is reached or it needs human input.

A useful working definition: an AI agent is an LLM in a loop that can use tools, read from and write to memory, and make decisions about what to do next. That definition is small enough to actually build in an afternoon — and powerful enough to automate real work.

The Three Core Properties of Any Agent

🔁

Looping Execution

The agent continues calling the model and executing tools iteratively until a task is complete — not just a single prompt-response exchange.

🔧

Tool Access

An LLM without tools can only produce text. With tools, it can read files, call APIs, query databases, run code, and trigger real-world effects.

🧠

Memory & State

Agents maintain context across steps — either in the conversation history, external storage, or a scratchpad — so they remember what they’ve done.

Agents vs. Chatbots: A Clear Comparison

Aspect	Chatbot	AI Agent
Interaction model	Single-turn Q&A	Multi-step autonomous task execution
Tools	None (text only)	Files, APIs, code execution, web search
State	Stateless or short context	Persistent memory and context management
Goal handling	Answers a question	Decomposes and executes a complex goal
Human involvement	Per turn	Only when needed (approval gates, errors)
Duration	Seconds	Minutes to hours on complex tasks

The Agent Loop (Conceptually)

Fig 1.1 — The Core Agent Execution Loop

💬

User Goal

Natural language task

→

🧠

LLM Reason

Decide next action

→

🔧

Execute Tool

Call API / run code

→

📊

Observe Result

Parse tool output

→

✅

Done?

Return or loop back

Leaderboard · 728 x 90

Top banner placement

Platform Choice

Why Build with Claude?

Claude by Anthropic is the leading model for agentic tasks in 2026. Claude Sonnet 4.6 is the world’s best coding model, and the Claude Opus family delivers frontier-level reasoning — both are purpose-built for long-horizon, multi-step work.

61.4%

OSWorld Score

30+

Hours Sustained Focus

SWE-bench Verified

ASL-3

Safety Safeguards

Claude Sonnet 4.6 has been observed maintaining focus for over 30 hours on complex, multi-step tasks, making it the current best choice for production agents that need sustained execution. On the OSWorld leaderboard — the benchmark for computer-use agents — it achieved 61.4%, up from 42.2% just four months earlier.

Claude’s Agent-Specific Advantages

🛡️

Safety & Alignment

The most aligned frontier model to date. Reduced sycophancy, deception, and power-seeking. Strengthened prompt injection defenses.

⚡

Prompt Caching

Automatic caching of repeated context reduces cost and latency in long agent runs dramatically — critical for production viability.

🔌

MCP Ecosystem

Claude is the primary driver of the Model Context Protocol standard — with official TypeScript SDKs and a rapidly growing tool ecosystem.

📦

Multiple SDKs

Python, TypeScript, Java, Go, C#, Ruby, PHP — official SDKs for every major language with shared patterns and concepts.

☁️

Managed Execution

Anthropic’s Managed Agents platform handles environment provisioning, sandboxing, and session lifecycle — no infrastructure setup needed.

🌐

Multi-Platform

Route through Amazon Bedrock, Google Vertex AI, or Anthropic directly. Flexible deployment across cloud providers.

Available Claude Models for Agents (2026)

claude-opus-4-8 — Frontier reasoning, best for complex multi-step analysis. Highest capability.
claude-sonnet-4-6 — Best for most agents. #1 on SWE-bench. Best speed/capability ratio.
claude-haiku-4-5 — Fastest, lowest cost. Best for high-volume, simpler agentic subtasks.

Medium Rectangle · 300 x 250

In-content

Vocabulary

Core Concepts & Terminology

Before writing a single line of code, it’s essential to understand the vocabulary of AI agents. These terms map directly to API concepts and architectural decisions you’ll make throughout your build.

Term	Definition	In Practice
Agent	A configuration defining a model, system prompt, and available tools	Created once, reused across many sessions
Session	A running instance of an agent executing a specific task	Like a job run — has a start, work, and end
Tool / Function	An external capability the LLM can invoke by name with parameters	Web search, file read, API call, calculator
Tool Use	The mechanism by which Claude requests to call a tool	Claude returns a `tool_use` stop reason
ReAct Loop	Reasoning + Acting: the iterative cycle of thought → action → observation	The foundational pattern for all agents
System Prompt	Instructions defining the agent’s role, constraints, and tool guidance	The agent’s job description — shapes all behaviour
Context Window	The total tokens (input + output) the model can process at once	Determines how much history an agent can hold
Compaction	Automatic summarisation of old context when nearing the window limit	Enables long-running agents without overflow
MCP	Model Context Protocol — standard for exposing tools to AI models	Re-usable tool servers (Slack, Postgres, GitHub)
Environment	The sandboxed compute container where an agent session runs	Cloud sandbox with bash, Python, Node.js, internet
Subagent	An agent spawned by another agent to handle a subtask in parallel	Parallelise research, code review, data extraction
Permission Mode	Controls what actions require human approval vs. auto-execute	bypassPermissions, requireApproval, etc.

The Agent Architecture Stack

Fig 3.1 — Claude Agent Architecture Layers

Managed Agents Platform (Anthropic Cloud)

Agent SDK + Tool Loop

Claude API + Tool Use

Claude
LLM

LLM → API → SDK → Managed Platform. Each layer adds more agent infrastructure.

Billboard · 970 x 250

Post-architecture

Getting Started

Prerequisites & Environment Setup

Before writing any agent code, you need an Anthropic account, an API key, the right language runtime, and the SDK installed. This section walks through the full environment setup.

What You Need

Prerequisites Checklist

Anthropic Console Account — Sign up at console.anthropic.com
API Key — Create at console.anthropic.com/settings/keys. Store securely.
Python 3.10+ OR Node.js 18+ — Pick your preferred language
NPM — Required to install Claude Code CLI (even for Python users)
Terminal access — macOS Terminal, Linux shell, or Windows Terminal with WSL

Step 1: Install the Claude Code CLI

The Claude Code CLI is the runtime that powers the Agent SDK. It must be installed regardless of whether you use Python or TypeScript.

Shell — macOS / Linux

# Option A: NPM (cross-platform)
npm install -g @anthropic-ai/claude-code

# Option B: curl installer (Linux / macOS)
curl -fsSL https://claude.ai/install.sh | bash

# Option C: Homebrew (macOS)
brew install anthropic/tap/claude-code

# Verify installation
claude --version
claude doctor

PowerShell — Windows

# Download and run the installer
irm https://claude.ai/install.ps1 | iex

# Then add to PATH: C:Users<user>.localbin
# Restart PowerShell, then verify:
claude --version

Step 2: Set Your API Key

Shell

# Set for current session
export ANTHROPIC_API_KEY="sk-ant-your-key-here"

# Persist in shell profile (bash/zsh)
echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.zshrc

# OR use a .env file (recommended for projects)
echo "ANTHROPIC_API_KEY=sk-ant-your-key-here" > .env

Step 3: Install the SDK

Python

# Install the Anthropic Python SDK
pip install anthropic python-dotenv

# Install the Claude Agent SDK (for agent loop features)
pip install claude-code-sdk

TypeScript / Node.js

# Create a new project
mkdir my-claude-agent && cd my-claude-agent
npm init -y

# Install SDKs and TypeScript tooling
npm install @anthropic-ai/sdk @anthropic-ai/claude-agent-sdk
npm install -D typescript @types/node tsx

Quick Sanity Check

Before building anything complex, run a one-liner to confirm your credentials work. In Python: python -c "import anthropic; c=anthropic.Anthropic(); r=c.messages.create(model='claude-sonnet-4-6', max_tokens=20, messages=[{'role':'user','content':'Say hello'}]); print(r.content[0].text)"

Medium Rectangle · 300 x 250

After setup

API Fundamentals

Claude API Basics

All Claude agents ultimately communicate through the Messages API. Understanding this foundation — models, roles, tokens, stop reasons — is essential before adding the complexity of tool use.

Your First API Call

Python — Basic API Call

import os
import anthropic
from dotenv import load_dotenv

load_dotenv()
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

def ask_claude(prompt: str) -> str:
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    return message.content[0].text

# Quick test
print(ask_claude("What is 2 + 2? Answer in one word."))

Understanding the Message Structure

The Messages API uses a simple role-based conversation format. Every request is a list of messages with role (either "user" or "assistant") and content.

Python — Multi-Turn Conversation

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system="You are a helpful coding assistant.",  # System prompt
    messages=[
        {"role": "user", "content": "Write a Python function to reverse a string"},
        {"role": "assistant", "content": "def reverse_string(s): return s[::-1]"},
        {"role": "user", "content": "Add input validation"}
    ]
)

Key API Parameters

Parameter	Type	Description
model	string	Which Claude model to use. e.g. `claude-sonnet-4-6`
max_tokens	integer	Max output tokens. Required. Set to 4096+ for agents.
messages	array	Conversation history. Alternating user/assistant turns.
system	string	System prompt — the agent’s job description and constraints.
tools	array	List of tool schemas the model can invoke.
temperature	float 0–1	Randomness. Lower = more deterministic. Default is 1.
stop_sequences	array	Strings that stop generation when encountered.

Stop Reasons — Critical for Agents

The stop_reason field in the response tells you why Claude stopped generating. For agents, there are two critical values:

Stop Reasons

end_turn — Claude finished its response naturally. Extract the text and present it as the final answer.
tool_use — Claude wants to call a tool. Extract the tool call details, execute the tool, and send the result back in the next message.
max_tokens — Hit the token limit. Increase max_tokens or use compaction.
stop_sequence — A custom stop sequence was hit.

Core Capability

Tool Use & Function Calling

Tools are the agent’s hands. Without them, Claude can only produce text. With tools, it can query databases, call APIs, execute code, search the web, and create files. Defining tools correctly is the most important skill in agent development.

How Tool Use Works

Tools are defined as JSON Schema objects. You pass them in the tools parameter of the API request. Claude reads these schemas at inference time and decides whether to call them. The flow is always:

You define tools with a name, description, and input schema in your API request
Claude decides to call a tool and returns stop_reason: "tool_use" plus a tool_use block with the tool name and inputs
You execute the tool in your own code and get a result
You send the result back as a tool_result message
Claude reads the result and either calls another tool or returns a final answer

Defining a Tool

Python — Tool Definition Schema

tools = [
    {
        "name": "calculator",
        "description": "Performs basic arithmetic. Use this for any math operations.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Math expression to evaluate, e.g. '15 * 24 + 100'"
                }
            },
            "required": ["expression"]
        }
    },
    {
        "name": "web_search",
        "description": "Searches the web for current information on a topic.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The search query"}
            },
            "required": ["query"]
        }
    },
    {
        "name": "save_to_file",
        "description": "Saves text content to a local file.",
        "input_schema": {
            "type": "object",
            "properties": {
                "filename": {"type": "string"},
                "content": {"type": "string"}
            },
            "required": ["filename", "content"]
        }
    }
]

Implementing the Tool Functions

Python — Tool Implementation & Dispatcher

import math

def calculator(expression: str) -> str:
    try:
        allowed = {k: v for k, v in math.__dict__.items()
                   if not k.startswith("__")}
        result = eval(expression, {"__builtins__": {}}, allowed)
        return f"Result: {result}"
    except Exception as e:
        return f"Error: {str(e)}"

def web_search(query: str) -> str:
    # In production: wire to SerpAPI, Tavily, or Brave Search
    return f"Search results for '{query}': [relevant results here]"

def save_to_file(filename: str, content: str) -> str:
    try:
        with open(filename, 'w') as f:
            f.write(content)
        return f"Successfully saved to {filename}"
    except Exception as e:
        return f"Error saving file: {str(e)}"

# Tool dispatcher — routes tool calls to implementations
def execute_tool(tool_name: str, tool_input: dict) -> str:
    if tool_name == "calculator":
        return calculator(tool_input["expression"])
    elif tool_name == "web_search":
        return web_search(tool_input["query"])
    elif tool_name == "save_to_file":
        return save_to_file(tool_input["filename"], tool_input["content"])
    else:
        return f"Unknown tool: {tool_name}"

Tool Description Best Practices

Be explicit about when to use a tool, not just what it does — “Use this when the user asks for current information after 2024”
Describe the format of inputs Claude should provide — “Query should be 2–5 keywords, no question marks”
List edge cases and limitations — “This tool only works with public URLs, not authenticated pages”
The tool description is the most impactful part of tool use. More detail = better model decisions.

Large Rectangle · 336 x 280

Mid-article

Agent Architecture

The ReAct Agent Loop

ReAct (Reasoning + Acting) is the backbone of most production agents. It maps cleanly onto how Claude’s tool use API works: think → act → observe → repeat. Understanding this pattern makes every agent tutorial after this one click into place.

The Complete ReAct Implementation

Python — Full ReAct Agent Loop

def run_agent(user_query: str, max_iterations: int = 10) -> str:
    print(f"nUser: {user_query}")
    messages = [{"role": "user", "content": user_query}]
    system_prompt = """You are a helpful AI agent with access to tools.
    Think step by step. Use tools when you need real data or calculations.
    When you have enough information, provide a clear final answer."""

    for iteration in range(max_iterations):
        print(f"n[Iteration {iteration + 1}]")
        # Call Claude with tools
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            system=system_prompt,
            tools=tools,
            messages=messages
        )
        print(f"Stop reason: {response.stop_reason}")

        # If Claude finished reasoning, extract and return the answer
        if response.stop_reason == "end_turn":
            final_answer = ""
            for block in response.content:
                if hasattr(block, 'text'):
                    final_answer += block.text
            return final_answer

        # If Claude wants to use tools
        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    print(f"  Tool: {block.name} | Input: {block.input}")
                    result = execute_tool(block.name, block.input)
                    print(f"  Result: {result}")
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            messages.append({"role": "user", "content": tool_results})

    return "Max iterations reached without a final answer."

# Run the agent
result = run_agent("What is 15% of 847 and save the result to result.txt?")
print(f"nFinal Answer: {result}")

The Key Insight: Message History is Everything

Every tool call and result is appended to the messages list. Claude always receives full context of what it’s already done. This conversation history is the agent’s working memory — without it, each step would be blind to previous steps.

Framework

Claude Agent SDK

The Claude Agent SDK is the same engine powering Claude Code — exposed as a developer library. It handles the agent loop, built-in tools, and context management so you don’t have to build them yourself.

The SDK handles the loop. Without it, you manage the loop manually. With it, Claude manages it — you just stream messages as Claude reads files, runs commands, finds bugs, and edits code.

— Nader Dabit, The Complete Guide to Building Agents

SDK vs. Raw API

TypeScript — SDK vs Raw API Comparison

// Without SDK: You manage the loop manually
let response = await client.messages.create({...});
while (response.stop_reason === "tool_use") {
  const result = yourToolExecutor(response.tool_use);
  response = await client.messages.create({ tool_result: result, ... });
}

// With the SDK: Claude manages the loop for you
for await (const message of query({ prompt: "Fix the bug in auth.py" })) {
  console.log(message); // Claude reads files, finds bugs, edits code automatically
}

Built-in Tools

The Agent SDK comes with working tools out of the box — no implementation needed:

📖

Read

Read any file in the working directory. Respects .gitignore.

✏️

Write

Create new files with specified content anywhere in the project.

🔧

Edit

Make precise, targeted edits to existing files without rewriting.

💻

Bash

Run terminal commands — npm install, python scripts, git operations.

🔍

Glob / Grep

Find files by pattern or search file contents with regex.

🌐

WebSearch

Search the web for current information beyond training data.

📡

WebFetch

Fetch and parse web pages for content extraction and research.

🖥️

Computer Use

Control a desktop GUI — click, type, scroll, take screenshots.

Your First SDK Agent (TypeScript)

TypeScript — SDK Quick Start

import { query } from "@anthropic-ai/claude-agent-sdk";

async function main() {
  for await (const message of query({
    prompt: "What files are in this directory? Summarise each one.",
    options: {
      model: "claude-sonnet-4-6",
      allowedTools: ["Glob", "Read"],
      maxTurns: 250
    }
  })) {
    if (message.type === "assistant") {
      for (const block of message.message.content) {
        if ("text" in block) console.log(block.text);
      }
    }
    if (message.type === "result") {
      console.log("Done: " + message.subtype + " | Cost: $" + message.total_cost_usd);
    }
  }
}

main();

SDK Agent in Python

Python — SDK Agent with Streaming

import asyncio
from claude_code_sdk import query, ClaudeCodeOptions

async def run_agent():
    async for message in query(
        prompt="Analyse the Python files in this directory and find any bugs",
        options=ClaudeCodeOptions(
            model="claude-sonnet-4-6",
            allowed_tools=["Read", "Glob", "Grep"],
            max_turns=50,
            system_prompt="You are a senior Python engineer. Be thorough."
        )
    ):
        if message.type == "assistant":
            for block in message.message.content:
                if hasattr(block, 'text'):
                    print(block.text, end="", flush=True)
        elif message.type == "result":
            print(f"nComplete — Cost: ${message.total_cost_usd:.4f}")

asyncio.run(run_agent())

Real-World Example: Code Review Agent

TypeScript — Code Review Agent

import { query } from "@anthropic-ai/claude-agent-sdk";

async function reviewCode(directory: string) {
  console.log("Starting code review for: " + directory);

  for await (const message of query({
    prompt: "Review the code in " + directory + " for: 1) bugs, 2) security issues, 3) performance, 4) code quality. Be specific about file names and line numbers.",
    options: {
      model: "claude-opus-4-8",
      allowedTools: ["Read", "Glob", "Grep"],
      permissionMode: "bypassPermissions",
      maxTurns: 250
    }
  })) {
    if (message.type === "assistant") {
      for (const block of message.message.content) {
        if ("text" in block) console.log(block.text);
        else if ("name" in block) console.log("Using: " + block.name);
      }
    }
  }
}

reviewCode(".");

Leaderboard · 728 x 90

Pre-managed agents

Cloud Platform

Managed Agents (Cloud Platform)

Claude Managed Agents is Anthropic’s fully-managed cloud platform for running production agents. You don’t need Docker, Kubernetes, or any infrastructure — just an API key and your agent configuration.

Platform Overview

Launched April 8, 2026, Managed Agents provides sandboxed Linux containers with Python, Node.js, and common tools pre-installed. Environments support unrestricted or restricted networking. All sessions have persistent file systems, web access, and code execution without any setup.

Core Managed Agents Concepts

Concept	Description	Usage
Agent	Reusable config: model + system prompt + tools	Create once, reuse across sessions
Environment	Sandboxed Linux container template	Configure networking, packages, mounts
Session	A running agent instance on a task	Start, stream events, stop
Events	Stream of messages between app and agent	Tool calls, outputs, status updates
Vault	Secure secrets store for agent credentials	GitHub tokens, API keys, passwords

Step 1: Install the CLI and SDK

Shell

# Install the Managed Agents CLI (ant)
brew install anthropic/tap/ant       # macOS
curl -fsSL https://claude.ai/install.sh | bash  # Linux

# Verify
ant --version

# Install Python SDK
pip install anthropic

# Set API key
export ANTHROPIC_API_KEY="your-api-key-here"

Step 2: Create an Agent

Python — Create Agent

from anthropic import Anthropic
client = Anthropic()

agent = client.beta.agents.create(
    name="Coding Assistant",
    model="claude-sonnet-4-6",
    system="You are a helpful coding assistant. Write clean, well-documented code.",
    tools=[
        {"type": "agent_toolset_20260401"},  # Full built-in toolset
    ],
)
print(f"Agent ID: {agent.id}")

Step 3: Create an Environment

Python — Create Environment

environment = client.beta.environments.create(
    name="quickstart-env",
    config={
        "type": "cloud",
        "networking": {"type": "unrestricted"},
        # In production: restrict to specific domains
        # "networking": {"type": "restricted", "allowed_domains": ["api.example.com"]}
    },
)
print(f"Environment ID: {environment.id}")

Step 4: Start a Session & Stream Events

Python — Session + Streaming

import anthropic
client = anthropic.Anthropic()

session = client.beta.sessions.create(
    agent_id="agent_abc123",
    environment_id="env_xyz456",
)

with client.beta.sessions.send_message.stream(
    session_id=session.id,
    content="Create a simple Python web scraper for https://httpbin.org/json"
) as stream:
    for event in stream:
        if event.type == "content_block_delta":
            print(event.delta.text, end="", flush=True)
        elif event.type == "tool_use":
            print(f"nTool: {event.tool_name}({event.tool_input})")
        elif event.type == "message_stop":
            print("nSession complete")

The Key Events

💬

content_block_delta

Streaming text from the agent. Print to screen in real time for a live typing effect.

🔧

tool_use

Claude is calling a built-in tool. Contains tool_name and tool_input. Monitor for auditing.

📊

tool_result

Result of a tool call. Useful for logging and debugging agent behavior.

🏁

message_stop

Session complete or needs input. Contains final status, cost, and any error details.

❓

input_required

Agent hit a decision point and needs human input before continuing. Implement approval flows here.

⚠️

error

Session-level error. Implement retry logic and fallback handling for production agents.

Extensibility

Agent Skills & MCP Integration

Agent Skills are reusable, versioned packages of tools and instructions that extend Claude’s capabilities. The Model Context Protocol (MCP) is the open standard for connecting AI models to external tools, databases, and services.

What is MCP?

The Model Context Protocol is Anthropic’s open standard for exposing tools to AI agents. Instead of re-implementing Gmail, Slack, or Postgres integrations for every project, you run (or build) an MCP server once and plug it into any agent that needs it.

MCP vs Custom Tools

Custom Tools — One-off functions for your specific project. Good for private APIs and internal logic.
MCP Servers — Standardised, reusable tool servers. Official TypeScript SDKs. Growing ecosystem of community servers.
Use custom tools for project-specific logic; use MCP for anything you’d want to share across agents or projects.

Connecting a Remote MCP Server

Python — MCP Connector in API Request

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    tools=[
        {
            "type": "mcp",
            "server_label": "github",
            "server_url": "https://mcp.github.com",
            "authorization_token": os.getenv("GITHUB_TOKEN"),
            "allowed_tools": ["list_repos", "create_issue", "get_file_contents"]
        }
    ],
    messages=[{"role": "user", "content": "List my recent GitHub repos"}]
)

Popular MCP Servers

💻

GitHub MCP

Read repos, create issues, list PRs, get file contents. Official Anthropic-backed server.

🗄️

PostgreSQL MCP

Query databases, inspect schemas, run migrations. Connect your agents to production data.

📧

Gmail / Outlook MCP

Read, send, search emails. Build agents that triage inbox, draft replies, or monitor alerts.

💬

Slack MCP

Post messages, read channel history, manage threads. Build team-aware agents.

📅

Google Calendar

Read/write events, schedule meetings, check availability. Scheduling agent workflows.

🔔

Jira / Linear MCP

Create tickets, update statuses, query sprints. Bridge agents to project management.

Agent Skills (Managed Platform)

On the Managed Agents platform, Skills are versioned packages deployed to Anthropic’s infrastructure. Create a skill once and share it across agents, teams, or publish to the Skills marketplace.

Shell — Agent Skills Quickstart (CLI)

# Create a new skill project
ant beta:skills init my-data-skill

# Define tools in skill.yaml, implement in index.ts/main.py
# Then deploy:
ant beta:skills deploy

# Attach to an agent
ant beta:agents update AGENT_ID 
  --skill 'my-data-skill@1.0.0'

Medium Rectangle · 300 x 250

After MCP servers

State Management

Memory & Context Management

A single-shot agent is stateless — each run starts fresh. Production agents need memory. The right memory architecture depends on your use case: short-term, long-term, or semantic retrieval.

Four Memory Patterns

📝

In-Context (Short-term)

The conversation history itself. Limited by context window. Free and fast. Best for within-session state.

🗒️

Scratchpad

A markdown file the agent writes to and reads from. Cheap persistent memory. Best for tracking progress within long tasks.

💾

External Database

Postgres, Redis, SQLite via tool calls. Structured, persistent, queryable memory. Best for cross-session state.

🔍

Semantic / Vector Store

Embeddings + vector database (Pinecone, pgvector). Retrieve by meaning not exact match. Best for large knowledge bases.

Context Window Management

Claude’s context window is large but finite. For long-running agents, the SDK provides automatic compaction — summarising old context when nearing the limit to prevent overflow while preserving essential information.

Prompt Caching

Anthropic’s automatic prompt caching dramatically reduces cost for agents that use long system prompts or reference documents repeatedly. When the same prefix appears multiple times, it is served from cache at a fraction of the token cost.

Caching Rules of Thumb

System prompts are the best caching target — put stable, reusable instructions at the top
Large reference documents (codebase context, knowledge bases) should be placed before variable user content
Cached tokens cost ~10% of uncached input tokens — significant savings for large context windows
The SDK handles caching automatically; no explicit configuration needed in most cases

Advanced Architecture

Multi-Agent Systems

When a single agent isn’t enough — because the task is too long for one context window, requires parallelism, or benefits from specialisation — you orchestrate multiple agents working together.

When to Use Multiple Agents

Pattern	Use Case	Example
Orchestrator + Workers	Complex tasks decomposed into parallel subtasks	Research agent spawns 5 subagents for each source
Pipeline	Sequential processing with specialised stages	Scraper → Extractor → Formatter → Publisher
Review Pattern	One agent generates, another critiques	Code writer → Security reviewer → Code writer
Specialist Pool	Route tasks to domain-expert agents	Query → Classifier → SQL / Code / Research Agent

Orchestrator Pattern (Python)

Python — Multi-Agent Orchestration

import asyncio
from anthropic import Anthropic
client = Anthropic()

async def run_subagent(agent_id: str, env_id: str, task: str) -> str:
    """Run a single subagent and return its final output."""
    session = client.beta.sessions.create(
        agent_id=agent_id,
        environment_id=env_id,
    )
    result = ""
    with client.beta.sessions.send_message.stream(
        session_id=session.id, content=task
    ) as stream:
        for event in stream:
            if event.type == "content_block_delta":
                result += event.delta.text
    return result

async def orchestrate_research(topic: str) -> str:
    subtasks = [
        f"Research the history of {topic} from primary sources",
        f"Find recent developments and news about {topic} in 2025-2026",
        f"Analyse the technical aspects and future outlook of {topic}",
    ]
    # Run subagents in parallel
    results = await asyncio.gather(*[
        run_subagent("agent_research_worker", "env_sandbox", task)
        for task in subtasks
    ])
    combined = "nn---nn".join(results)
    return await run_subagent(
        "agent_synthesiser", "env_sandbox",
        f"Synthesise these research results into a coherent report:nn{combined}"
    )

report = asyncio.run(orchestrate_research("quantum computing"))

Multi-Agent Safety Principles

Give each subagent minimal permissions — only the tools it needs for its specific role
Subagents should have explicit scope limits — “only read files, never write” or “only query this database”
The orchestrator should validate subagent outputs before using them as inputs to other agents
Always implement a max_turns or timeout to prevent runaway agent loops

Billboard · 970 x 250

Pre-best-practices

Production Readiness

Best Practices, Safety & Debugging

Building an agent that works in a demo is straightforward. Building one that’s reliable, safe, and cost-effective in production requires intentional design decisions at every layer.

Step 1: Define the Agent’s Job Narrowly

The single biggest predictor of whether your agent will work is how clearly you can describe its job. Vague goals produce vague, expensive, unreliable agents.

Job Scope Template

Before writing any code, write down three things:
Input: what the agent receives
Output: what it produces
Boundary: what it is explicitly NOT allowed to do (send emails, spend money, delete files)

System Prompt (CLAUDE.md) Best Practices

The system prompt is the agent’s job description. A good system prompt has four sections:

Markdown — CLAUDE.md Template

# Role
You are an invoice processing agent for Acme Corp.

# Inputs & Outputs
Input: Email messages containing invoice attachments (PDF)
Output: Structured JSON with vendor, amount, due_date, line items

# Tool Usage
1. Use read_email to fetch the latest unprocessed invoice emails
2. Use extract_pdf to parse the PDF attachment
3. Use validate_vendor to confirm the vendor exists in our system
4. Use save_to_db to persist the structured record

# Guardrails (NEVER DO)
- Do NOT send any emails or notifications
- Do NOT approve or reject invoices — only extract data
- Do NOT access any files outside the invoices/ directory
- If the invoice total exceeds $50,000, flag for human review
- If vendor is not found, escalate — do not create new vendors

Cost Management

💰

Model Selection

Use Haiku for simple classification/routing tasks. Sonnet for most agents. Opus only when deep reasoning is genuinely needed.

📦

Prompt Caching

Structure system prompts to maximise cache hits. Move static content before dynamic user input.

🛑

max_turns Limits

Always set a maximum iteration count to prevent runaway loops that burn through tokens.

🔍

Monitoring

Track cost_usd per session. Set up alerts when a single run exceeds your budget threshold.

🗜️

Tool Output Size

Truncate large tool outputs before returning them to Claude. Returning 100KB from a web search wastes tokens.

⚡

Streaming

Use streaming for user-facing agents. Perceived response time drops dramatically even when total time is the same.

Security & Permission Hardening

⚠️ Production Security Checklist

Never store API keys in code — use environment variables or a secrets manager
Use allowedTools to restrict agents to only the tools they need for their role
In Managed Agents, use restricted networking — only allow access to specific domains
Implement human-in-the-loop gates for irreversible actions (deleting files, sending emails, API calls that cost money)
Validate all tool inputs before execution — Claude can be manipulated by injected content in documents it reads
Log all tool calls and their results for audit trails and debugging
Use sandboxed environments — never give agents direct access to production databases initially

Agent Development Timeline

Day 1

Basic Tool-Using Agent

Raw API + 2–3 custom tools + ReAct loop. Get familiar with stop_reason handling and message history structure.

Day 2–3

SDK Agent

Migrate to the Claude Agent SDK. Add streaming. Test with real file system operations using built-in tools.

Week 1

Managed Agent

Deploy to Managed Agents platform. Set up agent + environment + session management. Add structured output.

Week 2

MCP Integration

Connect external MCP servers (GitHub, Slack, database). Implement permission policies. Add monitoring.

Month 1

Production Multi-Agent

Orchestrator + specialist subagents. Long-term memory. Human-in-the-loop gates. Cost dashboards. CI/CD pipeline.

“The single biggest predictor of whether your agent will work is how clearly you can describe its job. Vague goals produce vague agents.”

— Claude Code Playbooks, How to Build an AI Agent from Scratch

Sources & References

Anthropic — Build a Tool-Using Agent

Official tutorial for building tool-using agents with the Claude API.

Anthropic — Managed Agents Quickstart

Create your first autonomous agent on Anthropic’s cloud platform.

Dextra Labs — Build AI Agent from Scratch

ReAct loop, custom tools, and full production code walkthrough.

Nader Dabit — Complete Guide to Building Agents

SDK deep-dive with TypeScript, code review agent example.

Helply — Create AI Agents with Claude SDK

Step-by-step guide to the Claude Agent SDK setup and usage.

SmythOS — Create AI Agents Using Claude

Developer integration guide for Claude-powered agents.

Claude Code Playbooks — Build AI Agent

End-to-end walkthrough from goal definition to multi-agent orchestration.

FindSkill.ai — Build First Claude Agent

Managed Agents hands-on tutorial with Python and TypeScript code.

ClickUp — Build an AI Agent with Claude

Practical guide to Claude agent creation for business automation.

Anthropic — Agent Skills Quickstart

Building reusable, versioned tool packages for Claude agents.

DataCamp — Claude Agent SDK Tutorial

Three hands-on projects from one-shot to full custom-tool agents.

How to Create an AI Agent with Claude

What Is an AI Agent?

The Three Core Properties of Any Agent

Agents vs. Chatbots: A Clear Comparison

The Agent Loop (Conceptually)

Why Build with Claude?

Claude’s Agent-Specific Advantages

Core Concepts & Terminology

The Agent Architecture Stack

Prerequisites & Environment Setup

What You Need

Step 1: Install the Claude Code CLI

Step 2: Set Your API Key

Step 3: Install the SDK

Claude API Basics

Your First API Call

Understanding the Message Structure

Key API Parameters

Stop Reasons — Critical for Agents

Tool Use & Function Calling

How Tool Use Works

Defining a Tool

Implementing the Tool Functions

The ReAct Agent Loop

The Complete ReAct Implementation

Claude Agent SDK

SDK vs. Raw API

Built-in Tools

Your First SDK Agent (TypeScript)

SDK Agent in Python

Real-World Example: Code Review Agent

Managed Agents (Cloud Platform)

Core Managed Agents Concepts

Step 1: Install the CLI and SDK

Step 2: Create an Agent

Step 3: Create an Environment

Step 4: Start a Session & Stream Events

The Key Events

content_block_delta

tool_use

tool_result

message_stop

input_required

error

Agent Skills & MCP Integration

What is MCP?

Connecting a Remote MCP Server

Popular MCP Servers

GitHub MCP

PostgreSQL MCP

Gmail / Outlook MCP

Slack MCP

Google Calendar

Jira / Linear MCP

Agent Skills (Managed Platform)

Memory & Context Management

Four Memory Patterns

Context Window Management

Prompt Caching

Multi-Agent Systems

When to Use Multiple Agents

Orchestrator Pattern (Python)

Best Practices, Safety & Debugging

Step 1: Define the Agent’s Job Narrowly

System Prompt (CLAUDE.md) Best Practices

Cost Management

Model Selection

Prompt Caching

max_turns Limits

Monitoring

Tool Output Size

Streaming

Security & Permission Hardening

Agent Development Timeline

Basic Tool-Using Agent

SDK Agent

Managed Agent

MCP Integration

Production Multi-Agent

Leave a Reply Cancel reply