Building AI Agents
From Scratch

AI agents are autonomous software systems that perceive their environment, reason about it, and take actions to achieve goals. This guide gives you a structured, beginner-friendly path from understanding what an agent is — all the way to building, testing, and deploying one safely.

Chapters

60+

Topics

Prerequisites

∞

Possibilities

📚 How to Use This Guide

Each chapter builds on the previous one. Chapters 1–3 cover theory and foundations. Chapters 4–6 introduce the key building blocks every agent needs. Chapters 7–8 show you how to code real agents using popular frameworks. Chapters 9–10 prepare you to ship agents responsibly. Beginners are encouraged to read chapters sequentially before jumping to implementation.

Chapter One · Foundations

What Are AI Agents?

Chapter Overview

Before writing a single line of code, you need a crisp mental model of what an AI agent actually is, how it differs from traditional software, and where it fits in the broader AI landscape. This chapter builds that foundation.

An AI agent is a software entity that senses its environment through inputs, processes those inputs using an AI model, decides on actions, and executes those actions — often in a loop, without constant human direction.

“An AI agent is any system that perceives its environment and takes actions that maximise its chances of achieving its goals.”

— Russell & Norvig, Artificial Intelligence: A Modern Approach

Topics Covered in This Chapter

1.1

Definition of an AI Agent

What separates an agent from a regular program. The four defining properties: autonomy, reactivity, pro-activeness, and social ability. Why the word “agent” matters and how it differs from “chatbot”, “assistant”, or “script”.
1.2

The Perception–Reasoning–Action Loop

The canonical agent loop: observe → think → act → observe again. How inputs (text, images, API data) feed reasoning, and how actions (tool calls, API requests, file writes) feed back into the environment.
1.3

Agents vs. Chatbots vs. Automation Scripts

A clear comparison of agents, traditional chatbots, RPA bots, and workflow automation. When to use an agent vs. a simpler tool. Understanding the spectrum from deterministic to autonomous behaviour.
1.4

Types of AI Agents

Simple reflex agents, model-based reflex agents, goal-based agents, utility-based agents, and learning agents. Real-world examples of each type with concrete use cases (customer support, coding assistant, research agent, etc.).
1.5

Real-World Applications of AI Agents

Agents in use today: GitHub Copilot Workspace, Devin, AutoGPT, Claude Computer Use, Salesforce Agentforce. Industry verticals — customer service, software development, healthcare, finance, and research — and what makes each domain suitable for agents.
1.6

The Agentic AI Landscape in 2025–2026

Where the field stands today. The shift from single-turn LLM calls to multi-step, tool-using agents. Key milestones: ReAct, Toolformer, GPT-4 function calling, Claude tool use, and the emergence of full agentic frameworks.

🤖

Reactive Agent

Acts purely on current perception. No memory, no planning. Fast and predictable but limited.

Simplest

🧭

Goal-Based Agent

Works toward a defined objective. Plans sequences of actions. Most common LLM-powered type.

Common

🧠

Learning Agent

Improves behaviour from experience. Requires feedback loops and evaluation. Closest to AGI.

Advanced

Chapter Two · Foundations

Core Concepts & Terminology

Chapter Overview

AI agent development has its own vocabulary. Mastering these terms early prevents confusion when reading documentation, papers, and framework guides. Every concept below appears repeatedly in later chapters.

Like any engineering discipline, building AI agents requires command of its language. This chapter defines every key term you will encounter — from tokens and context windows to tool calls and hallucinations.

Topics Covered in This Chapter

2.1

Tokens, Context Windows & Latency

What a token is and why it matters for cost and speed. Context window limits (4K, 8K, 128K, 1M tokens). How latency compounds in multi-step agents. Practical rules of thumb for context management.
2.2

System Prompts, User Messages & Roles

The message structure of LLM APIs: system, user, assistant roles. How to craft an effective system prompt for an agent. The difference between stateless and stateful message histories.
2.3

Temperature, Top-P & Sampling Parameters

How temperature controls randomness. What top-p (nucleus sampling) does. Practical settings for different agent tasks: deterministic code generation (low temp) vs. creative brainstorming (high temp).
2.4

Hallucination & Grounding

Why LLMs confidently produce wrong facts. Grounding techniques: RAG, tool use, structured output validation. How to design agents that minimise hallucination risk through verification steps.
2.5

Tool Calls & Function Calling

What a “tool” is in agent terms. How OpenAI, Anthropic, and Gemini implement function/tool calling. JSON schema definitions for tools. The tool call → tool result → continuation cycle.
2.6

Reasoning Patterns: Chain-of-Thought & ReAct

Chain-of-Thought (CoT) prompting and why it improves complex reasoning. The ReAct (Reason + Act) framework: interleaving thinking and tool use. Scratchpad reasoning vs. final answer output.
2.7

Memory Types: In-Context, External & Episodic

In-context memory (conversation history). External memory (vector databases, key-value stores). Episodic memory (summaries of past interactions). When to use each and the trade-offs involved.
2.8

Embeddings & Semantic Search

How text is converted into numerical vectors. Cosine similarity for semantic matching. The role of embeddings in retrieval-augmented generation (RAG). Popular embedding models: OpenAI ada-002, Cohere, sentence-transformers.

💡 Key Mental Model

Think of an LLM as the “brain” of an agent. It receives context (everything in the context window), reasons about it, and outputs either a final answer or a tool call request. The scaffolding around the LLM — memory, tools, loops — is what turns a brain into an agent.

Chapter Three · Foundations

LLMs & Foundation Models

Chapter Overview

Almost every modern AI agent is built on a Large Language Model. Understanding what LLMs can and cannot do — and how to choose the right one — is essential before writing agent code.

Large Language Models are the reasoning engines of modern AI agents. Choosing the right model and understanding its capabilities and limits will determine what your agent can reliably accomplish.

Topics Covered in This Chapter

3.1

How Large Language Models Work (Conceptually)

Transformers and self-attention explained for beginners. Pre-training on text corpora. Instruction tuning and RLHF (Reinforcement Learning from Human Feedback). Why scale matters and what “emergent capabilities” mean in practice.
3.2

Surveying Available Models: GPT-4, Claude, Gemini, Llama

Comparative overview of frontier models: OpenAI GPT-4o, Anthropic Claude 4, Google Gemini 2, Meta Llama 3. Strengths and weaknesses of each. Open-source vs. proprietary trade-offs.
3.3

Choosing the Right Model for Your Agent

Decision framework: task complexity, latency requirements, cost per token, context window needs, tool-calling support, and data privacy. When to use a smaller model (Haiku, GPT-4o-mini) vs. a frontier model.
3.4

Fine-Tuning vs. Prompting vs. RAG

Three ways to specialise a model for your domain. Prompt engineering: zero-shot and few-shot. Fine-tuning: when it’s worth the cost. RAG: injecting real-time knowledge without re-training. Practical decision tree for beginners.
3.5

Multimodal Models: Text, Images, Audio & Code

How vision-language models work. Using image input for agents that read screenshots, diagrams, or documents. Code-specific models (CodeLlama, GitHub Copilot models). When to route tasks to specialised models.
3.6

Model APIs: REST, SDKs & Rate Limits

Making your first API call to OpenAI or Anthropic. Understanding request/response structure. Managing API keys securely. Rate limits, retries, and exponential backoff. Cost tracking and token budgets.

Model	Provider	Context Window	Tool Calling	Best For
GPT-4o	OpenAI	128K tokens	✅ Native	Versatile agents, vision tasks
Claude Sonnet 4	Anthropic	200K tokens	✅ Native	Reasoning, long documents
Gemini 2 Flash	Google	1M tokens	✅ Native	Very long context tasks
Llama 3.1 70B	Meta (OSS)	128K tokens	✅ via Ollama	On-premise, privacy-sensitive
Mistral Large	Mistral AI	128K tokens	✅ Native	European data residency

Chapter Four · Building Blocks

Prompt Engineering for Agents

Chapter Overview

Prompt engineering is the craft of communicating effectively with LLMs. For agents, it goes far beyond simple question-answering — it involves structuring multi-step reasoning, defining personas, constraining output formats, and orchestrating tool use.

The quality of your agent’s outputs is directly proportional to the quality of your prompts. Prompt engineering for agents is a discipline in itself — one that rewards careful study.

Topics Covered in This Chapter

4.1

Anatomy of a Great System Prompt

The five elements of an effective agent system prompt: role definition, capability declaration, behavioural constraints, output format instructions, and tool use guidelines. Before-and-after examples comparing weak and strong prompts.
4.2

Zero-Shot, One-Shot & Few-Shot Prompting

Zero-shot: ask without examples. One-shot: one example. Few-shot: 2–8 examples. How to select good examples. The diminishing returns of adding too many examples. When few-shot prompting outperforms fine-tuning.
4.3

Chain-of-Thought (CoT) Prompting

“Think step by step” and why it works. Automatic CoT vs. manual CoT. Self-consistency: sampling multiple reasoning chains and picking the most common answer. CoT for arithmetic, logic, planning, and coding tasks.
4.4

Structured Output Prompting: JSON, XML & Markdown

Instructing models to return machine-parseable output. JSON mode in OpenAI API. XML tags in Anthropic Claude. Pydantic models for output validation. Why structured output is essential for reliable agents.
4.5

ReAct Prompting Pattern

Implementing the Reason + Act pattern in your prompts. Structuring Thought / Action / Observation triplets. How ReAct agents decide when to use a tool vs. answer directly. Debugging ReAct traces to improve performance.
4.6

Prompt Templates & Variables

Building reusable prompt templates with Jinja2 or f-strings. Dynamic injection of user context, tool results, and memory. Prompt versioning and change management. Testing prompt changes safely.
4.7

Prompt Injection & Security

What prompt injection is and why it’s a critical agent vulnerability. Direct vs. indirect injection attacks. Defensive techniques: input sanitisation, privilege separation, sandboxing. Real-world attack examples and mitigations.

        System Prompt Example
SYSTEM: You are ResearchAgent, an expert research assistant.
# Capabilities
You can: search the web, read URLs, summarise documents, write reports.
# Reasoning Style
Always think step-by-step. Use <thinking> tags for reasoning.
Output your final answer inside <answer> tags.
# Tool Use
Call tools when you need real-time or external information.
Never fabricate URLs, citations, or statistics.
      

⚡ Prompt Engineering Tip

Always test your prompts against adversarial inputs before deploying an agent. Ask yourself: “What would happen if a malicious user tried to override my system prompt?” Build constraints that hold even under pressure.

Chapter Five · Building Blocks

Tools, APIs & Memory

Chapter Overview

An LLM alone cannot browse the web, run code, or remember past conversations. Tools and memory are what transform a language model into a capable agent. This chapter teaches you to build and integrate both.

Tools extend what an agent can do. Memory extends what it can know. Together, they bridge the gap between a static language model and a dynamic, capable agent that acts in the real world.

Topics Covered in This Chapter

5.1

What Is a Tool? Defining & Registering Tools

Tools as functions exposed to the LLM. JSON schema definitions: name, description, parameters, required fields. Why clear tool descriptions matter for model decision-making. Registering tools with OpenAI, Anthropic, and LangChain.
5.2

Web Search Tools

Integrating Serper, Tavily, Brave Search, and Bing Search APIs. Designing a search tool that returns structured results. Handling pagination, rate limits, and irrelevant results. Building a research agent with web search as its backbone.
5.3

Code Execution Tools

Giving an agent the ability to write and run Python code. Using sandboxed environments (E2B, Modal, Docker) for safe execution. The code interpreter pattern. Handling errors and feeding results back into the agent loop.
5.4

File System & Document Tools

Reading, writing, and searching files. PDF parsing with PyMuPDF. Spreadsheet tools with openpyxl. Building a document QA agent. Handling large files without blowing the context window.
5.5

External API Tools (REST & GraphQL)

Wrapping any REST API as an agent tool. Authentication (API keys, OAuth 2.0). Error handling and retries. Auto-generating tool schemas from OpenAPI specs. Example: building a weather tool, a calendar tool, a Jira tool.
5.6

In-Context Memory Management

Managing the conversation history array. Strategies for long conversations: sliding window, summarisation, selective retention. Token counting with tiktoken. When context grows too large — what to do and what not to do.
5.7

Vector Databases & Retrieval-Augmented Generation (RAG)

What a vector database is and why it matters. Embedding documents and storing them in Chroma, Pinecone, Weaviate, or pgvector. Semantic retrieval at query time. Building a basic RAG pipeline from scratch. Chunking strategies that affect retrieval quality.
5.8

Persistent Memory: Redis, SQLite & Key-Value Stores

Saving agent state between sessions. Using Redis for fast key-value memory. SQLite for structured agent logs and user profiles. Designing a memory schema that agents can query efficiently. Privacy considerations when storing user data.

🔍

Search Tools

Connect your agent to live web information. Essential for research, fact-checking, and current events.

⚙️

Code Execution

Let your agent write and run Python to perform calculations, data analysis, and automations.

🧠

Vector Memory

Enable your agent to recall relevant information from large document stores using semantic search.

Chapter Six · Building Blocks

Agent Architectures & Design Patterns

Chapter Overview

How you structure an agent’s decision loop, planning strategy, and tool-use pattern has a huge impact on reliability and capability. This chapter covers the major architectures every agent builder needs to understand.

Architecture is to agents what algorithms are to data structures — the invisible skeleton that determines capability, efficiency, and failure modes. Choosing the right pattern for your task is half the battle.

Topics Covered in This Chapter

6.1

The ReAct Architecture (Reason + Act)

The foundational pattern for most LLM agents. How the model interleaves reasoning steps and tool calls. Implementing a ReAct loop from scratch in Python. Debugging: reading a ReAct trace and identifying failure points.
6.2

Plan-and-Execute Architecture

Two-phase agents: a planner LLM generates a task list, an executor LLM carries out each step. Advantages over ReAct for complex, multi-day tasks. Handling plan revisions when steps fail. OpenAI’s task decomposition approach.
6.3

Reflexion: Self-Critiquing Agents

Agents that evaluate their own outputs and revise them. The Reflexion pattern: act → evaluate → reflect → retry. How verbal reinforcement improves agent performance without gradient descent. Implementing a self-correction loop.
6.4

Tree of Thoughts (ToT)

Exploring multiple reasoning paths in parallel. When branching outperforms linear reasoning. Beam search and DFS/BFS strategies for thought trees. Computational cost trade-offs. Best suited for creative problem-solving and puzzles.
6.5

Router Agents & Specialised Sub-Agents

A dispatcher agent that routes tasks to the best-fit specialist. Designing a router prompt that accurately classifies intents. Building a library of specialist agents (code agent, search agent, data agent). Cascading failures and fallback strategies.
6.6

Human-in-the-Loop (HITL) Patterns

When agents should pause and ask for confirmation. Designing approval gates for high-stakes actions. Async human review in long-running agent workflows. Balancing autonomy with oversight — finding the right intervention points.
6.7

State Machines & Workflow Graphs

Modelling agent behaviour as a directed graph of states. LangGraph’s graph-based approach. Benefits over simple loops: predictability, debuggability, auditability. Handling conditional branches and parallel execution.

“The best architecture for an agent is the simplest one that reliably solves your problem — start with ReAct, only add complexity when you hit its limits.”

— Agent Engineering Best Practice, 2025

Chapter Seven · Implementation

Frameworks & SDKs

Chapter Overview

You don’t have to build everything from scratch. A rich ecosystem of open-source frameworks makes agent development faster. This chapter gives you hands-on guidance for the most popular tools.

Frameworks like LangChain, LlamaIndex, and CrewAI provide pre-built components for tool integration, memory, and agent loops — letting you focus on your agent’s purpose rather than its plumbing.

Topics Covered in This Chapter

7.1

LangChain: Agents, Chains & Tools

LangChain’s core abstractions: LLMs, Chains, Agents, Tools, and Memory. Building a ReAct agent with AgentExecutor. Using LangChain’s built-in tool library. When LangChain is a good choice — and when it adds unnecessary complexity.
7.2

LangGraph: Stateful, Graph-Based Agents

LangGraph’s node/edge model. Adding persistence with checkpointers. Building branching, cyclical agent workflows. Human-in-the-loop with interrupt() and resume(). Streaming agent outputs in real time. Deploying with LangServe.
7.3

LlamaIndex: RAG & Data Agents

LlamaIndex’s Document → Node → Index → Query pipeline. Building an agent over a PDF library. Sub-question query engine for complex questions. LlamaIndex vs. LangChain for knowledge-intensive agents.
7.4

CrewAI: Role-Based Multi-Agent Teams

Defining Agents with roles, backstories, and goals. Assigning Tasks to agents. Crew orchestration: sequential vs. hierarchical process. Use cases: content teams, research crews, engineering squads. CrewAI vs. LangGraph trade-offs.
7.5

Anthropic Claude SDK & Tool Use

Using the official Anthropic Python SDK. Defining tools with JSON schema. The tool-use request/response cycle. Handling multi-turn tool calls. Streaming responses. Extended Thinking for complex reasoning tasks.
7.6

OpenAI Agents SDK (Swarm / Assistants API)

OpenAI Assistants API: threads, runs, and built-in tools. The Swarm framework for lightweight multi-agent handoffs. Function calling best practices. File search and code interpreter as managed tools. Comparing Assistants API to custom frameworks.
7.7

Building Without a Framework: Vanilla Python

Why every agent builder should understand the raw API. Implementing a minimal ReAct loop in <100 lines of Python. Managing conversation state manually. When vanilla Python outperforms a framework (small, fast, auditable agents).

        LangGraph · Minimal Agent
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode

# Define the graph
graph = StateGraph(AgentState)
graph.add_node(“agent”, call_model)
graph.add_node(“tools”, ToolNode(tools))
graph.add_conditional_edges(“agent”, route_tools)
graph.add_edge(“tools”, “agent”)
graph.set_entry_point(“agent”)

app = graph.compile()
      

LangChain LangGraph LlamaIndex CrewAI AutoGen Anthropic SDK OpenAI SDK Haystack

Chapter Eight · Implementation

Multi-Agent Systems

Chapter Overview

Some tasks are too complex for a single agent. Multi-agent systems divide complex problems among specialised agents, enabling parallelism, specialisation, and cross-checking. This chapter shows you how to orchestrate them.

Multi-agent systems are the next step after mastering single agents. They bring modularity, specialisation, and parallelism — but also new coordination challenges that require careful design.

Topics Covered in This Chapter

8.1

Why Multi-Agent Systems?

The limits of a single agent. Benefits of parallelism, specialisation, and redundancy. When to split a workflow into multiple agents vs. keeping it in one. Cost vs. capability trade-offs. Real-world examples: coding pipelines, research workflows, customer support hierarchies.
8.2

Orchestrator–Subagent Pattern

A central orchestrator agent that breaks tasks into subtasks and delegates. Subagents specialised for specific tools or domains. Communication protocol between orchestrator and subagents. Error handling when a subagent fails.
8.3

Agent Communication Protocols

How agents pass information to each other: shared state, message passing, shared memory. Structured message formats (JSON payloads). Synchronous vs. asynchronous agent communication. Using message queues (Redis, Kafka) for decoupled agent pipelines.
8.4

Parallel Agent Execution

Running multiple agents simultaneously using asyncio, concurrent.futures, or task queues. Map-reduce patterns for agent workflows. Collecting and aggregating results from parallel agents. Race conditions and idempotency.
8.5

Debate & Verification: Critic Agents

Using a second agent to review, critique, or verify the first agent’s output. The Generator–Critic–Reviser pattern. Reducing hallucinations through adversarial checking. Implementing a fact-checking agent pipeline.
8.6

Shared Memory & Blackboard Architecture

A shared data store that all agents can read and write. The blackboard model from classical AI. Implementing shared memory with Redis or an in-memory dict. Conflict resolution when agents write to the same key.
8.7

Building a Full Multi-Agent Pipeline: End-to-End Example

Step-by-step walkthrough: a research + writing pipeline with four agents (Researcher, Fact-Checker, Writer, Editor). Defining agent roles, tools, and handoff logic. Running the pipeline and inspecting intermediate results. Iterating on the design.

🏗️ Architecture Tip

Start with the simplest possible multi-agent design: one orchestrator, two specialists. Only add agents when you can point to a specific bottleneck or capability gap. Every additional agent adds communication overhead, failure modes, and cost. Complexity is not a feature.

Chapter Nine · Production

Evaluation, Testing & Debugging

Chapter Overview

Agents are non-deterministic — the same input can produce different outputs on different runs. Rigorous evaluation and testing are the only way to build confidence in your agent before shipping it to real users.

An untested agent is a liability. This chapter teaches you to measure, stress-test, and systematically improve your agent’s performance before it ever touches production data.

Topics Covered in This Chapter

9.1

Defining Agent Success Metrics

Task completion rate, correctness, latency, cost-per-task, tool call accuracy, and hallucination rate. Writing a metrics spec before building. Goal-oriented metrics (did the agent achieve its goal?) vs. process metrics (did it take the right steps?).
9.2

Building an Evaluation Dataset (Evals)

What makes a good eval dataset. Curating representative, diverse, and adversarial test cases. Gold-standard answers vs. rubric-based evaluation. Avoiding data contamination. Using LLMs as evaluators (LLM-as-judge pattern).
9.3

Unit Testing Agent Components

Testing individual tools in isolation. Mocking LLM responses for deterministic tests with pytest. Testing tool schemas, input validation, and error handling. CI/CD pipelines that run agent tests on every code change.
9.4

End-to-End Agent Testing

Running full agent scenarios against a test environment. Record-and-replay testing. Canary testing with a subset of real traffic. A/B testing prompt changes. Regression suites that catch performance degradation.
9.5

Tracing, Logging & Observability

Capturing every step of an agent’s execution: LLM calls, tool calls, and state transitions. OpenTelemetry for agent traces. LangSmith, Langfuse, and Arize Phoenix for visual tracing. Structured logging for production analysis. Debugging a failed run from its trace.
9.6

Common Agent Failure Modes & Fixes

The taxonomy of agent failures: infinite loops, tool misuse, context overflow, hallucinated tool calls, planning failures, and prompt injection. Systematic debugging checklist. Pattern-matching symptoms to root causes. Defensive coding techniques.
9.7

Evaluation Frameworks: RAGAS, PromptFoo & Evals

Using RAGAS to evaluate RAG pipelines. PromptFoo for comparing prompt versions. OpenAI Evals library. Anthropic’s Constitutional AI evaluation approach. Building a lightweight custom eval harness in Python.

Failure Mode	Symptom	Root Cause	Fix
Infinite Loop	Agent never returns final answer	Missing termination condition	Add max_iterations limit
Tool Misuse	Wrong tool called for task	Ambiguous tool descriptions	Rewrite tool descriptions, add examples
Hallucinated Citation	Fake URLs or sources cited	No grounding / verification step	Add search + fact-check tool
Context Overflow	Truncated or forgotten history	Context window exceeded	Implement memory summarisation
Prompt Injection	Agent behaves unexpectedly	Malicious input in tool results	Sanitise tool outputs before context injection

Chapter Ten · Production

Safety, Ethics & Deployment

Chapter Overview

Deploying an agent in production is not just a technical act — it carries responsibility. This final chapter covers the principles, practices, and infrastructure needed to run agents safely, ethically, and reliably at scale.

The most capable agent is worthless — or dangerous — if it cannot be trusted. Safety, ethics, and production-grade infrastructure are not afterthoughts. They are the foundation on which every real-world agent is built.

Topics Covered in This Chapter

10.1

The Principle of Minimal Footprint

Why agents should request only the permissions they need. Scoping tool access by task. Time-limited credentials. Avoiding irreversible actions without confirmation. The principle of least privilege applied to AI agents.
10.2

Guardrails & Content Moderation

Input and output guardrails. Using moderation APIs (OpenAI Moderation, Llama Guard, Guardrails AI). Blocking harmful, off-topic, or policy-violating outputs. Rate limiting and abuse prevention. Designing graceful refusals that don’t break user experience.
10.3

Data Privacy & Compliance

What data flows through your agent and where it’s stored. GDPR and CCPA obligations for agent developers. Data minimisation in prompts. Avoiding PII in LLM context windows. On-premise vs. cloud model trade-offs for sensitive data.
10.4

Responsible AI Principles for Agent Builders

Fairness, accountability, transparency, and safety (FATE). Bias in agent outputs: sources, examples, and mitigations. Who is responsible when an agent causes harm? Documenting agent capabilities and limitations honestly. Communicating to users that they’re interacting with AI.
10.5

Deploying Agents: APIs, Serverless & Containers

Packaging an agent as a REST API with FastAPI or Flask. Serverless deployment on AWS Lambda, GCP Cloud Run, or Modal. Docker containers for portable, reproducible agents. Managing environment variables and secrets securely (Vault, AWS Secrets Manager).
10.6

Scaling Agents: Queues, Concurrency & Cost Control

Handling high request volumes with Celery, BullMQ, or AWS SQS. Async Python with asyncio for concurrent agent calls. LLM cost optimisation: caching, prompt compression, model routing. Setting hard cost limits per task. Horizontal scaling strategies.
10.7

Monitoring, Alerting & Incident Response

Production dashboards: latency P50/P95, error rate, cost/day. Alerts for cost spikes, failure rate increases, and unusual tool usage. Rollback strategies for bad deployments. Post-incident reviews for agent failures. The importance of human oversight in production agent systems.
10.8

What to Build Next: Your Agent Learning Path

Suggested projects for each experience level. Resources for going deeper: papers, courses, communities. The evolving agent ecosystem — what to watch in 2026 and beyond. From beginner to practitioner: a 90-day roadmap.

🛡️

Safety First

Never deploy an agent without guardrails. Define what your agent should never do before defining what it should do.

🔒

Privacy by Design

Minimise PII in prompts. Log only what you need. Treat every piece of user data as sensitive until proven otherwise.

📊

Monitor Everything

Agents fail in unexpected ways. Comprehensive tracing and alerting will catch issues before your users do.

🎓 Congratulations — You’ve Completed the Curriculum

By working through these ten chapters, you now have a comprehensive map of the AI agent landscape — from first principles to production deployment. The field moves fast. Stay curious, build projects, engage with the community, and revisit these foundations as you grow. The best agents you’ll ever build are ahead of you.

Suggested First Projects by Level

Wk 1

Hello Agent — Your First Tool-Using Bot

Build a simple agent that answers questions using a web search tool. No framework — raw API calls only. Goal: understand the request/response loop end-to-end.

Wk 2–3

Research Agent with RAG Memory

An agent that can ingest a set of PDF documents and answer questions about them using vector search. Add persistent memory across sessions.

Wk 4–6

Multi-Step Task Agent with LangGraph

A planning agent that can execute a 5-step task plan, handle failures, and ask for human confirmation on irreversible actions. Observe it with LangSmith.

Wk 7–10

Production-Ready Multi-Agent API

Deploy a multi-agent pipeline as a FastAPI service. Add guardrails, monitoring, cost tracking, and a test suite. Invite feedback from real users.

Recommended Resources & References

Anthropic — Building Effective Agents

Anthropic’s official guide covering patterns, architectures, and best practices for agentic systems — essential reading.

LangChain Documentation — Agents

Official LangChain docs on building agents with AgentExecutor, custom tools, and memory modules.

LangGraph — Official Documentation

Comprehensive guide to building stateful, graph-based agents with LangGraph, including human-in-the-loop patterns.

ReAct: Synergizing Reasoning and Acting in Language Models (arxiv)

The original ReAct paper by Shunyu Yao et al. — the foundational architecture behind most modern LLM agents.

Anthropic Claude — Tool Use Documentation

Official documentation on defining and using tools with Claude’s API, including JSON schema formats.

OpenAI — Function Calling Guide

OpenAI’s guide to function/tool calling with GPT-4, including structured output and parallel tool calls.

DeepLearning.AI — AI Agents in LangGraph (Short Course)

Free short course from Andrew Ng’s platform covering ReAct, memory, and multi-agent patterns with hands-on exercises.

CrewAI — Documentation

Official CrewAI docs for building role-based multi-agent teams, including crew composition and task assignment patterns.

Building AI AgentsFrom Scratch

Topics Covered in This Chapter

Definition of an AI Agent

The Perception–Reasoning–Action Loop

Agents vs. Chatbots vs. Automation Scripts

Types of AI Agents

Real-World Applications of AI Agents

The Agentic AI Landscape in 2025–2026

Topics Covered in This Chapter

Tokens, Context Windows & Latency

System Prompts, User Messages & Roles

Temperature, Top-P & Sampling Parameters

Hallucination & Grounding

Tool Calls & Function Calling

Reasoning Patterns: Chain-of-Thought & ReAct

Memory Types: In-Context, External & Episodic

Embeddings & Semantic Search

Topics Covered in This Chapter

How Large Language Models Work (Conceptually)

Surveying Available Models: GPT-4, Claude, Gemini, Llama

Choosing the Right Model for Your Agent

Fine-Tuning vs. Prompting vs. RAG

Multimodal Models: Text, Images, Audio & Code

Model APIs: REST, SDKs & Rate Limits

Topics Covered in This Chapter

Anatomy of a Great System Prompt

Zero-Shot, One-Shot & Few-Shot Prompting

Chain-of-Thought (CoT) Prompting

Structured Output Prompting: JSON, XML & Markdown

ReAct Prompting Pattern

Prompt Templates & Variables

Prompt Injection & Security

Topics Covered in This Chapter

What Is a Tool? Defining & Registering Tools

Web Search Tools

Code Execution Tools

File System & Document Tools

External API Tools (REST & GraphQL)

In-Context Memory Management

Vector Databases & Retrieval-Augmented Generation (RAG)

Persistent Memory: Redis, SQLite & Key-Value Stores

Search Tools

Code Execution

Vector Memory

Topics Covered in This Chapter

The ReAct Architecture (Reason + Act)

Plan-and-Execute Architecture

Reflexion: Self-Critiquing Agents

Tree of Thoughts (ToT)

Router Agents & Specialised Sub-Agents

Human-in-the-Loop (HITL) Patterns

State Machines & Workflow Graphs

Topics Covered in This Chapter

LangChain: Agents, Chains & Tools

LangGraph: Stateful, Graph-Based Agents

LlamaIndex: RAG & Data Agents

CrewAI: Role-Based Multi-Agent Teams

Anthropic Claude SDK & Tool Use

OpenAI Agents SDK (Swarm / Assistants API)

Building Without a Framework: Vanilla Python

Topics Covered in This Chapter

Why Multi-Agent Systems?

Orchestrator–Subagent Pattern

Agent Communication Protocols

Parallel Agent Execution

Debate & Verification: Critic Agents

Shared Memory & Blackboard Architecture

Building a Full Multi-Agent Pipeline: End-to-End Example

Topics Covered in This Chapter

Defining Agent Success Metrics

Building an Evaluation Dataset (Evals)

Unit Testing Agent Components

End-to-End Agent Testing

Tracing, Logging & Observability

Common Agent Failure Modes & Fixes

Evaluation Frameworks: RAGAS, PromptFoo & Evals

Topics Covered in This Chapter

The Principle of Minimal Footprint

Guardrails & Content Moderation

Data Privacy & Compliance

Building AI Agents
From Scratch