Building AI Agents
From Scratch
A comprehensive, chapter-by-chapter curriculum for beginners who want to understand, design, and build intelligent AI agents — from first principles to production-ready systems.
AI agents are autonomous software systems that perceive their environment, reason about it, and take actions to achieve goals. This guide gives you a structured, beginner-friendly path from understanding what an agent is — all the way to building, testing, and deploying one safely.
Each chapter builds on the previous one. Chapters 1–3 cover theory and foundations. Chapters 4–6 introduce the key building blocks every agent needs. Chapters 7–8 show you how to code real agents using popular frameworks. Chapters 9–10 prepare you to ship agents responsibly. Beginners are encouraged to read chapters sequentially before jumping to implementation.
Before writing a single line of code, you need a crisp mental model of what an AI agent actually is, how it differs from traditional software, and where it fits in the broader AI landscape. This chapter builds that foundation.
An AI agent is a software entity that senses its environment through inputs, processes those inputs using an AI model, decides on actions, and executes those actions — often in a loop, without constant human direction.
“An AI agent is any system that perceives its environment and takes actions that maximise its chances of achieving its goals.”— Russell & Norvig, Artificial Intelligence: A Modern Approach
Topics Covered in This Chapter
-
1.1
Definition of an AI Agent
What separates an agent from a regular program. The four defining properties: autonomy, reactivity, pro-activeness, and social ability. Why the word “agent” matters and how it differs from “chatbot”, “assistant”, or “script”.
-
1.2
The Perception–Reasoning–Action Loop
The canonical agent loop: observe → think → act → observe again. How inputs (text, images, API data) feed reasoning, and how actions (tool calls, API requests, file writes) feed back into the environment.
-
1.3
Agents vs. Chatbots vs. Automation Scripts
A clear comparison of agents, traditional chatbots, RPA bots, and workflow automation. When to use an agent vs. a simpler tool. Understanding the spectrum from deterministic to autonomous behaviour.
-
1.4
Types of AI Agents
Simple reflex agents, model-based reflex agents, goal-based agents, utility-based agents, and learning agents. Real-world examples of each type with concrete use cases (customer support, coding assistant, research agent, etc.).
-
1.5
Real-World Applications of AI Agents
Agents in use today: GitHub Copilot Workspace, Devin, AutoGPT, Claude Computer Use, Salesforce Agentforce. Industry verticals — customer service, software development, healthcare, finance, and research — and what makes each domain suitable for agents.
-
1.6
The Agentic AI Landscape in 2025–2026
Where the field stands today. The shift from single-turn LLM calls to multi-step, tool-using agents. Key milestones: ReAct, Toolformer, GPT-4 function calling, Claude tool use, and the emergence of full agentic frameworks.
Acts purely on current perception. No memory, no planning. Fast and predictable but limited.
SimplestWorks toward a defined objective. Plans sequences of actions. Most common LLM-powered type.
CommonImproves behaviour from experience. Requires feedback loops and evaluation. Closest to AGI.
AdvancedAI agent development has its own vocabulary. Mastering these terms early prevents confusion when reading documentation, papers, and framework guides. Every concept below appears repeatedly in later chapters.
Like any engineering discipline, building AI agents requires command of its language. This chapter defines every key term you will encounter — from tokens and context windows to tool calls and hallucinations.
Topics Covered in This Chapter
-
2.1
Tokens, Context Windows & Latency
What a token is and why it matters for cost and speed. Context window limits (4K, 8K, 128K, 1M tokens). How latency compounds in multi-step agents. Practical rules of thumb for context management.
-
2.2
System Prompts, User Messages & Roles
The message structure of LLM APIs: system, user, assistant roles. How to craft an effective system prompt for an agent. The difference between stateless and stateful message histories.
-
2.3
Temperature, Top-P & Sampling Parameters
How temperature controls randomness. What top-p (nucleus sampling) does. Practical settings for different agent tasks: deterministic code generation (low temp) vs. creative brainstorming (high temp).
-
2.4
Hallucination & Grounding
Why LLMs confidently produce wrong facts. Grounding techniques: RAG, tool use, structured output validation. How to design agents that minimise hallucination risk through verification steps.
-
2.5
Tool Calls & Function Calling
What a “tool” is in agent terms. How OpenAI, Anthropic, and Gemini implement function/tool calling. JSON schema definitions for tools. The tool call → tool result → continuation cycle.
-
2.6
Reasoning Patterns: Chain-of-Thought & ReAct
Chain-of-Thought (CoT) prompting and why it improves complex reasoning. The ReAct (Reason + Act) framework: interleaving thinking and tool use. Scratchpad reasoning vs. final answer output.
-
2.7
Memory Types: In-Context, External & Episodic
In-context memory (conversation history). External memory (vector databases, key-value stores). Episodic memory (summaries of past interactions). When to use each and the trade-offs involved.
-
2.8
Embeddings & Semantic Search
How text is converted into numerical vectors. Cosine similarity for semantic matching. The role of embeddings in retrieval-augmented generation (RAG). Popular embedding models: OpenAI ada-002, Cohere, sentence-transformers.
Think of an LLM as the “brain” of an agent. It receives context (everything in the context window), reasons about it, and outputs either a final answer or a tool call request. The scaffolding around the LLM — memory, tools, loops — is what turns a brain into an agent.
Almost every modern AI agent is built on a Large Language Model. Understanding what LLMs can and cannot do — and how to choose the right one — is essential before writing agent code.
Large Language Models are the reasoning engines of modern AI agents. Choosing the right model and understanding its capabilities and limits will determine what your agent can reliably accomplish.
Topics Covered in This Chapter
-
3.1
How Large Language Models Work (Conceptually)
Transformers and self-attention explained for beginners. Pre-training on text corpora. Instruction tuning and RLHF (Reinforcement Learning from Human Feedback). Why scale matters and what “emergent capabilities” mean in practice.
-
3.2
Surveying Available Models: GPT-4, Claude, Gemini, Llama
Comparative overview of frontier models: OpenAI GPT-4o, Anthropic Claude 4, Google Gemini 2, Meta Llama 3. Strengths and weaknesses of each. Open-source vs. proprietary trade-offs.
-
3.3
Choosing the Right Model for Your Agent
Decision framework: task complexity, latency requirements, cost per token, context window needs, tool-calling support, and data privacy. When to use a smaller model (Haiku, GPT-4o-mini) vs. a frontier model.
-
3.4
Fine-Tuning vs. Prompting vs. RAG
Three ways to specialise a model for your domain. Prompt engineering: zero-shot and few-shot. Fine-tuning: when it’s worth the cost. RAG: injecting real-time knowledge without re-training. Practical decision tree for beginners.
-
3.5
Multimodal Models: Text, Images, Audio & Code
How vision-language models work. Using image input for agents that read screenshots, diagrams, or documents. Code-specific models (CodeLlama, GitHub Copilot models). When to route tasks to specialised models.
-
3.6
Model APIs: REST, SDKs & Rate Limits
Making your first API call to OpenAI or Anthropic. Understanding request/response structure. Managing API keys securely. Rate limits, retries, and exponential backoff. Cost tracking and token budgets.
| Model | Provider | Context Window | Tool Calling | Best For |
|---|---|---|---|---|
| GPT-4o | OpenAI | 128K tokens | ✅ Native | Versatile agents, vision tasks |
| Claude Sonnet 4 | Anthropic | 200K tokens | ✅ Native | Reasoning, long documents |
| Gemini 2 Flash | 1M tokens | ✅ Native | Very long context tasks | |
| Llama 3.1 70B | Meta (OSS) | 128K tokens | ✅ via Ollama | On-premise, privacy-sensitive |
| Mistral Large | Mistral AI | 128K tokens | ✅ Native | European data residency |
Prompt engineering is the craft of communicating effectively with LLMs. For agents, it goes far beyond simple question-answering — it involves structuring multi-step reasoning, defining personas, constraining output formats, and orchestrating tool use.
The quality of your agent’s outputs is directly proportional to the quality of your prompts. Prompt engineering for agents is a discipline in itself — one that rewards careful study.
Topics Covered in This Chapter
-
4.1
Anatomy of a Great System Prompt
The five elements of an effective agent system prompt: role definition, capability declaration, behavioural constraints, output format instructions, and tool use guidelines. Before-and-after examples comparing weak and strong prompts.
-
4.2
Zero-Shot, One-Shot & Few-Shot Prompting
Zero-shot: ask without examples. One-shot: one example. Few-shot: 2–8 examples. How to select good examples. The diminishing returns of adding too many examples. When few-shot prompting outperforms fine-tuning.
-
4.3
Chain-of-Thought (CoT) Prompting
“Think step by step” and why it works. Automatic CoT vs. manual CoT. Self-consistency: sampling multiple reasoning chains and picking the most common answer. CoT for arithmetic, logic, planning, and coding tasks.
-
4.4
Structured Output Prompting: JSON, XML & Markdown
Instructing models to return machine-parseable output. JSON mode in OpenAI API. XML tags in Anthropic Claude. Pydantic models for output validation. Why structured output is essential for reliable agents.
-
4.5
ReAct Prompting Pattern
Implementing the Reason + Act pattern in your prompts. Structuring Thought / Action / Observation triplets. How ReAct agents decide when to use a tool vs. answer directly. Debugging ReAct traces to improve performance.
-
4.6
Prompt Templates & Variables
Building reusable prompt templates with Jinja2 or f-strings. Dynamic injection of user context, tool results, and memory. Prompt versioning and change management. Testing prompt changes safely.
-
4.7
Prompt Injection & Security
What prompt injection is and why it’s a critical agent vulnerability. Direct vs. indirect injection attacks. Defensive techniques: input sanitisation, privilege separation, sandboxing. Real-world attack examples and mitigations.
Always test your prompts against adversarial inputs before deploying an agent. Ask yourself: “What would happen if a malicious user tried to override my system prompt?” Build constraints that hold even under pressure.
An LLM alone cannot browse the web, run code, or remember past conversations. Tools and memory are what transform a language model into a capable agent. This chapter teaches you to build and integrate both.
Tools extend what an agent can do. Memory extends what it can know. Together, they bridge the gap between a static language model and a dynamic, capable agent that acts in the real world.
Topics Covered in This Chapter
-
5.1
What Is a Tool? Defining & Registering Tools
Tools as functions exposed to the LLM. JSON schema definitions: name, description, parameters, required fields. Why clear tool descriptions matter for model decision-making. Registering tools with OpenAI, Anthropic, and LangChain.
-
5.2
Web Search Tools
Integrating Serper, Tavily, Brave Search, and Bing Search APIs. Designing a search tool that returns structured results. Handling pagination, rate limits, and irrelevant results. Building a research agent with web search as its backbone.
-
5.3
Code Execution Tools
Giving an agent the ability to write and run Python code. Using sandboxed environments (E2B, Modal, Docker) for safe execution. The code interpreter pattern. Handling errors and feeding results back into the agent loop.
-
5.4
File System & Document Tools
Reading, writing, and searching files. PDF parsing with PyMuPDF. Spreadsheet tools with openpyxl. Building a document QA agent. Handling large files without blowing the context window.
-
5.5
External API Tools (REST & GraphQL)
Wrapping any REST API as an agent tool. Authentication (API keys, OAuth 2.0). Error handling and retries. Auto-generating tool schemas from OpenAPI specs. Example: building a weather tool, a calendar tool, a Jira tool.
-
5.6
In-Context Memory Management
Managing the conversation history array. Strategies for long conversations: sliding window, summarisation, selective retention. Token counting with tiktoken. When context grows too large — what to do and what not to do.
-
5.7
Vector Databases & Retrieval-Augmented Generation (RAG)
What a vector database is and why it matters. Embedding documents and storing them in Chroma, Pinecone, Weaviate, or pgvector. Semantic retrieval at query time. Building a basic RAG pipeline from scratch. Chunking strategies that affect retrieval quality.
-
5.8
Persistent Memory: Redis, SQLite & Key-Value Stores
Saving agent state between sessions. Using Redis for fast key-value memory. SQLite for structured agent logs and user profiles. Designing a memory schema that agents can query efficiently. Privacy considerations when storing user data.
Search Tools
Connect your agent to live web information. Essential for research, fact-checking, and current events.
Code Execution
Let your agent write and run Python to perform calculations, data analysis, and automations.
Vector Memory
Enable your agent to recall relevant information from large document stores using semantic search.
How you structure an agent’s decision loop, planning strategy, and tool-use pattern has a huge impact on reliability and capability. This chapter covers the major architectures every agent builder needs to understand.
Architecture is to agents what algorithms are to data structures — the invisible skeleton that determines capability, efficiency, and failure modes. Choosing the right pattern for your task is half the battle.
Topics Covered in This Chapter
-
6.1
The ReAct Architecture (Reason + Act)
The foundational pattern for most LLM agents. How the model interleaves reasoning steps and tool calls. Implementing a ReAct loop from scratch in Python. Debugging: reading a ReAct trace and identifying failure points.
-
6.2
Plan-and-Execute Architecture
Two-phase agents: a planner LLM generates a task list, an executor LLM carries out each step. Advantages over ReAct for complex, multi-day tasks. Handling plan revisions when steps fail. OpenAI’s task decomposition approach.
-
6.3
Reflexion: Self-Critiquing Agents
Agents that evaluate their own outputs and revise them. The Reflexion pattern: act → evaluate → reflect → retry. How verbal reinforcement improves agent performance without gradient descent. Implementing a self-correction loop.
-
6.4
Tree of Thoughts (ToT)
Exploring multiple reasoning paths in parallel. When branching outperforms linear reasoning. Beam search and DFS/BFS strategies for thought trees. Computational cost trade-offs. Best suited for creative problem-solving and puzzles.
-
6.5
Router Agents & Specialised Sub-Agents
A dispatcher agent that routes tasks to the best-fit specialist. Designing a router prompt that accurately classifies intents. Building a library of specialist agents (code agent, search agent, data agent). Cascading failures and fallback strategies.
-
6.6
Human-in-the-Loop (HITL) Patterns
When agents should pause and ask for confirmation. Designing approval gates for high-stakes actions. Async human review in long-running agent workflows. Balancing autonomy with oversight — finding the right intervention points.
-
6.7
State Machines & Workflow Graphs
Modelling agent behaviour as a directed graph of states. LangGraph’s graph-based approach. Benefits over simple loops: predictability, debuggability, auditability. Handling conditional branches and parallel execution.
“The best architecture for an agent is the simplest one that reliably solves your problem — start with ReAct, only add complexity when you hit its limits.”
— Agent Engineering Best Practice, 2025You don’t have to build everything from scratch. A rich ecosystem of open-source frameworks makes agent development faster. This chapter gives you hands-on guidance for the most popular tools.
Frameworks like LangChain, LlamaIndex, and CrewAI provide pre-built components for tool integration, memory, and agent loops — letting you focus on your agent’s purpose rather than its plumbing.
Topics Covered in This Chapter
-
7.1
LangChain: Agents, Chains & Tools
LangChain’s core abstractions: LLMs, Chains, Agents, Tools, and Memory. Building a ReAct agent with AgentExecutor. Using LangChain’s built-in tool library. When LangChain is a good choice — and when it adds unnecessary complexity.
-
7.2
LangGraph: Stateful, Graph-Based Agents
LangGraph’s node/edge model. Adding persistence with checkpointers. Building branching, cyclical agent workflows. Human-in-the-loop with interrupt() and resume(). Streaming agent outputs in real time. Deploying with LangServe.
-
7.3
LlamaIndex: RAG & Data Agents
LlamaIndex’s Document → Node → Index → Query pipeline. Building an agent over a PDF library. Sub-question query engine for complex questions. LlamaIndex vs. LangChain for knowledge-intensive agents.
-
7.4
CrewAI: Role-Based Multi-Agent Teams
Defining Agents with roles, backstories, and goals. Assigning Tasks to agents. Crew orchestration: sequential vs. hierarchical process. Use cases: content teams, research crews, engineering squads. CrewAI vs. LangGraph trade-offs.
-
7.5
Anthropic Claude SDK & Tool Use
Using the official Anthropic Python SDK. Defining tools with JSON schema. The tool-use request/response cycle. Handling multi-turn tool calls. Streaming responses. Extended Thinking for complex reasoning tasks.
-
7.6
OpenAI Agents SDK (Swarm / Assistants API)
OpenAI Assistants API: threads, runs, and built-in tools. The Swarm framework for lightweight multi-agent handoffs. Function calling best practices. File search and code interpreter as managed tools. Comparing Assistants API to custom frameworks.
-
7.7
Building Without a Framework: Vanilla Python
Why every agent builder should understand the raw API. Implementing a minimal ReAct loop in <100 lines of Python. Managing conversation state manually. When vanilla Python outperforms a framework (small, fast, auditable agents).
Some tasks are too complex for a single agent. Multi-agent systems divide complex problems among specialised agents, enabling parallelism, specialisation, and cross-checking. This chapter shows you how to orchestrate them.
Multi-agent systems are the next step after mastering single agents. They bring modularity, specialisation, and parallelism — but also new coordination challenges that require careful design.
Topics Covered in This Chapter
-
8.1
Why Multi-Agent Systems?
The limits of a single agent. Benefits of parallelism, specialisation, and redundancy. When to split a workflow into multiple agents vs. keeping it in one. Cost vs. capability trade-offs. Real-world examples: coding pipelines, research workflows, customer support hierarchies.
-
8.2
Orchestrator–Subagent Pattern
A central orchestrator agent that breaks tasks into subtasks and delegates. Subagents specialised for specific tools or domains. Communication protocol between orchestrator and subagents. Error handling when a subagent fails.
-
8.3
Agent Communication Protocols
How agents pass information to each other: shared state, message passing, shared memory. Structured message formats (JSON payloads). Synchronous vs. asynchronous agent communication. Using message queues (Redis, Kafka) for decoupled agent pipelines.
-
8.4
Parallel Agent Execution
Running multiple agents simultaneously using asyncio, concurrent.futures, or task queues. Map-reduce patterns for agent workflows. Collecting and aggregating results from parallel agents. Race conditions and idempotency.
-
8.5
Debate & Verification: Critic Agents
Using a second agent to review, critique, or verify the first agent’s output. The Generator–Critic–Reviser pattern. Reducing hallucinations through adversarial checking. Implementing a fact-checking agent pipeline.
-
8.6
Shared Memory & Blackboard Architecture
A shared data store that all agents can read and write. The blackboard model from classical AI. Implementing shared memory with Redis or an in-memory dict. Conflict resolution when agents write to the same key.
-
8.7
Building a Full Multi-Agent Pipeline: End-to-End Example
Step-by-step walkthrough: a research + writing pipeline with four agents (Researcher, Fact-Checker, Writer, Editor). Defining agent roles, tools, and handoff logic. Running the pipeline and inspecting intermediate results. Iterating on the design.
Start with the simplest possible multi-agent design: one orchestrator, two specialists. Only add agents when you can point to a specific bottleneck or capability gap. Every additional agent adds communication overhead, failure modes, and cost. Complexity is not a feature.
Agents are non-deterministic — the same input can produce different outputs on different runs. Rigorous evaluation and testing are the only way to build confidence in your agent before shipping it to real users.
An untested agent is a liability. This chapter teaches you to measure, stress-test, and systematically improve your agent’s performance before it ever touches production data.
Topics Covered in This Chapter
-
9.1
Defining Agent Success Metrics
Task completion rate, correctness, latency, cost-per-task, tool call accuracy, and hallucination rate. Writing a metrics spec before building. Goal-oriented metrics (did the agent achieve its goal?) vs. process metrics (did it take the right steps?).
-
9.2
Building an Evaluation Dataset (Evals)
What makes a good eval dataset. Curating representative, diverse, and adversarial test cases. Gold-standard answers vs. rubric-based evaluation. Avoiding data contamination. Using LLMs as evaluators (LLM-as-judge pattern).
-
9.3
Unit Testing Agent Components
Testing individual tools in isolation. Mocking LLM responses for deterministic tests with pytest. Testing tool schemas, input validation, and error handling. CI/CD pipelines that run agent tests on every code change.
-
9.4
End-to-End Agent Testing
Running full agent scenarios against a test environment. Record-and-replay testing. Canary testing with a subset of real traffic. A/B testing prompt changes. Regression suites that catch performance degradation.
-
9.5
Tracing, Logging & Observability
Capturing every step of an agent’s execution: LLM calls, tool calls, and state transitions. OpenTelemetry for agent traces. LangSmith, Langfuse, and Arize Phoenix for visual tracing. Structured logging for production analysis. Debugging a failed run from its trace.
-
9.6
Common Agent Failure Modes & Fixes
The taxonomy of agent failures: infinite loops, tool misuse, context overflow, hallucinated tool calls, planning failures, and prompt injection. Systematic debugging checklist. Pattern-matching symptoms to root causes. Defensive coding techniques.
-
9.7
Evaluation Frameworks: RAGAS, PromptFoo & Evals
Using RAGAS to evaluate RAG pipelines. PromptFoo for comparing prompt versions. OpenAI Evals library. Anthropic’s Constitutional AI evaluation approach. Building a lightweight custom eval harness in Python.
| Failure Mode | Symptom | Root Cause | Fix |
|---|---|---|---|
| Infinite Loop | Agent never returns final answer | Missing termination condition | Add max_iterations limit |
| Tool Misuse | Wrong tool called for task | Ambiguous tool descriptions | Rewrite tool descriptions, add examples |
| Hallucinated Citation | Fake URLs or sources cited | No grounding / verification step | Add search + fact-check tool |
| Context Overflow | Truncated or forgotten history | Context window exceeded | Implement memory summarisation |
| Prompt Injection | Agent behaves unexpectedly | Malicious input in tool results | Sanitise tool outputs before context injection |
Deploying an agent in production is not just a technical act — it carries responsibility. This final chapter covers the principles, practices, and infrastructure needed to run agents safely, ethically, and reliably at scale.
The most capable agent is worthless — or dangerous — if it cannot be trusted. Safety, ethics, and production-grade infrastructure are not afterthoughts. They are the foundation on which every real-world agent is built.
Topics Covered in This Chapter
-
10.1
The Principle of Minimal Footprint
Why agents should request only the permissions they need. Scoping tool access by task. Time-limited credentials. Avoiding irreversible actions without confirmation. The principle of least privilege applied to AI agents.
-
10.2
Guardrails & Content Moderation
Input and output guardrails. Using moderation APIs (OpenAI Moderation, Llama Guard, Guardrails AI). Blocking harmful, off-topic, or policy-violating outputs. Rate limiting and abuse prevention. Designing graceful refusals that don’t break user experience.
-
10.3
Data Privacy & Compliance
What data flows through your agent and where it’s stored. GDPR and CCPA obligations for agent developers. Data minimisation in prompts. Avoiding PII in LLM context windows. On-premise vs. cloud model trade-offs for sensitive data.
-
10.4
Responsible AI Principles for Agent Builders
Fairness, accountability, transparency, and safety (FATE). Bias in agent outputs: sources, examples, and mitigations. Who is responsible when an agent causes harm? Documenting agent capabilities and limitations honestly. Communicating to users that they’re interacting with AI.
-
10.5
Deploying Agents: APIs, Serverless & Containers
Packaging an agent as a REST API with FastAPI or Flask. Serverless deployment on AWS Lambda, GCP Cloud Run, or Modal. Docker containers for portable, reproducible agents. Managing environment variables and secrets securely (Vault, AWS Secrets Manager).
-
10.6
Scaling Agents: Queues, Concurrency & Cost Control
Handling high request volumes with Celery, BullMQ, or AWS SQS. Async Python with asyncio for concurrent agent calls. LLM cost optimisation: caching, prompt compression, model routing. Setting hard cost limits per task. Horizontal scaling strategies.
-
10.7
Monitoring, Alerting & Incident Response
Production dashboards: latency P50/P95, error rate, cost/day. Alerts for cost spikes, failure rate increases, and unusual tool usage. Rollback strategies for bad deployments. Post-incident reviews for agent failures. The importance of human oversight in production agent systems.
-
10.8
What to Build Next: Your Agent Learning Path
Suggested projects for each experience level. Resources for going deeper: papers, courses, communities. The evolving agent ecosystem — what to watch in 2026 and beyond. From beginner to practitioner: a 90-day roadmap.
Safety First
Never deploy an agent without guardrails. Define what your agent should never do before defining what it should do.
Privacy by Design
Minimise PII in prompts. Log only what you need. Treat every piece of user data as sensitive until proven otherwise.
Monitor Everything
Agents fail in unexpected ways. Comprehensive tracing and alerting will catch issues before your users do.
By working through these ten chapters, you now have a comprehensive map of the AI agent landscape — from first principles to production deployment. The field moves fast. Stay curious, build projects, engage with the community, and revisit these foundations as you grow. The best agents you’ll ever build are ahead of you.
Suggested First Projects by Level
Hello Agent — Your First Tool-Using Bot
Build a simple agent that answers questions using a web search tool. No framework — raw API calls only. Goal: understand the request/response loop end-to-end.
Research Agent with RAG Memory
An agent that can ingest a set of PDF documents and answer questions about them using vector search. Add persistent memory across sessions.
Multi-Step Task Agent with LangGraph
A planning agent that can execute a 5-step task plan, handle failures, and ask for human confirmation on irreversible actions. Observe it with LangSmith.
Production-Ready Multi-Agent API
Deploy a multi-agent pipeline as a FastAPI service. Add guardrails, monitoring, cost tracking, and a test suite. Invite feedback from real users.
Recommended Resources & References
Anthropic’s official guide covering patterns, architectures, and best practices for agentic systems — essential reading.
Official LangChain docs on building agents with AgentExecutor, custom tools, and memory modules.
Comprehensive guide to building stateful, graph-based agents with LangGraph, including human-in-the-loop patterns.
The original ReAct paper by Shunyu Yao et al. — the foundational architecture behind most modern LLM agents.
Official documentation on defining and using tools with Claude’s API, including JSON schema formats.
OpenAI’s guide to function/tool calling with GPT-4, including structured output and parallel tool calls.
Free short course from Andrew Ng’s platform covering ReAct, memory, and multi-agent patterns with hands-on exercises.
Official CrewAI docs for building role-based multi-agent teams, including crew composition and task assignment patterns.