61.3 Compound AI Systems: A New Paradigm
105Measuring What Matters: Evaluation Frameworks for Agent Systems
71.4 The State Problem
10616.1 The Evaluation Stack
81.5 The Case Study: Media Rights and Content Licensing Engine
10716.2 Building Evaluation Datasets
91.6 Chapter Summary
10816.3 LLM-as-Judge for Subjective Quality
10The Multi-Agent Paradigm: Principles of Production AI
10916.4 Regression Testing for Agent Systems
112.1 The Seven Properties of Production Multi-Agent Systems
11016.5 Chapter Summary
12Property 1: Composability
111Synthetic Testing and Edge-Case Simulation
13Property 2: Observability
11217.1 Generating Synthetic Test Cases
14Property 3: Reliability
11317.2 Adversarial Testing
15Property 4: Security
11417.3 Failure Mode Taxonomy
16Property 5: Evaluability
11517.4 Chapter Summary
17Property 6: Cost Accountability
116Tracing Multi-Agent Workflows: Seeing Inside the Black Box
18Property 7: Governance
11718.1 Distributed Tracing for Multi-Agent Systems
192.2 Agent Roles and Responsibilities
11818.2 LangSmith Integration
202.3 The Evolution from Chatbots to Autonomous Workflows
11918.3 Debugging Multi-Step Workflow Failures
212.4 The Orchestrator Pattern
12018.4 Token Usage Monitoring
222.5 Agent Communication Protocols
12118.5 Chapter Summary
232.6 Designing for Failure
122Production Debugging: Diagnosing Agent Failures in the Wild
242.7 Chapter Summary
12319.1 Categories of Production Failures
25Layers of Intelligence: Anatomy of the Agentic Stack
12419.2 Reproducing Intermittent Failures
263.1 The LLM Layer
12519.3 Operational Runbooks
273.2 The Memory Layer
12619.4 Chapter Summary
283.3 The Tool Layer
127Fault Tolerance: Retries, Timeouts, and the Dead-Agent Queue
293.4 The Orchestration Layer
12820.1 Transient vs. Persistent Failures
303.5 The Evaluation Layer
12920.2 Circuit Breakers for Agent Reliability
313.6 The Observability Layer
13020.3 Timeout Budgets
323.7 The Security Layer
13120.4 Loop Detection
333.8 Chapter Summary
13220.5 The Dead-Agent Queue: Complete Implementation
34Framework Wars: LangGraph, CrewAI, AutoGen, and Semantic Kernel
13320.6 Chapter Summary
354.1 LangGraph: Graph-Based Workflow Orchestration
134Event-Driven Agent Architectures
364.2 CrewAI: Role-Based Collaborative Agents
13521.1 Event-Driven vs. Request-Response Architectures
374.3 AutoGen: Conversational Multi-Agent Coordination
13621.2 Designing the Event Schema
384.4 Semantic Kernel: Enterprise Integration Focus
13721.3 The Saga Pattern for Distributed Transactions
394.5 The Case for Framework Independence
13821.4 Chapter Summary
404.6 Chapter Summary
139Token Economics: Understanding and Controlling AI Costs
41Stack Selection and Integration Strategy
14022.1 The Cost Model for Multi-Agent Systems
425.1 The Reference Stack
14122.2 Model Routing Strategies
435.2 The Pydantic Contract
14222.3 Response Caching
445.3 Integrating the Reference Stack
14322.4 Prompt Efficiency
455.4 Chapter Summary
14422.5 Chapter Summary
46Designing Agents: Roles, Capabilities, and Reasoning Strategies
145Cost Optimization at Scale: Rate Limiting, Caching, and Loop Prevention
476.1 The Single Responsibility Principle for Agents
14623.1 Rate Limiting Architecture
486.2 Reasoning Strategies
14723.2 Semantic Caching
49Direct Completion
14823.3 Batch Processing for Cost Efficiency
50Chain-of-Thought (CoT)
14923.4 Chapter Summary
51ReAct: Reason + Act
150Distributed Agent Deployment: Microservices and Containers
526.3 Task Decomposition
15124.1 Agent Microservice Architecture
536.4 State Persistence Strategies
15224.2 Kubernetes Deployment for Agent Services
546.5 Chapter Summary
15324.3 Queue-Based Agent Scaling
55Design Patterns for Multi-Agent Systems
15424.4 Multi-Region Deployment
567.1 The Supervisor Pattern
15524.5 Chapter Summary
577.2 The Planner-Executor Pattern
156Cloud-Native Agent Infrastructure: AWS, Azure, and GCP
587.3 Hierarchical Agent Networks
15725.1 AWS: Bedrock, SageMaker, and Step Functions
597.4 The Critic-Actor Pattern
15825.2 Azure: OpenAI Service and AI Foundry
607.5 The Fan-Out/Fan-In Pattern
15925.3 GCP: Vertex AI and Gemini
617.6 Chapter Summary
16025.4 Chapter Summary
62State Machines and Persistence: Building Durable Workflows
161Emerging Paradigms: Compound AI, Agent Economies, and Self-Improving Systems
638.1 Modeling Workflows as State Machines
16226.1 Compound AI Systems: Combining Neural and Symbolic Reasoning
648.2 Event Sourcing for Agent Workflows
16326.2 Agent Operating Systems
658.3 The Dead-Agent Queue
16426.3 Self-Improving Agent Systems
668.4 Chapter Summary
16526.4 Agent Economies
67Connecting Agents to the Real World: APIs, Databases, and Documents
16626.5 What Changes and What Stays the Same
689.1 The Anatomy of a Production Tool
16726.6 Chapter Summary
699.2 SQL Query Tools
168Agent Identity and Access Management
709.3 Document Parsing Tools
16927.1 The Agent Identity Model
719.4 Knowledge Graph Tools
17027.2 Authentication: Proving Agent Identity
729.5 Chapter Summary
17127.3 Role-Based Access Control for Agents
73Typed Agent Communication: The Pydantic Protocol
17227.4 Agent-to-Agent Authorization
7410.1 Why Typed Schemas Are Non-Negotiable
17327.5 Secret Management Architecture
7510.2 Designing Schema Hierarchies
17427.6 Chapter Summary
7610.3 LLM Structured Output Configuration
175Prompt Injection Defense: The Agent Attack Surface
7710.4 Schema Evolution and Versioning
17628.1 The Taxonomy of Prompt Injection
7810.5 Chapter Summary
17728.2 Structural Defense: Prompt Architecture
79Building the Case Study Tool Layer
17828.3 Input Sanitization Layer
8011.1 Contract Retrieval Tools
17928.4 Output Validation as Injection Containment
8111.2 Regulatory Database Tools
18028.5 Multi-Hop Injection Defense
8211.3 Pricing Calculation Tools
18128.6 Chapter Summary
8311.4 Chapter Summary
182Sandboxing Tool Execution
84Approval Workflows: Engineering Human Oversight
18329.1 The Threat Model for Tool Execution
8512.1 The Four Modes of Human Involvement
18429.2 The Tool Permission Model
8612.2 Implementing Durable Pauses with LangGraph
18529.3 Container Isolation for Code Execution
8712.3 Confidence Scoring and Automatic Escalation
18629.4 Seccomp and Linux Capability Restriction
8812.4 The Reviewer Interface
18729.5 Chapter Summary
8912.5 Chapter Summary
188Data Sovereignty and Cross-Border Compliance
90Risk Escalation, Manual Overrides, and Audit Trails
18930.1 The Regulatory Landscape
9113.1 Structured Risk Escalation
19030.2 Data Classification Architecture
9213.2 Manual Override Design
19130.3 The Right to Erasure in Multi-Agent Systems
9313.3 Immutable Audit Trails
19230.4 Chapter Summary
9413.4 Chapter Summary
193Agent Versioning, Drift Detection, and A/B Testing
95Enterprise RAG: Beyond Basic Vector Search
19430.1 Agent Versioning
9614.1 The Failures of Basic RAG
19530.2 Model Drift Detection with CUSUM
9714.2 Hybrid Search: Combining Semantic and Lexical Retrieval
19630.3 Champion-Challenger A/B Testing
9814.3 Query Rewriting
197About the Author
9914.4 Multi-Stage Retrieval