Section 01
Study Guide
Long-form notes and references.
Domain 5 covers the disciplines that keep Claude grounded across long, complex tasks: context architecture, retrieval, summarization, caching, and reliability patterns for production agents and large knowledge bases.
North Star
Long-context reliability comes from intelligent context engineering and deterministic architecture — not from blindly increasing the token window.
Glossary of Key Terms
- Lost-in-the-Middle
- The structural tendency of LLMs to attend more strongly to the beginning and end of context while weakening attention in the middle.
- Pinned Preamble
- A static, never-removed section at the top of the prompt holding system instructions, safety policies, agent identity, and critical constraints.
- Sliding Window
- A context-management pattern that keeps only the most recent N turns active, evicting the oldest as new turns arrive.
- Rolling Summary
- A compressed structured representation of older conversation turns preserving key facts, decisions, goals, and open tasks.
- External Memory Layer
- Persistent state stored outside the model (DB, vector store, Redis, JSON file) and reinjected on demand — long-term memory engineered separately from the context window.
- Progressive Summarization at Threshold
- Proactively compressing older context when usage hits ~70–85% of the window, before reasoning quality degrades.
- Prompt Caching
- Reusing a precomputed static prefix across requests for ~80–90% input-cost savings, lower latency, and stable high-attention placement.
- Heartbeat / Warm-up Cron
- Periodic lightweight requests sent to keep a prompt cache warm and prevent cold-start latency spikes.
- Message Batches API
- An asynchronous bulk-processing API for high-volume, non-urgent, cost-sensitive workloads, typically with up to a 24-hour completion window.
- Real-time API
- The synchronous, low-latency API used for chatbots, live support, and interactive multi-turn workflows.
- Transient Failure
- A temporary, recoverable error (5xx, 429, timeouts, network blips) that should be retried with backoff and jitter.
- Permanent Failure
- A structural or invalid condition (401, 403, malformed JSON, schema violation) that will not succeed on retry — fail fast.
- Exponential Backoff
- A retry strategy that doubles (or otherwise grows) the wait interval between attempts to relieve pressure on a struggling service.
- Jitter
- Randomness added to retry delays to desynchronize clients and prevent thundering-herd retry storms.
- Thundering Herd
- Synchronized traffic spikes caused when many clients fail and retry at exactly the same time, overwhelming an already-stressed service.
- Boundary Validation
- Deterministic schema/type/business-rule checks applied between agents to contain malformed state before it cascades.
- Silent Failure
- An execution that returns successfully at the API level but produces semantically invalid or corrupted internal state.
- Retrieve-and-Read
- A two-pass pattern: first retrieve the most relevant chunks, then perform deep reasoning only over that focused subset.
- RAG (Retrieval-Augmented Generation)
- An architecture that retrieves relevant external evidence first and generates an answer over it, instead of relying on the model's parametric memory or full-context dumps.
- Attention Dilution
- Loss of effective focus on important tokens caused by overloading the prompt with too much irrelevant or middle-positioned content.