HomeExam InfoDomain 05
05

Context Management & Reliability

Context window strategy, retrieval, summarization, caching, and reliability patterns for long-running agents and large knowledge bases.

Section 01

Study Guide

Long-form notes and references.

Domain 5 covers the disciplines that keep Claude grounded across long, complex tasks: context architecture, retrieval, summarization, caching, and reliability patterns for production agents and large knowledge bases.

North Star

Long-context reliability comes from intelligent context engineering and deterministic architecture — not from blindly increasing the token window.

Glossary of Key Terms

Lost-in-the-Middle
The structural tendency of LLMs to attend more strongly to the beginning and end of context while weakening attention in the middle.
Pinned Preamble
A static, never-removed section at the top of the prompt holding system instructions, safety policies, agent identity, and critical constraints.
Sliding Window
A context-management pattern that keeps only the most recent N turns active, evicting the oldest as new turns arrive.
Rolling Summary
A compressed structured representation of older conversation turns preserving key facts, decisions, goals, and open tasks.
External Memory Layer
Persistent state stored outside the model (DB, vector store, Redis, JSON file) and reinjected on demand — long-term memory engineered separately from the context window.
Progressive Summarization at Threshold
Proactively compressing older context when usage hits ~70–85% of the window, before reasoning quality degrades.
Prompt Caching
Reusing a precomputed static prefix across requests for ~80–90% input-cost savings, lower latency, and stable high-attention placement.
Heartbeat / Warm-up Cron
Periodic lightweight requests sent to keep a prompt cache warm and prevent cold-start latency spikes.
Message Batches API
An asynchronous bulk-processing API for high-volume, non-urgent, cost-sensitive workloads, typically with up to a 24-hour completion window.
Real-time API
The synchronous, low-latency API used for chatbots, live support, and interactive multi-turn workflows.
Transient Failure
A temporary, recoverable error (5xx, 429, timeouts, network blips) that should be retried with backoff and jitter.
Permanent Failure
A structural or invalid condition (401, 403, malformed JSON, schema violation) that will not succeed on retry — fail fast.
Exponential Backoff
A retry strategy that doubles (or otherwise grows) the wait interval between attempts to relieve pressure on a struggling service.
Jitter
Randomness added to retry delays to desynchronize clients and prevent thundering-herd retry storms.
Thundering Herd
Synchronized traffic spikes caused when many clients fail and retry at exactly the same time, overwhelming an already-stressed service.
Boundary Validation
Deterministic schema/type/business-rule checks applied between agents to contain malformed state before it cascades.
Silent Failure
An execution that returns successfully at the API level but produces semantically invalid or corrupted internal state.
Retrieve-and-Read
A two-pass pattern: first retrieve the most relevant chunks, then perform deep reasoning only over that focused subset.
RAG (Retrieval-Augmented Generation)
An architecture that retrieves relevant external evidence first and generates an answer over it, instead of relying on the model's parametric memory or full-context dumps.
Attention Dilution
Loss of effective focus on important tokens caused by overloading the prompt with too much irrelevant or middle-positioned content.