Home Exam InfoDomain 05

Context Management & Reliability

Context window strategy, retrieval, summarization, caching, and reliability patterns for long-running agents and large knowledge bases.

Section 01

Study Guide

Long-form notes and references.

Domain 5 covers the disciplines that keep Claude grounded across long, complex tasks: context architecture, retrieval, summarization, caching, and reliability patterns for production agents and large knowledge bases.

North Star

Long-context reliability comes from intelligent context engineering and deterministic architecture — not from blindly increasing the token window.

Glossary of Key Terms

Lost-in-the-Middle: The structural tendency of LLMs to attend more strongly to the beginning and end of context while weakening attention in the middle.
Pinned Preamble: A static, never-removed section at the top of the prompt holding system instructions, safety policies, agent identity, and critical constraints.
Sliding Window: A context-management pattern that keeps only the most recent N turns active, evicting the oldest as new turns arrive.
Rolling Summary: A compressed structured representation of older conversation turns preserving key facts, decisions, goals, and open tasks.
External Memory Layer: Persistent state stored outside the model (DB, vector store, Redis, JSON file) and reinjected on demand — long-term memory engineered separately from the context window.
Progressive Summarization at Threshold: Proactively compressing older context when usage hits ~70–85% of the window, before reasoning quality degrades.
Prompt Caching: Reusing a precomputed static prefix across requests for ~80–90% input-cost savings, lower latency, and stable high-attention placement.
Heartbeat / Warm-up Cron: Periodic lightweight requests sent to keep a prompt cache warm and prevent cold-start latency spikes.
Message Batches API: An asynchronous bulk-processing API for high-volume, non-urgent, cost-sensitive workloads, typically with up to a 24-hour completion window.
Real-time API: The synchronous, low-latency API used for chatbots, live support, and interactive multi-turn workflows.
Transient Failure: A temporary, recoverable error (5xx, 429, timeouts, network blips) that should be retried with backoff and jitter.
Permanent Failure: A structural or invalid condition (401, 403, malformed JSON, schema violation) that will not succeed on retry — fail fast.
Exponential Backoff: A retry strategy that doubles (or otherwise grows) the wait interval between attempts to relieve pressure on a struggling service.
Jitter: Randomness added to retry delays to desynchronize clients and prevent thundering-herd retry storms.
Thundering Herd: Synchronized traffic spikes caused when many clients fail and retry at exactly the same time, overwhelming an already-stressed service.
Boundary Validation: Deterministic schema/type/business-rule checks applied between agents to contain malformed state before it cascades.
Silent Failure: An execution that returns successfully at the API level but produces semantically invalid or corrupted internal state.
Retrieve-and-Read: A two-pass pattern: first retrieve the most relevant chunks, then perform deep reasoning only over that focused subset.
RAG (Retrieval-Augmented Generation): An architecture that retrieves relevant external evidence first and generates an answer over it, instead of relying on the model's parametric memory or full-context dumps.
Attention Dilution: Loss of effective focus on important tokens caused by overloading the prompt with too much irrelevant or middle-positioned content.

← Previous

Prompt Engineering

Context Management & Reliability

Study Guide

Lost-in-the-Middle

Sliding Window with Pinned Preamble

External Memory Layer

Progressive Summarization at Threshold

Prompt Caching: Cost, Latency & Attention

Message Batches vs. Real-time API

Transient vs. Permanent Failures

Jitter & the Thundering Herd

Boundary Validation in Multi-Agent Pipelines

Retrieve-and-Read for Long Documents

Glossary of Key Terms

Context Management & Reliability

Study Guide

01Lost-in-the-MiddleWhat is the 'Lost-in-the-Middle' phenomenon, and how does it affect long-context performance?

Lost-in-the-Middle

02Sliding Window with Pinned PreambleDescribe the structure of a 'Sliding Window with Pinned Preamble' architecture for long conversations.

Sliding Window with Pinned Preamble

03External Memory LayerHow does an External Memory Layer differ from a standard context window in long-running sessions?

External Memory Layer

04Progressive Summarization at ThresholdExplain the 'Progressive Summarization at Threshold' strategy.

Progressive Summarization at Threshold

05Prompt Caching: Cost, Latency & AttentionWhat are the primary benefits of Prompt Caching regarding cost, latency, and attention?

Prompt Caching: Cost, Latency & Attention

06Message Batches vs. Real-time APIUnder what specific conditions should an architect choose the Message Batches API over the Real-time API?

Message Batches vs. Real-time API

07Transient vs. Permanent FailuresWhat is the difference between transient and permanent failures, and how should a system handle each?

Transient vs. Permanent Failures

08Jitter & the Thundering HerdHow does 'Jitter' assist in preventing a 'thundering herd' problem during API retries?

Jitter & the Thundering Herd

09Boundary Validation in Multi-Agent PipelinesWhy is 'Boundary Validation' critical in multi-agent pipelines?

Boundary Validation in Multi-Agent Pipelines

10Retrieve-and-Read for Long DocumentsExplain the 'Retrieve-and-Read' pattern for processing very long documents.

Retrieve-and-Read for Long Documents

Glossary of Key Terms