HomeExam InfoDomain 04
04

Prompt Engineering & Structured Output

System prompts, role framing, examples, XML scaffolding, and structured output techniques that turn Claude into a dependable component of larger systems.

Section 01

Study Guide

Long-form notes and references.

Domain 4 shifts prompt engineering from 'writing instructions' to designing multi-layered contracts. Schemas, tool use, defensive parsing, and code-level guardrails work together to produce production-grade reliability from a probabilistic generator.

North Star

Use prompts for semantic problems, schemas for structural enforcement, and code for deterministic repair. Don't fight the model — architect around its probabilistic nature.

Glossary of Key Terms

tool_use (Structured Generation Mode)
An API mechanism that constrains model output to a defined input_schema, dramatically improving JSON validity even when no external tool actually executes.
Dummy Tool
A tool defined purely to force schema-constrained generation; never executed externally. Often called 'tool-use-as-generation-mode'.
Sentinel Value
A reserved explicit placeholder (e.g., 'not_specified', 'unknown') that represents missing or unspecified data in a structured schema, replacing inconsistent omissions/nulls/blanks.
Few-Shot Curation
The strategic selection of edge cases, decision boundaries, and contrastive pairs as in-context examples — chosen for informational value rather than quantity.
Contrastive Example
Two nearly-identical inputs with different outputs presented together so the model learns which features actually drive the decision.
Decision Boundary
The fuzzy zone between two similar labels where classification ambiguity exists — the highest-value target for few-shot examples.
Chain-of-Thought (CoT)
A reasoning technique where the model explains its reasoning step-by-step before producing a conclusion.
Grounded CoT
CoT in which every reasoning step must cite explicit evidence from the provided source material, producing inspectable, verifiable, audit-friendly logic.
Two-Pass Pipeline
An architecture that splits a task into separate identification and execution stages to improve reliability, scalability, and interpretability.
Decompose-and-Filter
A pipeline pattern where one pass identifies/filters relevant items and another pass processes them — used for nested tasks, large volumes, or fuzzy-to-deterministic handoffs.
Defensive Parsing
Repairing minor structural drift (string→array, '3'→3, missing fields) in deterministic code rather than over-prompting the model to be perfectly formatted.
Don't Fight the Model
The principle that probabilistic formatting variability should be handled with code, not with longer prompts and retries.
Prefilling
Supplying the beginning of the assistant's response to anchor the autoregressive generation path toward the intended task format and prevent false refusals.
Autoregressive Anchoring
The phenomenon where early generated tokens strongly bias the trajectory of subsequent tokens — exploited by prefilling.
False Refusal
The model incorrectly refusing a benign task because keywords (medical, security, legal) resemble risky topics during initial classification.
Primacy Effect
LLMs allocate disproportionate attention to the beginning of the context, making it the optimal location for critical safety constraints.
Lost-in-the-Middle
Reduced model attention to instructions buried in the middle of long contexts, weakening compliance with rules placed there.
Defense-in-Depth (Prompt)
Reinforcing safety rules across multiple layers — system prompt placement, schemas, tool permissions, infrastructure — so no single layer is solely trusted.
Message Batches API
An asynchronous bulk-processing mode (~24h window) offering ~50% lower cost in exchange for flexible latency, ideal for offline pipelines.
Prompt Caching
Reusing computation on repeated context prefixes (system prompts, knowledge bases, style guides) so the same tokens are not re-tokenized and re-attended on every request.