Section 01
Study Guide
Long-form notes and references.
Domain 4 shifts prompt engineering from 'writing instructions' to designing multi-layered contracts. Schemas, tool use, defensive parsing, and code-level guardrails work together to produce production-grade reliability from a probabilistic generator.
North Star
Use prompts for semantic problems, schemas for structural enforcement, and code for deterministic repair. Don't fight the model — architect around its probabilistic nature.
Glossary of Key Terms
- tool_use (Structured Generation Mode)
- An API mechanism that constrains model output to a defined input_schema, dramatically improving JSON validity even when no external tool actually executes.
- Dummy Tool
- A tool defined purely to force schema-constrained generation; never executed externally. Often called 'tool-use-as-generation-mode'.
- Sentinel Value
- A reserved explicit placeholder (e.g., 'not_specified', 'unknown') that represents missing or unspecified data in a structured schema, replacing inconsistent omissions/nulls/blanks.
- Few-Shot Curation
- The strategic selection of edge cases, decision boundaries, and contrastive pairs as in-context examples — chosen for informational value rather than quantity.
- Contrastive Example
- Two nearly-identical inputs with different outputs presented together so the model learns which features actually drive the decision.
- Decision Boundary
- The fuzzy zone between two similar labels where classification ambiguity exists — the highest-value target for few-shot examples.
- Chain-of-Thought (CoT)
- A reasoning technique where the model explains its reasoning step-by-step before producing a conclusion.
- Grounded CoT
- CoT in which every reasoning step must cite explicit evidence from the provided source material, producing inspectable, verifiable, audit-friendly logic.
- Two-Pass Pipeline
- An architecture that splits a task into separate identification and execution stages to improve reliability, scalability, and interpretability.
- Decompose-and-Filter
- A pipeline pattern where one pass identifies/filters relevant items and another pass processes them — used for nested tasks, large volumes, or fuzzy-to-deterministic handoffs.
- Defensive Parsing
- Repairing minor structural drift (string→array, '3'→3, missing fields) in deterministic code rather than over-prompting the model to be perfectly formatted.
- Don't Fight the Model
- The principle that probabilistic formatting variability should be handled with code, not with longer prompts and retries.
- Prefilling
- Supplying the beginning of the assistant's response to anchor the autoregressive generation path toward the intended task format and prevent false refusals.
- Autoregressive Anchoring
- The phenomenon where early generated tokens strongly bias the trajectory of subsequent tokens — exploited by prefilling.
- False Refusal
- The model incorrectly refusing a benign task because keywords (medical, security, legal) resemble risky topics during initial classification.
- Primacy Effect
- LLMs allocate disproportionate attention to the beginning of the context, making it the optimal location for critical safety constraints.
- Lost-in-the-Middle
- Reduced model attention to instructions buried in the middle of long contexts, weakening compliance with rules placed there.
- Defense-in-Depth (Prompt)
- Reinforcing safety rules across multiple layers — system prompt placement, schemas, tool permissions, infrastructure — so no single layer is solely trusted.
- Message Batches API
- An asynchronous bulk-processing mode (~24h window) offering ~50% lower cost in exchange for flexible latency, ideal for offline pipelines.
- Prompt Caching
- Reusing computation on repeated context prefixes (system prompts, knowledge bases, style guides) so the same tokens are not re-tokenized and re-attended on every request.