Overview
Orient
7 sectionsobjective Objective
Understand how agentic systems actually work turn by turn, where deterministic controls belong, and how multi-agent systems should be decomposed and coordinated.
chapter map Chapter Map
Estimated study time:
- Reading: 2.5 to 3.5 hours
- Whiteboarding and drills: 2 hours
- Lab design work: 2 to 3 hours
- Quiz and review: 1 to 1.5 hours
Sub-lessons:
- Why agentic systems need loops instead of one-shot prompting
- Why explicit control signals beat natural-language guesses
- When deterministic enforcement is mandatory
- How coordinators and subagents divide responsibility
- How subagent invocation and spawning actually work
- How decomposition quality shapes coverage quality
- How session state, resumption, and forking affect reliability
Suggested 5-Day Teaching Flow
Day 1:
- teach the agentic loop from first principles
- diagram a tool-using request across multiple turns
- compare correct loop control with brittle text-parsing approaches
Day 2:
- teach deterministic enforcement versus prompt guidance
- workshop business-critical prerequisite examples
- introduce coordinator-subagent orchestration
Day 3:
- teach
Task-based subagent invocation and configuration - run a guided architecture walkthrough for support and research scenarios
- practice explicit context passing between subagents
- compare prompt chaining and adaptive decomposition
Day 4:
- hold a whiteboard session on decomposition failures
- review common misconceptions
- run the weekly quiz and discussion review
Day 5:
- run the week test under time pressure
- debrief why wrong answers solve the wrong system layer
End-of-Lecture Recap and Homework
Lecture 1.1 Recap Questions
- Why is
stop_reasonthe real control surface for the loop? - What breaks if tool results are not appended back into the conversation?
- Why is an iteration cap a safeguard instead of the main stopping rule?
Homework:
- Write agent-loop pseudocode for one support scenario and one developer-tooling scenario.
- Mark where
tool_useandend_turnappear in each flow.
Lecture 1.2 Recap Questions
- When is prompt guidance acceptable?
- When is deterministic enforcement required?
- What makes a workflow "business critical" from an enforcement perspective?
Homework:
- Take three workflows and classify each control as prompt-based or deterministic.
- Justify each classification in two sentences.
Lecture 1.3 Recap Questions
- What does the coordinator own that subagents do not?
- Why can downstream agents perform well and still produce incomplete outputs?
- Why is automatic context inheritance a bad assumption?
Homework:
- Design one coordinator prompt for a research system and one for a support system.
- Make both prompts goal-driven rather than step-by-step.
Lecture 1.4 Recap Questions
- What mechanism actually spawns subagents in the guide’s architecture?
- Why must
allowedToolsinclude"Task"? - What belongs in an
AgentDefinition? - Why is fork-based session management different from ordinary resumption?
Homework:
- Write a coordinator checklist for spawning subagents safely.
- Define two subagent roles, each with a description, a system prompt goal, and a restricted tool set.
Lecture 1.5 Recap Questions
- What kind of task fits prompt chaining best?
- What kind of task fits adaptive decomposition best?
- Why does narrow decomposition create hidden coverage risk?
Homework:
- Break one code-review workflow into prompt-chained steps.
- Break one open-ended investigation task into adaptive steps.
Lecture 1.6 Recap Questions
- Why should subagent context often be structured rather than free-form?
- When does parallelism help?
- What is the main risk of parallelism?
Homework:
- Create a structured handoff format for claim, source, excerpt, and date.
- Define one case where subagents should run sequentially instead.
Lecture 1.7 Recap Questions
- When is session resumption appropriate?
- When is starting fresh with a structured summary safer?
- What is a good use of forking?
Homework:
- Describe one stale-session failure mode and how to prevent it.
- Sketch a fork-based comparison between two refactoring approaches.
lecture summary Lecture Summary
By the end of Week 1, the student should understand that agentic systems are controlled workflows, not loose prompt chains. The most important rules are to drive loop control from stop_reason, to enforce critical prerequisites structurally when failure is costly, to understand that subagent spawning depends on the Task mechanism and correct allowedTools configuration, and to treat the coordinator as the owner of decomposition quality. If a multi-agent system produces incomplete work, the failure is often upstream in scope design rather than downstream in execution.
Memorize What To Memorize 0 / 10
addendum Task Statement Coverage Addendum
Use this section as the explicit checklist that Week 1 must cover.
Task Statement 1.1: Agentic Loops
Students must know:
- the full loop lifecycle: request, inspect
stop_reason, run tools, append results, continue - the difference between model-guided tool selection and hard-coded decision trees
- why parsing assistant prose for completion is an anti-pattern
Students must be able to:
- continue when
stop_reasonistool_use - stop when
stop_reasonisend_turn - append tool results correctly between iterations
Task Statement 1.2: Coordinator-Subagent Orchestration
Students must know:
- hub-and-spoke coordination patterns
- coordinator ownership of routing, aggregation, and recovery
- isolated subagent context
- why narrow coordinator decomposition causes incomplete coverage
Students must be able to:
- choose subagents dynamically based on task needs
- partition scope to reduce overlap
- re-delegate when synthesis reveals gaps
- keep inter-subagent traffic routed through the coordinator
Task Statement 1.3: Subagent Invocation, Context Passing, and Spawning
Students must know:
Taskis the spawning mechanismallowedToolsmust include"Task"for the coordinator to delegate- subagents do not automatically inherit parent state
AgentDefinitionshould include a role description, system prompt, and tool restrictions- fork-based sessions support divergent exploration from a shared baseline
Students must be able to:
- pass complete prior findings directly into downstream subagent prompts
- separate content from metadata in handoffs
- spawn parallel subagents in a single coordinator response
- write coordinator prompts that specify goals and quality criteria instead of brittle step lists
Task Statement 1.4: Multi-Step Workflows, Enforcement, and Handoff
Students must know:
- the difference between prompt guidance and programmatic enforcement
- why deterministic compliance matters for risky ordered workflows
- what a structured handoff must include for mid-process escalation
Students must be able to:
- implement prerequisite gates before downstream tool calls
- decompose multi-concern requests and investigate them in parallel where appropriate
- compile human-handoff summaries with customer details, root cause, amounts, and recommended action
Task Statement 1.5: Hooks and Interception
Students must know:
- post-tool hooks can normalize data before the model sees it
- outgoing tool-call hooks can block or redirect unsafe actions
- hooks provide deterministic guarantees when prompts are not enough
Students must be able to:
- normalize heterogeneous formats such as Unix timestamps, ISO 8601 values, and numeric codes
- intercept policy-violating actions and route them into escalation workflows
- explain when hooks are preferable to stronger prompt wording
Task Statement 1.6: Task Decomposition
Students must know:
- when prompt chaining fits better than adaptive decomposition
- how large reviews can be split into per-file and cross-file passes
- why open-ended work benefits from evolving plans
Students must be able to:
- choose decomposition patterns appropriate to the task
- split reviews into local and integration phases
- map unfamiliar codebases and adapt the plan as dependencies are discovered
Task Statement 1.7: Session State, Resumption, and Forking
Students must know:
- named
--resumecontinues specific prior work fork_sessionsupports independent branches from one baseline- stale tool results make naive resumption risky
- resumed work should be told what changed
Students must be able to:
- resume named investigations appropriately
- fork sessions to compare alternatives
- choose between resumption and fresh-start summary injection
- target re-analysis by communicating changed files explicitly
Lecture
Read in depth
17 sectionsFrom First Principles
At the most basic level, an agentic system exists because one model response is often insufficient for real work. Production tasks regularly require:
- retrieving information the model does not already have
- taking actions through tools
- reevaluating decisions after new facts arrive
- splitting work into specialized subproblems
That means the system cannot be designed as a single static prompt. It must behave like a controlled reasoning loop over changing state.
From first principles, Week 1 is about three realities:
- the model needs an explicit control loop
- high-risk steps need structural enforcement, not hopeful wording
- complex work needs deliberate decomposition, not just more tokens
If a student understands those three principles deeply, many Week 1 exam questions become straightforward.
Guided Walkthrough: Building A Refund Agent Correctly
Walkthrough goal:
- understand how loop control, prerequisites, and escalation fit together in one system
Step 1: Start with the user request
Example:
- "I was charged twice for order 12345 and I want my money back."
First-principles question:
- What facts does the model need that it cannot safely infer from the user message alone?
Expected answer:
- verified customer identity
- order ownership
- charge status
- refund eligibility
Step 2: Decide whether the system can rely on prompting alone
Ask:
- If a mistaken refund is costly, should the system merely instruct the model to verify identity first?
Expected answer:
- no, because the workflow needs a deterministic prerequisite
Step 3: Design the loop
The loop should:
- ask Claude for the next step
- inspect
stop_reason - run required tools
- append results
- continue until
end_turn
Step 4: Add enforcement
Before process_refund can run, the application should verify:
- a verified customer ID exists
- the order belongs to that customer
- the refund amount is within policy
Step 5: Add escalation
Escalation is required if:
- the user asks for a human
- policy is ambiguous
- identity remains unresolved
- the refund exceeds the autonomous threshold
Step 6: Consider multi-issue requests
If the user also says:
- "The replacement item was damaged too"
the coordinator should decompose the conversation into at least two issue tracks:
- duplicate charge
- damaged replacement
Those tracks can share verified identity context but still require separate investigation.
Teaching point:
The important lesson is that "agent intelligence" alone is not enough. Reliable systems are built by combining model-guided reasoning with deterministic workflow structure.
Week 1 Worked Pseudo-Architecture
User Request
|
v
Coordinator Loop
|
+--> allowedTools includes "Task"
|
+--> Task -> Search Subagent
| AgentDefinition:
| - description
| - system prompt
| - restricted tools
|
+--> Task -> Analysis Subagent
| AgentDefinition:
| - description
| - system prompt
| - restricted tools
|
+--> get_customer --------------+
| |
| verified customer ID
| |
+--> lookup_order --------------+
| |
| order ownership and status
| |
+--> policy gate / threshold check
| | |
| | +--> escalate_to_human
| |
| +--> process_refund
|
+--> final unified response
Week 1 Board Teaching Notes
- Draw the loop before discussing prompts. Students retain control flow better when they can point to state transitions.
- Ask students which parts are model-guided and which parts are deterministic. Keep pressing until they separate those cleanly.
- When teaching coordinator-subagent architecture, ask "who owns completeness?" The answer should be the coordinator, not the synthesis agent.
- Add a separate board segment for
Task,allowedTools, andAgentDefinition. Students should be able to explain exactly what enables spawning and what shapes each subagent role.
deep dive Deep Dive: Hooks, Enforcement, and Handoff Reliability
Some Week 1 concepts are easy to mention and easy to underteach. Hooks are one of them.
From first principles, a hook is useful when the application needs to alter or inspect a tool interaction at a point where the model itself should not be the only enforcement layer. The guide calls out hook patterns like PostToolUse and outgoing tool-call interception. These matter because they let the application enforce or normalize behavior deterministically.
Deep Dive A: PostToolUse as a Normalization Layer
Suppose three backend tools return dates in different formats:
- Unix timestamps
- ISO 8601 strings
- numeric status codes plus separate reason fields
If the model has to interpret each raw format directly every time, cognitive load rises and inconsistencies spread through the workflow. A PostToolUse hook can normalize those outputs before the model sees them, so the model reasons over a common representation.
Why that matters:
- it reduces accidental format confusion
- it prevents downstream prompt complexity from ballooning
- it keeps the model focused on decision quality rather than data cleanup
Deep Dive B: Outgoing Tool Interception
Imagine an autonomous refund workflow with a hard rule:
- refunds above
$500must go to a human
There are two possible designs:
- prompt Claude to remember the policy
- intercept the refund tool call and block or reroute it
The second is stronger because it guarantees compliance even when the model’s reasoning path varies.
Deep Dive C: Handoff Quality
Handoffs are not just summaries. In a real escalation, the human may not have the conversation transcript. A proper handoff should therefore stand on its own:
- customer or case ID
- issue type
- facts established so far
- root cause or likely root cause
- action already attempted
- recommendation for the next human step
A weak handoff says:
- "Customer upset. Needs help."
A strong handoff says:
- "Customer ID 48291. Duplicate charge confirmed on order 12345. Verified refund amount $84.50. Refund exceeds auto-threshold because second issue involves damaged replacement requiring manual override. Recommended action: review damaged-item exception and approve combined handling."
This level of structure is what exam questions are trying to reward.
Lecture 1.1: The Agentic Loop
Key Distinctions:
- loop control comes from API state, not from reading assistant prose for hints
tool_usemeans continue with tool execution, whileend_turnmeans stop the loop- having tools available is not the same as being instructed to use them
Common Wrong Answers:
- "Continue whenever the reply feels incomplete."
- "Stop once the assistant writes a natural-sounding answer."
- "Use a decision tree instead of inspecting
stop_reason."
What To Memorize:
stop_reasonis the control surface- append tool results and continue only on
tool_use - stop on
end_turn
Try It Yourself:
No single right answer — draft your attempt, then compare against the lecture's worked examples.
- Label three sample turns as
tool_useorend_turn. - Rewrite a brittle prose-parsing loop into a
stop_reason-based loop.
Check Your Understanding:
- Why is assistant wording a weak completion signal?
Show answer
The model's prose is a probabilistic surface, and the same wording can appear when more tool calls are still required. The API'sstop_reasonis the explicit completion signal — it tells the application whether the loop should continue (tool_use) or stop (end_turn). Driving control flow off interpreted text instead of explicit state is brittle by construction. - What must happen after a tool result is returned?
Show answer
The result must be appended to the message history and the loop must continue, sending the updated conversation back to Claude. The model cannot reason over information it has not seen, so a tool that executes but whose output never re-enters context creates a broken half-loop where work is done but never used.
An agentic loop is not "send one prompt and hope for the best." It is a control structure. Claude reasons over the current conversation, decides whether a tool is needed, requests that tool, receives the tool result back in context, then continues reasoning. This repeats until Claude reaches a natural stopping point.
The key control signal is stop_reason. For this exam, the distinction that matters most is:
tool_use: Claude wants one or more tools to run, so your loop should continue.end_turn: Claude is done with the current task and can produce the final answer for that turn.
This matters because many fragile implementations try to infer completion from assistant text. That is weak engineering. A model may say "I’m done" and still require a tool in the next turn if the loop is structured incorrectly. Or the opposite: it may produce text that looks incomplete even though end_turn has occurred. Control flow should follow explicit API signals, not prose interpretation.
Another core rule: tool results must be returned to Claude as part of the conversation history. The model cannot reason over information it has not seen. If a tool call fetches customer data, order details, or document metadata, that result must be injected back into the context for the next iteration. Otherwise the system becomes a broken half-loop where tools execute but the model does not get to use the output.
In production, iteration caps are still useful, but only as a guardrail. They are not the primary completion signal. A cap prevents runaway loops; it should not decide that normal work is done.
Example
Bad logic:
- ask Claude for a response
- if the response text contains "final answer", stop
- otherwise try to parse whether a tool is needed
Better logic:
- send the current conversation to Claude
- inspect
stop_reason - if
tool_use, execute the requested tool calls - append tool results to the conversation
- repeat
- if
end_turn, return the answer
Why this shows up on the exam
The exam likes tradeoff questions where one option is "add stronger prompting" and another is "enforce the control flow programmatically." If a workflow requires guaranteed ordering or deterministic compliance, the right answer is usually structural enforcement, not stronger prose.
📐 See the diagram: stop_reason as control surface.
exercise Guided Exercise 1.1
Write pseudocode for an agent loop that:
- receives a user request
- allows Claude to call tools
- continues while
stop_reason == "tool_use" - ends when
stop_reason == "end_turn"
Sample Answer
messages = [user_message]
while True:
response = call_claude(messages, tools=toolset)
if response.stop_reason == "tool_use":
messages.append(response.assistant_message)
for tool_call in response.tool_calls:
result = run_tool(tool_call)
messages.append(tool_result_message(tool_call.id, result))
continue
if response.stop_reason == "end_turn":
return response.final_text
raise UnexpectedStateError(response.stop_reason)
Lecture 1.2: Deterministic Enforcement vs Prompt Guidance
Key Distinctions:
- prompts guide model behavior, while deterministic controls guarantee compliance
- risky ordered workflows need gates and interception, not stronger wording
- policy enforcement belongs in system structure, not just in prompts
Common Wrong Answers:
- "Add more examples and the ordering issue will disappear."
- "Use stronger cautionary wording for mandatory business rules."
- "Trust the model if it usually follows the policy."
What To Memorize:
- deterministic gates beat probabilistic compliance
- hooks can normalize or block behavior
- structural enforcement is for costly failure modes
Try It Yourself:
No single right answer — draft your attempt, then compare against the lecture's worked examples.
- Classify three workflow controls as prompt-based or deterministic.
- Rewrite one risky prompt rule as a prerequisite gate.
Check Your Understanding:
- When is stronger prompting still insufficient?
Show answer
Whenever the failure mode is costly enough that probabilistic compliance is unacceptable — identity verification before financial actions, threshold-bound approvals, irreversible operations, legally mandated steps. Stronger wording reduces the rate of mistakes but does not eliminate them, and a system that "usually" enforces a critical rule has not enforced it. - What problem does an outgoing hook solve better than a prompt?
Show answer
It guarantees enforcement at the point of action. A prompt asks the model to remember and apply a rule; an outgoing hook intercepts the tool call itself and can block, rewrite, or redirect it regardless of the model's reasoning path. That guarantee is what makes hooks the right answer for high-cost policy breaches.
Not every workflow should be left entirely to model judgment. This exam expects you to know when probabilistic behavior is acceptable and when it is not.
Prompt guidance is useful for:
- prioritizing one reasonable tool over another
- giving escalation criteria
- describing quality standards
- nudging the model toward better decomposition
Prompt guidance is not enough for:
- identity verification before financial actions
- policy thresholds that must never be exceeded
- steps that are legally or operationally mandatory
- actions that can cause irreversible damage
If a support agent must never issue a refund above a threshold without human review, the correct fix is not "remind the model more strongly." The correct fix is to intercept or block the tool call programmatically. The same principle applies to prerequisite gates. If get_customer must happen before process_refund, enforce the dependency.
This is one of the highest-value distinctions in the exam.
exercise Guided Exercise 1.2
A support system sometimes processes refunds before identity verification. Choose the better fix and explain why:
- Add three more few-shot examples showing identity verification first.
- Block refund tools until verification returns a valid customer ID.
Sample Answer
The second fix is better. The first is still probabilistic and can fail on edge cases. The second gives a deterministic guarantee for a business-critical prerequisite.
Lecture 1.3: Coordinator-Subagent Architecture
Key Distinctions:
- the coordinator owns decomposition, routing, aggregation, and recovery
- subagents do bounded specialist work rather than global orchestration
- complete-looking synthesis can still hide upstream decomposition failure
Common Wrong Answers:
- "If subagents are strong enough, the coordinator does not matter much."
- "Coverage quality is mainly a synthesis problem."
- "Direct subagent-to-subagent traffic is preferable because it is faster."
What To Memorize:
- hub-and-spoke is the core pattern
- coordinator owns completeness
- subagent isolation is a feature, not a flaw
Try It Yourself:
No single right answer — draft your attempt, then compare against the lecture's worked examples.
- Diagnose whether a failure belongs to the coordinator or a subagent.
- Split one broad task into coordinator-owned and subagent-owned responsibilities.
Check Your Understanding:
- Who owns completeness in a multi-agent design?
Show answer
The coordinator. Subagents are responsible for the bounded slices they are assigned, but the question of whether the slices add up to a complete answer is a decomposition question — it lives at the layer that decided how to partition the work and which subagents to invoke. - Why can a polished report still indicate coordinator failure?
Show answer
Because synthesis quality and coverage quality are independent. A synthesis subagent given a narrow set of findings can produce a fluent, well-structured report on those findings while the broader topic remains under-covered. A polished output is evidence that the synthesis layer worked; it is not evidence that the coordinator decomposed correctly.
A multi-agent system is not just "many agents." It needs a coordination model. The exam focuses on the coordinator-subagent pattern, especially hub-and-spoke designs.
In this pattern:
- the coordinator receives the top-level task
- it decomposes the work
- it decides which subagents to invoke
- it routes information between them
- it handles recovery and aggregation
- it owns the final answer
Subagents do not automatically inherit the coordinator’s context. This is another trap the exam uses repeatedly. If the synthesis agent needs the findings from the web-search and document-analysis agents, those findings must be explicitly passed into its prompt or its structured inputs.
The coordinator should also avoid overly narrow decomposition. A common failure mode is when the coordinator breaks a broad problem into only one slice of the topic. If the task is "AI impact on creative industries" and the coordinator decomposes only into visual-art subtasks, the subagents may perform perfectly and still produce an incomplete report. In that case the subagents are not the problem; decomposition is.
Lecture 1.4: Subagent Invocation, `Task`, and `AgentDefinition`
Key Distinctions:
- spawning depends on
Task, not on vague multi-agent prompting - available delegation requires
allowedToolsto include"Task" - subagents need explicit context because they do not inherit parent memory automatically
Common Wrong Answers:
- "Subagents can infer the parent context from the session."
- "Good system prompts make
Taskconfiguration unnecessary." - "Agent roles matter more than tool restrictions."
What To Memorize:
Taskis the spawning mechanismAgentDefinitionshould include description, system prompt, and tool restrictions- forked sessions support divergent analysis from a shared baseline
Try It Yourself:
No single right answer — draft your attempt, then compare against the lecture's worked examples.
- Identify why a coordinator cannot spawn when
Taskis missing. - Rewrite a weak handoff to include explicit context and quality criteria.
Check Your Understanding:
- Why do subagents not automatically inherit parent context?
Show answer
Each subagent runs as its own model invocation with its own message list; there is no implicit shared memory acrossTaskcalls. The architecture treats subagents as isolated workers, which is a feature — it forces the coordinator to be explicit about what each subagent needs and prevents leak-through of irrelevant or sensitive context. - What belongs inside an
AgentDefinition?Show answer
A description of the role, the system prompt that shapes the subagent's behavior, and the tool restrictions that scope what it can do. Together those three configure the subagent as a specialist; leaving any of them generic weakens specialization and increases the chance of tool misuse or off-task work.
This lesson covers a mechanism that is explicitly named in the source guide and is important enough that students should be able to state it precisely.
Subagents are not invoked abstractly. In the architecture described by the guide, the coordinator uses the Task tool to spawn subagents. That means delegation depends on actual tool availability, not just good prompt wording. If a coordinator is expected to invoke subagents, its allowedTools must include "Task".
That gives us a concrete exam distinction:
- describing delegation in the prompt is not the same as enabling delegation in the system
- the coordinator can only spawn subagents if the spawning mechanism is actually allowed
This matters because many wrong answers on architecture questions sound plausible at the prompt layer while the real failure is at the configuration layer.
The second key concept is explicit context passing. Subagents do not automatically inherit the parent’s full history or shared memory across invocations. If the coordinator wants a synthesis subagent to use the findings from a web-search subagent and a document-analysis subagent, it must pass those findings explicitly.
Weak handoff:
- "Use what the previous agents found and produce a report."
Strong handoff:
- pass the actual claims, evidence excerpts, source URLs, dates, and document identifiers needed for synthesis
The third concept is AgentDefinition. A subagent should be configured intentionally rather than treated as a generic secondary model invocation. The guide explicitly calls out:
- description
- system prompt
- tool restrictions
Those settings define the role. A web-search agent, document-analysis agent, and synthesis agent should not all share the same instruction surface or tool access. If they do, specialization weakens and tool misuse becomes more likely.
The fourth concept is fork-based session management. Forking is useful when you want to branch from a shared analysis baseline into multiple possible approaches without contaminating the original line of reasoning. This is especially useful for:
- comparing two migration plans
- testing multiple investigation strategies
- exploring alternative synthesis structures from the same evidence base
Forking is not the same as ordinary resumption. Resumption continues one path. Forking creates multiple paths from a shared starting point.
Minimal Operational Checklist
For a coordinator to spawn subagents correctly:
- the coordinator must have access to the
Tasktool allowedToolsmust include"Task"- each subagent should have a clear
AgentDefinition - the coordinator should pass context explicitly
- the coordinator should scope each subagent’s tool access to its role
Failure Mode Example 1
A team writes a detailed coordinator prompt that says:
- "Delegate to specialized subagents when useful."
But no subagents are ever invoked.
The likely problem is not prompt wording. The likely problem is that the coordinator does not actually have access to Task, or allowedTools does not include "Task".
Failure Mode Example 2
A synthesis agent produces weak output and misses citations. The team blames the synthesis agent’s prompt.
The deeper issue may be that the coordinator handed off only a vague prose summary instead of explicit structured findings with provenance fields.
exercise Guided Exercise 1.3
A coordinator is supposed to spawn subagents, but this never happens in practice. What are the first three things you should verify?
Sample Answer
- Verify that the coordinator has access to the
Tasktool. - Verify that
allowedToolsincludes"Task". - Verify that the subagents are actually defined with usable role configuration and that the coordinator prompt can choose delegation.
exercise Guided Exercise 1.4
Why is this handoff weak?
- "Use the previous agents' findings and produce a final report."
Sample Answer
It assumes implicit inheritance and does not pass the actual information needed for synthesis. A stronger handoff would explicitly include the findings, source metadata, and evidence needed by the downstream subagent.
Lecture 1.5: Decomposition Strategies
Key Distinctions:
- prompt chaining fits fixed ordered workflows, while adaptive decomposition fits open-ended work
- decomposition quality determines coverage quality
- broad tasks need evolving plans rather than one static breakdown
Common Wrong Answers:
- "Always use prompt chains because they are simpler."
- "Adding more subagents automatically improves coverage."
- "Planning quality matters less than final synthesis quality."
What To Memorize:
- choose decomposition pattern by task shape
- use adaptive decomposition for uncertain or broad tasks
- split local versus integration review concerns
Try It Yourself:
No single right answer — draft your attempt, then compare against the lecture's worked examples.
- Choose prompt chaining or adaptive decomposition for three scenarios.
- Break one large review into local and cross-file passes.
Check Your Understanding:
- What kind of task is a poor fit for prompt chaining?
Show answer
Open-ended or exploratory work where the next step depends on what was discovered in the previous one. A fixed chain commits to a sequence in advance; if the early steps surface a finding that should redirect the investigation, the chain has no way to incorporate it. Adaptive decomposition is the right pattern for that shape of work. - Why does decomposition quality affect coverage quality?
Show answer
Coverage is bounded by what the decomposition asked for. If the coordinator partitions a broad topic into a narrow slice, the subagents can execute that slice perfectly and the final answer will still be incomplete. Improving the synthesis layer cannot recover information that was never gathered, which is why coverage failures usually trace back to scoping decisions made upstream.
The course guide distinguishes between two useful patterns:
- prompt chaining for predictable multi-step work
- adaptive decomposition for open-ended investigation
Prompt chaining works well when the workflow is known in advance. For example:
- analyze each file individually
- summarize file-level findings
- run a cross-file integration pass
Adaptive decomposition works better for open-ended work where the next step depends on what is discovered. For example:
- map the codebase
- identify high-risk modules
- inspect dependencies
- revise the plan after new findings emerge
The exam may ask which pattern fits a scenario. The right answer depends on predictability. If the work has known stages, prompt chaining is usually correct. If the work is exploratory and branching, adaptive decomposition is stronger.
📐 See the diagram: Prompt chain vs adaptive decomposition.
Lecture 1.6: Context Passing and Parallelism
Key Distinctions:
- explicit handoff beats assumed shared memory
- parallelism helps only when subtasks are independently scoped
- quality criteria should be passed with the task, not left implicit
Common Wrong Answers:
- "Spawn parallel agents first and clarify context later."
- "Metadata and content can be mixed loosely in handoffs."
- "Parallelization always improves quality."
What To Memorize:
- pass findings explicitly
- separate content from routing metadata
- parallelize only when subtasks are clearly separable
Try It Yourself:
No single right answer — draft your attempt, then compare against the lecture's worked examples.
- Rewrite a vague handoff into a complete subagent prompt.
- Decide whether two subtasks should run sequentially or in parallel.
Check Your Understanding:
- Why is assumed shared memory dangerous?
Show answer
Subagents only see what is passed to them, but a handoff written as if they already knew the context will produce silent gaps. The downstream subagent fills in plausible defaults, the synthesis layer treats those defaults as findings, and provenance is lost. Explicit handoff prevents the failure by making the unknowns visible. - What belongs in a high-quality handoff?
Show answer
The actual content the downstream subagent needs (claims, excerpts, source URLs, dates, document identifiers) separated from routing metadata, plus the explicit success criteria for the work being handed off. Vague summaries collapse content and metadata together and force the downstream agent to guess at structure.
Subagents need explicit context. That context should often be structured, not free-form. A strong design passes content and metadata separately, for example:
- claim
- supporting excerpt
- source URL
- publication date
- document name
That separation matters because the downstream agent must preserve provenance. If you only pass a flattened summary, the synthesis layer may lose attribution.
Parallelism also matters. If a coordinator can invoke multiple Task calls in one response, latency can be reduced significantly. But parallelization should not create duplicated work. Scope each subagent carefully:
- by subtopic
- by source type
- by question type
Lecture 1.7: Sessions, Resumption, and Forking
Key Distinctions:
- resumption continues prior work, while forking explores alternatives from a shared baseline
- stale tool outputs make naive resumption risky
- changed files or facts should be communicated explicitly on resume
Common Wrong Answers:
- "A resumed session automatically knows what changed."
- "Forking is only for experimentation, not for disciplined comparison."
- "Fresh restarts are always safer than targeted resumption."
What To Memorize:
- use named resume deliberately
- use forks for divergent approaches
- stale state is the main resumption risk
Try It Yourself:
No single right answer — draft your attempt, then compare against the lecture's worked examples.
- Choose resume, fork, or fresh start for three change scenarios.
- List what changed information should be passed into a resumed session.
Check Your Understanding:
- What is the main risk in naive session resumption?
Show answer
Stale state. The resumed session still trusts tool outputs and analysis from the original run, but the underlying world — files, data, system state — may have changed. Acting on stale evidence as if it were current is a quiet failure that produces confidently wrong work. Either tell the resumed session what changed or start fresh with a structured summary. - When is forking better than resuming?
Show answer
When you want multiple independent paths from a shared baseline — comparing two refactoring approaches, exploring alternative synthesis structures, isolating a verbose workflow from the main conversation. Resumption continues one line; forking creates parallel lines that can be evaluated against each other without contaminating the original.
The exam guide expects you to understand session state at a practical level:
- named resumption continues a prior investigation when the context is still mostly valid
- forking creates independent branches from a shared baseline
- fresh starts with injected summaries are better when old tool outputs have become stale
This is an engineering judgment issue. Resuming a session that analyzed old code and then blindly trusting that analysis after major changes is weak. In that case, either tell the resumed session what changed or start fresh with a structured summary.
Lecture 1.8: Independent Review — Why a Generator Should Not Grade Itself
Key Distinctions:
- a same-session reviewer inherits the generator's reasoning trail and tends to ratify it
- independence comes from a fresh context, not from a different system prompt in the same session
- "self-critique" prompts produce confidence calibration, not real review
Common Wrong Answers:
- "Add a 'now critically review your previous answer' prompt to the same session."
- "A more skeptical system prompt is enough to make a reviewer independent."
- "If the generator is strong enough, an independent reviewer is unnecessary overhead."
What To Memorize:
- the reviewer must not see the generator's chain of reasoning before forming its own opinion
- spawn the reviewer as a separate
Taskwith only the artifact and the criteria - a forked session counts as independent only if the fork point precedes the generator's reasoning
Try It Yourself:
No single right answer — draft your attempt, then compare against the lecture's worked examples.
- Sketch the message list a coordinator would pass to an independent code reviewer for a generated patch. Strip everything not needed.
- A team adds "Please check your work carefully and disagree if needed" to its synthesis prompt and reports better quality. Critique that intervention.
Check Your Understanding:
- Why does adding a self-critique step to the same session usually fail to catch the generator's mistakes?
Show answer
The model already committed to a reasoning path and the messages preserving that commitment are still in context. A self-critique prompt is steered by the same evidence and the same prior conclusions, so the model tends to defend the answer rather than re-evaluate it. Independence requires a context that does not include the prior reasoning trail. - A coordinator forks a session to run a reviewer subagent. Is that automatically independent?
Show answer
Not by itself. A fork inherits the baseline messages; if the generator's reasoning was already in that baseline, the reviewer sees it and can be primed by it. Independence requires either a fresh session seeded only with the artifact and criteria, or a fork from a baseline cut before the generator produced its output.
A reviewer needs to disagree with the generator. That sounds like a prompting problem, but it is mostly a context problem. When the same session that produced an answer is asked to grade it, the answer's reasoning chain is still in the model's view. The model has already justified the answer, and most of the messages that follow will continue along the same line. A "critically review your work" prompt arrives at the worst possible moment — after commitment, against the grain of the prior text, and without any new evidence to anchor a different conclusion.
The failure mode is quiet. The reviewer produces a plausible critique that catches surface issues — typos, formatting, the kind of thing the generator was already going to fix on a re-read — and ratifies the substantive decisions. Stakeholders see "review passed" and trust it. The structural mistakes that the reviewer would have caught with a clean view of just the artifact survive into production. This is the pattern behind exam questions that ask why same-session self-review is weaker than independent review: it is not a quality of the prompt, it is a property of the conversation.
The intervention is to construct a context that does not contain the generator's reasoning. The cleanest version is a separate Task-spawned subagent whose prompt contains only the artifact under review (the patch, the report, the plan), the explicit criteria, and any reference material. No transcript, no draft history, no "the previous agent thought X." If a fork is used, fork from a point before the generator started, and pass only the artifact across. The reviewer then produces an opinion against the artifact, not against the prior model's defense of it.
Caveat: independent review is not free. The reviewer pays the full context cost again, and you lose any context-sensitive judgment the generator was able to apply. For low-stakes work — a draft email, a one-off summary — same-session re-reads are fine. The independence rule applies when a wrong answer is expensive enough that ratification by the same reasoning chain would be a real failure mode.
📐 See the diagram: Independent review — what the reviewer sees.
Lecture 1.9: Subagent Failure Modes — Partial Results, Timeouts, and Re-delegation
Key Distinctions:
- a subagent that times out is not a subagent that returned nothing
- "the call failed" is not enough information for the coordinator to recover safely
- gap detection during synthesis is the coordinator's job, not the synthesis subagent's
Common Wrong Answers:
- "If a subagent times out, drop the partial results and rerun the whole task."
- "Surface a generic 'something went wrong' to the user and stop."
- "If synthesis looks complete, no follow-up delegation is needed."
What To Memorize:
- preserve partial results from a failed subagent; the coordinator decides whether they are usable
- a structured failure record carries: what was attempted, what completed, what failed, and why
- when synthesis surfaces a coverage gap, the coordinator re-delegates the gap, not the entire task
Try It Yourself:
No single right answer — draft your attempt, then compare against the lecture's worked examples.
- A research subagent retrieves three sources, then times out on the fourth. Write the structured failure context the subagent should hand back to the coordinator.
- A synthesis pass produces a confident-sounding report but the coordinator notices one of the requested subtopics is missing. Outline the re-delegation step rather than restarting.
Check Your Understanding:
- Why is "rerun the whole subagent task" the wrong default response to a partial-results timeout?
Show answer
The completed work is real evidence, and discarding it costs another full execution and another chance to time out. The right default is to surface the partial results plus a structured failure record, then let the coordinator decide whether to fill the gap with a narrowly scoped follow-up call instead of repeating the whole task. - The coordinator detects a gap during synthesis — one requested subtopic was never covered. What is the correct response?
Show answer
Re-delegate the specific gap. Spawn a focused subagent with the missing subtopic and the constraints needed to cover it, then re-synthesize. Restarting the full investigation wastes work, and silently shipping the incomplete report misrepresents what is known.
Subagent failures are rarely binary. A search subagent fetches three of four sources before its time budget runs out. A document-analysis subagent extracts most of a long PDF before hitting an exception on a malformed page. A coordinator-spawned tool call returns a permission error after one valid result. In every case, real work was done. The naive recovery — discard, retry — throws away evidence that is more reliable than anything a second attempt is likely to produce, and often hits the same boundary the second time.
The failure mode is two-sided. On one side, callers swallow the error and present partial results as if the run completed; the synthesis layer reports confidently on incomplete evidence and the user does not know the difference. On the other side, callers raise a generic "operation failed," drop the partial work, and force a full rerun. Both sides are wrong because they collapse three separate facts — what was attempted, what completed, what failed — into one signal.
The intervention is a structured failure context. When a subagent cannot finish, it returns the work it did complete, an explicit statement of what was not attempted or not finished, and the reason (timeout, validation error, permission denial, upstream 5xx). The coordinator now has enough to choose: synthesize on what is available with explicit gaps, re-delegate the unfinished portion to a narrower subagent with a fresh budget, or escalate. The same logic applies when synthesis itself surfaces a gap that the original decomposition missed — re-delegate the gap with focused scope, do not restart the whole investigation.
Caveat: structured failure context only helps if the coordinator actually inspects it. A coordinator that handles every failure with the same retry-or-give-up policy gains nothing from richer error data. The architectural commitment is upstream: error envelopes that the coordinator's synthesis logic is built to read.
Lecture 1.10: Handoff Quality and Human Escalation
Key Distinctions:
- a handoff is a self-contained brief, not a transcript reference
- explicit human requests are escalation triggers and should not be re-evaluated for "complexity"
- escalation criteria belong to the system, not to the model's discretion alone
Common Wrong Answers:
- "If the user asks for a human but the issue looks easy, the agent should keep trying first."
- "A short status update like 'customer needs help with refund' is enough for a human to take over."
- "Escalation is a fallback for when the agent gets stuck, not a normal control path."
What To Memorize:
- a strong handoff includes case identifier, issue type, established facts, root cause hypothesis, actions attempted, and a recommended next step
- an explicit user request for a human is honored immediately, regardless of perceived issue difficulty
- escalation criteria — policy thresholds, identity gaps, explicit requests — are deterministic triggers, not nudges
Try It Yourself:
No single right answer — draft your attempt, then compare against the lecture's worked examples.
- Take a weak handoff like "Customer upset, refund issue" and rewrite it for a human who has no transcript access.
- A user with a $20 billing question writes "I want to talk to a person." Decide whether the agent should escalate immediately and justify the choice.
Check Your Understanding:
- Why does an agent honor an explicit human-request even when the underlying issue looks simple?
Show answer
The user has stated a preference about how the issue should be handled, and that preference is itself the request. Re-evaluating it against the agent's own difficulty estimate substitutes the agent's judgment for the user's, which both delays resolution and damages trust. Honoring the request is the correct default. - What turns a status update into a usable handoff?
Show answer
Self-containment. A handoff is read by a human who likely cannot scroll the transcript, so it must carry the case identifier, what is established, what was attempted, and what the next human step should be. A status update like "customer needs help" describes the situation; a handoff describes what the human needs to do.
Two patterns recur in escalation questions. The first is the explicit human request. A user types "I want a human" or "transfer me to a person" or "stop, I want to talk to someone real." The user's words are the escalation trigger, full stop. An agent that answers "I can help with that — what is your order number?" or that runs through a complexity check first is overriding a stated preference, and the exam treats this as a clear miss. The same reasoning applies to ambiguous-but-emphatic frustration when paired with policy-sensitive operations: route, do not improvise.
The second pattern is the handoff itself. Escalation without a usable handoff is just abandonment. The default failure mode is a one-line status — "customer upset, needs refund help" — that forces the human to read the entire transcript before they can act, and most escalation surfaces do not show the transcript anyway. The result is wait time, repeated questions to the user, and a worse experience than if the agent had stayed with the issue. A strong handoff stands on its own: case identifier, issue type, facts established (verified customer ID, order ID, charge status), root cause if known, actions attempted by the agent, and a specific recommended next step for the human.
The intervention is a templated escalation tool, not a free-form prompt. The escalation tool's schema requires the structured fields, and a hook can validate the handoff before the escalation actually fires. That makes the escalation deterministic both at the trigger (explicit request, threshold breach, identity gap) and at the message (validated structure, no fields left blank). Prompting alone cannot guarantee either side.
Caveat: there is a real cost to over-escalation, especially for systems where humans are scarce and slow. The deterministic triggers should be calibrated — the explicit-request rule is unconditional, but the threshold and identity-gap rules should be set with the operational team that will absorb the volume. Honoring "I want a human" is non-negotiable; defining "policy threshold" requires a real number.
Drill
Memorize & spot misconceptions
4 sectionsFlashcards Core Vocabulary 19 terms
Click a card to flip it. Keyboard: space toggles focused card.
Common misconceptions Common Misconceptions
-
“If the model says it is done, the loop should end.”
Wrong because explicit API state is the correct control surface.
-
“More examples can replace business-rule enforcement.”
Wrong because probabilistic compliance is not enough for mandatory constraints.
-
“Subagents know what the coordinator knows.”
Wrong because context must be passed explicitly.
-
“If the final answer is coherent, the decomposition must have been good.”
Wrong because coherent outputs can still be incomplete.
-
“Same-session self-review catches the generator's mistakes.”
Wrong because the reviewer inherits the generator's reasoning trail and tends to ratify it; independence requires a context that does not contain the prior chain.
-
“If a subagent fails, the work it already completed should be discarded.”
Wrong because partial results are real evidence; the coordinator's job is to decide whether they are usable, not to throw them away.
-
“If the user asks for a human but the issue looks easy, the agent should still try to resolve it first.”
Wrong because an explicit human request is itself the escalation trigger, and overriding it substitutes the agent's judgment for the user's stated preference.
-
“Forking is only useful when comparing alternative paths.”
Wrong because forking also isolates verbose or exploratory work from the main conversation, keeping the primary context clean.
-
“Always invoking every subagent guarantees coverage.”
Wrong because routing is itself a coverage decision; invoking everything wastes effort and produces noisy synthesis without addressing query-specific needs.
Key distinctions Key Distinctions
-
tool_usevsend_turncontinue the loop only when the API state requires tool execution, not when prose merely sounds unfinished.
- prompt guidance vs deterministic control
use prompts for judgment and routing, but use gates, hooks, and interception for mandatory policy or ordering constraints.
- coordinator failure vs subagent failure
incomplete coverage often starts in decomposition, even when each subagent executes its assigned task well.
- prompt chaining vs adaptive decomposition
fixed chains fit stable workflows, while broad or uncertain tasks need evolving decomposition.
- context presence vs context inheritance
subagents use only what the coordinator explicitly passes, not what the parent session "already knows."
- same-session self-review vs independent review
a reviewer in the same conversation inherits the generator's reasoning chain; independence requires a context that does not contain it.
- partial-result preservation vs generic failure surfacing
a structured failure record carries what completed and what failed; "operation failed" is the wrong abstraction.
- escalation trigger vs model discretion
explicit human requests, threshold breaches, and identity gaps are deterministic triggers — not optional nudges the model can override.
- structured state export vs session resumption
long-running workflows recover from explicit state manifests, not from trusting that a resumed session still understands the world.
Don't say this Common Wrong Answers
- "Add more prompting so the model remembers to verify identity first."
- "If the report reads well, the orchestration must be correct."
- "Invoke all subagents every time to guarantee coverage."
- "End the workflow when the assistant sounds finished."
- "Assume subagents can infer missing context from the larger conversation."
- "Add 'now critically review your answer' to the same session."
- "If a subagent times out, drop the partial results and rerun."
- "Escalate only after the agent has tried everything else."
- "Forking is only for exploring alternative paths, not for keeping the main conversation clean."
Lab
Practice
2 sectionscase study Worked Case Study
Case:
A returns assistant performs well on simple requests but occasionally refunds the wrong account after matching a customer by name only.
Analysis:
- The primary failure is not "lack of examples."
- The critical issue is that identity verification is not enforced before order or refund operations.
- A secondary risk is that the agent may be using ambiguous lookup inputs without requiring a unique identifier.
Best redesign:
- require
get_customerto return a verified customer ID beforelookup_orderorprocess_refund - ask for clarification when multiple customer matches exist
- preserve customer ID and order ID in a structured facts block
- escalate when policy or identity remains unresolved
lab Lab
Design a customer support resolution agent that handles returns, disputes, and account issues.
Requirements:
- tools:
get_customer,lookup_order,process_refund,escalate_to_human - refunds require prior identity verification
- multi-issue requests should be decomposed
- escalations must include customer ID, root cause, refund amount if relevant, and recommended action
What a strong design includes
- loop control based on
stop_reason - programmatic prerequisite gate before refund
- decomposition of multi-concern requests into separate tracks
- structured escalation summary for humans who do not have the full conversation transcript
Quiz
Test yourself
2 sectionsQuiz
- What is the strongest signal that an agentic loop should continue?
A. Assistant text looks incomplete
B.
stop_reason == "tool_use"C. There are tools available D. The system prompt requests another pass
Answer: B
- Why is checking assistant prose for completion weak? A. It is expensive B. It prevents tool use C. It relies on natural-language interpretation instead of explicit API state D. It only works for JSON
Answer: C
- Which is true of subagents? A. They inherit parent context automatically B. They require explicit context injection C. They cannot run in parallel D. They do not need tool restrictions
Answer: B
- What is the best first response when a critical workflow step must always happen before another? A. Add more examples B. Enforce the prerequisite programmatically C. Raise the context window D. Use sentiment analysis
Answer: B
- If a broad topic is consistently under-covered, what is the most likely root cause? A. The synthesis agent is too slow B. The coordinator decomposed the task too narrowly C. The web agent needs more tokens D. The user prompt is too short
Answer: B
- A coordinator is expected to spawn subagents but never does. Which is the best first thing to verify?
A. The context window is large enough
B. The coordinator has
Taskavailable andallowedToolsincludes"Task"C. The synthesis agent has more examples D. The final answer prompt is more explicit
Answer: B
Week 1 Quiz Explanations
Bis correct because loop progression should follow explicit API state.AandDare indirect signals.Csays nothing about whether the model requested tool execution.Cis correct because prose interpretation is probabilistic and brittle.Ais not the main issue.Bis false.Dis unrelated.Bis correct because subagents require explicit context passing.AandDare incorrect assumptions.Cis false because parallel spawning is explicitly supported.Bis correct because critical ordering constraints require deterministic enforcement.Astill leaves failure probability.Cis irrelevant.Dsolves the wrong problem.Bis correct because incomplete coverage often begins with narrow decomposition by the coordinator.A,C, andDare downstream or weaker explanations.Bis correct because subagent spawning depends on the actual mechanism being available.Ais unrelated.CandDaddress prompt quality rather than enabling delegation.
Test
Short Answer
- Explain the difference between
tool_useandend_turn. - When should a system choose adaptive decomposition instead of prompt chaining?
- Why is structured escalation data important for human handoff?
Scenario Question
Your multi-agent research system produces well-written but incomplete reports. Logs show the coordinator always invokes all subagents, but on broad topics it assigns overly narrow subtasks. What is the architectural fix?
Sample Answer
The problem is coordinator decomposition, not downstream execution quality. The coordinator should inspect query breadth, partition the scope more comprehensively, and use iterative gap checking before final synthesis. It should invoke only the necessary subagents and should re-delegate targeted follow-up tasks when coverage gaps are detected.
Week 1 Test Rubric
- Full credit: explains
stop_reasoncorrectly, identifies coordinator decomposition as the root issue, and proposes explicit gap detection or re-delegation. - Partial credit: identifies the right component but proposes only vague prompt improvements.
- Low credit: blames synthesis quality or recommends adding more tools without fixing decomposition.