Agentic · 27%
32 sections · 19 concepts · 9 misconceptions · 6 exercises

Overview

Orient

7 sections

objective Objective

Understand how agentic systems actually work turn by turn, where deterministic controls belong, and how multi-agent systems should be decomposed and coordinated.

chapter map Chapter Map

Estimated study time:

  • Reading: 2.5 to 3.5 hours
  • Whiteboarding and drills: 2 hours
  • Lab design work: 2 to 3 hours
  • Quiz and review: 1 to 1.5 hours

Sub-lessons:

  1. Why agentic systems need loops instead of one-shot prompting
  2. Why explicit control signals beat natural-language guesses
  3. When deterministic enforcement is mandatory
  4. How coordinators and subagents divide responsibility
  5. How subagent invocation and spawning actually work
  6. How decomposition quality shapes coverage quality
  7. How session state, resumption, and forking affect reliability

Suggested 5-Day Teaching Flow

Day 1:

  • teach the agentic loop from first principles
  • diagram a tool-using request across multiple turns
  • compare correct loop control with brittle text-parsing approaches

Day 2:

  • teach deterministic enforcement versus prompt guidance
  • workshop business-critical prerequisite examples
  • introduce coordinator-subagent orchestration

Day 3:

  • teach Task-based subagent invocation and configuration
  • run a guided architecture walkthrough for support and research scenarios
  • practice explicit context passing between subagents
  • compare prompt chaining and adaptive decomposition

Day 4:

  • hold a whiteboard session on decomposition failures
  • review common misconceptions
  • run the weekly quiz and discussion review

Day 5:

  • run the week test under time pressure
  • debrief why wrong answers solve the wrong system layer

End-of-Lecture Recap and Homework

Lecture 1.1 Recap Questions

  1. Why is stop_reason the real control surface for the loop?
  2. What breaks if tool results are not appended back into the conversation?
  3. Why is an iteration cap a safeguard instead of the main stopping rule?

Homework:

  • Write agent-loop pseudocode for one support scenario and one developer-tooling scenario.
  • Mark where tool_use and end_turn appear in each flow.

Lecture 1.2 Recap Questions

  1. When is prompt guidance acceptable?
  2. When is deterministic enforcement required?
  3. What makes a workflow "business critical" from an enforcement perspective?

Homework:

  • Take three workflows and classify each control as prompt-based or deterministic.
  • Justify each classification in two sentences.

Lecture 1.3 Recap Questions

  1. What does the coordinator own that subagents do not?
  2. Why can downstream agents perform well and still produce incomplete outputs?
  3. Why is automatic context inheritance a bad assumption?

Homework:

  • Design one coordinator prompt for a research system and one for a support system.
  • Make both prompts goal-driven rather than step-by-step.

Lecture 1.4 Recap Questions

  1. What mechanism actually spawns subagents in the guide’s architecture?
  2. Why must allowedTools include "Task"?
  3. What belongs in an AgentDefinition?
  4. Why is fork-based session management different from ordinary resumption?

Homework:

  • Write a coordinator checklist for spawning subagents safely.
  • Define two subagent roles, each with a description, a system prompt goal, and a restricted tool set.

Lecture 1.5 Recap Questions

  1. What kind of task fits prompt chaining best?
  2. What kind of task fits adaptive decomposition best?
  3. Why does narrow decomposition create hidden coverage risk?

Homework:

  • Break one code-review workflow into prompt-chained steps.
  • Break one open-ended investigation task into adaptive steps.

Lecture 1.6 Recap Questions

  1. Why should subagent context often be structured rather than free-form?
  2. When does parallelism help?
  3. What is the main risk of parallelism?

Homework:

  • Create a structured handoff format for claim, source, excerpt, and date.
  • Define one case where subagents should run sequentially instead.

Lecture 1.7 Recap Questions

  1. When is session resumption appropriate?
  2. When is starting fresh with a structured summary safer?
  3. What is a good use of forking?

Homework:

  • Describe one stale-session failure mode and how to prevent it.
  • Sketch a fork-based comparison between two refactoring approaches.

lecture summary Lecture Summary

By the end of Week 1, the student should understand that agentic systems are controlled workflows, not loose prompt chains. The most important rules are to drive loop control from stop_reason, to enforce critical prerequisites structurally when failure is costly, to understand that subagent spawning depends on the Task mechanism and correct allowedTools configuration, and to treat the coordinator as the owner of decomposition quality. If a multi-agent system produces incomplete work, the failure is often upstream in scope design rather than downstream in execution.

Memorize What To Memorize 0 / 10

addendum Task Statement Coverage Addendum

Use this section as the explicit checklist that Week 1 must cover.

Task Statement 1.1: Agentic Loops

Students must know:

  • the full loop lifecycle: request, inspect stop_reason, run tools, append results, continue
  • the difference between model-guided tool selection and hard-coded decision trees
  • why parsing assistant prose for completion is an anti-pattern

Students must be able to:

  • continue when stop_reason is tool_use
  • stop when stop_reason is end_turn
  • append tool results correctly between iterations

Task Statement 1.2: Coordinator-Subagent Orchestration

Students must know:

  • hub-and-spoke coordination patterns
  • coordinator ownership of routing, aggregation, and recovery
  • isolated subagent context
  • why narrow coordinator decomposition causes incomplete coverage

Students must be able to:

  • choose subagents dynamically based on task needs
  • partition scope to reduce overlap
  • re-delegate when synthesis reveals gaps
  • keep inter-subagent traffic routed through the coordinator

Task Statement 1.3: Subagent Invocation, Context Passing, and Spawning

Students must know:

  • Task is the spawning mechanism
  • allowedTools must include "Task" for the coordinator to delegate
  • subagents do not automatically inherit parent state
  • AgentDefinition should include a role description, system prompt, and tool restrictions
  • fork-based sessions support divergent exploration from a shared baseline

Students must be able to:

  • pass complete prior findings directly into downstream subagent prompts
  • separate content from metadata in handoffs
  • spawn parallel subagents in a single coordinator response
  • write coordinator prompts that specify goals and quality criteria instead of brittle step lists

Task Statement 1.4: Multi-Step Workflows, Enforcement, and Handoff

Students must know:

  • the difference between prompt guidance and programmatic enforcement
  • why deterministic compliance matters for risky ordered workflows
  • what a structured handoff must include for mid-process escalation

Students must be able to:

  • implement prerequisite gates before downstream tool calls
  • decompose multi-concern requests and investigate them in parallel where appropriate
  • compile human-handoff summaries with customer details, root cause, amounts, and recommended action

Task Statement 1.5: Hooks and Interception

Students must know:

  • post-tool hooks can normalize data before the model sees it
  • outgoing tool-call hooks can block or redirect unsafe actions
  • hooks provide deterministic guarantees when prompts are not enough

Students must be able to:

  • normalize heterogeneous formats such as Unix timestamps, ISO 8601 values, and numeric codes
  • intercept policy-violating actions and route them into escalation workflows
  • explain when hooks are preferable to stronger prompt wording

Task Statement 1.6: Task Decomposition

Students must know:

  • when prompt chaining fits better than adaptive decomposition
  • how large reviews can be split into per-file and cross-file passes
  • why open-ended work benefits from evolving plans

Students must be able to:

  • choose decomposition patterns appropriate to the task
  • split reviews into local and integration phases
  • map unfamiliar codebases and adapt the plan as dependencies are discovered

Task Statement 1.7: Session State, Resumption, and Forking

Students must know:

  • named --resume continues specific prior work
  • fork_session supports independent branches from one baseline
  • stale tool results make naive resumption risky
  • resumed work should be told what changed

Students must be able to:

  • resume named investigations appropriately
  • fork sessions to compare alternatives
  • choose between resumption and fresh-start summary injection
  • target re-analysis by communicating changed files explicitly

Lecture

Read in depth

17 sections

From First Principles

At the most basic level, an agentic system exists because one model response is often insufficient for real work. Production tasks regularly require:

  • retrieving information the model does not already have
  • taking actions through tools
  • reevaluating decisions after new facts arrive
  • splitting work into specialized subproblems

That means the system cannot be designed as a single static prompt. It must behave like a controlled reasoning loop over changing state.

From first principles, Week 1 is about three realities:

  • the model needs an explicit control loop
  • high-risk steps need structural enforcement, not hopeful wording
  • complex work needs deliberate decomposition, not just more tokens

If a student understands those three principles deeply, many Week 1 exam questions become straightforward.

Guided Walkthrough: Building A Refund Agent Correctly

Walkthrough goal:

  • understand how loop control, prerequisites, and escalation fit together in one system

Step 1: Start with the user request

Example:

  • "I was charged twice for order 12345 and I want my money back."

First-principles question:

  • What facts does the model need that it cannot safely infer from the user message alone?

Expected answer:

  • verified customer identity
  • order ownership
  • charge status
  • refund eligibility

Step 2: Decide whether the system can rely on prompting alone

Ask:

  • If a mistaken refund is costly, should the system merely instruct the model to verify identity first?

Expected answer:

  • no, because the workflow needs a deterministic prerequisite

Step 3: Design the loop

The loop should:

  • ask Claude for the next step
  • inspect stop_reason
  • run required tools
  • append results
  • continue until end_turn

Step 4: Add enforcement

Before process_refund can run, the application should verify:

  • a verified customer ID exists
  • the order belongs to that customer
  • the refund amount is within policy

Step 5: Add escalation

Escalation is required if:

  • the user asks for a human
  • policy is ambiguous
  • identity remains unresolved
  • the refund exceeds the autonomous threshold

Step 6: Consider multi-issue requests

If the user also says:

  • "The replacement item was damaged too"

the coordinator should decompose the conversation into at least two issue tracks:

  • duplicate charge
  • damaged replacement

Those tracks can share verified identity context but still require separate investigation.

Teaching point:

The important lesson is that "agent intelligence" alone is not enough. Reliable systems are built by combining model-guided reasoning with deterministic workflow structure.

Week 1 Worked Pseudo-Architecture

User Request
   |
   v
Coordinator Loop
   |
   +--> allowedTools includes "Task"
   |
   +--> Task -> Search Subagent
   |         AgentDefinition:
   |         - description
   |         - system prompt
   |         - restricted tools
   |
   +--> Task -> Analysis Subagent
   |         AgentDefinition:
   |         - description
   |         - system prompt
   |         - restricted tools
   |
   +--> get_customer --------------+
   |                               |
   |                      verified customer ID
   |                               |
   +--> lookup_order --------------+
   |                               |
   |                    order ownership and status
   |                               |
   +--> policy gate / threshold check
   |          |             |
   |          |             +--> escalate_to_human
   |          |
   |          +--> process_refund
   |
   +--> final unified response

Week 1 Board Teaching Notes

  • Draw the loop before discussing prompts. Students retain control flow better when they can point to state transitions.
  • Ask students which parts are model-guided and which parts are deterministic. Keep pressing until they separate those cleanly.
  • When teaching coordinator-subagent architecture, ask "who owns completeness?" The answer should be the coordinator, not the synthesis agent.
  • Add a separate board segment for Task, allowedTools, and AgentDefinition. Students should be able to explain exactly what enables spawning and what shapes each subagent role.

deep dive Deep Dive: Hooks, Enforcement, and Handoff Reliability

Some Week 1 concepts are easy to mention and easy to underteach. Hooks are one of them.

From first principles, a hook is useful when the application needs to alter or inspect a tool interaction at a point where the model itself should not be the only enforcement layer. The guide calls out hook patterns like PostToolUse and outgoing tool-call interception. These matter because they let the application enforce or normalize behavior deterministically.

Deep Dive A: PostToolUse as a Normalization Layer

Suppose three backend tools return dates in different formats:

  • Unix timestamps
  • ISO 8601 strings
  • numeric status codes plus separate reason fields

If the model has to interpret each raw format directly every time, cognitive load rises and inconsistencies spread through the workflow. A PostToolUse hook can normalize those outputs before the model sees them, so the model reasons over a common representation.

Why that matters:

  • it reduces accidental format confusion
  • it prevents downstream prompt complexity from ballooning
  • it keeps the model focused on decision quality rather than data cleanup

Deep Dive B: Outgoing Tool Interception

Imagine an autonomous refund workflow with a hard rule:

  • refunds above $500 must go to a human

There are two possible designs:

  • prompt Claude to remember the policy
  • intercept the refund tool call and block or reroute it

The second is stronger because it guarantees compliance even when the model’s reasoning path varies.

Deep Dive C: Handoff Quality

Handoffs are not just summaries. In a real escalation, the human may not have the conversation transcript. A proper handoff should therefore stand on its own:

  • customer or case ID
  • issue type
  • facts established so far
  • root cause or likely root cause
  • action already attempted
  • recommendation for the next human step

A weak handoff says:

  • "Customer upset. Needs help."

A strong handoff says:

  • "Customer ID 48291. Duplicate charge confirmed on order 12345. Verified refund amount $84.50. Refund exceeds auto-threshold because second issue involves damaged replacement requiring manual override. Recommended action: review damaged-item exception and approve combined handling."

This level of structure is what exam questions are trying to reward.

Lecture 1.1: The Agentic Loop

Key Distinctions:

  • loop control comes from API state, not from reading assistant prose for hints
  • tool_use means continue with tool execution, while end_turn means stop the loop
  • having tools available is not the same as being instructed to use them

Common Wrong Answers:

  • "Continue whenever the reply feels incomplete."
  • "Stop once the assistant writes a natural-sounding answer."
  • "Use a decision tree instead of inspecting stop_reason."

What To Memorize:

  • stop_reason is the control surface
  • append tool results and continue only on tool_use
  • stop on end_turn

Try It Yourself:

No single right answer — draft your attempt, then compare against the lecture's worked examples.

  • Label three sample turns as tool_use or end_turn.
  • Rewrite a brittle prose-parsing loop into a stop_reason-based loop.

Check Your Understanding:

  • Why is assistant wording a weak completion signal?
    Show answerThe model's prose is a probabilistic surface, and the same wording can appear when more tool calls are still required. The API's stop_reason is the explicit completion signal — it tells the application whether the loop should continue (tool_use) or stop (end_turn). Driving control flow off interpreted text instead of explicit state is brittle by construction.
  • What must happen after a tool result is returned?
    Show answerThe result must be appended to the message history and the loop must continue, sending the updated conversation back to Claude. The model cannot reason over information it has not seen, so a tool that executes but whose output never re-enters context creates a broken half-loop where work is done but never used.

An agentic loop is not "send one prompt and hope for the best." It is a control structure. Claude reasons over the current conversation, decides whether a tool is needed, requests that tool, receives the tool result back in context, then continues reasoning. This repeats until Claude reaches a natural stopping point.

The key control signal is stop_reason. For this exam, the distinction that matters most is:

  • tool_use: Claude wants one or more tools to run, so your loop should continue.
  • end_turn: Claude is done with the current task and can produce the final answer for that turn.

This matters because many fragile implementations try to infer completion from assistant text. That is weak engineering. A model may say "I’m done" and still require a tool in the next turn if the loop is structured incorrectly. Or the opposite: it may produce text that looks incomplete even though end_turn has occurred. Control flow should follow explicit API signals, not prose interpretation.

Another core rule: tool results must be returned to Claude as part of the conversation history. The model cannot reason over information it has not seen. If a tool call fetches customer data, order details, or document metadata, that result must be injected back into the context for the next iteration. Otherwise the system becomes a broken half-loop where tools execute but the model does not get to use the output.

In production, iteration caps are still useful, but only as a guardrail. They are not the primary completion signal. A cap prevents runaway loops; it should not decide that normal work is done.

Example

Bad logic:

  • ask Claude for a response
  • if the response text contains "final answer", stop
  • otherwise try to parse whether a tool is needed

Better logic:

  1. send the current conversation to Claude
  2. inspect stop_reason
  3. if tool_use, execute the requested tool calls
  4. append tool results to the conversation
  5. repeat
  6. if end_turn, return the answer

Why this shows up on the exam

The exam likes tradeoff questions where one option is "add stronger prompting" and another is "enforce the control flow programmatically." If a workflow requires guaranteed ordering or deterministic compliance, the right answer is usually structural enforcement, not stronger prose.

📐 See the diagram: stop_reason as control surface.

exercise Guided Exercise 1.1

Write pseudocode for an agent loop that:

  • receives a user request
  • allows Claude to call tools
  • continues while stop_reason == "tool_use"
  • ends when stop_reason == "end_turn"

Sample Answer

messages = [user_message]

while True:
    response = call_claude(messages, tools=toolset)

    if response.stop_reason == "tool_use":
        messages.append(response.assistant_message)
        for tool_call in response.tool_calls:
            result = run_tool(tool_call)
            messages.append(tool_result_message(tool_call.id, result))
        continue

    if response.stop_reason == "end_turn":
        return response.final_text

    raise UnexpectedStateError(response.stop_reason)

Lecture 1.2: Deterministic Enforcement vs Prompt Guidance

Key Distinctions:

  • prompts guide model behavior, while deterministic controls guarantee compliance
  • risky ordered workflows need gates and interception, not stronger wording
  • policy enforcement belongs in system structure, not just in prompts

Common Wrong Answers:

  • "Add more examples and the ordering issue will disappear."
  • "Use stronger cautionary wording for mandatory business rules."
  • "Trust the model if it usually follows the policy."

What To Memorize:

  • deterministic gates beat probabilistic compliance
  • hooks can normalize or block behavior
  • structural enforcement is for costly failure modes

Try It Yourself:

No single right answer — draft your attempt, then compare against the lecture's worked examples.

  • Classify three workflow controls as prompt-based or deterministic.
  • Rewrite one risky prompt rule as a prerequisite gate.

Check Your Understanding:

  • When is stronger prompting still insufficient?
    Show answerWhenever the failure mode is costly enough that probabilistic compliance is unacceptable — identity verification before financial actions, threshold-bound approvals, irreversible operations, legally mandated steps. Stronger wording reduces the rate of mistakes but does not eliminate them, and a system that "usually" enforces a critical rule has not enforced it.
  • What problem does an outgoing hook solve better than a prompt?
    Show answerIt guarantees enforcement at the point of action. A prompt asks the model to remember and apply a rule; an outgoing hook intercepts the tool call itself and can block, rewrite, or redirect it regardless of the model's reasoning path. That guarantee is what makes hooks the right answer for high-cost policy breaches.

Not every workflow should be left entirely to model judgment. This exam expects you to know when probabilistic behavior is acceptable and when it is not.

Prompt guidance is useful for:

  • prioritizing one reasonable tool over another
  • giving escalation criteria
  • describing quality standards
  • nudging the model toward better decomposition

Prompt guidance is not enough for:

  • identity verification before financial actions
  • policy thresholds that must never be exceeded
  • steps that are legally or operationally mandatory
  • actions that can cause irreversible damage

If a support agent must never issue a refund above a threshold without human review, the correct fix is not "remind the model more strongly." The correct fix is to intercept or block the tool call programmatically. The same principle applies to prerequisite gates. If get_customer must happen before process_refund, enforce the dependency.

This is one of the highest-value distinctions in the exam.

exercise Guided Exercise 1.2

A support system sometimes processes refunds before identity verification. Choose the better fix and explain why:

  • Add three more few-shot examples showing identity verification first.
  • Block refund tools until verification returns a valid customer ID.

Sample Answer

The second fix is better. The first is still probabilistic and can fail on edge cases. The second gives a deterministic guarantee for a business-critical prerequisite.

Lecture 1.3: Coordinator-Subagent Architecture

Key Distinctions:

  • the coordinator owns decomposition, routing, aggregation, and recovery
  • subagents do bounded specialist work rather than global orchestration
  • complete-looking synthesis can still hide upstream decomposition failure

Common Wrong Answers:

  • "If subagents are strong enough, the coordinator does not matter much."
  • "Coverage quality is mainly a synthesis problem."
  • "Direct subagent-to-subagent traffic is preferable because it is faster."

What To Memorize:

  • hub-and-spoke is the core pattern
  • coordinator owns completeness
  • subagent isolation is a feature, not a flaw

Try It Yourself:

No single right answer — draft your attempt, then compare against the lecture's worked examples.

  • Diagnose whether a failure belongs to the coordinator or a subagent.
  • Split one broad task into coordinator-owned and subagent-owned responsibilities.

Check Your Understanding:

  • Who owns completeness in a multi-agent design?
    Show answerThe coordinator. Subagents are responsible for the bounded slices they are assigned, but the question of whether the slices add up to a complete answer is a decomposition question — it lives at the layer that decided how to partition the work and which subagents to invoke.
  • Why can a polished report still indicate coordinator failure?
    Show answerBecause synthesis quality and coverage quality are independent. A synthesis subagent given a narrow set of findings can produce a fluent, well-structured report on those findings while the broader topic remains under-covered. A polished output is evidence that the synthesis layer worked; it is not evidence that the coordinator decomposed correctly.

A multi-agent system is not just "many agents." It needs a coordination model. The exam focuses on the coordinator-subagent pattern, especially hub-and-spoke designs.

In this pattern:

  • the coordinator receives the top-level task
  • it decomposes the work
  • it decides which subagents to invoke
  • it routes information between them
  • it handles recovery and aggregation
  • it owns the final answer

Subagents do not automatically inherit the coordinator’s context. This is another trap the exam uses repeatedly. If the synthesis agent needs the findings from the web-search and document-analysis agents, those findings must be explicitly passed into its prompt or its structured inputs.

The coordinator should also avoid overly narrow decomposition. A common failure mode is when the coordinator breaks a broad problem into only one slice of the topic. If the task is "AI impact on creative industries" and the coordinator decomposes only into visual-art subtasks, the subagents may perform perfectly and still produce an incomplete report. In that case the subagents are not the problem; decomposition is.

Lecture 1.4: Subagent Invocation, `Task`, and `AgentDefinition`

Key Distinctions:

  • spawning depends on Task, not on vague multi-agent prompting
  • available delegation requires allowedTools to include "Task"
  • subagents need explicit context because they do not inherit parent memory automatically

Common Wrong Answers:

  • "Subagents can infer the parent context from the session."
  • "Good system prompts make Task configuration unnecessary."
  • "Agent roles matter more than tool restrictions."

What To Memorize:

  • Task is the spawning mechanism
  • AgentDefinition should include description, system prompt, and tool restrictions
  • forked sessions support divergent analysis from a shared baseline

Try It Yourself:

No single right answer — draft your attempt, then compare against the lecture's worked examples.

  • Identify why a coordinator cannot spawn when Task is missing.
  • Rewrite a weak handoff to include explicit context and quality criteria.

Check Your Understanding:

  • Why do subagents not automatically inherit parent context?
    Show answerEach subagent runs as its own model invocation with its own message list; there is no implicit shared memory across Task calls. The architecture treats subagents as isolated workers, which is a feature — it forces the coordinator to be explicit about what each subagent needs and prevents leak-through of irrelevant or sensitive context.
  • What belongs inside an AgentDefinition?
    Show answerA description of the role, the system prompt that shapes the subagent's behavior, and the tool restrictions that scope what it can do. Together those three configure the subagent as a specialist; leaving any of them generic weakens specialization and increases the chance of tool misuse or off-task work.

This lesson covers a mechanism that is explicitly named in the source guide and is important enough that students should be able to state it precisely.

Subagents are not invoked abstractly. In the architecture described by the guide, the coordinator uses the Task tool to spawn subagents. That means delegation depends on actual tool availability, not just good prompt wording. If a coordinator is expected to invoke subagents, its allowedTools must include "Task".

That gives us a concrete exam distinction:

  • describing delegation in the prompt is not the same as enabling delegation in the system
  • the coordinator can only spawn subagents if the spawning mechanism is actually allowed

This matters because many wrong answers on architecture questions sound plausible at the prompt layer while the real failure is at the configuration layer.

The second key concept is explicit context passing. Subagents do not automatically inherit the parent’s full history or shared memory across invocations. If the coordinator wants a synthesis subagent to use the findings from a web-search subagent and a document-analysis subagent, it must pass those findings explicitly.

Weak handoff:

  • "Use what the previous agents found and produce a report."

Strong handoff:

  • pass the actual claims, evidence excerpts, source URLs, dates, and document identifiers needed for synthesis

The third concept is AgentDefinition. A subagent should be configured intentionally rather than treated as a generic secondary model invocation. The guide explicitly calls out:

  • description
  • system prompt
  • tool restrictions

Those settings define the role. A web-search agent, document-analysis agent, and synthesis agent should not all share the same instruction surface or tool access. If they do, specialization weakens and tool misuse becomes more likely.

The fourth concept is fork-based session management. Forking is useful when you want to branch from a shared analysis baseline into multiple possible approaches without contaminating the original line of reasoning. This is especially useful for:

  • comparing two migration plans
  • testing multiple investigation strategies
  • exploring alternative synthesis structures from the same evidence base

Forking is not the same as ordinary resumption. Resumption continues one path. Forking creates multiple paths from a shared starting point.

Minimal Operational Checklist

For a coordinator to spawn subagents correctly:

  • the coordinator must have access to the Task tool
  • allowedTools must include "Task"
  • each subagent should have a clear AgentDefinition
  • the coordinator should pass context explicitly
  • the coordinator should scope each subagent’s tool access to its role

Failure Mode Example 1

A team writes a detailed coordinator prompt that says:

  • "Delegate to specialized subagents when useful."

But no subagents are ever invoked.

The likely problem is not prompt wording. The likely problem is that the coordinator does not actually have access to Task, or allowedTools does not include "Task".

Failure Mode Example 2

A synthesis agent produces weak output and misses citations. The team blames the synthesis agent’s prompt.

The deeper issue may be that the coordinator handed off only a vague prose summary instead of explicit structured findings with provenance fields.

exercise Guided Exercise 1.3

A coordinator is supposed to spawn subagents, but this never happens in practice. What are the first three things you should verify?

Sample Answer

  1. Verify that the coordinator has access to the Task tool.
  2. Verify that allowedTools includes "Task".
  3. Verify that the subagents are actually defined with usable role configuration and that the coordinator prompt can choose delegation.

exercise Guided Exercise 1.4

Why is this handoff weak?

  • "Use the previous agents' findings and produce a final report."

Sample Answer

It assumes implicit inheritance and does not pass the actual information needed for synthesis. A stronger handoff would explicitly include the findings, source metadata, and evidence needed by the downstream subagent.

Lecture 1.5: Decomposition Strategies

Key Distinctions:

  • prompt chaining fits fixed ordered workflows, while adaptive decomposition fits open-ended work
  • decomposition quality determines coverage quality
  • broad tasks need evolving plans rather than one static breakdown

Common Wrong Answers:

  • "Always use prompt chains because they are simpler."
  • "Adding more subagents automatically improves coverage."
  • "Planning quality matters less than final synthesis quality."

What To Memorize:

  • choose decomposition pattern by task shape
  • use adaptive decomposition for uncertain or broad tasks
  • split local versus integration review concerns

Try It Yourself:

No single right answer — draft your attempt, then compare against the lecture's worked examples.

  • Choose prompt chaining or adaptive decomposition for three scenarios.
  • Break one large review into local and cross-file passes.

Check Your Understanding:

  • What kind of task is a poor fit for prompt chaining?
    Show answerOpen-ended or exploratory work where the next step depends on what was discovered in the previous one. A fixed chain commits to a sequence in advance; if the early steps surface a finding that should redirect the investigation, the chain has no way to incorporate it. Adaptive decomposition is the right pattern for that shape of work.
  • Why does decomposition quality affect coverage quality?
    Show answerCoverage is bounded by what the decomposition asked for. If the coordinator partitions a broad topic into a narrow slice, the subagents can execute that slice perfectly and the final answer will still be incomplete. Improving the synthesis layer cannot recover information that was never gathered, which is why coverage failures usually trace back to scoping decisions made upstream.

The course guide distinguishes between two useful patterns:

  • prompt chaining for predictable multi-step work
  • adaptive decomposition for open-ended investigation

Prompt chaining works well when the workflow is known in advance. For example:

  1. analyze each file individually
  2. summarize file-level findings
  3. run a cross-file integration pass

Adaptive decomposition works better for open-ended work where the next step depends on what is discovered. For example:

  • map the codebase
  • identify high-risk modules
  • inspect dependencies
  • revise the plan after new findings emerge

The exam may ask which pattern fits a scenario. The right answer depends on predictability. If the work has known stages, prompt chaining is usually correct. If the work is exploratory and branching, adaptive decomposition is stronger.

📐 See the diagram: Prompt chain vs adaptive decomposition.

Lecture 1.6: Context Passing and Parallelism

Key Distinctions:

  • explicit handoff beats assumed shared memory
  • parallelism helps only when subtasks are independently scoped
  • quality criteria should be passed with the task, not left implicit

Common Wrong Answers:

  • "Spawn parallel agents first and clarify context later."
  • "Metadata and content can be mixed loosely in handoffs."
  • "Parallelization always improves quality."

What To Memorize:

  • pass findings explicitly
  • separate content from routing metadata
  • parallelize only when subtasks are clearly separable

Try It Yourself:

No single right answer — draft your attempt, then compare against the lecture's worked examples.

  • Rewrite a vague handoff into a complete subagent prompt.
  • Decide whether two subtasks should run sequentially or in parallel.

Check Your Understanding:

  • Why is assumed shared memory dangerous?
    Show answerSubagents only see what is passed to them, but a handoff written as if they already knew the context will produce silent gaps. The downstream subagent fills in plausible defaults, the synthesis layer treats those defaults as findings, and provenance is lost. Explicit handoff prevents the failure by making the unknowns visible.
  • What belongs in a high-quality handoff?
    Show answerThe actual content the downstream subagent needs (claims, excerpts, source URLs, dates, document identifiers) separated from routing metadata, plus the explicit success criteria for the work being handed off. Vague summaries collapse content and metadata together and force the downstream agent to guess at structure.

Subagents need explicit context. That context should often be structured, not free-form. A strong design passes content and metadata separately, for example:

  • claim
  • supporting excerpt
  • source URL
  • publication date
  • document name

That separation matters because the downstream agent must preserve provenance. If you only pass a flattened summary, the synthesis layer may lose attribution.

Parallelism also matters. If a coordinator can invoke multiple Task calls in one response, latency can be reduced significantly. But parallelization should not create duplicated work. Scope each subagent carefully:

  • by subtopic
  • by source type
  • by question type

Lecture 1.7: Sessions, Resumption, and Forking

Key Distinctions:

  • resumption continues prior work, while forking explores alternatives from a shared baseline
  • stale tool outputs make naive resumption risky
  • changed files or facts should be communicated explicitly on resume

Common Wrong Answers:

  • "A resumed session automatically knows what changed."
  • "Forking is only for experimentation, not for disciplined comparison."
  • "Fresh restarts are always safer than targeted resumption."

What To Memorize:

  • use named resume deliberately
  • use forks for divergent approaches
  • stale state is the main resumption risk

Try It Yourself:

No single right answer — draft your attempt, then compare against the lecture's worked examples.

  • Choose resume, fork, or fresh start for three change scenarios.
  • List what changed information should be passed into a resumed session.

Check Your Understanding:

  • What is the main risk in naive session resumption?
    Show answerStale state. The resumed session still trusts tool outputs and analysis from the original run, but the underlying world — files, data, system state — may have changed. Acting on stale evidence as if it were current is a quiet failure that produces confidently wrong work. Either tell the resumed session what changed or start fresh with a structured summary.
  • When is forking better than resuming?
    Show answerWhen you want multiple independent paths from a shared baseline — comparing two refactoring approaches, exploring alternative synthesis structures, isolating a verbose workflow from the main conversation. Resumption continues one line; forking creates parallel lines that can be evaluated against each other without contaminating the original.

The exam guide expects you to understand session state at a practical level:

  • named resumption continues a prior investigation when the context is still mostly valid
  • forking creates independent branches from a shared baseline
  • fresh starts with injected summaries are better when old tool outputs have become stale

This is an engineering judgment issue. Resuming a session that analyzed old code and then blindly trusting that analysis after major changes is weak. In that case, either tell the resumed session what changed or start fresh with a structured summary.

Lecture 1.8: Independent Review — Why a Generator Should Not Grade Itself

Key Distinctions:

  • a same-session reviewer inherits the generator's reasoning trail and tends to ratify it
  • independence comes from a fresh context, not from a different system prompt in the same session
  • "self-critique" prompts produce confidence calibration, not real review

Common Wrong Answers:

  • "Add a 'now critically review your previous answer' prompt to the same session."
  • "A more skeptical system prompt is enough to make a reviewer independent."
  • "If the generator is strong enough, an independent reviewer is unnecessary overhead."

What To Memorize:

  • the reviewer must not see the generator's chain of reasoning before forming its own opinion
  • spawn the reviewer as a separate Task with only the artifact and the criteria
  • a forked session counts as independent only if the fork point precedes the generator's reasoning

Try It Yourself:

No single right answer — draft your attempt, then compare against the lecture's worked examples.

  • Sketch the message list a coordinator would pass to an independent code reviewer for a generated patch. Strip everything not needed.
  • A team adds "Please check your work carefully and disagree if needed" to its synthesis prompt and reports better quality. Critique that intervention.

Check Your Understanding:

  • Why does adding a self-critique step to the same session usually fail to catch the generator's mistakes?
    Show answerThe model already committed to a reasoning path and the messages preserving that commitment are still in context. A self-critique prompt is steered by the same evidence and the same prior conclusions, so the model tends to defend the answer rather than re-evaluate it. Independence requires a context that does not include the prior reasoning trail.
  • A coordinator forks a session to run a reviewer subagent. Is that automatically independent?
    Show answerNot by itself. A fork inherits the baseline messages; if the generator's reasoning was already in that baseline, the reviewer sees it and can be primed by it. Independence requires either a fresh session seeded only with the artifact and criteria, or a fork from a baseline cut before the generator produced its output.

A reviewer needs to disagree with the generator. That sounds like a prompting problem, but it is mostly a context problem. When the same session that produced an answer is asked to grade it, the answer's reasoning chain is still in the model's view. The model has already justified the answer, and most of the messages that follow will continue along the same line. A "critically review your work" prompt arrives at the worst possible moment — after commitment, against the grain of the prior text, and without any new evidence to anchor a different conclusion.

The failure mode is quiet. The reviewer produces a plausible critique that catches surface issues — typos, formatting, the kind of thing the generator was already going to fix on a re-read — and ratifies the substantive decisions. Stakeholders see "review passed" and trust it. The structural mistakes that the reviewer would have caught with a clean view of just the artifact survive into production. This is the pattern behind exam questions that ask why same-session self-review is weaker than independent review: it is not a quality of the prompt, it is a property of the conversation.

The intervention is to construct a context that does not contain the generator's reasoning. The cleanest version is a separate Task-spawned subagent whose prompt contains only the artifact under review (the patch, the report, the plan), the explicit criteria, and any reference material. No transcript, no draft history, no "the previous agent thought X." If a fork is used, fork from a point before the generator started, and pass only the artifact across. The reviewer then produces an opinion against the artifact, not against the prior model's defense of it.

Caveat: independent review is not free. The reviewer pays the full context cost again, and you lose any context-sensitive judgment the generator was able to apply. For low-stakes work — a draft email, a one-off summary — same-session re-reads are fine. The independence rule applies when a wrong answer is expensive enough that ratification by the same reasoning chain would be a real failure mode.

📐 See the diagram: Independent review — what the reviewer sees.

Lecture 1.9: Subagent Failure Modes — Partial Results, Timeouts, and Re-delegation

Key Distinctions:

  • a subagent that times out is not a subagent that returned nothing
  • "the call failed" is not enough information for the coordinator to recover safely
  • gap detection during synthesis is the coordinator's job, not the synthesis subagent's

Common Wrong Answers:

  • "If a subagent times out, drop the partial results and rerun the whole task."
  • "Surface a generic 'something went wrong' to the user and stop."
  • "If synthesis looks complete, no follow-up delegation is needed."

What To Memorize:

  • preserve partial results from a failed subagent; the coordinator decides whether they are usable
  • a structured failure record carries: what was attempted, what completed, what failed, and why
  • when synthesis surfaces a coverage gap, the coordinator re-delegates the gap, not the entire task

Try It Yourself:

No single right answer — draft your attempt, then compare against the lecture's worked examples.

  • A research subagent retrieves three sources, then times out on the fourth. Write the structured failure context the subagent should hand back to the coordinator.
  • A synthesis pass produces a confident-sounding report but the coordinator notices one of the requested subtopics is missing. Outline the re-delegation step rather than restarting.

Check Your Understanding:

  • Why is "rerun the whole subagent task" the wrong default response to a partial-results timeout?
    Show answerThe completed work is real evidence, and discarding it costs another full execution and another chance to time out. The right default is to surface the partial results plus a structured failure record, then let the coordinator decide whether to fill the gap with a narrowly scoped follow-up call instead of repeating the whole task.
  • The coordinator detects a gap during synthesis — one requested subtopic was never covered. What is the correct response?
    Show answerRe-delegate the specific gap. Spawn a focused subagent with the missing subtopic and the constraints needed to cover it, then re-synthesize. Restarting the full investigation wastes work, and silently shipping the incomplete report misrepresents what is known.

Subagent failures are rarely binary. A search subagent fetches three of four sources before its time budget runs out. A document-analysis subagent extracts most of a long PDF before hitting an exception on a malformed page. A coordinator-spawned tool call returns a permission error after one valid result. In every case, real work was done. The naive recovery — discard, retry — throws away evidence that is more reliable than anything a second attempt is likely to produce, and often hits the same boundary the second time.

The failure mode is two-sided. On one side, callers swallow the error and present partial results as if the run completed; the synthesis layer reports confidently on incomplete evidence and the user does not know the difference. On the other side, callers raise a generic "operation failed," drop the partial work, and force a full rerun. Both sides are wrong because they collapse three separate facts — what was attempted, what completed, what failed — into one signal.

The intervention is a structured failure context. When a subagent cannot finish, it returns the work it did complete, an explicit statement of what was not attempted or not finished, and the reason (timeout, validation error, permission denial, upstream 5xx). The coordinator now has enough to choose: synthesize on what is available with explicit gaps, re-delegate the unfinished portion to a narrower subagent with a fresh budget, or escalate. The same logic applies when synthesis itself surfaces a gap that the original decomposition missed — re-delegate the gap with focused scope, do not restart the whole investigation.

Caveat: structured failure context only helps if the coordinator actually inspects it. A coordinator that handles every failure with the same retry-or-give-up policy gains nothing from richer error data. The architectural commitment is upstream: error envelopes that the coordinator's synthesis logic is built to read.

Lecture 1.10: Handoff Quality and Human Escalation

Key Distinctions:

  • a handoff is a self-contained brief, not a transcript reference
  • explicit human requests are escalation triggers and should not be re-evaluated for "complexity"
  • escalation criteria belong to the system, not to the model's discretion alone

Common Wrong Answers:

  • "If the user asks for a human but the issue looks easy, the agent should keep trying first."
  • "A short status update like 'customer needs help with refund' is enough for a human to take over."
  • "Escalation is a fallback for when the agent gets stuck, not a normal control path."

What To Memorize:

  • a strong handoff includes case identifier, issue type, established facts, root cause hypothesis, actions attempted, and a recommended next step
  • an explicit user request for a human is honored immediately, regardless of perceived issue difficulty
  • escalation criteria — policy thresholds, identity gaps, explicit requests — are deterministic triggers, not nudges

Try It Yourself:

No single right answer — draft your attempt, then compare against the lecture's worked examples.

  • Take a weak handoff like "Customer upset, refund issue" and rewrite it for a human who has no transcript access.
  • A user with a $20 billing question writes "I want to talk to a person." Decide whether the agent should escalate immediately and justify the choice.

Check Your Understanding:

  • Why does an agent honor an explicit human-request even when the underlying issue looks simple?
    Show answerThe user has stated a preference about how the issue should be handled, and that preference is itself the request. Re-evaluating it against the agent's own difficulty estimate substitutes the agent's judgment for the user's, which both delays resolution and damages trust. Honoring the request is the correct default.
  • What turns a status update into a usable handoff?
    Show answerSelf-containment. A handoff is read by a human who likely cannot scroll the transcript, so it must carry the case identifier, what is established, what was attempted, and what the next human step should be. A status update like "customer needs help" describes the situation; a handoff describes what the human needs to do.

Two patterns recur in escalation questions. The first is the explicit human request. A user types "I want a human" or "transfer me to a person" or "stop, I want to talk to someone real." The user's words are the escalation trigger, full stop. An agent that answers "I can help with that — what is your order number?" or that runs through a complexity check first is overriding a stated preference, and the exam treats this as a clear miss. The same reasoning applies to ambiguous-but-emphatic frustration when paired with policy-sensitive operations: route, do not improvise.

The second pattern is the handoff itself. Escalation without a usable handoff is just abandonment. The default failure mode is a one-line status — "customer upset, needs refund help" — that forces the human to read the entire transcript before they can act, and most escalation surfaces do not show the transcript anyway. The result is wait time, repeated questions to the user, and a worse experience than if the agent had stayed with the issue. A strong handoff stands on its own: case identifier, issue type, facts established (verified customer ID, order ID, charge status), root cause if known, actions attempted by the agent, and a specific recommended next step for the human.

The intervention is a templated escalation tool, not a free-form prompt. The escalation tool's schema requires the structured fields, and a hook can validate the handoff before the escalation actually fires. That makes the escalation deterministic both at the trigger (explicit request, threshold breach, identity gap) and at the message (validated structure, no fields left blank). Prompting alone cannot guarantee either side.

Caveat: there is a real cost to over-escalation, especially for systems where humans are scarce and slow. The deterministic triggers should be calibrated — the explicit-request rule is unconditional, but the threshold and identity-gap rules should be set with the operational team that will absorb the volume. Honoring "I want a human" is non-negotiable; defining "policy threshold" requires a real number.

Drill

Memorize & spot misconceptions

4 sections

Flashcards Core Vocabulary 19 terms

Click a card to flip it. Keyboard: space toggles focused card.

Common misconceptions Common Misconceptions

  1. “If the model says it is done, the loop should end.”

  2. “More examples can replace business-rule enforcement.”

  3. “Subagents know what the coordinator knows.”

  4. “If the final answer is coherent, the decomposition must have been good.”

  5. “Same-session self-review catches the generator's mistakes.”

  6. “If a subagent fails, the work it already completed should be discarded.”

  7. “If the user asks for a human but the issue looks easy, the agent should still try to resolve it first.”

  8. “Forking is only useful when comparing alternative paths.”

  9. “Always invoking every subagent guarantees coverage.”

Key distinctions Key Distinctions

  • tool_use vs end_turn

    continue the loop only when the API state requires tool execution, not when prose merely sounds unfinished.

  • prompt guidance vs deterministic control

    use prompts for judgment and routing, but use gates, hooks, and interception for mandatory policy or ordering constraints.

  • coordinator failure vs subagent failure

    incomplete coverage often starts in decomposition, even when each subagent executes its assigned task well.

  • prompt chaining vs adaptive decomposition

    fixed chains fit stable workflows, while broad or uncertain tasks need evolving decomposition.

  • context presence vs context inheritance

    subagents use only what the coordinator explicitly passes, not what the parent session "already knows."

  • same-session self-review vs independent review

    a reviewer in the same conversation inherits the generator's reasoning chain; independence requires a context that does not contain it.

  • partial-result preservation vs generic failure surfacing

    a structured failure record carries what completed and what failed; "operation failed" is the wrong abstraction.

  • escalation trigger vs model discretion

    explicit human requests, threshold breaches, and identity gaps are deterministic triggers — not optional nudges the model can override.

  • structured state export vs session resumption

    long-running workflows recover from explicit state manifests, not from trusting that a resumed session still understands the world.

Lab

Practice

2 sections

case study Worked Case Study

Case:

A returns assistant performs well on simple requests but occasionally refunds the wrong account after matching a customer by name only.

Analysis:

  • The primary failure is not "lack of examples."
  • The critical issue is that identity verification is not enforced before order or refund operations.
  • A secondary risk is that the agent may be using ambiguous lookup inputs without requiring a unique identifier.

Best redesign:

  • require get_customer to return a verified customer ID before lookup_order or process_refund
  • ask for clarification when multiple customer matches exist
  • preserve customer ID and order ID in a structured facts block
  • escalate when policy or identity remains unresolved

lab Lab

Design a customer support resolution agent that handles returns, disputes, and account issues.

Requirements:

  • tools: get_customer, lookup_order, process_refund, escalate_to_human
  • refunds require prior identity verification
  • multi-issue requests should be decomposed
  • escalations must include customer ID, root cause, refund amount if relevant, and recommended action

What a strong design includes

  • loop control based on stop_reason
  • programmatic prerequisite gate before refund
  • decomposition of multi-concern requests into separate tracks
  • structured escalation summary for humans who do not have the full conversation transcript

Quiz

Test yourself

2 sections

Quiz

  1. What is the strongest signal that an agentic loop should continue? A. Assistant text looks incomplete B. stop_reason == "tool_use" C. There are tools available D. The system prompt requests another pass

Answer: B

  1. Why is checking assistant prose for completion weak? A. It is expensive B. It prevents tool use C. It relies on natural-language interpretation instead of explicit API state D. It only works for JSON

Answer: C

  1. Which is true of subagents? A. They inherit parent context automatically B. They require explicit context injection C. They cannot run in parallel D. They do not need tool restrictions

Answer: B

  1. What is the best first response when a critical workflow step must always happen before another? A. Add more examples B. Enforce the prerequisite programmatically C. Raise the context window D. Use sentiment analysis

Answer: B

  1. If a broad topic is consistently under-covered, what is the most likely root cause? A. The synthesis agent is too slow B. The coordinator decomposed the task too narrowly C. The web agent needs more tokens D. The user prompt is too short

Answer: B

  1. A coordinator is expected to spawn subagents but never does. Which is the best first thing to verify? A. The context window is large enough B. The coordinator has Task available and allowedTools includes "Task" C. The synthesis agent has more examples D. The final answer prompt is more explicit

Answer: B

Week 1 Quiz Explanations

  1. B is correct because loop progression should follow explicit API state. A and D are indirect signals. C says nothing about whether the model requested tool execution.
  2. C is correct because prose interpretation is probabilistic and brittle. A is not the main issue. B is false. D is unrelated.
  3. B is correct because subagents require explicit context passing. A and D are incorrect assumptions. C is false because parallel spawning is explicitly supported.
  4. B is correct because critical ordering constraints require deterministic enforcement. A still leaves failure probability. C is irrelevant. D solves the wrong problem.
  5. B is correct because incomplete coverage often begins with narrow decomposition by the coordinator. A, C, and D are downstream or weaker explanations.
  6. B is correct because subagent spawning depends on the actual mechanism being available. A is unrelated. C and D address prompt quality rather than enabling delegation.

Test

Short Answer

  1. Explain the difference between tool_use and end_turn.
  2. When should a system choose adaptive decomposition instead of prompt chaining?
  3. Why is structured escalation data important for human handoff?

Scenario Question

Your multi-agent research system produces well-written but incomplete reports. Logs show the coordinator always invokes all subagents, but on broad topics it assigns overly narrow subtasks. What is the architectural fix?

Sample Answer

The problem is coordinator decomposition, not downstream execution quality. The coordinator should inspect query breadth, partition the scope more comprehensively, and use iterative gap checking before final synthesis. It should invoke only the necessary subagents and should re-delegate targeted follow-up tasks when coverage gaps are detected.

Week 1 Test Rubric

  • Full credit: explains stop_reason correctly, identifies coordinator decomposition as the root issue, and proposes explicit gap detection or re-delegation.
  • Partial credit: identifies the right component but proposes only vague prompt improvements.
  • Low credit: blames synthesis quality or recommends adding more tools without fixing decomposition.