SL#73 - AWS AI Series (14/30) - Multi-Agent Patterns in Strands: Agents-as-Tools, Swarm, and Graph (and When Each Is Wrong)

What we are building

By the end you will have one task ("write a short briefing on a technical topic") solved by a three-agent team in three different orchestration shapes, all running locally against Amazon Bedrock.

The three agents never change: a researcher that gathers facts, a writer that turns facts into prose, and a reviewer that checks the draft and either approves it or sends it back. What changes is who decides the order of operations.

In agents-as-tools, an orchestrator agent decides at runtime which specialist to call, like a manager delegating. In a swarm, the agents hand off to each other autonomously with shared memory, no manager. In a graph, you wire the edges yourself and the framework just executes them, optionally with conditions and loops. Same three workers, three different management structures.

The non-obvious lesson is that these are not interchangeable. The reason most multi-agent demos feel flaky is that people pick the autonomous patterns (orchestrator, swarm) for problems that are actually deterministic pipelines. The research-to-writing flow is a pipeline. By the end you will see exactly why the graph is the right call here and the other two are interesting but wrong for this job.

Prerequisites

You need Python 3.10 or newer, an AWS account, and the AWS CLI configured with credentials (aws configure). You need Amazon Bedrock model access enabled for at least one Anthropic Claude model in your region. Bedrock model access is per-account, per-region, and off by default. Open the Bedrock console, go to Model access, and request access to a Claude model. Approval for Anthropic models is usually instant.

You should be comfortable reading Python and have a rough mental model of what an LLM agent is (a model in a loop with tools). If you followed episode 7 of this series ("Strands Agents 101") you already have everything. If you skipped it, the setup below is self-contained.

This tutorial uses the Bedrock model provider, which is the Strands default. You are billed per token for every model call. The full tutorial costs well under a dollar to run end to end (see the cost section).

Setup

Create a clean virtual environment and install the SDK plus the community tools package. Pin the major version so this tutorial still runs in six months.

python -m venv .venv
source .venv/bin/activate
pip install "strands-agents>=1.0,<2.0" "strands-agents-tools>=0.2"
export AWS_REGION=us-east-1

Strands defaults to the Amazon Bedrock provider, so you do not need an API key, just working AWS credentials. To be explicit about which model every agent uses, define a BedrockModel once and reuse it. Pin the model ID rather than relying on the default, which has changed across SDK versions.

# common.py
from strands.models import BedrockModel

# Use a Claude model you have enabled in Bedrock model access.
# Swap this for the latest Sonnet or an Amazon Nova model available in your region.
MODEL = BedrockModel(
    model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    region_name="us-east-1",
    temperature=0.3,
)

Smoke test before writing any orchestration. If this prints a sentence, your credentials and model access are good.

# smoke.py
from strands import Agent
from common import MODEL

agent = Agent(model=MODEL, system_prompt="You are terse.")
print(agent("Say 'setup works' and nothing else."))

Run python smoke.py. If you see setup works, you are ready. If you see an AccessDeniedException, your model access is not enabled for that model ID in that region. Fix that before continuing.

Step 1: Build the three specialists once

Every pattern below reuses the same three agents, so define them in one place. Each gets a focused system prompt and a name. Naming matters: the swarm and graph use the name as the node ID, and the orchestrator uses it to decide what to call.

# agents.py
from strands import Agent
from common import MODEL

researcher = Agent(
    name="researcher",
    model=MODEL,
    system_prompt=(
        "You are a research specialist. Given a topic, produce 5-8 concise, "
        "factual bullet points a writer can use. No prose, just facts."
    ),
)

writer = Agent(
    name="writer",
    model=MODEL,
    system_prompt=(
        "You are a technical writer. Turn the research notes into a tight "
        "200-word briefing for senior engineers. No fluff, no marketing words."
    ),
)

reviewer = Agent(
    name="reviewer",
    model=MODEL,
    system_prompt=(
        "You are an editor. Check the briefing for accuracy and clarity. "
        "If it is publishable, reply with the final text prefixed 'APPROVED:'. "
        "If not, reply with specific fixes prefixed 'REVISE:'."
    ),
)

Three agents, three jobs, no orchestration yet. The reviewer prompt is the interesting one. It returns a machine-readable signal (APPROVED: or REVISE:) that a graph can branch on later. Designing your agents to emit signals the orchestration layer can read is half the battle in multi-agent systems. An agent that just rambles cannot be wired into a conditional.

Step 2: Pattern one, agents-as-tools

The first pattern puts a manager on top. An orchestrator agent receives the task and decides, at runtime, which specialist to invoke. In Strands, you make an agent callable by another agent simply by passing it in the tools list. The SDK wraps it as a tool that takes an input string and returns the agent's text.

# pattern_tools.py
from strands import Agent
from common import MODEL
from agents import researcher, writer, reviewer

orchestrator = Agent(
    model=MODEL,
    system_prompt=(
        "You produce a reviewed briefing on the user's topic. "
        "First call researcher with the topic. Then call writer with the "
        "research notes. Then call reviewer with the draft. If the reviewer "
        "says REVISE, call writer again with the feedback, then reviewer "
        "again. Return the APPROVED text."
    ),
    tools=[researcher, writer, reviewer],
)

result = orchestrator("Write a briefing on what S3 Vectors is and when to use it.")
print(result)

Run python pattern_tools.py. The orchestrator calls the three agents in sequence, and because you spelled out the revise loop in its prompt, it will sometimes bounce a draft back to the writer.

Here is the catch. The order is a suggestion, not a guarantee. The orchestrator is an LLM, and on some runs it will skip the reviewer, or call the writer before the researcher, or decide one pass is enough when your prompt said two. You bought flexibility (the manager can react to weird inputs) and you paid for it in determinism. For a workflow this fixed, that is a bad trade. Agents-as-tools shines when you genuinely do not know the path in advance, like a customer-service router that picks billing versus technical versus returns based on the message. For a known pipeline, it is the wrong tool.

Step 3: Pattern two, swarm

The swarm removes the manager entirely. You hand the agents a shared task and a shared memory, and they decide among themselves who works next by calling a handoff_to_agent tool that Strands injects automatically.

# pattern_swarm.py
from strands.multiagent import Swarm
from agents import researcher, writer, reviewer

swarm = Swarm(
    [researcher, writer, reviewer],
    entry_point=researcher,
    max_handoffs=10,
    max_iterations=10,
    execution_timeout=300.0,
    node_timeout=120.0,
    repetitive_handoff_detection_window=6,
    repetitive_handoff_min_unique_agents=2,
)

result = swarm("Write a briefing on what S3 Vectors is and when to use it.")
print(f"Status: {result.status}")
print(f"Path: {[n.node_id for n in result.node_history]}")
print(result.results["reviewer"].result)

Run python pattern_swarm.py. Watch the Path line. On a good run it reads researcher -> writer -> reviewer. On a bad run you will see the reviewer hand back to the writer, the writer hand to the researcher, the researcher hand back to the writer, and so on until the handoff limits kick in.

Those limits are not decoration. max_handoffs and max_iterations cap the total work, execution_timeout and node_timeout cap the wall-clock, and repetitive_handoff_detection_window with repetitive_handoff_min_unique_agents breaks ping-pong loops where two agents keep tossing the task back and forth. The fact that the swarm needs four separate safety mechanisms tells you something: maximum autonomy means maximum ways to misbehave. Swarms earn their keep on open-ended problems where the right sequence genuinely depends on what each agent discovers. A linear research-to-writing flow is not that. You are paying for emergent coordination you do not need.

Step 4: Pattern three, graph

The graph is the honest model for this problem. You declare the nodes and the edges yourself, and the framework executes them in dependency order. Output from each node flows to its dependents. No LLM decides the topology, you do.

# pattern_graph.py
from strands.multiagent import GraphBuilder
from agents import researcher, writer, reviewer

builder = GraphBuilder()
builder.add_node(researcher, "research")
builder.add_node(writer, "write")
builder.add_node(reviewer, "review")

builder.add_edge("research", "write")
builder.add_edge("write", "review")
builder.set_entry_point("research")

graph = builder.build()
result = graph("Write a briefing on what S3 Vectors is and when to use it.")

print(f"Status: {result.status}")
print(f"Order: {[n.node_id for n in result.execution_order]}")
print(result.results["review"].result)

Run python pattern_graph.py. The Order line is research -> write -> review every single time, because you wired it that way. The researcher's bullet points become the writer's input, the writer's draft becomes the reviewer's input, and you get a deterministic pipeline with three model calls and zero coordination overhead.

The graph also gives you the revise loop the orchestrator only promised. Edges can carry conditions, and conditions read the accumulated graph state, so you can route back to the writer only when the reviewer asked for changes.

# add to pattern_graph.py, before build()
def needs_revision(state):
    review = state.results.get("review")
    if not review:
        return False
    return "REVISE:" in str(review.result)

builder.add_node(writer, "rewrite")
builder.add_edge("review", "rewrite", condition=needs_revision)
builder.add_edge("rewrite", "review")

Now the loop is explicit and bounded: it only fires when the reviewer's output literally contains REVISE:, and the graph's own execution limits stop it from looping forever. That is the same revise behavior the orchestrator did probabilistically, expressed as a rule you can read, test, and trust.

Verify it works

Run all three scripts. Each should print a 200-word briefing about S3 Vectors that ends with an editor's approval.

For pattern_graph.py, the Order: line must read exactly ['research', 'write', 'review'] (plus 'rewrite' if a revision fired). That determinism is the whole point and your proof the graph is wired right.

For pattern_swarm.py, Status should be Status.COMPLETED and the Path should start with researcher. If the path is long and jumps around, that is the swarm's nondeterminism, not a bug, and exactly the behavior that makes it wrong for this task.

For pattern_tools.py, you should see the three specialists invoked as tool calls in the streamed output. Run it three or four times. If the sequence or the number of review passes changes between runs, you have just observed why an LLM-driven orchestrator is the wrong choice for a fixed pipeline.

A useful side-by-side: the graph run makes a predictable 3-4 model calls. The swarm and orchestrator runs vary, sometimes 3, sometimes 8 or more when they loop. That variance is tokens, latency, and money.

When it breaks

If you get AccessDeniedException or ValidationException: model ID is not supported, the model is not enabled in that region. Enable it in Bedrock model access, or change model_id in common.py to a model you do have, and confirm AWS_REGION matches.

If the swarm never terminates or hits its iteration cap with Status.FAILED, two agents are ping-ponging. Tighten repetitive_handoff_detection_window and repetitive_handoff_min_unique_agents, or sharpen the agent prompts so each one knows when to stop handing off.

If the graph condition never fires, your condition function is reading state wrong. The function receives the graph state, and you read a node's output with state.results.get("review").result. Print str(review.result) inside the function to see what the reviewer actually returned. The most common mistake is the reviewer not emitting the exact REVISE: token, which means the bug is in the agent prompt, not the graph.

If a node throws, the graph returns a FAILED status rather than crashing. Inspect result.results to see which node failed and why. A failed node does not silently skip; downstream nodes that depended on it will not run.

If you see throttling (ThrottlingException), you are hitting Bedrock's per-model requests-per-minute limit. Add a short sleep between runs, or request a quota increase. The swarm and orchestrator trigger this faster than the graph because they make more calls.

Cost of running this

Everything here is local Python calling Bedrock on demand. There is no provisioned infrastructure, so the only cost is per-token model usage. Each full run of the pipeline is roughly four to ten model calls of a few hundred tokens each. With a mid-tier Claude model that is a small fraction of a cent to a couple of cents per run. Running all three scripts a handful of times while you experiment will cost well under a dollar. If you swap in a smaller model like Amazon Nova Lite or Nova Micro, it is cheaper still.

Cleanup

There is nothing to tear down. These agents create no AWS resources, no Bedrock provisioned throughput, no endpoints, no storage. When you stop running the scripts, billing stops. If you enabled Bedrock model access only for this tutorial and want to leave no trace, you can leave it enabled (model access has no standing cost) or remove it in the Bedrock console. Deactivate your virtual environment with deactivate and delete the .venv folder if you are done.

Where to take it next

First, make the swarm and graph share state without going through the LLM. Both patterns support a shared state object you can pass at invocation so agents read configuration without burning it as context tokens. Move the topic and target word count there.

Second, put a real tool in the researcher. Right now it researches from the model's memory. Give it the http_request or retrieve tool from strands-agents-tools so it pulls live facts, and watch the graph stay deterministic even as the researcher's behavior gets richer.

Third, nest a swarm inside a graph. A GraphBuilder node can be another multi-agent system, so you can make "research" a small swarm of two researchers that collaborate, while the outer write-and-review flow stays a rigid graph. That hybrid, autonomy where you want exploration and determinism where you want a pipeline, is the pattern most production systems actually converge on. The question to keep asking is not "which pattern is best" but "which part of this problem is actually a decision, and which part is just a sequence I already know."