SL#65 - AWS AI Series (7/30) - Strands Agents 101: Build a Code-Defined Agent With 3 Custom Tools in ~50 Lines of Python

This is episode 7 of the AWS AI Series, and the first one about agents rather than raw model calls. The previous six episodes leaned on the Bedrock Runtime API directly: you sent messages, you got tokens back, you parsed them yourself. That works until the moment you want the model to do something between thinking and answering, like read a file, call an API, or run a query. The plumbing for that loop, deciding which tool to call, calling it, feeding the result back, deciding again, is the part nobody wants to write by hand.

Strands Agents is AWS's open-source SDK for exactly that loop. It is the same framework AWS uses internally for Amazon Q Developer and parts of the AWS support tooling, released under Apache 2.0. The pitch is "model-driven": you describe tools as plain Python functions, hand them to an Agent, and the model decides when to use them. No graph to wire up, no state machine to define. In this tutorial we use it to build something genuinely useful in a single sitting, then in the next episodes we take that same agent and deploy it to production with AgentCore.

What we are building

The deliverable is a command-line agent called RepoScout that answers natural-language questions about a code directory. You point it at a folder and ask things like "where is the retry logic in this project?" or "which files import boto3?" and it figures out which files to open, reads them, and answers with specifics.

Under the hood it is one Agent instance with three custom tools: list_files to see what is in a directory, read_file to open a specific file, and search_text to grep across the tree. The model gets these three tools and a question, and runs its own loop: list the directory, decide a file looks relevant, read it, maybe search for a symbol, then answer. We never script that sequence. That is the non-obvious part of the model-driven approach, and the thing worth internalizing in episode 7: you are not writing control flow, you are writing capabilities and letting the model sequence them.

The whole core is about 50 lines of Python. The tools are stdlib only, so the only thing that costs money or needs the network is the Bedrock model call.

Prerequisites

You need Python 3.10 or newer. Strands dropped support for 3.9 some time ago, and the current SDK (1.42.0 as of June 2026) targets 3.10 through 3.14.

You need an AWS account with Amazon Bedrock model access enabled for Claude Sonnet 4 in a US region. This is the single most common thing people miss. Bedrock does not grant model access by default. Go to the Bedrock console, open Model access, and request access to Anthropic Claude Sonnet 4. For most accounts it is granted instantly, but a brand-new account can take a few minutes.

You need AWS credentials configured locally, either through aws configure, environment variables, or an IAM role if you are on an EC2 box. The credentials need Bedrock invoke permissions, and because Strands defaults to a cross-region inference profile, the IAM policy is slightly more involved than a single-region grant. The exact policy is in Setup below.

Assumed knowledge: you are comfortable reading Python and have used a terminal. You do not need any prior agent-framework experience. If you have done a raw bedrock-runtime Converse call before (episode 1 of this series), you will recognize what Strands is doing for you, but it is not required.

Setup

Create a project and a virtual environment, then pin the SDK versions so this tutorial still works in three months when the SDK has moved on:

mkdir reposcout && cd reposcout
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install "strands-agents==1.42.0" "strands-agents-tools==0.8.0"

Now the IAM policy. Strands' default model is the US geographic cross-region inference profile us.anthropic.claude-sonnet-4-20250514-v1:0. Cross-region inference means Bedrock can route your request to us-east-1, us-east-2, or us-west-2 for capacity, so your IAM policy has to allow invoking both the inference profile and the underlying foundation model in each destination region. Attach this to your user or role, replacing ACCOUNT_ID:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "InvokeInferenceProfile",
      "Effect": "Allow",
      "Action": ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"],
      "Resource": "arn:aws:bedrock:us-east-1:ACCOUNT_ID:inference-profile/us.anthropic.claude-sonnet-4-20250514-v1:0"
    },
    {
      "Sid": "InvokeFoundationModelInAllRegions",
      "Effect": "Allow",
      "Action": ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-20250514-v1:0",
        "arn:aws:bedrock:us-east-2::foundation-model/anthropic.claude-sonnet-4-20250514-v1:0",
        "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-sonnet-4-20250514-v1:0"
      ]
    }
  ]
}

Set your default region so boto3 picks it up, then run a one-line smoke test before writing any real code:

export AWS_REGION=us-east-1
python -c "from strands import Agent; print(Agent()('Say hello in 5 words'))"

If that prints a short greeting, your install, credentials, model access, and IAM are all correct. If it errors, jump to "When it breaks" before going further. Do not skip the smoke test. Ninety percent of the pain in this tutorial is environment setup, and isolating it now saves you debugging the agent logic and the credentials at the same time later.

Step 1: Write the first custom tool

A Strands tool is a plain Python function with the @tool decorator. The decorator reads the function signature and the docstring and turns them into a tool schema the model can see. This is the whole trick: the docstring is not documentation for humans, it is the prompt the model uses to decide whether and how to call the function.

Create reposcout.py and start with the directory-listing tool:

from pathlib import Path
from strands import Agent, tool
ROOT = Path(".").resolve()
@tool
def list_files(directory: str = ".") -> str:
    """List files and folders in a directory of the project.
    Args:
        directory: Path relative to the project root. Defaults to root.
    Returns:
        A newline-separated listing, with [dir] markers for folders.
    """
    target = (ROOT / directory).resolve()
    if not str(target).startswith(str(ROOT)):
        return "Error: path is outside the project root."
    if not target.is_dir():
        return f"Error: {directory} is not a directory."
    entries = []
    for p in sorted(target.iterdir()):
        tag = "[dir] " if p.is_dir() else "      "
        entries.append(f"{tag}{p.relative_to(ROOT)}")
    return "\n".join(entries) or "(empty)"

Two things matter here. First, the type hints (directory: str, -> str) become the tool's input and output schema, so the model knows it must pass a string. Second, the path check that rejects anything outside ROOT is not optional. The model decides what to read, and a vague question can make it wander up into your home directory. Sandboxing the tool to the project root is the kind of guardrail you put in from line one, not after an incident.

Step 2: Add the read and search tools

The same pattern for the other two capabilities. Add these below list_files:

@tool
def read_file(path: str) -> str:
    """Read a UTF-8 text file from the project.
    Args:
        path: File path relative to the project root.
    Returns:
        The file contents, truncated to 8000 characters.
    """
    target = (ROOT / path).resolve()
    if not str(target).startswith(str(ROOT)) or not target.is_file():
        return f"Error: cannot read {path}."
    text = target.read_text(encoding="utf-8", errors="replace")
    return text[:8000]
@tool
def search_text(query: str, extension: str = ".py") -> str:
    """Search the project for a string, returning matching lines.
    Args:
        query: The substring to search for.
        extension: Restrict to files with this extension. Default .py.
    Returns:
        Up to 40 "path:line: text" matches.
    """
    hits = []
    for p in ROOT.rglob(f"*{extension}"):
        try:
            for n, line in enumerate(p.read_text(encoding="utf-8", errors="replace").splitlines(), 1):
                if query in line:
                    hits.append(f"{p.relative_to(ROOT)}:{n}: {line.strip()}")
                    if len(hits) >= 40:
                        return "\n".join(hits)
        except (OSError, UnicodeError):
            continue
    return "\n".join(hits) or f"No matches for '{query}'."

Notice the truncation in read_file (8000 characters) and the cap in search_text (40 hits). These are not arbitrary. Every character a tool returns becomes input tokens on the next model call, which you pay for and which eats the context window. A tool that returns a 200KB file will blow your latency, your bill, and possibly the model's context limit in one call. Bounding tool output is a core discipline of agent engineering, and it is far more important than people expect coming from regular Python where returning a big string is free.

The errors="replace" and the broad except are there because real directories contain binary files, broken encodings, and permission-denied paths. A tool that throws on the first weird file makes the whole agent loop fail. Tools should degrade to a useful error string, not raise.

Step 3: Assemble the agent

Now the part that would be a hundred lines of orchestration in a hand-rolled loop, and is four lines here. Add this to the bottom of reposcout.py:

from strands.models import BedrockModel
model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
    region_name="us-east-1",
    temperature=0.2,
)
SYSTEM = (
    "You are RepoScout, a precise code assistant. Answer questions about "
    "the local project by using your tools to inspect real files. Always "
    "cite the file paths and line numbers you used. If you are unsure, say so."
)
agent = Agent(model=model, system_prompt=SYSTEM,
              tools=[list_files, read_file, search_text])

The Agent constructor takes the model, a system prompt, and the list of tool functions. That is the entire wiring. When you call the agent, it sends your question plus the three tool schemas to Claude. Claude replies with either text or a request to call a tool. Strands executes the tool, appends the result to the conversation, and calls Claude again. This repeats until Claude answers in plain text. That loop is the agent loop, and the reason Strands calls itself model-driven is that the model, not your code, drives which tool runs and when.

I set temperature=0.2 deliberately. For a tool-using agent that should behave consistently, you want low temperature. High temperature is for creative generation, not for "decide which file to open." The default region is set explicitly so the example is reproducible regardless of your shell environment.

Step 4: Make it a CLI

Wrap the agent in a tiny command-line entry point so you can point it at a directory and ask a question. Add this at the very bottom:

import sys
if __name__ == "__main__":
    if len(sys.argv) < 2:
        print('Usage: python reposcout.py "your question about this repo"')
        sys.exit(1)
    question = " ".join(sys.argv[1:])
    result = agent(question)
    print("\n--- token usage ---")
    print(result.metrics.get_summary()["accumulated_usage"])

Calling agent(question) returns an AgentResult. By default Strands streams the agent's reasoning and answer to your console as it goes, so you see it work in real time. The AgentResult also carries result.metrics, which is how we print token usage at the end. Watching accumulated_usage after each run is the fastest way to build intuition for what agent calls actually cost, and it will surprise you: a single question that opens three files can easily run 10,000 input tokens, because every file the agent reads gets fed back into the next model call.

If you want the agent silent except for the final answer, pass callback_handler=None to the Agent constructor and print result.message yourself. For this tutorial we keep the streaming output on, because watching the tool calls happen is the whole point of the exercise.

Verify it works

Point RepoScout at its own directory and ask it about itself:

python reposcout.py "What custom tools does this project define and what does each one do?"

You should see the agent stream something like a call to list_files, then read_file on reposcout.py, then a final answer that names list_files, read_file, and search_text with an accurate one-line description of each, citing reposcout.py. At the end you will see the token usage summary, something like {'inputTokens': 8200, 'outputTokens': 240, 'totalTokens': 8440}.

Try a harder one that forces a search:

python reposcout.py "Which functions enforce that paths stay inside the project root?"

The agent should call search_text for something like startswith or ROOT, find the checks in list_files and read_file, and report both with line numbers. If it answers with specific file paths and line numbers rather than a vague summary, the tutorial worked. That citation behavior is coming from the system prompt, and it is the difference between an agent you can trust and one that confabulates.

When it breaks

AccessDeniedException or "You don't have access to the model". Your IAM policy or Bedrock model access is the problem, not your code. Confirm model access is granted in the Bedrock console under Model access, and that your policy includes the foundation-model ARN for the region your request actually lands in. With cross-region inference the request can route to us-east-2 or us-west-2 even though you set us-east-1, which is exactly why the Step-0 policy lists all three. A single-region policy works until the day Bedrock reroutes you and then fails intermittently, which is maddening to debug.

ValidationException about the model identifier. You probably used the bare model ID anthropic.claude-sonnet-4-20250514-v1:0 instead of the inference profile ID with the us. prefix. On-demand throughput for Claude Sonnet 4 requires the inference profile, not the raw foundation-model ID. Keep the us. prefix.

NoRegionError or botocore cannot find a region. You did not set AWS_REGION and have no default in your AWS config. Either export AWS_REGION=us-east-1 or pass region_name to BedrockModel, which the code above already does.

The agent answers without calling any tools. Usually a system-prompt problem. If the model thinks it can answer from general knowledge, it will. Make the instruction explicit that it must inspect real files, which the SYSTEM string above does. If it still skips tools on a vague question, the question itself may not require them.

The agent loops or returns enormous output. A tool is returning too much. Check that your truncation limits are in place. An unbounded read_file on a large file, or a search_text without the 40-hit cap, will balloon the context and the cost. Bounded tools are not a nicety, they are what keeps the loop stable.

Estimated AWS cost

This tutorial creates no standing AWS resources. There is no endpoint, no S3 bucket, no Knowledge Base to leave running. The only charge is per-token Bedrock inference. At the time of writing, Claude Sonnet 4 on Bedrock is billed on-demand at $3 per million input tokens and $15 per million output tokens (check the Bedrock pricing page for current rates). A typical RepoScout question runs roughly 8,000 to 15,000 input tokens and a few hundred output tokens because each file read is fed back in, so each question costs on the order of $0.03 to $0.06. Running the whole tutorial a dozen times costs well under a dollar. The token-usage printout in Step 4 lets you watch this directly.

Cleanup

Because there are no provisioned resources, cleanup is just deactivating the environment with deactivate and deleting the reposcout folder if you do not want to keep it. If you created a dedicated IAM policy for this, detach and delete it when you are done. Confirm in the Bedrock console that you have no leftover provisioned throughput or evaluation jobs from earlier episodes, since those do bill hourly, but nothing in this episode creates them.

Where to take it next

Three extensions, easiest to hardest. First, add a fourth tool that runs the project's tests by shelling out to pytest and returning the summary, so RepoScout can answer "do the tests pass?" Mind the sandboxing: running arbitrary commands is a much bigger security surface than reading files, so restrict it hard. Second, swap the model provider: change BedrockModel to us.amazon.nova-pro-v1:0 and compare answer quality and cost on the same questions, since Strands is model-agnostic and this is a one-line change. Third, give RepoScout memory across questions so it remembers what you already asked, which is exactly what episode 9 covers with AgentCore Memory.

The bigger arc: right now RepoScout runs on your laptop with your personal credentials. That is fine for a tool you use, and useless for a tool your team uses. Episode 8 takes this exact agent and deploys it to Bedrock AgentCore so it runs in an isolated managed runtime with its own identity, reachable as an endpoint. The agent code barely changes. That is the payoff of writing capabilities instead of orchestration: the thing you built today is already the thing you ship.