SL#74 - AWS AI Series (15/30) - Nova Act: Script a Browser Agent That Extracts Data and Fills Forms on a Real Site

What we are building

Most browser automation breaks the same way. You write a CSS selector, the site ships a redesign, your selector points at nothing, and your pipeline dies at 3am. Selenium and Playwright are precise and brittle. You tell them exactly which div to click, and they do exactly that until the div moves.

Nova Act takes the other side of the trade. You describe the task in plain English, and a model that AWS fine-tuned specifically for driving browsers figures out where to click. The headline demo writes itself, but the interesting engineering question is the one nobody answers in the launch posts: how do you get a non-deterministic model to produce data you can actually trust in code? That is what this tutorial is about.

We will build watcher.py, a script that does three things a real automation needs. It navigates a site and extracts fields into a typed Pydantic object, not a blob of text. It fills and submits a form. And it does both with retries, step limits, and a saved login session so the agent starts already authenticated. The non-obvious design choice, and the one most people skip, is that every extraction goes through a schema. A model that returns prose is a model you cannot branch on. A model that returns a validated PriceQuote object is a model you can build a system around.

By the end you have a runnable browser agent you could push to GitHub and, with one more step, deploy to AWS.

Prerequisites

You need Python 3.10 or newer and a terminal you are comfortable in. You need an Amazon account to generate a Nova Act API key at nova.amazon.com/act, which is free to create. You should be comfortable reading Python and have seen Pydantic models before, though I will explain the parts that matter.

One hard constraint to know up front: Nova Act is currently available in a single AWS region, US East (N. Virginia), and the SDK calls a hosted model, so your machine needs network access. The SDK drives a local Chromium through Playwright, which it installs for you on first run, so you do not need Chrome preinstalled. If you are on a corporate machine that blocks Playwright's browser download, that is the one thing that will bite you, so check it before you start.

Cost: generating an API key and running the examples in this tutorial through the Nova Act developer tier is free under the nova.amazon.com terms of use at the time of writing. There is no AWS bill for the local SDK runs here. You only start paying when you deploy workflows to the Nova Act AWS service or run them through Bedrock AgentCore, which we cover at the end as an extension. Check the Amazon Nova pricing page before you deploy anything to production, because consumption pricing changes.

Setup

Create a clean virtual environment and install the SDK. Nova Act SDK versions older than 3.0 are no longer supported, so pin to a current release and upgrade if you have an old one lying around.

python -m venv .venv
source .venv/bin/activate
pip install --upgrade nova-act
pip show nova-act        # confirm version is 3.x or newer

Generate an API key at nova.amazon.com/act, then export it. The SDK reads it from the environment.

export NOVA_ACT_API_KEY="your_api_key_here"

Now the smoke test. Before writing any real logic, prove the whole chain works: SDK installed, key valid, Playwright able to launch. Nova Act ships a set of practice sites under nova.amazon.com/act/gym that behave like real web apps but stay stable, which makes them perfect for a tutorial that needs to be reproducible. Save this as smoke.py:

from nova_act import NovaAct

with NovaAct(starting_page="https://nova.amazon.com/act/gym/next-dot/search") as nova:
    nova.act("Find flights from Boston to Seattle on February 22nd")

Run it with python smoke.py. The first run takes one to two minutes because Playwright downloads Chromium; later runs start in seconds. A Chrome window opens, the agent types into the search box, picks dates, and submits. If you see the browser drive itself, your setup is good and you can move on. If it hangs on launch, jump to the troubleshooting section before going further.

Step 1: Run your first act and read what came back

The NovaAct context manager opens a browser at starting_page, hands you a nova object, and closes the browser when the block exits. Each act() call passes a natural language instruction to the model, which then runs an agentic loop: it looks at the page, takes one action, looks again, and repeats until the task is done or it gives up.

from nova_act import NovaAct

with NovaAct(starting_page="https://nova.amazon.com/act/gym/next-dot/search") as nova:
    result = nova.act("Search for one-way flights from Boston to Seattle on February 22nd")
    print(result.response)

The thing to internalize early is the difference between an action and an answer. act() is built to do things, so its return value is not guaranteed to contain structured data. If you ask act() a question and expect a clean answer back, you will sometimes get None in the response. That surprises people on day one and sends them down a debugging rabbit hole. The fix is not to fight act(), it is to use the right call for extraction, which is the next step.

Keep each act() to a single logical move. A prompt like "search for flights, then sort by price, then book the cheapest, then enter my card" asks the model to hold four goals at once and gives the loop four chances to wander. Three short acts beat one long one almost every time, and they are far easier to debug because you can see exactly which step failed.

Step 2: Extract structured data with a Pydantic schema

This is the step that turns a party trick into something you can build on. To get data you can branch on, hand act_get() a JSON schema describing the shape you want. act_get() works like act() but always returns a parsed response, and when you pass a schema, it returns data matching that schema instead of free text.

from nova_act import NovaAct
from pydantic import BaseModel

class Flight(BaseModel):
    airline: str
    departure_time: str
    price_usd: float

class FlightResults(BaseModel):
    cheapest: Flight
    total_results: int

with NovaAct(starting_page="https://nova.amazon.com/act/gym/next-dot/search") as nova:
    nova.act("Search for one-way flights from Boston to Seattle on February 22nd")
    result = nova.act_get(
        "Return the cheapest flight and the total number of results shown.",
        schema=FlightResults.model_json_schema(),
    )
    flights = FlightResults.model_validate(result.parsed_response)
    print(f"Cheapest: {flights.cheapest.airline} at ${flights.cheapest.price_usd}")
    print(f"Out of {flights.total_results} results")

Three things are happening here that matter. First, model_json_schema() converts your Pydantic class into the JSON Schema the model needs, so you define the contract once in Python and never hand-write JSON. Second, result.parsed_response is plain parsed data, and model_validate() runs it back through Pydantic so you get a real typed object with real attributes, and a loud validation error if the model returned garbage. Third, notice the extraction lives in its own act_get() call, separate from the search. AWS's own guidance is to put extraction in a dedicated call rather than bolting "and tell me the price" onto an action prompt, because the two tasks pull the model in different directions.

If all you need is a yes or no, there is a BOOL_SCHEMA constant for exactly that, which is handy for control flow like checking whether a login actually succeeded before you continue.

from nova_act import NovaAct, BOOL_SCHEMA

with NovaAct(starting_page="https://nova.amazon.com/act") as nova:
    result = nova.act_get("Am I logged in?", schema=BOOL_SCHEMA)
    if result.parsed_response:
        print("Logged in, continuing")
    else:
        print("Not logged in, stopping")

Step 3: Fill and submit a form

Form filling is where the natural language approach earns its keep, because forms are exactly the surface that breaks selector-based scripts every time a field gets reordered. Break the form into small acts, one logical group of fields per call, then a final act to submit. Keep the values in Python and interpolate them so the same script works for any input.

from nova_act import NovaAct

def submit_contact(nova, name: str, email: str, message: str):
    nova.act(f"Type '{name}' into the name field")
    nova.act(f"Type '{email}' into the email field")
    nova.act(f"Type the following into the message box: {message}")
    nova.act("Submit the form")

with NovaAct(starting_page="https://nova.amazon.com/act") as nova:
    submit_contact(
        nova,
        name="Houssem Ben Slama",
        message="Testing Nova Act form fill.",
        email="[email protected]",
    )

Two habits worth forming. Use single quotes around the literal values inside the prompt so the model knows where each value starts and stops, which cuts down on it grabbing the wrong text. And split the submit into its own act so you can insert a confirmation check between filling and submitting. On any form that does something irreversible, you want a BOOL_SCHEMA gate ("are all required fields filled correctly?") before the click, not after.

For genuinely sensitive steps, payment, final purchase confirmation, anything you would not want a model doing unsupervised, Nova Act has a human-in-the-loop facility where the agent pauses, captures a screenshot, and hands control to a person to approve or take over the browser. You implement it by subclassing HumanInputCallbacksBase. That is beyond our scope here, but it is the right tool the moment money or irreversible actions enter the workflow, so know it exists.

Step 4: Start the agent already logged in

Most real automation runs behind a login, and you do not want the agent fighting an auth wall on every run. By default Nova Act starts each session with a clean browser by cloning a fresh Chromium profile and deleting it on exit. To persist cookies and local storage between runs, point user_data_dir at a profile directory and tell the SDK not to clone it.

The SDK ships a helper that opens a browser so you can log in by hand once and save the session. Run it like this:

python -m nova_act.samples.setup_chrome_user_data_dir

Log into your target site in the window it opens, then close it. It prints the path to the saved profile. From then on, pass that path and disable cloning so the agent reuses your authenticated session:

import os
from nova_act import NovaAct

user_data_dir = os.path.expanduser("~/nova-act-profile")

with NovaAct(
    starting_page="https://nova.amazon.com/act",
    user_data_dir=user_data_dir,
    clone_user_data_dir=False,
) as nova:
    logged_in = nova.act_get("Am I logged in?", schema=__import__("nova_act").BOOL_SCHEMA)
    print("Session restored:", logged_in.parsed_response)

One caveat that will trip you up if you scale out: when you run multiple NovaAct instances in parallel, they cannot share one profile directory, because two browsers writing the same profile corrupt it. In that case you must let each instance clone its own copy with clone_user_data_dir=True. Single sequential runs can keep cloning off and reuse the directory in place.

Step 5: Make it robust with step limits and error handling

A browser agent left to its own devices can loop. The page does not load, the model keeps trying, and you burn time and quota. Two guards keep it honest. Cap the agentic loop with max_steps on the act() call, and catch the typed errors the SDK raises so a single failed task does not take down the whole run.

from nova_act import NovaAct
from nova_act.types.act_errors import ActError, ActExceededMaxStepsError

with NovaAct(starting_page="https://nova.amazon.com/act/gym/next-dot/search") as nova:
    try:
        nova.act(
            "Search for one-way flights from Boston to Seattle on February 22nd",
            max_steps=15,
        )
    except ActExceededMaxStepsError:
        print("Agent could not finish within 15 steps, skipping this task")
    except ActError as e:
        print(f"Act failed: {e}")

Nova Act organizes its errors into families, and knowing them tells you whether to retry or give up. ActAgentError means the prompt could not be completed, including ActExceededMaxStepsError when the loop runs out of steps and ActInvalidModelGenerationError when the model returns output that does not fit your schema, both worth a retry with a clearer prompt. ActClientError covers things like ActGuardrailsError, where the responsible-AI guardrails blocked the request, and ActRateLimitExceededError, where you should back off and slow down. ActExecutionError and ActServerError point at problems on the execution or service side that retrying blindly will not fix. Catching the right family means your automation degrades gracefully instead of either crashing or silently looping.

When you do want throughput, one NovaAct instance drives exactly one browser, but the instances are lightweight, so you spin up several in their own threads to run tasks in parallel. Think of it as map-reduce over the web: fan out one session per target, each cloning its own profile, then collect the typed results. That pattern is how you turn a single-page scraper into something that watches fifty pages at once.

Verify it works

Here is the contract. Run the Step 2 script with python watcher.py against the gym flight site. You should see a Chrome window open, the agent run a search, and then your terminal print two lines that look like real data, for example Cheapest: SkyHop at $129.0 followed by Out of 24 results. The exact airline and number will vary because the gym data shifts, but the shape is the test: a parsed float price and an integer count, not a paragraph of text.

If the print statements show populated fields and no Pydantic ValidationError fired, the extraction contract held and your schema is doing its job. That is the moment that proves the tutorial worked, because it means you went from "a model looked at a page" to "I have typed data in Python I can branch on." Add a print(result.parsed_response) line if you want to see the raw dictionary the model returned before validation, which is the single most useful thing to look at when something is off.

For the login step, run the Step 4 script twice. The first run after setup_chrome_user_data_dir should print Session restored: True. If the second run also prints True without you logging in again, persistence is working.

When it breaks

The launch hangs for one to two minutes on the very first run. That is expected and not a bug, it is Playwright downloading Chromium. If it hangs much longer or errors out, your network is probably blocking the browser download. Set NOVA_ACT_SKIP_PLAYWRIGHT_INSTALL only if you already have the browsers installed, otherwise let it finish once.

result.parsed_response is empty or None. You almost certainly used act() instead of act_get(), or you called act_get() without a schema for a structured answer. Extraction without a schema gives the model permission to answer in prose. Always pass a schema when you expect data, and prefer act_get() over act() for anything you plan to read in code.

A ValidationError from Pydantic. The model returned data that does not fit your class, which usually means your schema is more specific than what the page actually shows, for example you typed price_usd: float but the page lists "Call for price." Loosen the field to str or make it Optional, or tighten the prompt to tell the model what to do when a value is missing.

The agent wanders or loops on a complex prompt. You packed too much into one act(). Split it into smaller single-purpose calls and add max_steps. A task that needs the model to juggle four goals at once is a task that needs four act() calls.

ActGuardrailsError or ActRateLimitExceededError. The first means your request tripped the service's responsible-AI guardrails, so rephrase the task. The second means you are going too fast, so add a backoff between calls or reduce how many parallel sessions you run.

Where to take it next

First, move it off your laptop. The Nova Act CLI packages a Python workflow and deploys it to Amazon Bedrock AgentCore Runtime, handling the containerization, ECR, and IAM roles for you, so the same script you wrote here runs as a managed cloud job instead of a window on your desk.

Second, swap the local Chromium for the Bedrock AgentCore Browser Tool, a fully managed cloud browser with session isolation that AWS built for exactly this. The combination of Nova Act for the driving and AgentCore Browser for the runtime is the production story, and AWS has a worked example of agentic QA testing built on the pair that is worth reading before you commit to an architecture.

Third, give the agent tools beyond the browser. You can decorate a Python function with @tool and pass it in, or hand it tools from an MCP server through a Strands MCP client, so your browser agent can call an API or query a database mid-workflow instead of being trapped on the page. That is the bridge from "scraper" to "agent," and it connects this episode straight back to the AgentCore Gateway and MCP work from earlier in the series.

The real shift here is not that a model can click buttons. It is that schemas turn a probabilistic browser into a typed data source. Everything downstream, retries, branching, parallelism, depends on the agent handing you a Flight object instead of a sentence. Get the contract right and the rest is ordinary engineering.