SL#76 - AWS AI Series (16/30) - Ship a Feature Through Kiro Specs With Property-Based Tests

What we are building

Kiro is AWS's agentic IDE. Its headline idea is spec-driven development: instead of chatting your way to code and hoping it matches your architecture, you make Kiro write down what it is going to build first, in three reviewable files, and only then let it write code. The three files are requirements.md (user stories and acceptance criteria in EARS notation), design.md (architecture, data flow, testing strategy), and tasks.md (a checklist of discrete, trackable implementation tasks).

The feature we will ship is deliberately tiny and deliberately full of edge cases: a function that splits a money amount across N people so that the parts sum back to the exact total, no cent is lost to rounding, and the parts are as even as possible. This is the kind of code that passes three hand-written unit tests and then loses a penny in production. It is the perfect target for property-based testing, where you assert invariants and let a generator hunt for the input that violates them.

By the end you will have a Python package with a working split_amount function, a Hypothesis test suite that checks the invariants against thousands of generated inputs, a steering file that makes Kiro write those tests by default on every future feature, and an agent hook that reruns the suite every time you save. The non-obvious part is that the spec, not the prompt, is what gets the tests right. You write the invariants once as acceptance criteria, and they flow through design into actual property tests.

Prerequisites

You need Kiro installed (download from kiro.dev/downloads, available for macOS, Windows, and Linux) and signed in. Kiro's free tier covers everything in this tutorial, so you do not need a paid plan or an AWS account with billing attached. Authentication is through a Kiro login; follow the in-app prompt the first time you open it.

On the local side you need Python 3.10 or newer and the ability to create a virtual environment. We use pytest and hypothesis, both installed with pip. You should be comfortable reading Python and reading a Markdown file, and it helps if you have seen the idea of a "property" in testing before, though I will explain it as we go.

No IAM roles, no S3 buckets, no Bedrock model access are required for this episode. Kiro runs the model calls for you under your Kiro account. That is a change from most of this series, where the deliverable lives in your AWS account. Here the deliverable is a repository on your laptop.

Setup

Create the project and the virtual environment first, so Kiro has a real folder to reason about.

mkdir fair-split && cd fair-split
python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install pytest==8.3.4 hypothesis==6.122.3
git init
mkdir -p src/fairsplit tests
touch src/fairsplit/__init__.py

Now open the folder in Kiro. From the project directory you can run kiro ., or use File then Open Folder inside the app. Click the Kiro ghost icon in the activity bar on the left to open the Kiro panel. You will see sections for Specs, Agent Hooks, Agent Steering, and MCP Servers. The chat pane opens by default on the right.

Smoke test before any real work: in the chat pane, ask Kiro "What files are in this workspace?" It should list your src and tests directories and the virtual environment. If it can see the tree, it is pointed at the right folder and you are ready. If it cannot, you opened the wrong directory; reopen the fair-split folder specifically.

Step 1: Lay down steering so the tests are not an afterthought

Steering files give Kiro persistent context about your project so you do not repeat conventions in every chat. They live in .kiro/steering/ and, by default, are pulled into every interaction. We are going to use one to make property-based testing the house style before we write a single requirement.

In the Kiro panel, open the Agent Steering section and click the + button, choose workspace scope, and name the file testing-standards.md. Put this in it:

---
inclusion: always
---
# Testing standards
All non-trivial logic must ship with property-based tests using Hypothesis,
not only example-based unit tests.
For every pure function, identify its invariants and assert them with
@given strategies. Prefer invariants (properties that hold for all inputs)
over specific input/output pairs.
When a function deals with money, represent amounts as integer cents.
Never use floats for currency. Every split or allocation must conserve
the total exactly.

The front matter matters. inclusion: always is the default mode and means this file loads into every Kiro interaction, so the testing rule influences requirements, design, and code generation alike. Kiro also supports fileMatch (load only when editing matching files), manual (load only when you type #testing-standards in chat), and auto (load when your request matches a description). For a rule this fundamental, always-on is correct.

This is the lever that makes the rest of the tutorial work. Without it, Kiro defaults to a couple of example tests. With it, the spec's design and tasks phases will explicitly plan property tests because the steering told them to. You are programming the planner, not just the coder.

Step 2: Create a Requirements-First spec and write the invariants as EARS

In the Kiro panel, click the + under Specs, or click the Spec button in the chat pane. Kiro asks whether you are building a Feature or fixing a Bug. Choose Feature, then choose the Requirements-First workflow, which is the right pick when you know the behavior you want but the architecture is flexible.

Give it this prompt:

Build a fair money splitter. A function split_amount(total_cents, n)
takes a non-negative integer amount in cents and a positive integer
number of recipients. It returns a list of n integer cent amounts that
sum exactly to total_cents, where any two amounts differ by at most one
cent, and the larger amounts come first. Invalid inputs (negative total,
n <= 0) raise ValueError.

Kiro generates requirements.md under .kiro/specs/fair-money-splitter/. It writes the behavior in EARS notation, which is the format "WHEN [condition] THE SYSTEM SHALL [behavior]". Expect something close to this:

WHEN split_amount is called with valid total_cents and n
THE SYSTEM SHALL return a list of exactly n integers
WHEN the returned list is summed
THE SYSTEM SHALL equal total_cents exactly
WHEN any two elements of the result are compared
THE SYSTEM SHALL differ by at most one cent
WHEN n is less than or equal to zero
THE SYSTEM SHALL raise ValueError

Read these as test oracles, not prose. Each EARS line is an invariant a property test can check directly: sum equals total, max minus min is at most one, length equals n. That one-to-one mapping from acceptance criterion to property is the whole reason EARS is worth the ceremony. Review the generated requirements, add any edge case Kiro missed (the total_cents equals zero case is a good one to confirm explicitly), and approve the phase.

Step 3: Generate design and tasks, and watch the tests show up by name

Once you confirm requirements, Kiro generates design.md: the function signature, the allocation algorithm (integer division for the base amount, then distribute the remainder one cent at a time to the first remainder recipients), data types, error handling, and a testing strategy section. Because your steering file says property tests are mandatory, the testing strategy will name Hypothesis and list the properties to check rather than a vague "add unit tests" line. If it does not, that is a signal your steering file did not load; reopen it and confirm the front matter is the very first content with no blank line above it.

Approve the design, and Kiro generates tasks.md, a checklist of discrete tasks. You will see something like implement the function, write example tests, write property-based tests, handle invalid input. Each task is trackable and updates to in-progress then done as it runs.

Here is a feature worth knowing: click Run all Tasks and Kiro builds a dependency graph of the task list, groups independent tasks into waves, and runs the tasks in a wave concurrently. The implementation task and a docs task with no shared dependency can run at the same time; a test task that depends on the implementation waits for its wave. For a four-task spec the speedup is modest, but on a twenty-task feature it is the difference between watching one task at a time and watching the whole thing fan out. You can also run tasks one by one if you want to review each diff, which is what I would do the first time.

Let the implementation task run. Kiro writes src/fairsplit/splitter.py. A correct version looks like this:

def split_amount(total_cents: int, n: int) -> list[int]:
    if not isinstance(total_cents, int) or not isinstance(n, int):
        raise ValueError("total_cents and n must be integers")
    if total_cents < 0:
        raise ValueError("total_cents must be non-negative")
    if n <= 0:
        raise ValueError("n must be positive")
    base, remainder = divmod(total_cents, n)
    return [base + 1 if i < remainder else base for i in range(n)]

Step 4: Let the spec write the property tests

The property-test task produces tests/test_splitter.py. This is the artifact that makes the feature trustworthy. Each EARS criterion becomes a @given property:

from hypothesis import given, strategies as st
import pytest
from fairsplit.splitter import split_amount
amounts = st.integers(min_value=0, max_value=10_000_000)
counts = st.integers(min_value=1, max_value=1000)
@given(total=amounts, n=counts)
def test_sum_is_conserved(total, n):
    assert sum(split_amount(total, n)) == total
@given(total=amounts, n=counts)
def test_parts_differ_by_at_most_one(total, n):
    parts = split_amount(total, n)
    assert max(parts) - min(parts) <= 1
@given(total=amounts, n=counts)
def test_length_matches_n(total, n):
    assert len(split_amount(total, n)) == n
@given(total=amounts, bad_n=st.integers(max_value=0))
def test_invalid_n_raises(total, bad_n):
    with pytest.raises(ValueError):
        split_amount(total, bad_n)

The difference from example-based testing is that you never picked the inputs. Hypothesis generates hundreds of cases per property, including the ones you would never think to type: a total of zero, a total smaller than n, a prime total across a prime number of people. The conservation property, sum(parts) == total, is the one that catches the classic bug where someone implements the split with floating point and round() and quietly loses a cent on totals that do not divide evenly. Run that broken version against this test and Hypothesis hands you the smallest failing input, for example total=1, n=3, rather than a vague "expected 100 got 99".

To make from fairsplit.splitter import split_amount resolve, install the package in editable mode or add a minimal pyproject.toml. The fastest path for the tutorial is a one-line conftest that puts src on the path:

# tests/conftest.py
import sys, pathlib
sys.path.insert(0, str(pathlib.Path(__file__).parent.parent / "src"))

Step 5: Add an agent hook that reruns the suite on every save

Agent hooks are event-driven automations: when something happens in the IDE, Kiro runs an agent prompt or a shell command. We want the property suite to run whenever a source file changes, so a regression announces itself in seconds instead of at the next manual test run.

Open the Agent Hooks section in the Kiro panel and click +. You can describe the hook in natural language and let Kiro configure it, or fill the form by hand. By hand, set the Event to File Save, the File pattern to src/**/*.py, the Action to Run Command, and the command to:

.venv/bin/pytest -q tests/

Save the hook. Now edit splitter.py, change base + 1 to base (deliberately break the remainder distribution), and save. The hook fires, pytest runs, and the conservation property fails immediately with a falsifying example. Revert the change, save again, green. You have a feedback loop where the invariants guard the file every time it is touched, with no CI round trip.

Hooks can trigger on far more than saves: file create and delete, prompt submission, agent turn completion, before or after a tool call, and before or after a spec task executes. A pre-task-execution hook that runs the linter, or a post-tool-use hook that checks for secrets, are the kind of guardrails that scale across a team. You can open the same UI from the command palette with Cmd+Shift+P (Ctrl+Shift+P on Windows or Linux) and "Kiro: Open Kiro Hook UI".

Verify it works

Run the suite yourself to confirm the whole thing hangs together:

pytest -q tests/

Expected output is four passing properties with Hypothesis reporting no falsifying examples:

....                                                     [100%]
4 passed in 0.93s

Then prove the tests have teeth. Temporarily replace the function body with a naive float version that rounds each share, save, and run again. You should see Hypothesis fail the conservation test and print the minimal counterexample:

Falsifying example: test_sum_is_conserved(
    total=1, n=3,
)

That two-line report is the contract of this tutorial. If you see four green properties on the correct implementation and a minimal falsifying example on the broken one, the spec, the steering file, the generated code, and the hook are all wired correctly. Restore the correct implementation before moving on.

When it breaks

If the design or tasks phase plans only example tests and ignores Hypothesis, your steering file is not loading. The most common cause is a blank line or a comment above the --- front matter; the inclusion block must be the very first content in the file. Fix that and regenerate the design with the Refine button on design.md.

If pytest reports ModuleNotFoundError: No module named 'fairsplit', the src layout is not on the import path. Add the conftest.py shown above, or run pip install -e . with a pyproject.toml that declares the package. The editable install is the cleaner long-term answer.

If the hook does not fire on save, check the file pattern. src/**/*.py matches nested files; a pattern of *.py will not match src/fairsplit/splitter.py because it has no directory component. Also confirm the command path to pytest matches your virtual environment; an absolute path or python -m pytest avoids "command not found".

If a property fails on the correct implementation, read the falsifying example before assuming the code is wrong. More often the property is too strong. A common mistake here is asserting the parts are sorted descending when the requirement only said they may differ by at most one; tighten the code or loosen the property to match the actual EARS criterion, not a stricter one you imagined.

If Kiro stalls during Run all Tasks, run the tasks individually. Parallel waves are convenient but a single failing task can leave the run waiting; sequential execution makes it obvious which task broke and lets you read each diff.

Where to take it next

First, add a Bugfix Spec for a real defect. Bugfix Specs in Kiro capture "unchanged behavior" alongside the fix, in the form "WHEN [condition] THEN the system SHALL CONTINUE TO [existing behavior]", and they generate property-based tests that validate both the fix and the preservation of everything else. It is the same machinery you just used, pointed at regressions.

Second, run Analyze Requirements on a larger spec before design. It does a slower pass that hunts for logical inconsistencies, ambiguities, and gaps across the whole requirement set, which pays off most on compliance-sensitive features where an ambiguous acceptance criterion is expensive.

Third, promote your testing rule from one workspace to all of them by moving testing-standards.md to ~/.kiro/steering/, the global steering location. Teams can push the same file to every developer's machine through MDM or a shared repo, so property-based testing becomes the default for everyone, not just you. The interesting question this raises: once the invariants live in the spec and the spec drives the tests, what is left for the code review to check besides whether the invariants were the right ones?