SL#58 - AWS AI Series (2/30) - Production RAG in an Afternoon: Build a Bedrock Knowledge Base on S3 Vectors

What we are building

Episode 1 of this series gave us a CLI that talks to two model families through the Converse API. Today we give those models something to know. The artifact is a retrieval augmented generation (RAG) pipeline: you drop PDFs or markdown files into an S3 bucket, Bedrock chunks them, embeds them with Titan Text Embeddings V2, stores the vectors in an S3 vector index, and you query the whole thing with two API calls, one that returns raw chunks and one that returns a generated answer with citations pointing back to your source files.

The non-obvious design choice is the vector store. Until recently, "Bedrock Knowledge Base" implied OpenSearch Serverless, which bills you for compute that idles at roughly $175 a month minimum even when nobody queries anything. S3 Vectors went GA in December 2025 and changed that math: it is object storage with a native vector query API, fully serverless, billed per GB stored and per query. AWS claims up to 90% cost reduction versus dedicated vector databases, and for the read-mostly, latency-tolerant workloads that most internal RAG actually is, the claim holds up. You trade hybrid search and single-digit-millisecond latency for a bill that rounds to zero when idle. For a documentation assistant, an internal wiki search, or the retrieval layer of an agent, that is the right trade.

By the end you will have: a vector bucket and index, a knowledge base wired to it, an ingestion pipeline, and a small Python module you can call from any app, including the episode 1 CLI.

Prerequisites

You need an AWS account with Bedrock model access enabled for Amazon Titan Text Embeddings V2 and one generation model (we use Nova Pro through its cross-region inference profile; any Converse-capable model works). Model access is granted per region in the Bedrock console under Model access.

Region matters. S3 Vectors is GA in 14 regions including us-east-1, us-east-2, us-west-2, eu-west-1, eu-central-1, and ap-southeast-2. We use us-east-1 throughout. If you pick another region, check both the S3 Vectors region list and the embedding model region table.

You also need Python 3.10+, boto3 1.40 or newer (the s3vectors client appeared in mid-2025; older versions will throw UnknownServiceError), and an IAM principal that can create roles and policies, because Bedrock Knowledge Bases runs under a service role you will create in step 2.

Estimated cost of following along: embedding a few hundred pages costs about a cent with Titan V2, vector PUTs are $0.20 per logical GB (a few thousand 1024-dim vectors are megabytes, so fractions of a cent), storage is $0.06 per GB-month, and each RetrieveAndGenerate call costs a normal model invocation. Realistically under $0.50 total if you clean up at the end.

Setup

Install and verify:

python -m venv .venv && source .venv/bin/activate
pip install "boto3>=1.40"
export AWS_REGION=us-east-1
python -c "import boto3; boto3.client('s3vectors'); print('s3vectors client OK, boto3', boto3.__version__)"

If that prints s3vectors client OK, your boto3 is new enough. Create a config module the rest of the code imports, config.py:

import boto3
REGION = "us-east-1"
ACCOUNT_ID = boto3.client("sts").get_caller_identity()["Account"]
DOCS_BUCKET = f"sl58-rag-docs-{ACCOUNT_ID}"          # source documents
VECTOR_BUCKET = f"sl58-rag-vectors-{ACCOUNT_ID}"     # S3 vector bucket
INDEX_NAME = "sl58-kb-index"
KB_NAME = "sl58-knowledge-base"
ROLE_NAME = "SL58BedrockKBRole"
EMBED_MODEL_ARN = (
    f"arn:aws:bedrock:{REGION}::foundation-model/amazon.titan-embed-text-v2:0"
)
GEN_MODEL_ID = "us.amazon.nova-pro-v1:0"  # cross-region inference profile
EMBED_DIMENSIONS = 1024

Two things to notice. First, bucket names embed your account ID because S3 names are global. Second, EMBED_DIMENSIONS is 1024: Titan V2 supports 256, 512, and 1024 dimensions, and the index dimension must match exactly what the knowledge base will write, or ingestion fails with a dimension mismatch error you will only see in the ingestion job statistics.

Step 1: Create the vector bucket and index

S3 Vectors introduces two new resource types: vector buckets and vector indexes inside them. We create both with the s3vectors client. Save as step1_vector_store.py:

import boto3
from config import REGION, VECTOR_BUCKET, INDEX_NAME, EMBED_DIMENSIONS
s3v = boto3.client("s3vectors", region_name=REGION)
s3v.create_vector_bucket(vectorBucketName=VECTOR_BUCKET)
s3v.create_index(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    dataType="float32",
    dimension=EMBED_DIMENSIONS,
    distanceMetric="cosine",
    metadataConfiguration={
        "nonFilterableMetadataKeys": [
            "AMAZON_BEDROCK_TEXT",
            "AMAZON_BEDROCK_METADATA",
        ]
    },
)
idx = s3v.get_index(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME)
print("Index ARN:", idx["index"]["indexArn"])

The metadataConfiguration line is the part everyone gets wrong. Every vector in S3 Vectors carries metadata, split into filterable (you can use it in query-time filters) and non-filterable (stored, returned, but not indexed for filtering). Bedrock Knowledge Bases stores the raw chunk text under AMAZON_BEDROCK_TEXT and its bookkeeping under AMAZON_BEDROCK_METADATA. Chunk text is big, and filterable metadata has a tight size budget (with Knowledge Bases you get about 1KB of custom filterable metadata and at most 35 metadata keys per vector). If you forget to declare those two keys as non-filterable, ingestion blows past the filterable metadata limit and documents fail to index. The console's quick-create flow does this for you; since we are building it ourselves, we do it explicitly.

distanceMetric="cosine" matches what Titan embeddings expect. S3 Vectors supports cosine and euclidean only, which is one of several signs this is a deliberately narrow product: no hybrid search, no BM25, floating-point vectors only. Semantic search, done cheaply, at up to 2 billion vectors per index since GA.

Step 2: Create the service role Bedrock will assume

Bedrock Knowledge Bases does its work (reading your docs, calling the embedding model, writing vectors) under a service role you provide. This is where most first attempts die with The knowledge base storage configuration provided is invalid, so we build the role carefully. Save as step2_iam.py:

import json, time, boto3
from config import (REGION, ACCOUNT_ID, DOCS_BUCKET, VECTOR_BUCKET,
                    INDEX_NAME, ROLE_NAME, EMBED_MODEL_ARN)
iam = boto3.client("iam")
trust = {
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Principal": {"Service": "bedrock.amazonaws.com"},
        "Action": "sts:AssumeRole",
        "Condition": {
            "StringEquals": {"aws:SourceAccount": ACCOUNT_ID},
            "ArnLike": {"aws:SourceArn":
                f"arn:aws:bedrock:{REGION}:{ACCOUNT_ID}:knowledge-base/*"}
        }
    }]
}
policy = {
    "Version": "2012-10-17",
    "Statement": [
        {"Effect": "Allow",
         "Action": ["bedrock:InvokeModel"],
         "Resource": [EMBED_MODEL_ARN]},
        {"Effect": "Allow",
         "Action": ["s3:GetObject", "s3:ListBucket"],
         "Resource": [f"arn:aws:s3:::{DOCS_BUCKET}",
                      f"arn:aws:s3:::{DOCS_BUCKET}/*"]},
        {"Effect": "Allow",
         "Action": ["s3vectors:GetIndex", "s3vectors:PutVectors",
                    "s3vectors:GetVectors", "s3vectors:QueryVectors",
                    "s3vectors:DeleteVectors", "s3vectors:ListVectors"],
         "Resource": [f"arn:aws:s3vectors:{REGION}:{ACCOUNT_ID}:bucket/"
                      f"{VECTOR_BUCKET}/index/{INDEX_NAME}"]},
    ],
}
iam.create_role(RoleName=ROLE_NAME, AssumeRolePolicyDocument=json.dumps(trust))
iam.put_role_policy(RoleName=ROLE_NAME, PolicyName="sl58-kb-access",
                    PolicyDocument=json.dumps(policy))
time.sleep(10)  # IAM propagation
print("Role ARN:", f"arn:aws:iam::{ACCOUNT_ID}:role/{ROLE_NAME}")

Three details worth your attention. The trust policy uses aws:SourceArn scoped to knowledge bases in your account, which prevents the confused-deputy problem where another account's Bedrock resources could assume your role. The S3 Vectors statement uses the s3vectors: action namespace, not s3:; it is a separate service from IAM's point of view, and the resource ARN format (bucket/<name>/index/<name>) is also its own thing. And the time.sleep(10): IAM is eventually consistent, and create_knowledge_base will fail with a validation error if it checks the role before propagation finishes. Ten seconds is usually enough; the retry logic in step 3 covers the rest.

Step 3: Create the knowledge base and data source

Now the Bedrock side. A knowledge base ties together the embedding model, the vector store, and one or more data sources. Save as step3_kb.py:

import time, boto3
from botocore.exceptions import ClientError
from config import (REGION, ACCOUNT_ID, DOCS_BUCKET, VECTOR_BUCKET,
                    INDEX_NAME, KB_NAME, ROLE_NAME,
                    EMBED_MODEL_ARN, EMBED_DIMENSIONS)
agent = boto3.client("bedrock-agent", region_name=REGION)
boto3.client("s3", region_name=REGION).create_bucket(Bucket=DOCS_BUCKET)
index_arn = (f"arn:aws:s3vectors:{REGION}:{ACCOUNT_ID}:bucket/"
             f"{VECTOR_BUCKET}/index/{INDEX_NAME}")
for attempt in range(6):
    try:
        kb = agent.create_knowledge_base(
            name=KB_NAME,
            roleArn=f"arn:aws:iam::{ACCOUNT_ID}:role/{ROLE_NAME}",
            knowledgeBaseConfiguration={
                "type": "VECTOR",
                "vectorKnowledgeBaseConfiguration": {
                    "embeddingModelArn": EMBED_MODEL_ARN,
                    "embeddingModelConfiguration": {
                        "bedrockEmbeddingModelConfiguration": {
                            "dimensions": EMBED_DIMENSIONS,
                            "embeddingDataType": "FLOAT32",
                        }
                    },
                },
            },
            storageConfiguration={
                "type": "S3_VECTORS",
                "s3VectorsConfiguration": {"indexArn": index_arn},
            },
        )
        break
    except ClientError as e:
        if attempt == 5: raise
        print("retrying:", e.response["Error"]["Code"]); time.sleep(15)
kb_id = kb["knowledgeBase"]["knowledgeBaseId"]
ds = agent.create_data_source(
    knowledgeBaseId=kb_id,
    name="sl58-docs",
    dataSourceConfiguration={
        "type": "S3",
        "s3Configuration": {"bucketArn": f"arn:aws:s3:::{DOCS_BUCKET}"},
    },
    vectorIngestionConfiguration={
        "chunkingConfiguration": {
            "chunkingStrategy": "FIXED_SIZE",
            "fixedSizeChunkingConfiguration": {
                "maxTokens": 512, "overlapPercentage": 20,
            },
        }
    },
)
print("KB:", kb_id, "| DS:", ds["dataSource"]["dataSourceId"])

The storageConfiguration block is the whole reason this episode exists: type: "S3_VECTORS" plus the index ARN, and Bedrock manages everything inside the index from then on. Compare that to the OpenSearch Serverless configuration, which needs a collection ARN, an index you pre-created with an exact field mapping, and three field name parameters that have to match it. S3 Vectors took the part of Knowledge Bases setup that generated the most support threads and deleted it.

On chunking: fixed-size at 512 tokens with 20% overlap is the boring default that works for prose documentation. Bedrock also offers hierarchical and semantic chunking strategies. Be careful with hierarchical chunking on S3 Vectors specifically: parent-child chunk context is stored as non-filterable metadata, and very large token settings can exceed the per-vector metadata size limit. The S3 Vectors + Knowledge Bases docs call this out as a known constraint.

Step 4: Ingest your documents

Upload some real documents and run an ingestion job. Use whatever you have: a few PDFs, your project's docs folder, exported runbooks. Save as step4_ingest.py:

import sys, time, pathlib, boto3
from config import REGION, DOCS_BUCKET
KB_ID, DS_ID = sys.argv[1], sys.argv[2]
s3 = boto3.client("s3", region_name=REGION)
agent = boto3.client("bedrock-agent", region_name=REGION)
for path in pathlib.Path("docs").rglob("*"):
    if path.is_file() and path.suffix.lower() in (".pdf", ".md", ".txt", ".html", ".docx"):
        s3.upload_file(str(path), DOCS_BUCKET, path.name)
        print("uploaded", path.name)
job = agent.start_ingestion_job(knowledgeBaseId=KB_ID, dataSourceId=DS_ID)
job_id = job["ingestionJob"]["ingestionJobId"]
while True:
    j = agent.get_ingestion_job(knowledgeBaseId=KB_ID,
                                dataSourceId=DS_ID,
                                ingestionJobId=job_id)["ingestionJob"]
    print("status:", j["status"], "|", j.get("statistics", {}))
    if j["status"] in ("COMPLETE", "FAILED"):
        break
    time.sleep(15)

Run it as python step4_ingest.py <KB_ID> <DS_ID> with the IDs printed by step 3, after putting your files in a local docs/ folder. The ingestion job is where Bedrock earns its keep: it fetches each object, parses it (PDFs included), chunks per your strategy, calls Titan V2 once per chunk, and PUTs the vectors into your index. The statistics dict tells you exactly how many documents were scanned, indexed, and failed. Read it. A job can finish COMPLETE with failed documents, and the only place you find out is here.

Sync is incremental: re-running an ingestion job re-scans the source and only processes new, modified, or deleted objects. For continuous pipelines there is also an IngestKnowledgeBaseDocuments API to push documents directly without a sync, which pairs well with event-driven ingestion (S3 event to Lambda to direct ingest). We will use exactly that pattern in episode 25 when we build real-time RAG.

Step 5: Query it, two ways

Retrieval comes in two flavors from the bedrock-agent-runtime client. Retrieve gives you scored chunks, which is what you want when your agent or app does its own generation. RetrieveAndGenerate is the full managed RAG loop with citations. Save as step5_query.py:

import sys, boto3
from config import REGION, GEN_MODEL_ID
KB_ID, QUESTION = sys.argv[1], " ".join(sys.argv[2:])
rt = boto3.client("bedrock-agent-runtime", region_name=REGION)
chunks = rt.retrieve(
    knowledgeBaseId=KB_ID,
    retrievalQuery={"text": QUESTION},
    retrievalConfiguration={"vectorSearchConfiguration": {"numberOfResults": 5}},
)
print("--- top chunks ---")
for r in chunks["retrievalResults"]:
    print(f"[{r['score']:.3f}]", r["content"]["text"][:120].replace("\n", " "), "...")
resp = rt.retrieve_and_generate(
    input={"text": QUESTION},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": KB_ID,
            "modelArn": GEN_MODEL_ID,
        },
    },
)
print("\n--- answer ---\n", resp["output"]["text"])
print("\n--- citations ---")
for c in resp["citations"]:
    for ref in c["retrievedReferences"]:
        print("-", ref["location"]["s3Location"]["uri"])

Note that modelArn accepts a cross-region inference profile ID like us.amazon.nova-pro-v1:0 directly. Citations come back as structured spans: each cited passage of the generated answer maps to the exact chunks (and source S3 URIs) that grounded it. This is the feature that separates a demo from something you can put in front of users, because "the answer came from page 98 of this PDF" is checkable and "the model said so" is not.

Verify it works

Put at least one document in docs/ whose content you can verify, then run the pipeline end to end:

python step1_vector_store.py
python step2_iam.py
python step3_kb.py            # prints KB: XXXXXXXXXX | DS: YYYYYYYYYY
python step4_ingest.py XXXXXXXXXX YYYYYYYYYY
python step5_query.py XXXXXXXXXX "What does our runbook say about rollbacks?"

Expected output shape for step 4: status lines progressing STARTING, IN_PROGRESS, then COMPLETE with statistics like {'numberOfDocumentsScanned': 12, 'numberOfNewDocumentsIndexed': 12, 'numberOfDocumentsFailed': 0}. For step 5: five scored chunks (cosine scores typically land between 0.3 and 0.8 for relevant hits), then a generated answer, then one or more s3://sl58-rag-docs-.../<your-file> citation lines. If the citations point at the document you expected, the contract is fulfilled: ingestion, embedding, storage, retrieval, and grounded generation all work.

When it breaks

UnknownServiceError: Unknown service: 's3vectors'. Your boto3 predates the S3 Vectors GA SDK. pip install -U boto3 inside the venv and re-check with the smoke test from Setup.

ValidationException: The knowledge base storage configuration provided is invalid on create_knowledge_base. Three usual causes, in order of likelihood: the service role is missing the s3vectors:* statement or has a typo in the index ARN (check the bucket/<name>/index/<name> format); IAM hasn't propagated yet (the retry loop in step 3 handles it, but only if the policy is actually correct); or the index dimension doesn't match embeddingModelConfiguration.dimensions.

Ingestion completes but numberOfDocumentsFailed > 0. Call get_ingestion_job and look at failureReasons. The S3-Vectors-specific cause is metadata overflow: you skipped the nonFilterableMetadataKeys configuration in step 1, or you used hierarchical chunking with large token settings. Recreate the index with the two AMAZON_BEDROCK_* keys declared non-filterable.

AccessDeniedException on retrieve_and_generate but retrieve works. Retrieval runs under the KB service role, generation runs under your caller identity. Your IAM user/role needs bedrock:InvokeModel on the generation model (and on the inference profile if you used a us. prefixed ID).

Empty or irrelevant results. Remember S3 Vectors is semantic-only, no hybrid search. Exact-match queries on part numbers, error codes, or function names that an OpenSearch BM25 index would nail can come back weak. If your corpus is keyword-heavy, that is the signal you are in OpenSearch territory; episode 26 covers the tiered strategy and the cost math for choosing.

Cleanup

Everything here is pay-per-use, but clean up anyway. Order matters: data source, then KB, then the vector resources, then IAM.

import boto3
from config import REGION, VECTOR_BUCKET, INDEX_NAME, DOCS_BUCKET, ROLE_NAME
agent = boto3.client("bedrock-agent", region_name=REGION)
kbs = agent.list_knowledge_bases()["knowledgeBaseSummaries"]
for kb in [k for k in kbs if k["name"] == "sl58-knowledge-base"]:
    for ds in agent.list_data_sources(knowledgeBaseId=kb["knowledgeBaseId"])["dataSourceSummaries"]:
        agent.delete_data_source(knowledgeBaseId=kb["knowledgeBaseId"],
                                 dataSourceId=ds["dataSourceId"])
    agent.delete_knowledge_base(knowledgeBaseId=kb["knowledgeBaseId"])
s3v = boto3.client("s3vectors", region_name=REGION)
s3v.delete_index(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME)
s3v.delete_vector_bucket(vectorBucketName=VECTOR_BUCKET)
s3 = boto3.resource("s3", region_name=REGION)
s3.Bucket(DOCS_BUCKET).objects.all().delete()
s3.Bucket(DOCS_BUCKET).delete()
iam = boto3.client("iam")
iam.delete_role_policy(RoleName=ROLE_NAME, PolicyName="sl58-kb-access")
iam.delete_role(RoleName=ROLE_NAME)
print("cleaned up")

The default data deletion policy on the data source is Delete, which removes the vectors Bedrock wrote when the data source is deleted, so the index should already be empty by the time you delete it.

Where to take it next

Easiest: add a metadata filter. Drop a <filename>.metadata.json sidecar next to each document in the docs bucket with custom attributes (team, product, date), re-ingest, and pass a filter inside vectorSearchConfiguration to scope retrieval; just respect the 1KB filterable metadata budget per vector.

Middle: wire this KB into the episode 1 CLI as a --kb flag, calling Retrieve yourself and stuffing chunks into the Converse API system prompt. You get control over the prompt template that RetrieveAndGenerate hides from you, and you can compare answer quality between Nova and Claude on identical retrieved context.

Hardest: skip ingestion jobs entirely and go event-driven with IngestKnowledgeBaseDocuments from a Lambda triggered by S3 events, so new documents become searchable in seconds rather than on the next sync. That is the skeleton of episode 25.

Next episode: the knowledge base answers questions, but nothing stops it from answering questions it shouldn't. We put Bedrock Guardrails in front of it, with before-and-after attack demos.