SL#90 - AWS AI Series (24/30) - Zero-ETL in Practice: Query Live DynamoDB Data in Redshift With No Pipeline Code

What we are building

DynamoDB is where your app writes. Redshift is where your analysts and your AI features read. The old way to bridge them was a pipeline: a scheduled export, a Glue job, a Lambda, a Step Function, something that read DynamoDB, reshaped it, and wrote it to the warehouse. That pipeline was code you owned, code that timed out at 3am, code that drifted out of sync with the table schema.

Zero-ETL deletes that pipeline. You declare an integration between a DynamoDB table and a Redshift namespace, and AWS keeps the warehouse copy current for you. No Glue job, no Lambda, no cron. The data shows up in a Redshift table and stays fresh.

By the end of this tutorial you will have a DynamoDB table named orders, a Redshift Serverless workgroup, and a live zero-ETL integration between them. You will insert an item into DynamoDB with the CLI, wait, and watch it appear in a Redshift query. The whole thing is driven from boto3 and the Redshift Data API, so there is no psql client to install and nothing to click in the console.

The one design choice worth naming up front: this is near-real-time, not real-time. The integration refreshes every 15 to 30 minutes using DynamoDB incremental exports. If you need sub-second freshness, this is the wrong tool, and episode 25 (Kinesis into a live knowledge base) is the right one. For dashboards, reporting, and feeding batch AI features from operational data, 15 minutes of lag is a fair trade for zero pipeline code.

Prerequisites

You need an AWS account with permissions to create DynamoDB tables, Redshift Serverless namespaces and workgroups, and Redshift zero-ETL integrations. The IAM identity running the scripts needs dynamodb:CreateTable, dynamodb:UpdateContinuousBackups, dynamodb:PutResourcePolicy, dynamodb:GetResourcePolicy, the redshift-serverless:* actions for creating the namespace and workgroup, and the integration actions redshift:CreateIntegration, redshift:DescribeIntegrations, redshift:CreateInboundIntegration, plus redshift-data:ExecuteStatement and redshift-data:DescribeStatement. If you run as an account admin in a sandbox account, you already have these.

You need Python 3.9 or newer and a recent boto3. Pin it: pip install boto3==1.40.15. You need the AWS CLI v2 configured with a default region, because the DynamoDB write test uses it.

You should be comfortable reading Python and SQL. You do not need to know Redshift internals or the SUPER data type in advance; we cover the one part that matters.

One hard constraint: the DynamoDB table and the Redshift namespace must be in the same Region. Cross-region zero-ETL does not exist for this source. Pick one region and use it everywhere. The examples use us-east-1.

Setup

Create a working directory and set the shared names as environment variables so every script agrees on them.

mkdir zetl-demo && cd zetl-demo
export AWS_REGION=us-east-1
export DDB_TABLE=orders
export RS_NAMESPACE=zetl-ns
export RS_WORKGROUP=zetl-wg
export RS_ADMIN_USER=zetladmin
export RS_ADMIN_PW='Zetl-Demo-2026!'
pip install boto3==1.40.15

Smoke test that your credentials and region resolve before you provision anything:

aws sts get-caller-identity --query Account --output text
aws configure get region

The first command prints your 12-digit account ID. The second should print us-east-1. If either fails, fix your credentials now. Every step below assumes both work.

Step 1: Create the DynamoDB source table with PITR and a resource policy

The source table has three requirements that are easy to miss: point-in-time recovery (PITR) must be on, the table must be encrypted with an AWS-owned or customer-managed KMS key (the default AWS-owned key is fine, but AWS-managed keys are not supported), and the table needs a resource-based policy that lets the Redshift service export it.

# 01_create_source.py
import os, json, boto3

region = os.environ["AWS_REGION"]
table = os.environ["DDB_TABLE"]
acct = boto3.client("sts").get_caller_identity()["Account"]
ddb = boto3.client("dynamodb", region_name=region)

ddb.create_table(
    TableName=table,
    AttributeDefinitions=[{"AttributeName": "order_id", "AttributeType": "S"}],
    KeySchema=[{"AttributeName": "order_id", "KeyType": "HASH"}],
    BillingMode="PAY_PER_REQUEST",
)
ddb.get_waiter("table_exists").wait(TableName=table)

ddb.update_continuous_backups(
    TableName=table,
    PointInTimeRecoverySpecification={"PointInTimeRecoveryEnabled": True},
)

policy = {
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "AllowRedshiftExport",
        "Effect": "Allow",
        "Principal": {"Service": "redshift.amazonaws.com"},
        "Action": ["dynamodb:ExportTableToPointInTime", "dynamodb:DescribeTable",
                   "dynamodb:DescribeExport"],
        "Resource": "*",
        "Condition": {
            "StringEquals": {"aws:SourceAccount": acct},
            "ArnEquals": {"aws:SourceArn": f"arn:aws:redshift:{region}:{acct}:integration:*"},
        },
    }],
}
ddb.put_resource_policy(
    ResourceArn=f"arn:aws:dynamodb:{region}:{acct}:table/{table}",
    Policy=json.dumps(policy),
)
print("source table ready:", table)

Run it with python 01_create_source.py. The update_continuous_backups call is what enables PITR, and without PITR the integration will refuse to select this table as a source. The resource policy is the part people forget: it grants redshift.amazonaws.com permission to run ExportTableToPointInTime against your table, scoped by aws:SourceAccount and aws:SourceArn so only integrations in your own account can use it. In the console there is a "Fix it for me" button that writes this policy for you, but doing it explicitly means you can see exactly what access you granted.

Step 2: Create the Redshift Serverless target with case sensitivity on

The target can be a Redshift Serverless workgroup or a provisioned RA3 cluster. Serverless is cheaper to spin up and tear down for a demo. The one non-obvious requirement is enable_case_sensitive_identifier: the integration fails to activate unless the target warehouse has this parameter set to true. On Serverless you set it as a config parameter on the workgroup.

# 02_create_target.py
import os, time, boto3

region = os.environ["AWS_REGION"]
ns = os.environ["RS_NAMESPACE"]
wg = os.environ["RS_WORKGROUP"]
rss = boto3.client("redshift-serverless", region_name=region)

rss.create_namespace(
    namespaceName=ns,
    adminUsername=os.environ["RS_ADMIN_USER"],
    adminUserPassword=os.environ["RS_ADMIN_PW"],
    dbName="dev",
)
rss.create_workgroup(
    workgroupName=wg,
    namespaceName=ns,
    baseCapacity=8,
    publiclyAccessible=False,
    configParameters=[
        {"parameterKey": "enable_case_sensitive_identifier", "parameterValue": "true"},
    ],
)

while rss.get_workgroup(workgroupName=wg)["workgroup"]["status"] != "AVAILABLE":
    print("workgroup provisioning...")
    time.sleep(30)
print("target ready:", wg)

Run python 02_create_target.py. A workgroup takes a few minutes to reach AVAILABLE, so the polling loop prints a status line every 30 seconds. baseCapacity=8 is the smallest Serverless size (8 RPUs), which is plenty for this demo and keeps the bill low. The dev database is created automatically inside the namespace; we will create a second database from the integration in step 4.

If you already run a provisioned RA3 cluster, the equivalent is a parameter group with enable_case_sensitive_identifier set to true followed by a reboot. Everything else in this tutorial is identical, only the target ARN changes.

Step 3: Create the zero-ETL integration and wait for Active

Now the actual integration. This is a single API call that ties the DynamoDB table ARN (source) to the Redshift namespace ARN (target). Note the client: create_integration lives on the redshift client, not redshift-serverless, even when the target is Serverless.

# 03_create_integration.py
import os, time, boto3

region = os.environ["AWS_REGION"]
acct = boto3.client("sts").get_caller_identity()["Account"]
table = os.environ["DDB_TABLE"]
ns = os.environ["RS_NAMESPACE"]

rss = boto3.client("redshift-serverless", region_name=region)
namespace_arn = rss.get_namespace(namespaceName=ns)["namespace"]["namespaceArn"]
source_arn = f"arn:aws:dynamodb:{region}:{acct}:table/{table}"

rs = boto3.client("redshift", region_name=region)
resp = rs.create_integration(
    IntegrationName="orders-ddb-to-rs",
    SourceArn=source_arn,
    TargetArn=namespace_arn,
)
integ_arn = resp["IntegrationArn"]
print("creating:", integ_arn)

while True:
    integ = rs.describe_integrations(IntegrationArn=integ_arn)["Integrations"][0]
    status = integ["Status"]
    print("status:", status)
    if status == "active":
        break
    if status in ("failed", "deleting"):
        raise SystemExit(f"integration {status}: {integ.get('Errors')}")
    time.sleep(60)
print("integration active")

Run python 03_create_integration.py. The status walks through creating and then active. The initial move is a full export of the DynamoDB table, so on a large table this takes a while; on our tiny demo table it is quick. If the status goes to failed, the Errors field usually names the missing prerequisite (most often PITR or the case-sensitivity parameter), which is why we print it before exiting.

The reason there is no schema to define here is worth pausing on. You never told Redshift what columns orders has. That is because DynamoDB does not have a fixed schema beyond its keys, and neither will the Redshift copy. We deal with that in the next step.

Step 4: Create the destination database and understand the SUPER column

The integration being active does not mean data is flowing yet. You have to create a database in Redshift from the integration. That database is where the replicated table lands. We do this over the Redshift Data API so there is no SQL client to install.

# 04_create_db_and_query.py
import os, time, boto3

region = os.environ["AWS_REGION"]
wg = os.environ["RS_WORKGROUP"]
data = boto3.client("redshift-data", region_name=region)

def run_sql(sql, db="dev"):
    sid = data.execute_statement(workgroupName=wg, database=db, sql=sql)["Id"]
    while True:
        d = data.describe_statement(Id=sid)
        if d["Status"] == "FINISHED":
            return data.get_statement_result(Id=sid) if d["HasResultSet"] else None
        if d["Status"] in ("FAILED", "ABORTED"):
            raise SystemExit(d.get("Error", d["Status"]))
        time.sleep(2)

integ_id = run_sql("SELECT integration_id FROM SVV_INTEGRATION")\
    ["Records"][0][0]["stringValue"]
print("integration_id:", integ_id)

run_sql(f"CREATE DATABASE orders_db FROM INTEGRATION '{integ_id}'")
print("database created, replication starting")

Run python 04_create_db_and_query.py. It reads the integration_id from the SVV_INTEGRATION system view, then runs CREATE DATABASE orders_db FROM INTEGRATION '<id>'. That statement is what starts replication. Within a minute or two Redshift creates a table named orders (same name as the DynamoDB table) in the public schema of orders_db.

Here is the part that trips everyone up. DynamoDB only enforces a schema on its key attributes, so the replicated table has exactly three columns: the partition key (order_id), the sort key if you had one (we did not), and a column named value of type SUPER that holds every other attribute as DynamoDB JSON. You do not get one clean column per attribute. You get one semistructured blob you unnest at query time. That is not a bug, it is the only honest way to project a schemaless store into a columnar warehouse.

To read a field out of value, you navigate into it and cast the DynamoDB type wrapper. A string attribute customer stored as {"S": "acme"} is read as value.customer."S"::text. Add this query function and call to the bottom of the script to see it:

rows = run_sql(
    'SELECT order_id, value.customer."S"::text AS customer, '
    'value.amount."N"::float AS amount '
    'FROM orders_db.public.orders',
    db="dev",
)
for r in (rows["Records"] if rows else []):
    print(r)

If you want clean columns for a dashboard or an AI feature, wrap that projection in a materialized view with AUTO REFRESH YES. Redshift refreshes it incrementally as new data arrives, so downstream queries hit flat typed columns instead of unnesting SUPER every time:

CREATE MATERIALIZED VIEW public.orders_flat AUTO REFRESH YES AS
SELECT order_id,
       value.customer."S"::varchar(60) AS customer,
       value.amount."N"::float          AS amount,
       value.status."S"::varchar(20)    AS status
FROM orders_db.public.orders;

Verify it works

Write an item to DynamoDB with the CLI, then query Redshift and watch it appear. Because replication runs every 15 to 30 minutes, this is where you make coffee.

aws dynamodb put-item --table-name orders --item '{
  "order_id": {"S": "o-1001"},
  "customer": {"S": "acme"},
  "amount":   {"N": "249.90"},
  "status":   {"S": "paid"}
}'

Then run the projection query from step 4 (rerun 04_create_db_and_query.py after adding the query block). Within one refresh cycle you should see:

[{'stringValue': 'o-1001'}, {'stringValue': 'acme'}, {'doubleValue': 249.9}]

That is the contract. You wrote to an operational NoSQL store and read the same record out of a SQL warehouse, and you wrote zero lines of pipeline code to move it. To confirm the integration is healthy while you wait, query the table-state system view: SELECT * FROM SVV_INTEGRATION_TABLE_STATE. A Synced state on the orders table means the last refresh landed. You can also watch the IntegrationLag metric in CloudWatch under the Redshift namespace, which reports replication lag in minutes.

When it breaks

Integration status goes to failed immediately. The Errors field is your friend here. The two most common causes are PITR not enabled on the source table and enable_case_sensitive_identifier not set to true on the target. Rerun step 1 and step 2, confirm both, then recreate the integration. You cannot patch a failed integration, you delete and recreate it.

Integration is active but no orders table appears in Redshift. You probably skipped CREATE DATABASE ... FROM INTEGRATION. The integration being active only means the plumbing is connected; the database creation is what starts the actual replication. Check SELECT * FROM SVV_INTEGRATION to confirm the database exists and is tied to the integration.

put-item succeeds but the row never shows up in Redshift. Give it a full 30 minutes before assuming failure. This is not streaming. If it is still missing after two refresh cycles, check SVV_INTEGRATION_TABLE_STATE for an error state on the table, and confirm the DynamoDB resource policy from step 1 is still attached (a PutResourcePolicy from another tool can overwrite it).

A write to Redshift fails with a permissions error. The destination database is read-only by design. Only the integration source can change the data. You can run DDL and read-only queries against orders_db, but any INSERT, UPDATE, or DELETE has to happen on the DynamoDB side. If your app needs to mutate a row, it mutates DynamoDB and the change replicates.

Access denied creating the integration. The identity-based policy for the caller needs redshift:CreateInboundIntegration on the namespace ARN in addition to redshift:CreateIntegration on the integration ARN. Missing the inbound action is the usual culprit, and the error message does not always make that obvious.

What it costs

There is no separate charge for the zero-ETL integration itself. You pay for the resources it uses: DynamoDB PITR (billed per GB-month of continuous backup), the DynamoDB incremental exports that drive replication, the extra Redshift storage for the replicated copy, and the Redshift compute that runs the refreshes and your queries. For this demo with a handful of items, the DynamoDB side is cents. The real line item is Redshift Serverless compute at roughly 0.375 USD per RPU-hour, so an 8-RPU workgroup that stays up costs a few dollars a day even mostly idle. Tear it down when you are done and the whole exercise stays under a couple of dollars.

Clean up

Deleting the integration stops future replication but does not delete data already in Redshift or anything in DynamoDB. Remove all three resources so nothing keeps billing:

# find and delete the integration
INTEG=$(aws redshift describe-integrations \
  --query "Integrations[?IntegrationName=='orders-ddb-to-rs'].IntegrationArn" \
  --output text)
aws redshift delete-integration --integration-arn "$INTEG"

# delete the Redshift Serverless workgroup then namespace
aws redshift-serverless delete-workgroup --workgroup-name "$RS_WORKGROUP"
aws redshift-serverless delete-namespace --namespace-name "$RS_NAMESPACE"

# delete the DynamoDB table (PITR goes with it)
aws dynamodb delete-table --table-name "$DDB_TABLE"

Delete the workgroup before the namespace; the namespace delete fails while a workgroup still references it. Once all three are gone, billing stops.

Where to take it next

First, swap DynamoDB for an Aurora source. The integration mechanics are the same API, but Aurora is relational, so the replicated tables come across as flat typed columns with no SUPER blob to unnest. You point SourceArn at the Aurora cluster ARN and, for PostgreSQL sources, name the source database in the CREATE DATABASE ... FROM INTEGRATION ... DATABASE "named_db" statement. That is the cleaner half of "Aurora or DynamoDB to Redshift" when your operational store is already relational.

Second, feed the materialized view into a Bedrock prompt. You built orders_flat in step 4; a nightly job that pulls aggregates from it (top customers, refund rate, order velocity) and drops them into a Converse call turns operational data into an AI feature with no bespoke ETL between the two. That is the whole point of getting the data into Redshift in the first place.

Third, add a second DynamoDB table. Zero-ETL integrations are one table per integration, but multiple integrations can target the same Redshift namespace, so you can assemble a warehouse view across several operational tables and join them in SQL. The question worth sitting with: once moving data is free, how much of your "data platform" was ever really about the platform, and how much was just plumbing you no longer have to own?