ETL compute savings estimator
Full refresh reprocesses data that did not change. Estimate your monthly compute savings from migrating to an incremental CDC pipeline.
Estimated CDC monthly cost
$2,500
You save roughly $2,500/mo
New data freshness
< 10 seconds
Plus automatic hard-delete capture (op = 'd').
Should you migrate?
CDC is not free. Answer these four structural questions to find out if full refresh still wins for your workload, or if you are a clear candidate for log-based CDC.
Architecture and code explorer
Select your tier to view the recommended stack and implementation code.
dlt replication + DuckDB
For PostgreSQL sources with fewer than 10 downstream consumers. Pure Python, no Java, no Kafka. Uses pg_replication via the built-in pgoutput plugin.
Python: extract and load
import dlt
from pg_replication import replication_resource
from pg_replication.helpers import init_replication
CREDENTIALS = "postgresql://postgres:postgres@localhost:5432/appdb"
pipeline = dlt.pipeline(
pipeline_name="cdc_pipeline", destination="duckdb", dataset_name="staging"
)
# 1. Capture initial snapshot
snapshot = init_replication(
slot_name="cdc_slot", pub_name="cdc_pub", schema_name="public",
table_names=["orders"], credentials=CREDENTIALS,
persist_snapshots=True, reset=True
)
pipeline.run(snapshot)
# 2. Stream ongoing changes (inserts, updates, deletes)
changes = replication_resource("cdc_slot", "cdc_pub", credentials=CREDENTIALS)
pipeline.run(changes)
Debezium + Redpanda
The standard production stack for durable replay, non-Postgres sources, or more than 10 downstream consumers. Downstream DuckDB MERGE logic stays identical to Tier 1.
Docker Compose: infrastructure
services:
redpanda:
image: docker.redpanda.com/redpandadata/redpanda:latest
command:
- redpanda start
- --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092
debezium:
image: quay.io/debezium/connect:2.5
environment:
- BOOTSTRAP_SERVERS=redpanda:9092
- GROUP_ID=1
- CONFIG_STORAGE_TOPIC=my_connect_configs
depends_on:
- redpanda
DuckDB: apply changes (both tiers)
-- Apply CDC events: upsert updated/inserted rows, remove deleted rows
MERGE INTO orders AS target
USING (
SELECT * FROM cdc.staging_staging.orders
WHERE _dlt_load_id = (SELECT MAX(_dlt_load_id) FROM cdc.staging_staging.orders)
) AS source
ON target.order_id = source.order_id
WHEN MATCHED AND source.deleted_ts IS NOT NULL THEN
DELETE
WHEN MATCHED AND source.deleted_ts IS NULL THEN
UPDATE SET status = source.status, amount = source.amount, updated_at = source.updated_at
WHEN NOT MATCHED AND source.deleted_ts IS NULL THEN
INSERT (order_id, status, amount, updated_at)
VALUES (source.order_id, source.status, source.amount, source.updated_at);