Real-time change streams

The engineer's reference for Change Data Capture.

In-depth guides and interactive tools for building real-time data pipelines across Oracle, Debezium, GoldenGate, Kafka, Redpanda, and Apache Iceberg.

Explore guides
Pipeline anatomy
Oracle redo log
SOURCE
Debezium
CAPTURE
Kafka / Iceberg
SINK
POSITIONSCN 0x0A3F·C21
LAG240 ms
OFFSETcommitted
THROUGHPUT18.4k ev/s
01

Debezium with Oracle LogMiner

2
01
Debezium Oracle SCN Recovery

Resiliency and recovery architectures for Debezium Server with Oracle LogMiner in Kubernetes: monitoring visualizations, root-cause analysis, recovery playbooks, and K8s topology patterns.

LOGMINERKUBERNETESRECOVERY
02
Debezium Oracle Performance Tuning

Interactive tuning studio that generates optimized configurations for throughput, latency, CPU, and memory profiles, with trade-off analysis for each.

TUNINGTHROUGHPUTLATENCY
02

Debezium with Oracle XStream

1
03
XStream Configuration Explorer

Dashboard-style explorer for Debezium Server with Oracle XStream: connection pooling, JDBC state storage, SMT pipelines, CloudEvents formatting, HTTP sink batching, and retry strategies.

XSTREAMJDBC STATEHTTP SINK
03

Oracle GoldenGate CDC

soon
In development: GoldenGate-specific topics for heterogeneous replication and zero-downtime migration.
04

CDC Patterns & Architecture

1
04
Connector Concurrency & Scaling

XStream vs LogMiner for multiple concurrent readers, same schema vs different schemas, horizontal scaling patterns, and high availability in Kubernetes.

XSTREAMSCALE-OUTHA
05

Kafka & Streaming Infrastructure

soon
Queued: Kafka vs Redpanda · multi-tenant topic strategy · disaster recovery · Strimzi deployments for CDC pipelines.
06

Data Lake & Apache Iceberg

soon
Queued: Debezium to Iceberg configuration · Oracle LOBs in Iceberg · multi-tenant CDC ingestion · Iceberg sink schema design.

What is Change Data Capture?

Change Data Capture (CDC) is a design pattern that tracks row-level changes in a database, inserts, updates, and deletes, and delivers them as a stream of events to downstream systems in real time. Instead of periodically querying for differences, CDC reads the database's transaction log (redo logs in Oracle, WAL in PostgreSQL, binlog in MySQL), capturing every change the moment it's committed.

CDC is the foundation of modern event-driven architectures, enabling real-time data replication, cache invalidation, audit logging, microservice synchronisation, and streaming analytics. In a typical CDC data stack, change events flow from the source database into a message broker, Apache Kafka or Redpanda, and from there into downstream stores such as data lakes built on Apache Iceberg, analytical warehouses, or search indexes. Debezium (open-source) and Oracle GoldenGate (commercial) are the most widely deployed CDC tools for Oracle Database environments.

Technologies Covered

Debezium Server

Open-source CDC platform that streams database changes as events. Runs as a standalone Quarkus application in Kubernetes, supporting Oracle, PostgreSQL, MySQL, SQL Server, and MongoDB.

Oracle LogMiner

Oracle's built-in utility for reading redo logs via SQL. No additional license required. Used by Debezium with the online_catalog or redo_log_catalog strategies.

Oracle XStream

Oracle's high-performance streaming API built on GoldenGate internals. Provides lower latency than LogMiner but requires an Oracle GoldenGate license. Used by Debezium as an alternative capture adapter.

Oracle GoldenGate

Enterprise-grade real-time data integration platform for heterogeneous replication, zero-downtime migrations, and active-active database synchronization across Oracle and non-Oracle targets.

Apache Kafka & Redpanda

The most widely used event streaming platforms for CDC pipelines. Kafka Connect hosts Debezium as a source connector, publishing change events to topics. Redpanda is a Kafka-compatible alternative with no ZooKeeper dependency.

Apache Iceberg

Open table format for large-scale data lakes in object storage (S3, Azure Data Lake, GCS). CDC events flow through Kafka Connect sink connectors into Iceberg tables, enabling real-time analytics on operational Oracle data.

Frequently Asked Questions

What is the difference between Debezium and Oracle GoldenGate?
Debezium is an open-source CDC tool that captures changes from database transaction logs and streams them as events. It runs on the JVM and is typically deployed in Kubernetes. Oracle GoldenGate is a commercial, enterprise-grade replication platform with support for heterogeneous databases, active-active replication, and conflict resolution. Debezium is ideal for event streaming use cases, while GoldenGate excels at database replication and migration scenarios.
Should I use LogMiner or XStream with Debezium for Oracle?
LogMiner is free and requires no additional Oracle licensing, it works by querying redo logs via SQL. XStream provides lower latency and better performance but requires an Oracle GoldenGate license. For most use cases, LogMiner is the recommended starting point. Switch to XStream when you need sub-second latency or are processing very high transaction volumes.
Can Debezium Server run in Kubernetes?
Yes. Debezium Server is a standalone Quarkus application packaged as a container image. It can be deployed as a Kubernetes Deployment or StatefulSet. Key considerations include JVM heap sizing aligned with container memory limits, JDBC-based offset and schema history storage (rather than local files), and proper liveness/readiness probes.
What is an SCN in Oracle CDC?
A System Change Number (SCN) is Oracle's internal transaction ordering mechanism, a monotonically increasing number assigned to every committed transaction. CDC tools like Debezium use the SCN to track their position in the redo logs, ensuring exactly-once processing. "SCN not found" errors typically indicate that the connector has fallen behind and the required redo logs have been archived or deleted.
How do I tune Debezium for high throughput with Oracle?
Key parameters include log.mining.batch.size.default (increase to 100,000+ for larger mining windows), max.queue.size (buffer between source and sink), poll.interval.ms (reduce to minimize idle time), and HTTP sink batch size. Use ZGC for garbage collection on large heaps, and ensure the Kubernetes pod has guaranteed QoS with requests equal to limits.