Optimizing Debezium Server for Oracle

A comprehensive research report on tuning Debezium Server (Kubernetes Standalone) connecting Oracle to HTTP Sinks. Focusing on Online Log Mining strategy.

Scenario

Source: Oracle DB
Strategy: Online Log Mining
Sink: HTTP / Custom Batch

Primary Goal

Balancing Throughput vs. Latency within resource constraints (K8s limits).

Critical Parameter

debezium.source.max.batch.size
Determines memory pressure & throughput.

Why this matters?

Debezium Server is strictly a change data capture (CDC) engine. Unlike Kafka Connect, it runs as a standalone Java process. When using HTTP Sinks, network round-trips become the bottleneck. Tuning the read buffer (Oracle) and the write buffer (HTTP Batch) is essential to prevent Backpressure or OutOfMemory errors in Kubernetes.

Configuration Tuning Studio

Select an optimization profile to generate the optimal configuration and analyze trade-offs.

Select Profile

Trade-off Visualizer

Relative performance impact estimation.

⚙ Generated `application.properties`

High Throughput

# Core Engine
debezium.source.max.batch.size=2048
debezium.source.max.queue.size=16384
debezium.source.poll.interval.ms=500

# Oracle Specific
debezium.source.log.mining.strategy=online_catalog
debezium.source.log.mining.batch.size.default=2000
debezium.source.log.mining.batch.size.max=10000

# Sink (Custom Batch)
debezium.sink.type=http-batch
debezium.sink.http.batch.size=100
debezium.sink.http.url=http://sink-service:8080/batch

# Java/K8s Hints
# JAVA_OPTS="-Xms1G -Xmx2G"

Rationale

Maximizes the amount of data processed in each cycle. Large batch sizes reduce the overhead of network round-trips to the HTTP sink and reduce the frequency of context switching in the connector loop.

Critical Settings Explained

max.batch.size (2048): Reads large chunks from Oracle LogMiner at once.
http.batch.size (100): Utilizes the maximum capacity of your custom sink to minimize HTTP POST overhead.
poll.interval.ms (500): Frequent enough to keep busy, but allows buffers to fill.

Technical Deep Dive

Source: Oracle Log Mining

The online_catalog strategy tells Debezium to use the database's current data dictionary. This is generally faster for startup but requires the connector to be tightly coupled to the DB state.

Polling vs. Streaming

Although Oracle LogMiner reads logs, Debezium "polls" the LogMiner view. The poll.interval.ms does not control how often we read from the DB, but how often the connector asks the Debezium engine for a new batch. The real work happens in the log.mining.batch.size.* parameters.

Batch Size Impact

⚡

Custom HTTP Batch Sink

A standard HTTP sink sends one POST request per event. This kills throughput. Your custom batch sink (up to 100 events) is critical for performance.

✓ High Throughput: Fill the batch (100). Wait for it.
✓ Low Latency: Reduce batch size (10-20) or add a time-flush (e.g., send batch after 50ms even if not full).

⚠

Resource Constraints

Low Memory: The internal queue max.queue.size is the biggest memory hog. If the HTTP sink is slow, this queue fills up. Reduce it to avoid OOM.

Low CPU: Parsing Oracle Redo Logs is CPU intensive. Increasing poll.interval.ms gives the CPU "breathing room" between batches, reducing context switching, but increases latency.

Debezium Oracle Performance Tuning