The CDC data flow

Select a component to understand its role and failure modes in the Debezium Oracle pipeline.

Oracle Database (LGWR)

The primary operational database. High-frequency log switches force the LGWR (Log Writer) and Checkpoint processes to work overtime, temporarily freezing database operations and increasing I/O latency. This directly affects the stability of the redo log stream that Debezium reads.

Key configuration parameters

Parameter	Target	Rationale
Switch frequency	4-6 per hour at peak	Avoids checkpoint stalls; manageable crash recovery time
Log file size	300MB - 1GB	Derived from switch frequency target; tune per workload
Log groups	3-5 minimum	Prevents archiver stalls during log switches
Members per group	2 minimum	Eliminates single-disk failure as outage cause
log.mining.strategy	online_catalog	No dictionary flush redo; simpler operations
Archive retention	24-48 hours peak redo	Covers typical connector downtime without SCN not found

System performance simulation

Database I/O latency and CDC extraction lag over 60 minutes. Target: log switch every 15-30 minutes.

Frequent high-latency spikes represent checkpoint waits (Log File Sync) every 5 minutes. Each spike delays the redo stream that Debezium reads.

Checkpoint not complete: what it looks like

When redo logs fill too fast, Oracle's LGWR must wait for DBWR to flush dirty buffers before switching. You'll see this in the alert log:

Thread 1 cannot allocate new log, sequence 892
Checkpoint not complete
  Current log# 3 seq# 892 mem# 0: /oracle/redo/redo3.log
  Current log# 3 seq# 892 mem# 1: /oracle/redo/redo3b.log

Increasing log file size from 100MB to 500MB and adding more groups eliminates these waits on most high-volume databases. Monitor V$ARCHIVED_LOG to verify your switch frequency after resizing.

The most critical failure in an Oracle CDC pipeline is the SCN not found error. Debezium's last-known System Change Number (SCN) is no longer present in the available redo or archive logs. Two resolution paths exist.

Root causes

Short retention: Archive log policy purges logs before Debezium can read them after downtime.
Long downtime: Connector was offline longer than the retention window.
Stale SCN: Monitored tables have low activity; the database SCN advances past the connector's last offset.
Log relocation: DBAs moved archive logs to a destination not configured in log.mining.archive.destination.name.

Resolution path 1: restore the log

Zero data loss. Preferred approach.

Identify the missing SCN from Kafka Connect logs.
DBA identifies the specific archive log sequence containing that SCN.
DBA restores the archive log from backup to the original configured destination.
Restart the Debezium connector.

Resolution path 2: re-snapshot

Use only when logs cannot be restored. This causes downtime and may produce duplicate events downstream.

Step 1

Stop the Debezium connector.

Step 2

Delete the connector's offset and schema history topics in Kafka.

Step 3

Restart. Debezium performs a fresh initial snapshot from the current SCN.

Prevention checklist

✓ Size redo logs for 4-6 switches per hour (not 15-30 min per switch — same thing, different phrasing).

✓ Set archive log retention to at least 24-48 hours of peak redo generation.

✓ Configure log.mining.strategy=online_catalog to avoid dictionary flush overhead.

✓ Set heartbeat.interval.ms=300000 to prevent stale SCN on low-traffic databases.

✓ Enable supplemental logging: ALL COLUMNS at the database level.

✓ Monitor V$ARCHIVED_LOG switch frequency after any log resizing.

Oracle Redo Log Sizing for Debezium LogMiner