changedatacapture.net How-To Guide

Debezium Oracle Connector Concurrency

Interactive analysis of XStream vs. LogMiner for multiple readers, scaling, and high availability

Verdict Summary

This section provides a high-level summary of whether multiple Debezium connectors can concurrently read from the same Oracle database using XStream or LogMiner. The verdict depends heavily on whether the connectors are targeting the same or different schemas. The table below outlines the support status, rationale, and key risks for each scenario.

Mode Allowed? Why Required Setup Risks
XStream (Same Schema) No Oracle's XStream architecture implies a single client per outbound server. Multiple clients would conflict. N/A (Unsupported pattern) Connection errors, missed data, unpredictable behavior.
XStream (Different Schemas) Yes Supported scale-out pattern where each connector has its own dedicated outbound server. One outbound server per Debezium connector instance. Each must be configured for a different schema/table set. Increased database resource overhead. GoldenGate licensing costs per outbound server.
LogMiner (Same Schema) It Depends Oracle allows multiple LogMiner sessions, but without filtering, it will lead to duplicate change events. Connectors must have disjoint (non-overlapping) table.include.list configurations to partition the workload. High risk of data duplication if table filters overlap or are misconfigured.
LogMiner (Different Schemas) Yes This is the standard and recommended scale-out pattern for LogMiner. Run one Debezium connector per schema. Ensure supplemental logging is enabled for all captured tables. Increased I/O and CPU load on the database server from multiple concurrent LogMiner sessions.

XStream Deep Dive

Oracle XStream provides an API for a client application to receive changes from an Oracle database. The architecture is designed around a dedicated outbound server that streams Logical Change Records (LCRs) to an attached client. This section explores the concurrency constraints and supported patterns for using XStream with Debezium.

1. Concurrent Readers (Same Schema)

It is not supported to have more than one Debezium connector instance attach to the same XStream outbound server. The outbound server maintains a single stream position for its client. Two clients attempting to read and acknowledge LCRs from the same stream would lead to a race condition, where one client's progress would cause the other to miss data.

Verdict: Unsupported Configuration

Unsupported Architecture:

Debezium #1
Debezium #2
CONFLICT
XStream Outbound Server
Oracle DB
"An outbound server in an XStream Out configuration streams Oracle database changes to a client application. The client application attaches to the outbound server..." - Oracle Streams Documentation (implies a single client-server relationship)

2. Multiple Schemas (Scale-Out Pattern)

The correct and supported way to capture data from multiple schemas (or shard a single large schema) using XStream is to deploy multiple, independent Debezium connectors. Each connector must be configured with its own dedicated XStream outbound server. This ensures isolation and prevents any conflicts.

Verdict: Supported Configuration

Supported Architecture:

Debezium #1
Outbound Server A (Schema A)
Oracle DB
Debezium #2
Outbound Server B (Schema B)

Configuration & Licensing Notes:

  • Each Debezium instance requires a unique database.out.server.name.
  • Each outbound server must be created and configured separately within the Oracle database.
  • Using Oracle XStream requires a GoldenGate license. Each outbound server may have licensing implications.
  • This pattern increases resource consumption (CPU, memory) on the database server.

LogMiner Deep Dive

The LogMiner strategy uses Oracle's built-in LogMiner utility to directly query the redo and archive logs for changes. This approach is more flexible regarding concurrent sessions but introduces a critical risk of data duplication if not configured carefully. This section details the supported patterns and necessary configurations.

1. Concurrent Readers (Same Schema)

Oracle allows multiple, concurrent LogMiner sessions against the same database logs. If two Debezium connectors are configured to capture the same schema without any further filtering, both will read the same changes and produce duplicate events downstream. This pattern is only viable if the workload is explicitly partitioned.

Verdict: Supported with caution (requires partitioning)

Configuration Notes for Partitioning:

To prevent duplicates, each connector must have a mutually exclusive set of tables.

Connector 1 Config:

table.include.list=SCHEMA.TABLE_A,SCHEMA.TABLE_B

Connector 2 Config:

table.include.list=SCHEMA.TABLE_C,SCHEMA.TABLE_D

Risk: Any overlap in the table.include.list will result in duplicate change events.

2. Multiple Schemas (Scale-Out Pattern)

Running multiple Debezium connectors where each captures a different schema is a standard, supported, and recommended scale-out pattern for LogMiner. Each connector runs an independent LogMiner session, isolating its work to the specified schema.

Verdict: Recommended Configuration

Supported Architecture:

Debezium #1
(Schema A)
Debezium #2
(Schema B)
Oracle DB
(LogMiner Sessions)
"If two connectors are configured to capture changes from different tables or schemas within the same database, they can operate concurrently without interference. Each connector would only read the log entries relevant to the tables it is configured to observe." - Debezium Documentation Principles

Performance & Setup Considerations:

  • Supplemental logging must be enabled on the source database for all tables being captured by any connector.
  • Each active LogMiner session adds CPU and I/O overhead. Monitor database performance closely as you add more connectors.
  • Ensure proper sizing of redo logs and archive log retention policies to support the combined read activity.

Scaling & High Availability (HA)

Understanding how Debezium scales is crucial for a robust deployment. The Debezium Oracle connector is a single-task connector, which has specific implications for how High Availability (HA) and horizontal scaling are achieved in an environment like Kubernetes.

Connector Parallelism: Single Task Only

The Debezium Oracle Connector does not support internal parallelism. Setting the Kafka Connect tasks.max property to a value greater than 1 will have no effect. The connector will always run as a single task.

"The spec.class names the Debezium...connector and spec.tasksMax must be 1 because that's all this connector ever uses." - Strimzi Blog (Debezium Deployment Guide)

HA vs. Scale-Out

It is vital to distinguish between High Availability and scaling out the workload.

High Availability (HA)

Achieved by running a single instance with a restart policy. If it fails, it comes back up and resumes.

Debezium Server
(replicas: 1)
Restart on Failure

Provides: Fault Tolerance

Horizontal Scale-Out

Achieved by running multiple, independent instances, each with a partitioned workload.

Connector A
(Schema A)
Connector B
(Schema B)

Provides: Increased Throughput

Recommended Scaling Pattern

The only way to horizontally scale the capture process is to deploy multiple, independent Debezium Server (or Kafka Connect) instances. Each instance should be configured to handle a disjoint part of the total workload.

Implementation Strategy:

  • Deploy multiple Debezium Server resources in Kubernetes.
  • Each deployment should have replicas: 1 for HA.
  • Assign a specific schema or a non-overlapping list of tables to each deployment.
    • Use table.include.list or schema.include.list to partition the work.
  • This creates multiple parallel streams, increasing overall throughput.