The Hidden Challenges of the Outbox Pattern

A Quick Refresher: What Is the Outbox Pattern?

Consider the following scenario:

Save Order -> Database Success
Publish Event -> Kafka Failure

The order exists in the database, but no downstream service receives the event.

The opposite scenario is equally problematic:

Publish Event -> Kafka Success
Save Order -> Database Failure

Now, other services believe the order exists even though it was never committed.

This is known as the dual-write problem.

The Outbox Pattern addresses this by storing business data and event data within the same database transaction.

+----------------+
| Business Table |
+----------------+
| Outbox Table   |
+----------------+

If the transaction commits, both records are persisted. A separate processor then reads pending outbox records and publishes them to Kafka.

This guarantees that events are never lost due to application crashes between database writes and message publication.

However, guaranteeing consistency is only the beginning.

Challenge #1: Duplicate Messages

One of the most overlooked issues is duplicate event delivery.

Consider the following sequence:

Publish Event -> Kafka Success
Update Outbox Status -> Database Failure

The event has already been published successfully, but the outbox record still appears as pending.

During the next polling cycle:

Processor republishes the same event

The consumer receives:

OrderCreated
OrderCreated

The Outbox Pattern provides at-least-once delivery, not exactly-once delivery.

As a result, consumers must be idempotent.

Common approaches include:

Event IDs
Business keys
Deduplication tables
Idempotent state transitions

Never assume an event will be delivered only once.

Challenge #2: Event Ordering

Imagine a product receives three updates:

ProductUpdated V1
ProductUpdated V2
ProductUpdated V3

A poller retrieves all three records.

During processing:

V1 -> Failed
V2 -> Success
V3 -> Success

Consumers may observe:

V2
V3
V1

The final state can become inconsistent.

Possible Solutions

Key-Based Ordering

Group events by aggregate identifier:

product_id
order_id
customer_id

Events sharing the same key are processed sequentially.

Skip Subsequent Events

If an event fails:

V1 -> Failed

then:

Skip V2
Skip V3

until V1 succeeds.

This preserves ordering at the cost of throughput.

Challenge #3: Polling Performance

Polling works well initially.

Eventually, the outbox table becomes large:

SELECT *
FROM outbox_events
WHERE status = 'PENDING'
LIMIT 1000;

Problems start appearing:

Full table scans
Increased I/O
Longer query execution times
Lock contention

Mitigations

Create proper indexes:

CREATE INDEX idx_outbox_status_created_at
ON outbox_events(status, created_at);

Consider partitioning:

PARTITION BY RANGE(created_at);

Without proper indexing, polling quickly becomes a bottleneck.

Challenge #4: Multiple Pollers

A single poller is rarely enough in production.

Suppose two pollers run simultaneously:

Poller A
Poller B

Both select the same record:

Event #123

Result:

Duplicate publication

Solution: SKIP LOCKED

In PostgreSQL:

SELECT *
FROM outbox_events
WHERE status = 'PENDING'
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT 100;

Each poller locks different rows, preventing concurrent processing.

This technique is essential when horizontally scaling Outbox processors.

Challenge #5: Outbox Table Growth

Outbox records accumulate rapidly.

After months of operation:

10 million rows
100 million rows
1 billion rows

Even if records are marked as processed:

status = SENT

they still consume storage and impact query performance.

Common Strategies

Delete

Delete after successful publication

Partition

Monthly partitions

Choose based on auditing requirements and retention policies.

Challenge #6: Batch vs Row-by-Row Publishing

There are two common publishing approaches.

Row-by-Row

1000 rows
1000 Kafka sends

Advantages:

Simple
Easier error handling

Disadvantages:

Lower throughput

Batch Publishing

1000 rows
1 batch operation

Advantages:

Higher throughput
Reduced network overhead

Disadvantages:

Partial success scenarios
More complex recovery logic

The right choice depends on your throughput requirements and operational complexity tolerance.

Challenge #7: Payload Storage Design

The outbox payload itself deserves careful consideration.

JSONB

Advantages:

Human-readable
Easy debugging
Queryable

Disadvantages:

Larger storage footprint
Potentially higher serialization overhead

TEXT / LOB

Advantages:

Simpler storage model
Potentially faster reads

Disadvantages:

Harder to inspect manually
Limited querying capabilities

For most business systems, operational visibility often outweighs minor storage savings.

When Outbox Is Not Enough

As throughput increases, polling may become the dominant bottleneck.

At that point, teams often evaluate Change Data Capture (CDC) solutions such as Debezium.

Outbox:

Application
    ↓
Outbox Table
    ↓
Poller
    ↓
Kafka

CDC:

Application
    ↓
Database
    ↓
Transaction Log
    ↓
Debezium
    ↓
Kafka

CDC removes the need for application-level polling and can significantly improve scalability.

Another alternative is Event Sourcing, where events become the primary source of truth rather than a side effect of state changes.

Final Thoughts

The Outbox Pattern is one of the most practical solutions for addressing the dual-write problem.

However, adopting it means accepting a different set of challenges.

You trade:

Distributed Transaction Complexity

for:

Polling Complexity
Ordering Complexity
Retry Complexity
Cleanup Complexity
Idempotency Complexity

For most systems, that trade-off is absolutely worth it.

The important thing is understanding that the Outbox Pattern is not a silver bullet. It guarantees consistency between your database and your event stream, but building a reliable production-grade implementation requires careful consideration of ordering, retries, scalability, and operational maintenance.

A Quick Refresher: What Is the Outbox Pattern?

Consider the following scenario:

Save Order -> Database Success
Publish Event -> Kafka Failure

The order exists in the database, but no downstream service receives the event.

The opposite scenario is equally problematic:

Publish Event -> Kafka Success
Save Order -> Database Failure

Now, other services believe the order exists even though it was never committed.

This is known as the dual-write problem.

The Outbox Pattern addresses this by storing business data and event data within the same database transaction.

+----------------+
| Business Table |
+----------------+
| Outbox Table   |
+----------------+

If the transaction commits, both records are persisted. A separate processor then reads pending outbox records and publishes them to Kafka.

This guarantees that events are never lost due to application crashes between database writes and message publication.

However, guaranteeing consistency is only the beginning.

Challenge #1: Duplicate Messages

One of the most overlooked issues is duplicate event delivery.

Consider the following sequence:

Publish Event -> Kafka Success
Update Outbox Status -> Database Failure

The event has already been published successfully, but the outbox record still appears as pending.

During the next polling cycle:

Processor republishes the same event

The consumer receives:

OrderCreated
OrderCreated

The Outbox Pattern provides at-least-once delivery, not exactly-once delivery.

As a result, consumers must be idempotent.

Common approaches include:

Event IDs
Business keys
Deduplication tables
Idempotent state transitions

Never assume an event will be delivered only once.

Challenge #2: Event Ordering

Imagine a product receives three updates:

ProductUpdated V1
ProductUpdated V2
ProductUpdated V3

A poller retrieves all three records.

During processing:

V1 -> Failed
V2 -> Success
V3 -> Success

Consumers may observe:

V2
V3
V1

The final state can become inconsistent.

Possible Solutions

Key-Based Ordering

Group events by aggregate identifier:

product_id
order_id
customer_id

Events sharing the same key are processed sequentially.

Skip Subsequent Events

If an event fails:

V1 -> Failed

then:

Skip V2
Skip V3

until V1 succeeds.

This preserves ordering at the cost of throughput.

Challenge #3: Polling Performance

Polling works well initially.

Eventually, the outbox table becomes large:

SELECT *
FROM outbox_events
WHERE status = 'PENDING'
LIMIT 1000;

Problems start appearing:

Full table scans
Increased I/O
Longer query execution times
Lock contention

Mitigations

Create proper indexes:

CREATE INDEX idx_outbox_status_created_at
ON outbox_events(status, created_at);

Consider partitioning:

PARTITION BY RANGE(created_at);

Without proper indexing, polling quickly becomes a bottleneck.

Challenge #4: Multiple Pollers

A single poller is rarely enough in production.

Suppose two pollers run simultaneously:

Poller A
Poller B

Both select the same record:

Event #123

Result:

Duplicate publication

Solution: SKIP LOCKED

In PostgreSQL:

SELECT *
FROM outbox_events
WHERE status = 'PENDING'
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT 100;

Each poller locks different rows, preventing concurrent processing.

This technique is essential when horizontally scaling Outbox processors.

Challenge #5: Outbox Table Growth

Outbox records accumulate rapidly.

After months of operation:

10 million rows
100 million rows
1 billion rows

Even if records are marked as processed:

status = SENT

they still consume storage and impact query performance.

Common Strategies

Delete

Delete after successful publication

Partition

Monthly partitions

Choose based on auditing requirements and retention policies.

Challenge #6: Batch vs Row-by-Row Publishing

There are two common publishing approaches.

Row-by-Row

1000 rows
1000 Kafka sends

Advantages:

Simple
Easier error handling

Disadvantages:

Lower throughput

Batch Publishing

1000 rows
1 batch operation

Advantages:

Higher throughput
Reduced network overhead

Disadvantages:

Partial success scenarios
More complex recovery logic

The right choice depends on your throughput requirements and operational complexity tolerance.

Challenge #7: Payload Storage Design

The outbox payload itself deserves careful consideration.

JSONB

Advantages:

Human-readable
Easy debugging
Queryable

Disadvantages:

Larger storage footprint
Potentially higher serialization overhead

TEXT / LOB

Advantages:

Simpler storage model
Potentially faster reads

Disadvantages:

Harder to inspect manually
Limited querying capabilities

For most business systems, operational visibility often outweighs minor storage savings.

When Outbox Is Not Enough

As throughput increases, polling may become the dominant bottleneck.

At that point, teams often evaluate Change Data Capture (CDC) solutions such as Debezium.

Outbox:

Application
    ↓
Outbox Table
    ↓
Poller
    ↓
Kafka

CDC:

Application
    ↓
Database
    ↓
Transaction Log
    ↓
Debezium
    ↓
Kafka

CDC removes the need for application-level polling and can significantly improve scalability.

Another alternative is Event Sourcing, where events become the primary source of truth rather than a side effect of state changes.

Final Thoughts

The Outbox Pattern is one of the most practical solutions for addressing the dual-write problem.

However, adopting it means accepting a different set of challenges.

You trade:

Distributed Transaction Complexity

for:

Polling Complexity
Ordering Complexity
Retry Complexity
Cleanup Complexity
Idempotency Complexity

For most systems, that trade-off is absolutely worth it.

The Hidden Challenges of the Outbox Pattern

A Quick Refresher: What Is the Outbox Pattern?

Challenge #1: Duplicate Messages

Challenge #2: Event Ordering

Possible Solutions

Key-Based Ordering

Skip Subsequent Events

Challenge #3: Polling Performance

Mitigations

Challenge #4: Multiple Pollers

Solution: SKIP LOCKED

Challenge #5: Outbox Table Growth

Common Strategies

Delete

Archive

Partition

Challenge #6: Batch vs Row-by-Row Publishing

Row-by-Row

Batch Publishing

Challenge #7: Payload Storage Design

JSONB

TEXT / LOB

When Outbox Is Not Enough

Final Thoughts

The Hidden Challenges of the Outbox Pattern

A Quick Refresher: What Is the Outbox Pattern?

Challenge #1: Duplicate Messages

Challenge #2: Event Ordering

Possible Solutions

Key-Based Ordering

Skip Subsequent Events

Challenge #3: Polling Performance

Mitigations

Challenge #4: Multiple Pollers

Solution: SKIP LOCKED

Challenge #5: Outbox Table Growth

Common Strategies

Delete

Archive

Partition

Challenge #6: Batch vs Row-by-Row Publishing

Row-by-Row

Batch Publishing

Challenge #7: Payload Storage Design

JSONB

TEXT / LOB

When Outbox Is Not Enough

Final Thoughts