The Hidden Challenges of the Outbox Pattern
A Quick Refresher: What Is the Outbox Pattern?
Consider the following scenario:
Save Order -> Database Success
Publish Event -> Kafka Failure
The order exists in the database, but no downstream service receives the event.
The opposite scenario is equally problematic:
Publish Event -> Kafka Success
Save Order -> Database Failure
Now, other services believe the order exists even though it was never committed.
This is known as the dual-write problem.
The Outbox Pattern addresses this by storing business data and event data within the same database transaction.
+----------------+
| Business Table |
+----------------+
| Outbox Table |
+----------------+
If the transaction commits, both records are persisted. A separate processor then reads pending outbox records and publishes them to Kafka.
This guarantees that events are never lost due to application crashes between database writes and message publication.
However, guaranteeing consistency is only the beginning.
Challenge #1: Duplicate Messages
One of the most overlooked issues is duplicate event delivery.
Consider the following sequence:
Publish Event -> Kafka Success
Update Outbox Status -> Database Failure
The event has already been published successfully, but the outbox record still appears as pending.
During the next polling cycle:
Processor republishes the same event
The consumer receives:
OrderCreated
OrderCreated
The Outbox Pattern provides at-least-once delivery, not exactly-once delivery.
As a result, consumers must be idempotent.
Common approaches include:
Event IDs
Business keys
Deduplication tables
Idempotent state transitions
Never assume an event will be delivered only once.
Challenge #2: Event Ordering
Imagine a product receives three updates:
ProductUpdated V1
ProductUpdated V2
ProductUpdated V3
A poller retrieves all three records.
During processing:
V1 -> Failed
V2 -> Success
V3 -> Success
Consumers may observe:
V2
V3
V1
The final state can become inconsistent.
Possible Solutions
Key-Based Ordering
Group events by aggregate identifier:
product_id
order_id
customer_id
Events sharing the same key are processed sequentially.
Skip Subsequent Events
If an event fails:
V1 -> Failed
then:
Skip V2
Skip V3
until V1 succeeds.
This preserves ordering at the cost of throughput.
Challenge #3: Polling Performance
Polling works well initially.
Eventually, the outbox table becomes large:
SELECT *
FROM outbox_events
WHERE status = 'PENDING'
LIMIT 1000;
Problems start appearing:
Full table scans
Increased I/O
Longer query execution times
Lock contention
Mitigations
Create proper indexes:
CREATE INDEX idx_outbox_status_created_at
ON outbox_events(status, created_at);
Consider partitioning:
PARTITION BY RANGE(created_at);
Without proper indexing, polling quickly becomes a bottleneck.
Challenge #4: Multiple Pollers
A single poller is rarely enough in production.
Suppose two pollers run simultaneously:
Poller A
Poller B
Both select the same record:
Event #123
Result:
Duplicate publication
Solution: SKIP LOCKED
In PostgreSQL:
SELECT *
FROM outbox_events
WHERE status = 'PENDING'
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT 100;
Each poller locks different rows, preventing concurrent processing.
This technique is essential when horizontally scaling Outbox processors.
Challenge #5: Outbox Table Growth
Outbox records accumulate rapidly.
After months of operation:
10 million rows
100 million rows
1 billion rows
Even if records are marked as processed:
status = SENT
they still consume storage and impact query performance.
Common Strategies
Delete
Delete after successful publication
Archive
Move processed records to a history table
Partition
Monthly partitions
Choose based on auditing requirements and retention policies.
Challenge #6: Batch vs Row-by-Row Publishing
There are two common publishing approaches.
Row-by-Row
1000 rows
1000 Kafka sends
Advantages:
Simple
Easier error handling
Disadvantages:
Lower throughput
Batch Publishing
1000 rows
1 batch operation
Advantages:
Higher throughput
Reduced network overhead
Disadvantages:
Partial success scenarios
More complex recovery logic
The right choice depends on your throughput requirements and operational complexity tolerance.
Challenge #7: Payload Storage Design
The outbox payload itself deserves careful consideration.
JSONB
Advantages:
Human-readable
Easy debugging
Queryable
Disadvantages:
Larger storage footprint
Potentially higher serialization overhead
TEXT / LOB
Advantages:
Simpler storage model
Potentially faster reads
Disadvantages:
Harder to inspect manually
Limited querying capabilities
For most business systems, operational visibility often outweighs minor storage savings.
When Outbox Is Not Enough
As throughput increases, polling may become the dominant bottleneck.
At that point, teams often evaluate Change Data Capture (CDC) solutions such as Debezium.
Outbox:
Application
↓
Outbox Table
↓
Poller
↓
Kafka
CDC:
Application
↓
Database
↓
Transaction Log
↓
Debezium
↓
Kafka
CDC removes the need for application-level polling and can significantly improve scalability.
Another alternative is Event Sourcing, where events become the primary source of truth rather than a side effect of state changes.
Final Thoughts
The Outbox Pattern is one of the most practical solutions for addressing the dual-write problem.
However, adopting it means accepting a different set of challenges.
You trade:
Distributed Transaction Complexity
for:
Polling Complexity
Ordering Complexity
Retry Complexity
Cleanup Complexity
Idempotency Complexity
For most systems, that trade-off is absolutely worth it.
The important thing is understanding that the Outbox Pattern is not a silver bullet. It guarantees consistency between your database and your event stream, but building a reliable production-grade implementation requires careful consideration of ordering, retries, scalability, and operational maintenance.