Mechanics of durable subscription
There are still three spots left on Greg Young’s Advanced Class in Krakow (February 6th-8th). Go register here before they are gone.
Today I’d like to talk a bit about the mechanics of a robust durable subscription system. First, what is a durable subscription? Here’s a description from Enterprise Integration Patterns (I really recommend this book):
Use a Durable Subscriber to make the messaging system save messages published while the subscriber is disconnected.
The goal is not to loose messages while subscriber is offline (for whatever reason). There are many possible implementations based on the specifics of a messaging system we are working with. The main distinction between them is who, publisher or subscriber, is responsible for maintaining the state of the subscription.
In theory, the mechanics are simple, no matter who’s responsible for the state. In case subscriber goes down, the state is simply not updated while new messages are generated (and stored). When subscriber is up again the publisher picks up where he left based on the state and sends all the missed messages. In case where publisher is responsible the only additional step is an acknowledgement that has to be sent to publisher every N messages (where N depends on how much you don’t like duplicates).
In practice however, many messaging platforms (including EventStore which is recently my favourite) offer two distinct APIs for implementing subscriptions:
- push API where a client (subscriber) opens a persistent channel (TCP connection) via which a server (publisher) sends messages immediately as they are generated
- pull API where a client (subscriber) periodically polls a server (publisher) asking if there are any messages newer than X where X is the current state of the subscription
The advantage of push API is, it is usually a lot faster (less effort on publisher side and minimum latency). The problem is, it works only when both parties are up and running and there is no lag. If the subscriber processes messages slower than the publisher generates, them sooner or later buffers on the publisher side become too big to fit in the memory. EventStore is smart enough to drop the subscription automatically in such case.
There is no such problem with the pull API because the client requests messages in its own pace and can even crash every now and then without the publisher ever noticing. This comes with a price however. Pull API is inherently slower (meaning higher latency) and it requires additional reads on the publisher side.
As usual, the best solution combines both approaches. The idea is simple: when the subscriber is up to date with the message stream, it switches to push API and whenever it is not — switches back to pull. The actual working algorithm is slightly more complex:
- The subscriber always starts with pull assuming there were some messages generated while it was offline
- The subscriber pulls messages until there’s nothing left to pull (it is up to date with the stream)
- Push subscription is started but arriving messages are not processed immediately but temporarily redirected to a buffer
- One last pull is done to ensure nothing happened between step 2 and 3
- Messages from this last pull are processed
- Processing messages from push buffer is started. While messages are processed, they are checked against IDs of messages processed in step 5 to ensure there’s no duplicates.
- System works in push model until subscriber is killed or subscription is dropped by publisher drops push subscription.
This algorithm is the single most important thing in my new project.