OuroLast couple of days I’ve been digging deeper and deeper into the EventStore,  a database for storing event streams. I want to share some of my findings starting with the client API. First, there is no dualism of *Session and *SessionFactory. This means that all the assumptions you would have after working with NHibernate and RavenDB might not be true. In EventStore there is only one façade class in the client API — EventStoreConnection.

Thread safety

The connection is thread-safe meaning you can share is between all threads in the application. It is not surprising as all this class does is sending commands to the server, receiving the responses and pairing up requests and responses to deliver the results to the proper thread. At this point you may ask about the exceptions coming from the connection. All NHibernate users know that after ISession throws an exceptions, it is no longer usable as its internal state may be corrupted. Fortunately that’s not the case with EventStore client API. Otherwise it would be a nightmare to use from multiple threads.

Persistence

Another nice thing about EventStoreConnection is the fact that you can tell it to be persistent i.e. try reconnecting to the server forever. You can work on the very same instance of the connection for weeks and during that time you may have periods when server is unavailable. The connection class will automatically reconnect when server is back online. What happens with the calls made during the time server was offline? I’ll tell you but first let me tell you how EventStoreConnection works internally.

Internal structure

The algorithm used by EventStoreConnection to execute client’s requests is following:

  1. A client calls the API method on the connection.
  2. The connection creates an object representing the invoked operations e.g. DeleteStreamOperation.
  3. The operation object is enqueued to be executed. If the queue size limit is reached, client is blocked till there is free slot in the queue.
  4. If the API call was non-blocking (-Async suffix) the control is returned to the client at this point.
  5. A worker thread of the connection picks up the queued item and, if work in progress limit allows, it sends the command to the server and marks the request as in progress.
  6. A response package arrives on the TCP connection and is matched (using a correlation ID) with the in progress request.
  7. The result of the call is delivered to the client.

There is a number of things that can go wrong in this nice flow. If the TCP connection breaks, the worker thread tries to re-establish it before proceeding with sending commands to the server. In the meantime, the items in the input queue and that are in  progress are waiting. For the items in progress the connection maintains the time the request was sent to server. The worker thread periodically checks these times against configured time out and, if necessary, it fails (ends with an exception) those timed out requests. So, the answer to the question what happens with the calls made during the time server was offline is, they will eventually time out and end with nasty exception.

An interesting question is, what happens when reconnection limit is reached and the server is still down. The original code was just throwing an exception from worker thread which unfortunately resulted in process being killed because nobody could catch it.  The behaviour I proposed through this pull request is to close/dispose the connection in such case which will effectively render it unusable. A closed connection can not be re-open. Feel free to discuss the approach in comments for the pull request but keep in mind that this behaviour is for the scenarios where for some reason you want to immediately know if connection is broken. Sharing a connection between threads in such case might be difficult or impossible.

VN:F [1.9.22_1171]
Rating: 5.0/5 (3 votes cast)