In this first blog entry, we will talk about the choice we made to use a persistent event bus in our Kill Bill infrastructure, and how it works. Kill Bill is comprised of several core components or sub-systems– for e.g catalog, account, entitlement, invoice, payment, overdue,… — each of which being responsible for a specific set of tasks, and offering an set of well defined APIs. The communication across these components could therefore rely solely on these APIs but this is not optimal because we want sub-systems to be able to react to various changes– for instance the invoice module need to create a new invoice when a subscription transitions out of trial. So in addition to the API, we also made the choice to create an asynchronous persistent event bus, where events are emitted by Kill Bill core sub systems and listeners can be registered by any piece of code running on the platform — whether a Kill Bill core component or a Kill Bill plugin. Our set of requirements were the following:
- Persistency: Posting an event on the bus needs to be made persistent for auditing purpose and for addressing the failure scenario
- Asynchronous delivery: Posting an event on the bus should return as soon as the event has been made persistent; the dispatching to the various listeners should occur outside the context of the thread posting the event.
- Atomicity: Posting an event on the bus should either occur or not occur at all; we also decided to offer an API to post an event from within a database transaction so that the atomicity of the transaction includes the success/failure of posting the event. As an example, when the invoice submodule generates a new invoice it also sends an event on the bus to notify listeners about that invoice. The bus event and the invoice creation are atomic.
- Isolation: When the invoice subsystem generates an invoice, the payment subsystem may need to trigger a payment; the invoice subsystem could notify the payment subsystem through API call, but we would rather have the invoice subsystem generate the invoice and not have to know anything about what happens to that invoice. Posting an event on the bus solves that problem nicely.
- Distributed event bus; Kill Bill can be made of several instances running on different nodes. The events must be seen from the different nodes.
It is also important to state that we are addressing those requirements in the context of a billing system, where the rate of events is fairly low (a few per seconds maybe, but not thousands per seconds). We therefore implemented that bus on top of a transactional database (mysql). We also decided to reuse the guava API which offers a nice way to register listeners using java annotations. The drawing below summarizes the flow:
- A thread (in purple) will post an event to the bus– whether from within or outside a database transaction. The event will be persisted on disk
- Later a set of threads (in orange) will dispatch the event to the various registered listeners.
Our implementation relies on two tables:
- One table where the active list of events reside; the dispatch threads will poll that table periodically, and atomically claim the entries before they dispatch them
- Another table where the processed events reside; Upon success the dispatch threads will move the entries in that table, so that the active table does not grow but we get to keep an audit of what happened. That second table is only there for autiding purpose and could be truncated periodically if the need arose.
The bus guarantees that each event will be delivered but in some rare occasions events could be delivered multiple times, so the handlers from the listeners must be idempotent.
The code has been open sourced and it has been running on production for about a year.