As a follow-up to our Payments Tech Talk at Groupon, I would like to go in more details over the performance work we have been working on and what you can expect in the next release.
Kill Bill is not designed to handle tens of thousands of requests per second. You won’t have that type of traffic, even if you are a large company. Think about it: customers don’t often create or change their subscriptions. Even in an e-commerce shopping cart scenario, the rate of checkout is fairly low: 10 payments per second of $10 already results in $8M per day or over $3B per year.
What we care about though is being able to support bursts: while you might typically handle 10 payments per minute, or even per hour, a limited-time promotion might result in 50 or 100 transactions per second for a period of a day or two. Additionally, we want to minimize latency. While the lower bound will eventually be dictated by the gateway, we want to optimize the overhead added by Kill Bill as it creates audit logs, checks the overdue status, moves the payment to the next state in its automaton, etc.
To do so, we regularly load test scenarii such as the following: create a new account, add a payment method and trigger a payment. This would simulate a new user going to your site and proceeding to the checkout page, which is what would typically happen during peak demand as described above. During and after the test, we extract various metrics. First, we look at data from the Kill Bill profiling framework, which gives us high-level insights on how much time is spent at the API, DAO, and plugin(s) layers. We also investigate at a lower granularity the rate of cache misses, time spent checking-out database connections, JVM health, etc.
Last year, we spent a few months before the 0.12.x release to analyze this data and we identified a few bottlenecks, especially around the JRuby stack. This is when we reached out to Karol Buček. Besides identifying and fixing issues in JRuby and Kill Bill, he improved his activerecord-bogacs gem to bypass the ActiveRecord pool and let us directly use the HikariCP OSGI pool that Kill Bill exports, which lead to better system stability and simplified production monitoring.
As we near the release of 0.14.x, we are currently integrating all of his changes (you can follow along here). We will likely wait for the release of activerecord-jdbc-adapter 1.4 and JRuby 1.7.20, as well as the merge of a couple of pull requests in
jdbi and Logback, before finalizing 0.14.0.
It is still too early for us to share graphs and numbers on the improvements you will get by upgrading to 0.14.0. I’m hoping we’ll be able to post an update in the next few months. In the meantime, you can take a look at Karol’s current findings here, as he has been running thousands of tests on EC2, using various Kill Bill and AWS configurations.
While we do our best to optimize the Kill Bill platform, it is ultimately your responsibility though to load test your Kill Bill deployment before going to production, as your hardware, set of plugins and overall configuration will greatly influence the numbers. In the same way we provide you with an integration test framework to verify the correctness of invoices and payments for your specific catalog configuration, we provide a load testing framework to test your workload. And it is even integrated with flood.io, so you can easily test it on a grid in the cloud.