I am currently working on a project in ThoughtWorks that required a lot of operations work to be done. Translating that to English, we wrote a lot of code that needed to be shared by an online application and a set of background jobs. Not to mentions that a large part of these operations were speaking to REST APIs. Not rocket science right? So Here is what we did.
The online application was written in a conventional MVC where some of the models made REST api calls(and you though they were all ActiveRecord?), parsed and returned results that the controllers could understand.
The jobs used these models in exactly the same way the controllers did except that these processes were off-course offline and each job invoked the model methods 1000s of times.
At the end of the project we came to a point where we wanted to parallelize some of these jobs to get them to run faster and it was clear quite immediately that the blocking calls made to the REST APIs were the biggest bottlenecks. For starters, parallelizing these calls could save us time. These were the factors that had to be addressed as part of this:
- All calls were being made from inside the models that were clearly unaware (in a good way) of whether its methods were being invoked by (online) controllers or by (offline) jobs. What this meant was that a lot of code that was written in the model was tuned to being invoked from a request-response pattern following system. This meant that there was no effort to keep the model’s code thread safe as like most conventional request-response patterns. What did this mean? A good amount of refactoring to write thread-safe code.
- Some of the most common xml-parsing libraries in Ruby (the likes of Nokogiri and HappyMapper) are not thread-safe.
- Ruby is not really the king of concurrency. Threads in ruby are provided to make parts of your code concurrent. This concurrency is unfortunately not pushed to the operating system but is instead handled by the ruby interpretter. What this means is that a threaded application can never really directly benefit from increasing the number of cores on your processor since these are run as sub-procs of your ruby-interpretter process. This is a topic that requires much larger a debate and I am not going to address it as part of this blog. Lets assume for now that Ruby’s concurrency is not so cool!
For me no 2 was the most compelling reason to rethink our strategy. There was no way I am going to write a thread-safe xml parsing library of my own (in the interest of time and personal-ability combined).
I remembered a chat with Mark Needham where we were speaking about pushing the concurrency out of `your` system. The riskiness involved in refactoring, replacing the core libraries in the system in the last few days before release clearly suggested that pushing the concurrency out of the system would win it for us. Here is what we did!
Before the jobs are run, we made a quick computation of all the rest urls that were going to be hit during the course of running the job. We hit these from the job with the help of some asynchronous libraries like typhoeus. The calls were not only asynchronous, but we also ensured that the rest of the job would not wait for the responses of the jobs to come in but would simply continue.
What we also did was to put a forward proxy (Squid FP) caching all responses between out system and the rest services. What this actually did for us was that, once the calls were bundled and made asynchronously, all calls would simply get cached at the FP that would handle the asynchronous nature of the http responses. And the jobs would simply continue running hoping that when the calls are made the equivalent calls have already preemptively been fetched into the Forward Proxy. Improved our job performance by over 100%. Clearly a win as a result of pushing the concurrency onto squid and out of our system. #win :)
Off course, from an OO perspective, we did remove the cohesiveness of the Rest API calls from the models. But by inserting a decorator in between the models and the jobs/controllers, a clear strategy could be devised to avoid this cohesion removal. #refactoring