Scaling Rails at Melbourne RORO

So last night I did a talk on scaling rails at the Melbourne ruby meetup, and a few people wanted copies of my slides. Normally I would just link off to slideshare, but I wanted to write a few words to elaborate on my slides. A couple of them aren’t very clear to read, and because the talk was a bit of a freewheeling rant, they weren’t very clear as I presented them either.
I would have posted this on the work blog, but this is more of an off the cuff personal musing than a solid Goodfilms tech post.
I’ve embedded the slides below, read those, then I’ll quickly elaborate on a few points.
I briefly touch on coupling being the #1 enemy. I think this is true in scaling for traffic, and scaling your codebase. This is where the risk of veering into my “entire universe of software development rant”, but I’ll quickly explain why it’s #1 on my hit list.
In my model of the universe, all good Rails developers fall somewhere on a scale between Avdi Grimm/Gary Bernhardt and DHH. At the Avdi/Gary end, you’ve got a lot of good, proper OO following SOLID principles, and at the DHH end you have highly productive pragmatic law of demeter trainwreck violators.
I like to have a team with both sorts working on it, or better yet, find individuals who can work at both ends of the scale, as it gives you a lot more flexibility to deal with business challenges as they come up. The only safe way to mix the approaches is by militantly keeping coupling low. It lets you have a module over here that is in a messy MVP stage, while a module over there is core business and looks really nice.
Always follow the Single Responsibility Principle (SRP), but sometimes the single responsibility is “throw this on the wall and see if it sticks”, and sometimes it is “do this very specific task”. Any module should have one, and only one reason to change, and that change should be in one place. Sometime that reason is “we have changed a specific business rule”, and sometimes it is “shit-can that whole module because users didn’t like it”.
There’s a lot more to my point of view than this, but that’s enough to get us back onto servers and the rails stack.
I’m a big believer in SRP for servers as well as classes. This server is for web traffic, and web traffic only. This server is for the database, this server is for emails, this server is for uploads.
Following SRP with your server setup makes for much simpler troubleshooting and change control. A server should have one, and only one reason to change. You only upgrade your mail server if you need to serve more mail. You upgrade your database because you are storing more data.
So that’s the kind of OO end of ops. At the other end, there’s the shared hosting/single VPS keep things simple and cheap in one place approach.
Once again, I like a mix of both, but you do need to be careful about coupling, and what you allow to mix together, and what is important to separate early.
The one load balancer, two rails servers, and one database plus frequent backups is what I recommend as a good balance between those two points for any app that has just come out of MVP, knows it is going to grow in future, still unsure of how quickly and how much revenue it will have to pay for hosting, and has a higher write load than a blog/news site (so maybe something like ecommerce, marketplace, or something with user generated content).
The load balancer: Amazon and Rackspace have as far as I’m concerned, functionally equivalent offerings, so use one of those and you get the benefits of outsourcing the complexity and high availability, without the vendor locking.
The database: scaling databases is kind of “solved”. It’s not to say it’s easy, but you’ve got plenty of options both vertically (bigger servers) and horizontally (more servers). By keeping this off your app servers, you are free to solve this problem independently of your rails app, or even pay an expert to do it for you.
A proper HA environment will have two databases replicating for failover, as as you’re deployed to the cloud, you will need to failover. The problem is that replicated environments are more complex, and complexity is a killer worse than the odd error page on your web app in the early stages. Quick cloud provisioning, plus automated setup, plus frequent backups will give you OK reliability for your budget and headaches.
The app servers: Rails by default uses cookies for session, which makes it dead easy to scale horizontally. Given that it’s memory hungry, it’s not great at scaling vertically anyway, so you should get used to it.
Starting with two servers gives you a couple of benefits.
As mentioned before cloud servers crash and disappear a lot. By having two, your magic cloud load balancer can keep routing traffic “somewhere” when one is off the air. Users can forgive downtime if there’s a friendly maintenance page, and they like you. If you’re missing one of those two elements, this approach might not work for you.
By splitting early, you’ll discover if you’ve accidentally coupled app serving stuff to your one machine. This is difficult to untangle under the stress of higher traffic, and better to do in a low stress time.
You break SRP by doing web traffic and offline jobs on the same boxes if you define the responsibility as “serve web” or “do jobs”, but in the early days of a monolithic rails app, it’s better to define the responsibility as “runs ruby code”. It means just one class of servers to receive your app code, and one class of servers to keep your ruby packages up to date. The complexity of two different kinds of jobs on one box in the early stages is less than managing the complexity of two kinds of boxes to run the same codebase.
The maths for when to scale more horizontally is easier too: too much ruby work happening (of any sort), spin up an identical new box. Too much database work, upgrade the box, or scale out.
Goodfilms is set up in a very similar way to what I’ve described, but has added complexity to do with our “magical film compatibility” system, and our consumption of 3rd party data sources. I’ll do a write up on that stack over on our team blog when I get some spare time.
EDIT Feel free to chat about this over at Hacker News
EDIT I finally got around to writing the follow up on the team blog