Rails and Scaling with Multiple Databases

Posted about 7 years back at Ryan Tomayko's Writings

Joe points out the following from this interview with Twitter’s Alex Payne:

The problem is that more instances of Rails (running as part of a Mongrel cluster, in our case) means more requests to your database. At this point in time there’s no facility in Rails to talk to more than one database at a time.

(David and Rafe weigh in also. Good stuff)

We've run against multiple (five now) separate PostgreSQL servers for a long time now. To be clear, that’s five separate “databases” in the sense that PostgreSQL uses the term. Not a replicating / mirror setup – separate databases with different data but with similar structure.

Each of our clients gets a schema (again, in the postgres sense of the word) on one of the database boxes. There are multiple client schemas per database. When a box becomes over utilized we get another box and move schemas around.

Sidebar: this whole schema-per-client thing might seem like overkill but when we say client, we mean, a company that processes health claims for 50 to 1000 self-funded employers. Each client is fairly massive in the volume of data they load into the system and our entire market consists of maybe 400 total prospects. Those 400 companies process something like 60% of the covered US population’s health claims every year.

In addition to the client databases, we have a master-slave balanced and replicated shared schema that stores user account information and other data applicable to all clients.

We run multiple fastcgi dispatchers on two severely underutilized web boxes. Our database load dwarfs our web load as we’re doing live data-warehouse style queries, which are insanely intensive when compared to generating the HTML or PDFs to display the results.

When an HTTP request comes in, it can go to any of the fastcgi dispatchers. We practice shared nothing. We establish, from the shared schema, which client the user belongs to and can then determine the server and schema that houses their data. Each fastcgi dispatcher has a pool of connections: one dispatcher has one connection to each postgres server. We scope the connection in using a set of Rails extensions similar to ActiveRecord::Base::with_scope at the beginning of the request.

This situation is surely very different from Twitter’s but I hope it shows the difference between the statement made by Alex, “there’s no facility in Rails to talk to more than one database at a time,” and the significantly more problematic, “you cannot talk to more than one database at a time.” While the former may be true (and I’ll argue in a moment that it’s not), the latter is clearly false.

Talking about this on the level of having “multi-database connectivity” come out of the box in Rails is simplifying the problem to an unreasonable level, I think. Would the multi-database connectivity features meet my needs or Twitter’s? They are different problems with minor overlap and we’re talking about the minor overlapping piece like it’s the biggest part of the problem.

Most of the time spent getting our setup running was in the conceptual and data wrangling phases. The amount of time it took to implement the multi-database connectivity was negligible compared to the amount of time it took to devise a method of splitting things out at the data level. When all was said and done, the Ruby/Rails related bits were implemented in no more than 40-50 lines of code.

In my case, ActiveRecord provided exactly the right level of functionality. I can have multiple database connections established and write code to manage when each should be used. Control of which connection is used is managed at the model level. Connections cascade up inheritance chains and I can specify that one model use the connection specified on another model using a simple delegate statement:

class A < ActiveRecord::Base
end

class B < ActiveRecord::Base
   class << self
     delegate :connection, :to => A
   end
end

Changing A’s connection changes B’s without effecting the connection used by any other model. We have a simple macro (uses_connection_of) that brings this down to a one liner for each top level model class:

class C < ActiveRecord::Base
  uses_connection_of B
end

This is only the tip of the framework level customizations we've made to Rails over almost two years of development. In most cases, I find the base functionality well balanced for the general (80%) case. We expect to write additional framework code when we get into special case territory, which our multi-database/schema setup clearly is, and which Twitter’s seems to also be.

When I consider what contributed to the unraveling of J2EE, one thing that stands out is that it tried to do too much. The promise was that of infinite scalability based on tooling, which assumes that designing scalable systems is a general case problem. I now firmly believe that this is flawed reasoning. Frameworks don’t solve scalability problems, design solves scalability problems.

I picked up a word from Joe a few years back and find myself using it a lot: “friction.” When referring to framework and tooling, “friction” is a (subjective) measure of how much the tooling gets in your way when trying to solve a specific-case problem. I've come to evaluate frameworks based on two rough metrics: how far the framework goes in solving the general case problem out of the box and how little friction the framework creates when you have to solve the specific-case problem yourself. When a framework finds a balance between these two areas, we call it “well designed.”

Measured along these lines, there are portions of Rails that have a less than perfect balance but I don’t think multi-database connectivity is one of them. It seems to me that moving too far in one direction on this would cause lots of friction for moving in other directions. There just doesn’t seem like there’s a lot of general case to solve here when you dig into the details.

Bottom line for me is that Twitter’s scaling and multi-database connection issues seem to be just that: Twitter’s issues. David’s response seems to indicate that he believes Rails could probably do more here but how far could framework level support really go and how much friction would be created?

Episode 18: Looping Through Flash

Posted about 7 years back at Railscasts

Displaying flash messages in the layout can be a pain at times. In this episode you will learn an easy way to display any kind of flash message by looping through the hash.

Episode 17: HABTM Checkboxes

Posted about 7 years back at Railscasts

It is often asked: how do I create a list of checkboxes for managing a HABTM association? Ask no more because this episode will show you how to do exactly that.

Productivity enhancers

Posted about 7 years back at work.rowanhick.com

Tannoy nearfield active monitors. Let's face it churning through documentation or code, can sometimes be mind numbing, no matter how much delegation you do you still have to deal with it and the only way to keep your mind in check, and sometimes to speed things along is to be bathed in your favourite music. Now I have a penchant for all things musical, whilst I don't spend 40k on audio gear at home, I also can't stomach the $50 computer speakers. A previous fan of Klipsch systems I decided it was time to step up the game. Enter these bad boys. "Nearfield active monitors" = speakers designed to be on a studio recording desk, up close to your ears versus across the living room floor. As they're monitors they're designed to be very neutral and flat sounding, allowing the detail of the music to present itself, not be coloured like most residential systems. So you won't have bass or mid range heavy music. To my ears, at low volumes everything is very crisp, tonnes of detail I've picked things I've only heard with Sennheisers on my head. All this comes at a cost, these clock in at $800 retail CAD, before taxes, if you know where to look. However eBay came to the rescue and I found a pair and won the auction for under half that, now they're gracing my desk getting burned in. Nice. (with a capital N). Not Pants.

Productivity Enhancers

Posted about 7 years back at work.rowanhick.com

Tannoy nearfield active monitors. Let's face it churning through documentation or code, can sometimes be mind numbing, no matter how much delegation you do you still have to deal with it and the only way to keep your mind in check, and sometimes to speed things along is to be bathed in your favourite music. Now I have a penchant for all things musical, whilst I don't spend 40k on audio gear at home, I also can't stomach the $50 computer speakers. A previous fan of Klipsch systems I decided it was time to step up the game. Enter these bad boys. "Nearfield active monitors" = speakers designed to be on a studio recording desk, up close to your ears versus across the living room floor. As they're monitors they're designed to be very neutral and flat sounding, allowing the detail of the music to present itself, not be coloured like most residential systems. So you won't have bass or mid range heavy music. To my ears, at low volumes everything is very crisp, tonnes of detail I've picked things I've only heard with Sennheisers on my head. All this comes at a cost, these clock in at $800 retail CAD, before taxes, if you know where to look. However eBay came to the rescue and I found a pair and won the auction for under half that, now they're gracing my desk getting burned in. Nice. (with a capital N). Not Pants.

Welcome

Posted about 7 years back at work.rowanhick.com

Welcome to my blog...after many years of reading other's blogs it's about time I gave back nuggets of wisdom. So here it is. I'll keep this bad boy updated on (almost) a daily basis - subscribe to the RSS, sit back and relax...

Welcome

Posted about 7 years back at work.rowanhick.com

Welcome to my blog...after many years of reading other's blogs it's about time I gave back nuggets of wisdom. So here it is. I'll keep this bad boy updated on (almost) a daily basis - subscribe to the RSS, sit back and relax...

RailRoad Class Visualization

Posted about 7 years back at zerosum dirt(nap) - Home

Just saw this InfoQ article about RailRoad and had to check it out. Gotta say, this is by far the best class visualization tool for RoR I’ve seen yet. Set your options, generate those DOT files, and then run them through GraphViz to export your image format of choice. Couldn’t be easier.

Check out some of the examples on the RailRoad RubyForge site, including the diagrams of the popular Depot example app and the much more complex Typo blog package. The latter is a good illustration of why the brief option is provided, heh. If you’re in the UML camp, you might be a little disappointed as the diagrams it produces are closer to BON, but personally I think they’re very straightforward and natural.

Whatever your modeling language preference is, I think we can agree that tools like this go a long way towards legitimizing Rails use in large multi-person projects and are, well, just plain helpful. Big thumbs up.

Paginating Associations

Posted about 7 years back at zerosum dirt(nap) - Home

It’s no real secret that the default Rails pagination helpers are kind of awful. Sure, you can use them, but I wouldn’t recommend it if you expect to scale. Instead, go snag yourself the wonderful paginating_find plugin. And then, if you’re going to be using them with your model associations, whip up an association extension like this:

module PaginationExtension
  def paginate(current = 1, size = 10, options = {})
    options[:page] = {:current => current, :size => size}
    find(:all, options)
  end
end

Now just extend the has_many association on your City class and you can call city.bars.paginate(2) to get the second 10-element page of bars associated with your city.

class City < ActiveRecord::Base
  has_many :bars, :extend => PaginationExtension
end

city.bars.paginate(2)

The good bars are all on the first page though, so consider yourself warned.

Episode 16: Virtual Attributes

Posted about 7 years back at Railscasts

Keep your controllers clean and forms flexible by adding virtual attributes to your model. This very powerful technique allows you to create form fields which may not directly relate to the database.

Episode 15: Fun with Find Conditions

Posted about 7 years back at Railscasts

You can pass more than simple strings to find conditions. Arrays, ranges, and nil values can be passed as well. In this episode you will see the tricks involved with passing these odd objects to find conditions. (Update: audio fixed).

Jack Dorsey and Alex Payne of Twitter - Ruby on Rails Podcast

Posted about 7 years back at Ruby on Rails Podcast

The creator of Twitter talks about developing the popular messaging site.
Also mentioned:

Episode 14: Performing Calculations on Models

Posted about 7 years back at Railscasts

Did you know ActiveRecord provides class methods for performing calculations on models? You can even use these methods through associations.

Go get scope_out - it should be in core!

Posted about 7 years back at The Hobo Blog

The scope_out plugin is such a great extension to ActiveRecord I can’t imagine any non-trivial app not benefiting from it.

I have just two reservations: I would have thought it could be named better (OK that’s a niggle), and the customised with_scope methods it creates (e.g. with_active), should really be protected. Why? See this thread.

Go get scope_out - it should be in core!

Posted about 7 years back at The Hobo Blog

The scope_out plugin is such a great extension to ActiveRecord I can’t imagine any non-trivial app not benefiting from it.

I have just two reservations: I would have thought it could be named better (OK that’s a niggle), and the customised with_scope methods it creates (e.g. with_active), should really be protected. Why? See this thread.