Setting up a public Git repository

Posted 5 days back at Barking Iguana

More and more these days I'm using Git as my version control system. I want to make code available to the general public. The easy choice would be to use a service like GitHub or repo.or.cz but I'm vain and want to serve my code from barkingiguana.com. I don't need to support multiple committers and I'd like to learn more about how Git works so I don't want to use Gitosis. It turns out that it's pretty easy to setup your own public repository...

Passenger

Posted 5 days back at Too-biased - Home

So there is a lot of talk about Phusion Passenger lately and I feel the need to chime in here. David pointed out that Shopify is running on passenger which is something I announced on Twitter a few months ago.

Some context on Shopify’s installation: We launched Shopify originally on Lighttpd with FastCGI and later migrated to nginx with mongrels. Obviously we had to use HAProxy between Nginx and mongrels to avoid the dreaded “queue behind long running process” problem. We also added Monit to the mix which observed all mongrels to make sure that everything is running according to plan. After a process reaches 260 mb of memory we signal it to shut down after the next request so that a new one can start out with less memory. For this we added runit to the mix which supervises the mongrels and starts them up quickly once they hit the ground.

It’s important to note that we are not talking about a memory leak here. The reason for the 260mb ceiling comes from two issues with Ruby’s garbage collector:

  1. It allocates memory in very large chunks once the available memory gets low. This means a 140mb process increases to 260mb in a single go. It also never gives memory back to the operating system because Ruby’s GC is not able to move objects. Once it adds an object into the newly allocated space and that object still lives it cannot yield memory back.
  2. Because Ruby’s garbage collector uses mark and sweep it has to traverse the entire memory space in search of pointers. There are no generations that help with that. It means that GC cycles become longer and longer the more memory is available. Rails mitigates these issues by moving a full GC run behind a HTTP response, into the time period when the process is waiting for a new request but performance monitoring tools such as NewRelic clearly show that average response times is directly correlated with the amount of memory used across the server farm.

Now why did we switch to Passenger? Simple: the keyword is remove moving parts.

Every additional tool you add will come with it’s own bugs. Many people I talked to over the past years considered haproxy to be the most solid piece of infrastructure in their stack but even there was a really nasty bug recently (search for request queue handling).

We treat our server farm very similar to Shopify’s codebase. We are in this for the long haul and we cannot accept complex solutions when simple ones present themselves. Maintainability of our code and servers is paramount to the long term success of our product. Yes the Mongrel setup worked very well but Passenger allowed us to remove: Nginx, Haproxy, Runit and Monit. That’s a nice refactoring!

At the same time Passenger introduced some tangible improvements. We switched to enterprise ruby to get the full benefit of the COW memory characteristics and we can absolutely confirm the memory savings of 30% some others have reported. This is many thousand dollars of savings even at today’s hardware prices. We allow Passenger to adaptively spawn more processes with demand but most of the time our application servers are running about 40 processes to handle more than a million dynamic requests a day.

In conclusion: I cannot see any reason to choose a different deployment strategy at this point. Its simple, complete, fast and well documented.

Passenger

Posted 5 days back at Too-biased - Home

So there is a lot of talk about Phusion Passenger lately and I feel the need to chime in here. David pointed out that Shopify is running on passenger which is something I announced on Twitter a few months ago.

Some context on Shopify’s installation: We launched Shopify originally on Lighttpd with FastCGI and later migrated to nginx with mongrels. Obviously we had to use HAProxy between Nginx and mongrels to avoid the dreaded “queue behind long running process” problem. We also added Monit to the mix which observed all mongrels to make sure that everything is running according to plan. After a process reaches 260 mb of memory we signal it to shut down after the next request so that a new one can start out with less memory. For this we added runit to the mix which supervises the mongrels and starts them up quickly once they hit the ground.

It’s important to note that we are not talking about a memory leak here. The reason for the 260mb ceiling comes from two issues with Ruby’s garbage collector:

  1. It allocates memory in very large chunks once the available memory gets low. This means a 140mb process increases to 260mb in a single go. It also never gives memory back to the operating system because Ruby’s GC is not able to move objects. Once it adds an object into the newly allocated space and that object remains alive, it cannot yield memory back to the OS.
  2. Because Ruby’s garbage collector uses mark and sweep it has to traverse the entire memory space in search of pointers. There are no generations that help with that. It means that GC cycles become longer and longer the more memory is available. -Rails mitigates these issues by moving a full GC run behind a HTTP response, into the time period when the process is waiting for a new request (Update: Rails doesn’t do this anymore) but performance monitoring tools such as NewRelic clearly show that average response times is directly correlated with the amount of memory used across the server farm.

Now why did we switch to Passenger? Simple: the keyword is remove moving parts.

Every additional tool you add will come with it’s own bugs. Many people I talked to over the past years considered haproxy to be the most solid piece of infrastructure in their stack but even there was a really nasty bug recently (search for request queue handling).

We treat our server farm very similar to Shopify’s codebase. We are in this for the long haul and we cannot accept complex solutions when simple ones present themselves. Maintainability of our code and servers is paramount to the long term success of our product. Yes the Mongrel setup worked very well but Passenger allowed us to remove: Nginx, Haproxy, Runit and Monit. That’s a nice refactoring!

At the same time Passenger introduced some tangible improvements. We switched to enterprise ruby to get the full benefit of the COW memory characteristics and we can absolutely confirm the memory savings of 30% some others have reported. This is many thousand dollars of savings even at today’s hardware prices. We allow Passenger to adaptively spawn more processes with demand but most of the time our application servers are running about 40 processes to handle more than a million dynamic requests a day. However, because passenger constantly despawns and respawns rails processes they always stay fresh, run short GC cycles and are generally a lot more responsive. All this means that the total amount of memory that is used by Shopify during normal operations went from average of 9GB to an average of 5GB. We evenly distributed the savings amongst more Shopify processes and more memcached space which moved our average response time from 210ms to 130ms while traffic grew 30% in the last few months.

In conclusion: I cannot see any reason to choose a different deployment strategy at this point. Its simple, complete, fast and well documented.

On the Existence of Struct::Group in Rails

Posted 5 days back at almost effortless

I ran into a really strange case yesterday while working to move an app from bj to delayed_job. I won't spend much time going into the details about why we're making this switch, but suffice to say that we had a problem similar to the one GitHub describes in their blog post. The problem is that bj reloads the entire Rails stack for every request, which is terribly inefficient. Imagine if you had to restart your web browser every time you went to a new page or submitted a form. You'd be paying a "startup tax" to launch your browser with every single request. It doesn't make sense architecturally, and it absolutely kills your CPU. The delayed_job plugin operates by leaving a single Rails instance open and available for processing requests asynchronously. It's proven to be much faster in my limited testing.

In making the move to delayed_job, I checked out the readme, which suggests structuring things like so:

 
class NewsletterJob < Struct.new(:text, :emails)
  def perform
    emails.each { |e| NewsletterMailer.deliver_text_to_email(text, e) }
  end
end
 
Delayed::Job.enqueue NewsletterJob.new('lorem ipsum...', Customers.find(:all).collect(&:email))
 

The idea here is that you can use a Struct to quickly create a class with a method named perform. When you enqueue a job for later, the perform method will be called with the parameters you provided. However cool this may be, it introduces a really interesting gotcha that I ran into almost immediately.

If your app has a Group model, you won't be able to use it within your perform method.

Why is that? Because of the way that Ruby namespaces work, the etc module, and the fact that something called Struct::Group already exists in your Rails app.

Perhaps a code example will help explain how this could happen:

 
require 'etc' # in Rails, rails/railties/lib/rails/mongrel_server requires 'etc' 
 
class Group
  def foo
    puts "hello"
  end
end
 
class WTF < Struct.new(:whatever)
  def foo
   Group.new.foo
  end
end
 
Group.new.foo
WTF.new.foo
 
# OUTPUT #
# hello
# NoMethodError: undefined method ‘foo’ for #<struct Struct::Group name=nil, passwd=nil, gid=nil, mem=nil>
 

The Group.new.foo call will work as expected, but the WTF.new.foo call will fail because it's calling the foo method on Struct::Group, which (surprisingly enough) exists, and doesn't have a method named foo. It exists because Rails has required the 'etc' module. This creates a couple of Structs on your behalf, which is the source of our problem.

Luckily, there's an easy workaround. If you prefix your calls to Group with two colons, you'll get access to the Group class that you expect. In our example, the perform method in WTF would be changed like so:

 
class WTF < Struct.new(:whatever)
  def foo
   ::Group.new.foo
  end
end
 

Totally weird. I know.

Myth #4: Rails is a monolith

Posted 5 days back at Loud Thinking

Rails is often accused of being a big monolithic framework. The charges usually contend that its intense mass makes it hard for people to understand the inner workings, thus making it hard to patch the framework, and that it results in slow running applications. Oy, let's start at the beginning.

Measuring lines of code is used to gauge the rough complexity of software. It's an easy but also incredibly crude way of measuring that rarely yields anything meaningful unless you apply intense rigor to the specifics. Most measurements of LOCs apply hardly any rigor and reduces what could otherwise be a somewhat useful indicator to an inverse dick measurement match.

Applying rigor to measuring LOCs in Rails
The measurements of LOC in Rails have not failed to live up to the low standards traditionally set for these pull-down-your-pants experiments. Let's look at a few common mistakes people commit when trying to measure the LOCs in Rails:

  • They count all lines including comments and whitespace in Ruby files, thus punishing well-documented and formatted code
  • They count tests, thus punishing well-tested code
  • They count bundled dependencies, thus punishing dependency-free code

Now let's take a simple example of committing all these mistakes against a part of Rails and see how misleading the results turn out to be. I'm going to use Action Mailer as an example here:

  • 12,406 lines including comments, whitespace, tests, and dependencies
  • 7,912 lines including tests and dependencies
  • 6,409 lines including dependencies (t-mail and text-format)
  • 667 lines with none of the above

So the difference between committing all the mistakes and reality is a factor of 20. Even just the difference between committing the dependency mistake and reality is a factor of 10! In reality, if you were to work on Action Mailer for a patch, you would only have to comprehend a framework of 667 lines. A much less challenging task than digging into 12,406 lines.

Rails measured with all it's six major components without the mistakes is 34,097 lines divided across Action Mailer at 667, Active Resource at 878, Active Support at 6,684, Active Record at 9,295, Action Pack at 11,117 (the single piece most web frameworks should be comparing themselves to unless they also ship as a full stack), and Rail Ties at 5,447.

Looking at the monolithic charge
That Rails is big in terms of lines of code is just one of the charges, though. More vague and insidious is the charge that Rails is monolithic. That is one giant mass where all the pieces depend on each other and are intertwined in hard-to-understand ways. That it lacks coherence and cohesion.

First, Rails can include almost as much or as little of the six major pieces as you prefer. If you're making an application that doesn't need Action Mailer, Active Resource, or Active Record, you can swiftly cut them out of your runtime by uncommenting the following statement in config/environment.rb:

# config.frameworks -= [ :active_record, :active_resource, :action_mailer ]

Now you've reduced your reliance on Rails to the 23,248 lines in Action Pack, Active Support, and Rail Ties. But let's dig deeper and look at the inner workings of Action Pack and how much of that fits the monolithic charge.

Taking out the optional parts
The Action Controller part of Action Pack consists of 8,282 lines which breaks down into two major halves. The essential, stuff that's needed to run the bare minimum of controllers, and the optional that adds specific features, which you could do without.

First the essentials of which there are 3,797 lines spread across these files and directories: base.rb, cgi_ext, cgi_ext.rb, cgi_process.rb, cookies.rb, dispatcher.rb, headers.rb, layout.rb, mime_type.rb, mime_types.rb, request.rb, response.rb, routing, routing.rb, session, session_management.rb, status_codes.rb, url_rewriter.rb.

The more interesting part is the optional parts of which there are 3,481 lines spread across these files and directories: assertions, assertions.rb, benchmarking.rb, caching, caching.rb, components.rb, filters.rb, flash.rb, helpers.rb, http_authentication.rb, integration.rb, mime_responds.rb, performance_test.rb, polymorphic_routes.rb, rack_process.rb, record_identifier.rb, request_forgery_protection.rb, request_profiler.rb, rescue.rb, resources.rb, streaming.rb, test_case.rb, test_process.rb, translation.rb, verification.rb.

All these optional parts can actually very easily be turned off as well, if you so please. If you look at actionpack/lib/action_controller.rb, you'll see something like the following:

ActionController::Base.class_eval do

include ActionController::Flash
include ActionController::Benchmarking
include ActionController::Caching
...

This is where all the optional bits are being mixed into Action Pack. But they didn't need to be. If you really wanted to, you could just edit this 1 file and remove the optional bits you didn't need and you'd have some 3,500 lines of optional goodies to pick from.

For example, let's say you didn't need caching in your application. You comment the include ActionController::Caching line out and delete the associated files and that's 349 lines for the savings there. Or let's say that you don't like the flash, that's another 96 lines.

The reason many of these pieces can be optional is because of a wonderful part of Active Support called alias_method_chain. With alias_method_chain, you can latch on to a method to embellish it with more stuff. For example, the Benchmarking module uses alias_method_chain like this to hook into perform_action and render:


module Benchmarking
def self.included(base)
base.extend(ClassMethods)

base.class_eval do
alias_method_chain :perform_action, :benchmark
alias_method_chain :render, :benchmark
end
end

ActionController::Base declares render and perform_action, but doesn't know anything about benchmarking (why should it?). The Benchmarking modules adds in these concerns when it's included similar to how aspects work. So as you can see, alias_method_chain is a great enabler for clearly defined modules in Rails.

All the other frameworks in Rails works in a similar fashion. There's a handful of essential parts and then a handful of optional parts, which can use alias_method_chain if they need to decorate some of the essential pieces. This means that the code is very well defined and you can look at just a single piece in isolation.

But why on earth would you bother?
The analysis above of how you can bring Action Controller down to some 3,500 lines carefully side-stepped one important question: Why would you bother? And that's an answer I don't quite have for you.

The important part about being modular is that the pieces are understandable in isolation. That the individual modules have coherence and cohesion. Not that they're actually handed to you as a puzzle for you to figure out how to put together.

I'd much rather give someone a complete picture, which they can then turn into a puzzle if they're so inclined. As I've shown you above, it's actually really simple to deconstruct the frameworks in Rails and you can make them much smaller really easily if you decide that's a good use of your time and energy.

See the Rails Myths index for more myths about Rails.

Myth #4: Rails is a monolith

Posted 5 days back at Loud Thinking

Rails is often accused of being a big monolithic framework. The charges usually contend that its intense mass makes it hard for people to understand the inner workings, thus making it hard to patch the framework, and that it results in slow running applications. Oy, let's start at the beginning.

Measuring lines of code is used to gauge the rough complexity of software. It's an easy but also incredibly crude way of measuring that rarely yields anything meaningful unless you apply intense rigor to the specifics. Most measurements of LOCs apply hardly any rigor and reduces what could otherwise be a somewhat useful indicator to an inverse dick measurement match.

Applying rigor to measuring LOCs in Rails
The measurements of LOC in Rails have not failed to live up to the low standards traditionally set for these pull-down-your-pants experiments. Let's look at a few common mistakes people commit when trying to measure the LOCs in Rails:

  • They count all lines including comments and whitespace in Ruby files, thus punishing well-documented and formatted code
  • They count tests, thus punishing well-tested code
  • They count bundled dependencies, thus punishing dependency-free code

Now let's take a simple example of committing all these mistakes against a part of Rails and see how misleading the results turn out to be. I'm going to use Action Mailer as an example here:

  • 12,406 lines including comments, whitespace, tests, and dependencies
  • 7,912 lines including tests and dependencies
  • 6,409 lines including dependencies (t-mail and text-format)
  • 667 lines with none of the above

So the difference between committing all the mistakes and reality is a factor of 20. Even just the difference between committing the dependency mistake and reality is a factor of 10! In reality, if you were to work on Action Mailer for a patch, you would only have to comprehend a framework of 667 lines. A much less challenging task than digging into 12,406 lines.

Rails measured with all it's six major components without the mistakes is 34,097 lines divided across Action Mailer at 667, Active Resource at 878, Active Support at 6,684, Active Record at 9,295, Action Pack at 11,117 (the single piece most web frameworks should be comparing themselves too unless they also ship as a full stack), and Rail Ties at 5,447.

Looking at the monolithic charge
That Rails is big in terms of lines of code is just one of the charges, though. More vague and insidious is the charge that Rails is monolithic. That is one giant mass where all the pieces depend on each other and are intertwined in hard-to-understand ways. That it lacks coherence and cohesion.

First, Rails can include almost as much or as little of the six major pieces as you prefer. If you're making an application that doesn't need Action Mailer, Active Resource, or Active Record, you can swiftly cut them out of your runtime by uncommenting the following statement in config/environment.rb:

# config.frameworks -= [ :active_record, :active_resource, :action_mailer ]

Now you've reduced your reliance on Rails to the 23,248 lines in Action Pack, Active Support, and Rail Ties. But let's dig deeper and look at the inner workings of Action Pack and how much of that fits the monolithic charge.

Taking out the optional parts
The Action Controller part of Action Pack consists of 8,282 lines which breaks down into two major halves. The essential, stuff that's needed to run the bare minimum of controllers, and the optional that adds specific features, which you could do without.

First the essentials of which there are 3,797 lines spread across these files and directories: base.rb, cgi_ext, cgi_ext.rb, cgi_process.rb, cookies.rb, dispatcher.rb, headers.rb, layout.rb, mime_type.rb, mime_types.rb, request.rb, response.rb, routing, routing.rb, session, session_management.rb, status_codes.rb, url_rewriter.rb.

The more interesting part is the optional parts of which there are 3,481 lines spread across these files and directories: assertions, assertions.rb, benchmarking.rb, caching, caching.rb, components.rb, filters.rb, flash.rb, helpers.rb, http_authentication.rb, integration.rb, mime_responds.rb, performance_test.rb, polymorphic_routes.rb, rack_process.rb, record_identifier.rb, request_forgery_protection.rb, request_profiler.rb, rescue.rb, resources.rb, streaming.rb, test_case.rb, test_process.rb, translation.rb, verification.rb.

All these optional parts can actually very easily be turned off as well, if you so please. If you look at actionpack/lib/action_controller.rb, you'll see something like the following:

ActionController::Base.class_eval do

include ActionController::Flash
include ActionController::Benchmarking
include ActionController::Caching
...

This is where all the optional bits are being mixed into Action Pack. But they didn't need to be. If you really wanted to, you could just edit this 1 file and remove the optional bits you didn't need and you'd have some 3,500 lines of optional goodies to pick from.

For example, let's say you didn't need caching in your application. You uncomment the include ActionController::Caching line and delete the associated files and that's 349 lines for the savings there. Or let's say that you don't like the flash, that's another 96 lines.

The reason many of these pieces can be optional is because of a wonderful part of Active Support called alias_method_chain. With alias_method_chain, you can latch on to a method to embellish it with more stuff. For example, the Benchmarking module uses alias_method_chain like this to hook into perform_action and render:


module Benchmarking
def self.included(base)
base.extend(ClassMethods)

base.class_eval do
alias_method_chain :perform_action, :benchmark
alias_method_chain :render, :benchmark
end
end

ActionController::Base declares render and perform_action, but doesn't know anything about benchmarking (why should it?). The Benchmarking modules adds in these concerns when it's included similar to how aspects work. So as you can see, alias_method_chain is a great enabler for clearly defined modules in Rails.

All the other frameworks in Rails works in a similar fashion. There's a handful of essential parts and then a handful of optional parts, which can use alias_method_chain if they need to decorate some of the essential pieces. This means that the code is very well defined and you can look at just a single piece in isolation.

But why on earth would you bother?
The analysis above of how you can bring Action Controller down to some 3,500 lines carefully side-stepped one important question: Why would you bother? And that's an answer I don't quite have for you.

The important part about being modular is that the pieces are understandable in isolation. That the individual modules have coherence and cohesion. Not that they're actually handed to you as a puzzle for you to figure out how to put together.

I'd much rather give someone a complete picture, which they can then turn into a puzzle if they're so inclined. As I've shown you above, it's actually really simple to deconstruct the frameworks in Rails and you can make them much smaller really easily if you decide that's a good use of your time and energy.

Rails/jQuery UI sortables with single UPDATE query

Posted 5 days back at The Pug Automatic

I just wrote some sortable code for a Rails/jQuery app and figured I would blog just how little code it takes, and also the single MySQL query I used on the backend.

I have a #images div containing several .image divs. I want the .image divs to be drag-and-drop sortable, and for the ordering to be persisted to the database (in a column named "ordinal").

The JavaScript sorting, using Sortables from jQuery UI:

$('#images').sortable({items:'.image', containment:'parent', axis:'y', update: function() {
  $.post('/admin/images/sort', '_method=put&authenticity_token='+AUTH_TOKEN+'&'+$(this).sortable('serialize'));
}});

So my .image divs are sortable within their containing #images, and can only be dragged on the y axis (up and down). When the sorting is done, an Ajax request is sent to /admin/images/sort. The AUTH_TOKEN bit is Rails CSRF protection – see this post for more details and another way of handling it.

The Ajax request contains params like image[]=3&image[]=1&image[]=2, reflecting the order. The parameter name and values are taken from the element ids (e.g. "image_1").

I route the path:

admin.resources :images, :collection => { :sort => :put }

Then make a controller action:

def sort
  order = params[:image]
  Image.order(order)
  render :text => order.inspect
end

What's rendered isn't important, but you should render something or you get a 404.

The model method is just this:

# Set passed-in order for passed-in ids.
def self.order(ids)
  update_all(
    ['ordinal = FIND_IN_SET(id, ?)', ids.join(',')],
    { :id => ids }
  )
end

This generates a query like

UPDATE images SET ordinal = FIND_IN_SET(id, "3,1,2") WHERE id IN (3,1,2)

which sets the ordinal column to the position of the record id in that set.

Whenever I need the images ordered, I just make sure they're sorted by ordinal ASC, created_at ASC.

That's all the code it takes.

Speeding Up Rails Development

Posted 5 days back at Jim Neath

Over the last few months I’ve realised that the speed at which I develop new projects is a lot quicker than it used to be. So I thought I’d share some of the things I’ve learned and also some quite obvious things (to me at least).

Use a Base Application

I’m obviously going to be horrifically biased due to the fact that I helped to develop Bort, but I think that base apps are the way to roll. They save you about half a days worth of development and let you get straight into developing your application rather than fucking around doing the same monotonous stuff every time.

So here’s a run down of base apps floating around:

I haven’t used any of these apart from Bort, so I can’t really give you any opinion but everything I’ve seen by Thoughtbot and James Golick have always been awesome. Just look through them and find which one suits your needs.

I would like to end this section with a nice graph taken from Rails Rumble Observations, part II :)

Bort

Write Your Own Scaffold Generator

The default Rails scaffold generator is alright for prototyping an app but let’s face it, you wouldn’t use it for everything. So why don’t you made your own that you can use for everything. At the start of the last project we worked on, we spent 2-3 days working on a scaffold generator that would help to generate parts of the admin.

We made the generator generate all the search stuff, add sortable tables, generate basic specs and a whole bunch of other awesome stuff. Now we can get an awesome admin section set up for a model by running line from terminal.

This must have saved us at least a weeks worth of time. Time that we can now spend making sure that the rest of the site is as brilliant as possible. With the extra time, you take it easy, or you could add extra features, improve the UI, whatever. Keep it RESTful, kids.

Use a Form Builder

I hate forms. No secret there. But alas, nearly every application you’ll develop need to have forms. I wrote a custom form builder for the chaps at Fudge and it saves us a hell of a lot of time.

Now instead of writing something like the following:

<% form_for @story do |f| %>
  <%= f.error_messages %>
  <fieldset>
    <legend>Story Details</legend>
    <ol>
      <li>
        <%= f.label :title %>
        <%= f.text_field :title %>
      </li>
      <li>
        <%= f.label :body, 'Content' %>
        <%= f.text_area :body %>
      </li>
    </ol>
    <div class="buttons">
      <%= f.submit 'Create' %>
    </div>
  </fieldset>
<% end %>

Using our form builder we write:

<% form_for @story do |f| %>
  <%= f.error_messages %>
  <% f.field_set "Story Details" do %>
    <%= f.text_field :title %>
    <%= f.text_area :body, :label => 'Content' %>
  <% end %>
  <%= f.submit 'Create' %>
<% end %>

Now imagine you’re got to write close to 50 forms for an application. Can you guess which ones saves you time? Which one is more enjoyable to use? You got it.

Now while I wouldn’t say our form builder is ready for the general coding public (it isn’t), there are still a few out there.

I have used Semantic Form Builder by RubyPond before and it also happens to be the one we based out form builder on.

Build a Populate Rake Task

We started using Populator/Faker a couple of months a go and this is probably one of our biggest time savers. It’s a pain in the ass adding test data into your applications.

Ryan Bates has made a great railscast on how to use Populator along with Faker to generate fake data using a rake task so I’ll leave it to his awesome video to tell you all about it.

There are also a couple other options out there for generating fake data, the random-data gem and the Forgery plugin.

Peter Cooper has a more thorough run down of all three options over at Rails Inside.

Use Plugins/Gems

This should really go without saying, but I’ve seen a few people trying to write (poor) code for tasks that have already been solved, tested and improved on.

Gems and plugins are probably your biggest time saver. One of the things I love about the ruby community is that a lot of people give back to it.

If you have a problem, have a look on the awesome GitHub and see if there’s a plugin/gem floating around that looks like it could solve your problem. Try it out. If it works brilliant, if not see if you can fix it and improve the original code. Then if someone else has the same problem, they can use the plugin. If everyone helps out, we all have an easier job, we can do less work and enjoy life more.

Seriously, Just Buy a Fucking Mac

Just do it. Stop making excuses. I was a Windows user for about ten years but mainly because I didn’t know any better. I now work full time on a mac, both at home and at work, and there’s not a thing you could do to make me go back to Windows.

Windows simply won’t do a lot of things that you’ll want to do. Background jobs? Not a chance. Git? oh yeah, you can use msysgit but who the fuck wants to open up a separate program just to use git? Fuck off Windows. You’re slow and you suck.

Why get a mac? Rails runs faster. You can use the best text editor around, TextMate. You can install all those gems and plugins that all say: “This won’t work on Windows”.

Think getting a mac is too expensive? Get a low spec mac mini for $599. That’s what I started using and even though it’s low spec I never had a problem with it. You can use your USB keyboard, mouse and your monitor from your Windows machine. Still think it’s too much? Have a look on Amazon… Preowned Mac Mini for $350

So do you, my lovely readers, have any more suggestions/tips to speed up your development?

Self Clearing Floats in CSS

Posted 5 days back at Jim Neath

This is a quickie mainly to remind myself. My friend and fellow Fudge developer, Mike “1312” Byrne showed me a CSS trick to have divs clear themselves.

div#container
{
  height: 1%;
}

div#container:after
{
content: ".";
display: block;
height: 0;
clear: both;
visibility: hidden;
}

Brilliant. This works in Safari, Firefox and IE6+ as far as I know.

new plugin: acts_as_git

Posted 6 days back at ~:caboose :: the ruby on rails developer underground

With the help of Jamie van Dyke at Parfait and Scott Chacon at GitHub, I'm pleased to announce Acts As Git (no, I don't like the name either). It's a simple plugin which stores all changes you make to a text field in a git repository. This is ideal for something like a git-backed wiki.

From the README:

ALG automagically saves the history of a given text or string field. It sits over the top of an ActiveRecord model; after a value is committed to the database, the plugin writes the new value to a text file and commits it to a git repository. This way you get all the advantages of using Git as version-control.

Usage:

class Post < ActiveRecord::Base
  versioning(:title) do |version|
    version.repository = '/home/git/repositories/postal.git'
    version.message = lambda { |post| "Committed by #{post.author.name}" }
  end
end

To view the complete list of changes:

>> @post = Post.find 15
<Post:15>
>> @post.title
=> 'Freddy'
>> @post.history(:title)
=> ['Joe', 'Frank', 'Freddy]
>> @post.log
=> ['bfec2f69e270d2d02de4e8c7a4eb2bd0f132bdbb', '643deb45c12982dde75ba71657792a2dbdda83e6', 
'1ce6c7368219db7698f4acc3417e656510b4138d']
>> @post.revert_to '1ce6c7368219db7698f4acc3417e656510b4138d'
>> @post.title
=> 'Joe'

It uses the excellent Grit library, and doesn't actually have a checked-out repository. The latest version of your data is still stored in the database. You can actually clone this repo and view the changes; pushing back to it won't do anything useful.

RubyConf 2008 - Ruby on Rails Podcast

Posted 6 days back at Ruby on Rails Podcast

Conversations from RubyConf 2008. With Matt Aimonetti of Merb, Blake Mizerany of Sinatra, and Josh Peek of Rails.

Sponsor

Duncan Prints

Posted 6 days back at Mike Clark

I've long been a fan of James Duncan Davidson's photography. He's probably best known for his exceptional work as the official shooter at O'Reilly software conferences. You may not see him, but he's always there. Indeed, Duncan is the lens through which many of us have truly seen the emotion of a keynote speaker or the excitement of an attendee. But what I admire most about Duncan is his versatility. He can capture amazing photos in the worst possible indoor lighting conditions all day, and then snap a gorgeous nature scene on his road trip home. He makes it look so easy. If only he knew how much he's cost me in camera gear. :-)

Earlier this year, Duncan started selling some of his non-conference prints online. I had already fallen in love with a couple of the photos he chose, and I was thrilled by the prospect of owning a print copy. But I must admit, I've never ordered print photography online, and frankly I was hesitant to try it. Part of my hesitation was just the unknown process of going from a digital image I bought online to a print copy hanging on my wall. And I was also a bit worried that some degree of quality might be sacrificed along the way. After all, when you buy art, you're making an investment.

I'm extremely happy to report that I'm now the proud owner of two Duncan prints. The process was super easy and the quality is absolutely amazing! Rather than tell you how it all works, I thought I'd show you.

First, you go to the Zenfolio storefront and pick your favorite Duncan photos. This is art; you get to pick the photos that speak to you.

Duncan Prints

If you've followed Duncan's road trips over the last few years, you'll recognize many of these shots. There's a story behind every picture. Duncan has told some of the stories in his blog. I would encourage you to find the story behind the picture you like, perhaps the next time you see Duncan at a conference. Knowing the photographer and hearing his story makes the photo more valuable to you.

The second decision you need to make is the size of each print you picked:

Duncan Prints

The size you choose might be as simple as the size of your wall or the size of your budget. Remember that this is the print size, and does not account for framing. In general, a frame will add 3-4" to each side of a print. In my case, I used a tape measure to estimate which size fit on my office wall with plenty of room to breathe. I ended up selecting one 16" x 24" print and one 12" x 18" print (displayed below).

Once you've picked prints and sizes, the checkout process is super easy (and does not require an account):

Duncan Prints

A couple days later, your prints show up on your doorstep:

Duncan Prints

I ordered on a Tuesday and the prints arrived on Friday. For some reason I expected the shipping to be expensive, and was surprised that USPS Priority Mail was only $5.45. (FedEx Next Day was also available for $10.25.) Although you order the prints through Zenfolio, they use Mpix as their print and fulfillment partner:

Duncan Prints

This single box contained both of my prints in a rigid cardboard structure:

Duncan Prints

I was very impressed with the care in which the prints were packaged and shipped. Inside the box, each picture is lightly affixed to a cardboard backing and carefully wrapped in a plastic sleeve:

Duncan Prints

The prints themselves are absolutely gorgeous!

Duncan Prints

Seriously, Duncan is meticulous when it comes to print color correction. These images don't do it justice. My jaw dropped when I first saw this print, and I'm still amazed every day when I look at it.

Once you've received your prints, you'll want to get them framed and matted. Now, I'm certainly no pro at this. And thankfully you don't have to be either. I just left my prints unframed for a few days to enjoy their natural color. Then I took the prints to a local frame shop and asked the friendly gal: "If these were your prints, how would you frame and mat them?" She seemed to appreciate that I asked for her advice. And after taking in the prints for a few minutes, she drew my attention to the subtle colors. Then she recommended a few framing and matting combos that would complement the colors and frame each print nicely. Of course, I'd been thinking about ideas too, so we went through a couple iterations. I ended up framing both prints for around $200 (thanks in part to a sale at the frame shop). It was actually a lot of fun!

To give you some perspective on the sizes, here's the 16" x 24" print:

Duncan Prints

And here's the 12" x 18" print:

Duncan Prints

I haven't quite decided where to hang each one yet. When I placed the order I thought I knew where they would go, but I'm so happy with how they turned out that now I can't commit to just one place. :-)

I hope seeing these pictures fills in some of the unknowns you might have about ordering Duncan's prints online. He's really done a fantastic job making these prints easily accessible while at the same time staying true to the quality of his work. If you like Duncan's photography, this is a great way to support his work!

What Is Wrong With Ruby's Net::HTTP?

Posted 6 days back at InfoQ Personalized Feed for unregistered user - Register to upgrade!

Ruby's implementation of Net::HTTP has serious performance problems in the current version 1.8.6, caused by an unfortunate implementation. Luckily, Ruby 1.9's implementation performs much better. By Mirko Stocker

Microsoft Access .MDB to PostgreSQL db

Posted 6 days back at almost effortless

So recently, while working with an ancient, hastily written calendar application, it fell to me to pry a decade's worth of data loose from a .mdb file.

And while the original specifications for the project only called for a csv for each of the database's tables, scope began to creep as the client realized that he needed his data to be reformatted and, in some cases, recast. Due to the persnickety nature of this client horrifyingly random nature of his data (some dates were MM/DD/YY while others where a legit timestamp; some commas were escaped, others were not, etc.), I realized that I was going to have to implement an industrial strength solution: something that would scale. So rather than writing a series of one-off python scripts (using the totally kickass csv module), I decided to get the data into a Postgres database and then query this database as necessary.

First, I'll lay out (most of) my program and then I'll do a blow-by-blow, dwelling briefly on certain important or noteworthy parts.

Remember, what we're doing here is grabbing one table at a time and stuffing it into a Postgres database (and maybe doing a little string sanitizing and type casting as we go). Anything more automated might not work, given the totally messed-up nature of the data and of the .mdb format in general.

Here goes:

#!/usr/bin/env python

from cStringIO import StringIO

import csv, os, psycopg2, re, subprocess, sys, time

dbHost = "localhost"
dbUser = "toconnell"
dbPass = "XXXXXXXXXX"
dbName = "toconnell"

# This program is run from the CLI: arguments one and two are the .mdb file
# and the table in that database that we're trying to import
mdbFile = sys.argv[1]
mdbTable = sys.argv[2]

mdbDump = "/usr/bin/mdb-export"
# Dynamically name the table, depending on the day the import is run
tableName = "%s_%s" % (os.path.basename(mdbFile.replace(".","_").lower()),time.strftime("%Y_%m_%d"))

def dumpMDB(mdbFile):
    command = [mdbDump,mdbFile,mdbTable]
    p = subprocess.Popen(command,stdout=subprocess.PIPE)
    mdbData = [line for line in p.stdout.readlines()]

    # Grab the column names
    columns = mdbData.pop(0)

    # Now write a CSV to the buffer
    for line in mdbData:
        tmpFile.write(line)

    # Now use the csv module to make a list where each line is a list
    tmpFile.seek(0,0)
    importData = csv.reader(tmpFile)
    dataList = []
    dataList.extend(importData)
    tmpFile.close()

    return columns,dataList

def createTable(columns):
    # Take this string from the pretend CSV file, break it up and remake it as something like a query
    columnsList = []
    for item in columns.split(","):
        item = item.strip().lower()
        # Also do a little type casting in there (beats having to do it later)
        if item == "date":
            columnsList.append(item + " TIMESTAMP")
        else:
            columnsList.append(item + " TEXT")

    columns = ",".join(columnsList)
    print columns

    # Now connect and do table stuff
    conn = psycopg2.connect("dbname=%s user=%s host=%s password=%s" % (dbName,dbUser,dbHost,dbPass))
    cursor = conn.cursor()

    # Check for previous tables:
    cursor.execute("SELECT * FROM pg_tables WHERE tablename LIKE '%%%s%%'" % tableName)
    results = cursor.fetchone()

    if results != None:
        # Drop previous ones:
        cursor.execute("DROP TABLE %s" % tableName)
    else:
        # Create a new one:
        cursor.execute("CREATE TABLE %s (%s)" % (tableName,columns))
        conn.commit()

def populateTable(columns,query):
    conn = psycopg2.connect("dbname=%s user=%s host=%s password=%s" % (dbName,dbUser,dbHost,dbPass))
    cursor = conn.cursor()
    cursor.execute("INSERT INTO %s(%s) VALUES(%s)" % (tableName,columns,query))
    conn.commit()

def sanitize(query):
    # You can do as many of these as necessary: I've included a single one as an example
    finalList = []
    for item in query:
        item = item.replace("'","\\'")
        finalList.append(item)
    return finalList

if __name__ == "__main__":
    # First, instantiate our pretend cvs file
    tmpFile = StringIO()

    # Now get the data
    columns,dataList = dumpMDB(mdbFile)

    # Now reformat the "columns" string a little bit
    columnsList = [item.strip().lower().replace(" ","") for item in columns.split(",")]
    columns = ",".join(columnsList)

    # Now create a new table (delete the previous one)
    createTable(columns)

    # Finally, insert data, reporting progress on the CLI (better than doing
    # select count(*) every 15 seconds)
    total = len(dataList)
    n = 1
    for item in dataList:
        item = sanitize(item)
        formatQuery = "'" + "','".join(item) + "'"
        populateTable(columns,formatQuery)
        print "Inserting data %s/%s..." % (n,total)
        n += 1
    print "Done."

In my opinion, there are two noteworthy aspects of the above: the use of the csv module and the use of cStringIO to create a csv file in the buffer (instead of simply creating a file and deleting it).

In order to better explain those two aspects of the program, here's that function again, broken into more easily digested pieces and presented with an in-line, blow-by-blow commentary:

def dumpMDB(mdbFile):
    command = [mdbDump,mdbFile,mdbTable]
    p = subprocess.Popen(command,stdout=subprocess.PIPE)
    mdbData = [line for line in p.stdout.readlines()]

Nothing special here: I do some pretty standard subprocess syntax that executes mdb-export (a part of the mdbtools package on Debian) on my file and dumps a table in csv format, one line at a time, into a list. Normally the program would just dump them to stdout: I'm just grabbing them up with subprocess's Popen function.

    # Grab the column names
    columns = mdbData.pop(0)

Now I do a list.pop(0) on this list to get the first line (i.e. the column names) of what I just dumped with mdb-export; using the built-in pop function without an integer gets you the last item in your list (in 2.5, at least).

    # Now write a CSV to the buffer
    for line in mdbData:
        tmpFile.write(line)

    # Now use the csv module to make a list where each line is a list
    tmpFile.seek(0,0)
    importData = csv.reader(tmpFile)
    dataList = []
    dataList.extend(importData)
    tmpFile.close()

    return columns,dataList

This isn't fancy code. I'm essentially creating a CSV file and reading each of its lines into a list. I do this instead of simply using the list I generated with the subprocess call because I'm nervous about splitting the strings that compose that list: as I mentioned above, there is a very real possibility that a given string will contain unescaped commas and other inappropriate characters within it and I don't even want to have to contemplate how to split those strings correctly.

What I want to do is trust the csv module to figure that out for me.

So, in order to make that I happen, I need to give the csv module a CSV file it can read. Rather than writing a file to the filesystem (which no self-respecting sysadmin will do if he can help it), I need to make some cStringIO magic happen. And in order to make that buffer magic happen, I need to do three things:

  1. import cStringIO,
  2. instantiate the file-like object and
  3. do that weird seek to it.

The import (

from cStringIO import StringIO

) is fairly simple; if you've ever used cStringIO for anything, you've done this. The instantiation (

tmpFile = StringIO()

) is also straight out of the documentation.

But once I write all of the lines of my list to the cStringIO buffer, I've got to do

tmpFile.seek(0,0)

in order to then read from that file-like object with python's built-in csv module. There are a number of obscure/arcane reasons for this, but, as I understand it, the seek(0,0) is necessary because, without it, your file-like object is merely a collection of strings and not file-like in the way that the csv module expects it to be file-like: once you do the seek, you're ready to "read" your file-like object with the csv module.

From there, it's a simple matter of using the reader() function of the csv module to get your CSV data from your cStringIO object, manipulating it however you see fit and then looping over it and inserting it in your database.

And then never, ever having to work with (i.e. around) MS Access again.

Implementing a simple logging IRC bot

Posted 6 days back at TechnicalPickles :: Blog posts

For Boston.rb, we've been using a public Campfire provided by thoughtbot as a backchannel during meetings. It's nice and all, except it doesn't provide proper logging. For example:

  • If you come into the channel, there's no way to see what had been previously said.
  • If you lose your connection at some point, like if you close your laptop and take it home, all the discussion you see goes away
  • If you were to try to save the conversation by copy and pasting, you have to massage the copied text a bit to make it less whitespacy

Little known is that there's actually a protocol out there dedicated to chatting across the series of tubes. It's called internet relay chat (aka IRC).

Now, IRC itself doesn't solve these solutions. The advantage is that IRC is an open protocol and has been around a long time. As a result, there are many libraries out there for interacting with it.

I came across one such library, isaac. In a nutshell, it's a DSL akin to Sinatra instead of web applications.

Within 15 minutes of finding this library, I managed to whip up a simple bot which just sits in a channel, and logs it to a file. Check it:

require 'rubygems'
require 'isaac'

config do |c|
  c.username = "bostonrbot"
  c.realname = "bostonrb logger bot"
  c.nick    = "bostonrbot"
  c.server  = "irc.freenode.net"
  c.port    = 6667
end

on :connect do
  join "#boston.rb"
end

on :channel, /.*/ do
  open("#{channel}.log", "a") do |log|
    log.puts "#{nick}: #{message}"
  end

  puts "#{channel}: #{nick}: #{message}"
end

For our user group, I plan on moving our back I plan on just running this bot during our meetings, and then emailing the log out afterwards. I could automate more, but this is a really simple first pass.


1 ... 5 6 7 8 9 ... 575