Episode #483 - July 25nd, 2014

Posted about 7 hours back at Ruby5

Rails Rumble, Debug Anything and Speeding up Rails

Listen to this episode on Ruby5

Sponsored by NewRelic

You should be using New Relic by now but do you regularly checkout their blog?

Rails Rumble

Rails Rumble 2014 will take place on the weekend of October 18th & 19th
Rails Rumble

How to Debug Anything

James Golick gave a talk at GoRuCo on how to debug anything
How to Debug Anything

3 Ways to Create Classes in Ruby

THUVA THARMA wrote a classy blog post on the many different ways to create a class
3 Ways to Create Classes in Ruby

Clone and Dup

Aaron Lasseigne just wrote a blog post Clone and Dup
Clone and Dup

Speedup Your Code by Aaron Patterson

Aaron Patterson from RedDotRuby 2014 on how he's been Speeding up rails
Speedup Your Code by Aaron Patterson


Thank you for listening to Ruby5. Be sure to tune in every Tuesday and Friday for the latest news in the Ruby and Rails community.

Phusion Passenger 4.0.48 released

Posted 1 day back at Phusion Corporate Blog

Phusion Passenger is a fast and robust web server and application server for Ruby, Python, Node.js and Meteor. Passenger takes a lot of complexity out of deploying web apps, and adds powerful enterprise-grade features that are useful in production. High-profile companies such as Apple, New York Times, AirBnB, Juniper, American Express, etc are already using it, as well as over 350.000 websites.

Phusion Passenger is under constant maintenance and development. Version 4.0.48 is a bugfix release.

Phusion Passenger also has an Enterprise version which comes with a wide array of additional features. By buying Phusion Passenger Enterprise you will directly sponsor the development of the open source version.

Recent changes

4.0.47 was a hotfix release for an Enterprise customer. The changes in 4.0.47 and 4.0.48 combined are as follows.

  • Fixed a race condition while determining what user an application should be executed as. This bug could lead to applications being run as the wrong user. Closes GH-1241.
  • [Standalone] Improved autodetection of Rails asset pipeline files. This prevents Standalone from incorrectly setting caching headers on non-asset pipeline files. Closes GH-1225.
  • Fixed compilation problems on CentOS 5. Thanks to J. Smith. Closes GH-1247.
  • Fixed compilation problems on OpenBSD.
  • Fixed compatibility with Ruby 1.8.5.

Installing or upgrading to 4.0.48

OS X OS X Debian Debian Ubuntu Ubuntu
Heroku Heroku Ruby gem Ruby gem Tarball Tarball


Fork us on Github!Phusion Passenger’s core is open source. Please fork or watch us on Github. :)

<iframe src="http://ghbtns.com/github-btn.html?user=phusion&amp;repo=passenger&amp;type=watch&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="170" height="30"></iframe><iframe src="http://ghbtns.com/github-btn.html?user=phusion&amp;repo=passenger&amp;type=fork&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="170" height="30"></iframe><iframe src="http://ghbtns.com/github-btn.html?user=phusion&amp;type=follow&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="190" height="30"></iframe>

If you would like to stay up to date with Phusion news, please fill in your name and email address below and sign up for our newsletter. We won’t spam you, we promise.

An Explained psqlrc


Let’s walk through my short psqlrc(5) to see what I’ve set, and to inspire you to find your own configuration that fits into your workflow. Here is my complete psqlrc:

\set ON_ERROR_ROLLBACK interactive
\set HISTFILE ~/.psql/history- :DBNAME

\pset pager off
\pset null '(null)'

PostgreSQL’s shell, psql(1), can be configured using \set and \pset. \pset is for changing the output format — HTML, pager, field separator, and so on — while \set is for everything else.


The ON_ERROR_ROLLBACK settings affects how psql(1) handles errors. The default value is off.

When this setting is on, errors are effectively ignored at all times. So if you have this script, slint.sql:

CREATE TABLE members (id SERIAL, name TEXT);
INSERT INTO member (name) VALUES ('David Pajo');
INSERT INTO members (name) VALUES ('Brian McMahan');

And run it from the command line:

psql -f slint.sql

You would end up with a members table with Brian McMahan but without David Pajo.

When it is set to off, the default, then you get nothing: no members table and no Brian McMahan. It either all works or it doesn’t, just like a transaction should.

There is a third value: interactive. Under interactive, the above command in which statements are piped into psql(1) non-interactively is treated like off, but if you type them into the interactive prompt it is treated like on. This gives you a chance to fix things without starting over:

$ psql
bands=# BEGIN;
bands=# CREATE TABLE members (id SERIAL, name TEXT);
bands=# INSERT INTO member (name) VALUES ('David Pajo');
ERROR:  relation "member" does not exist
LINE 1: INSERT INTO member (name) VALUES ('David Pajo');
bands=# INSERT INTO members (name) VALUES ('David Pajo');
bands=# INSERT INTO members (name) VALUES ('Brian McMahan');
bands=# COMMIT;


Some people format their SQL with uppercase keywords; others go downcase. Some mix and match depending on their mood. psql(1) handles that!

Possibly the greatest feature of any shell is tab completion, and psql(1) doesn’t disappoint. However, there’s a question of which case it should use to complete keywords.

The straight-forward thing to do is to set it to lower or upper.

sel tab completes to SELECT

But even fancier are preserve-lower and preserve-upper, with preserve-upper as the default. These preserve whatever case you were using, falling back to lower (or upper). For example:

preserve the case but default to upper

There, up was completed to update and S was completed to SET, preserving the case as the user typed it; n was completed to name, preserving the case of the object in the database; and the space after order was completed to BY, favoring uppercase when the user has typed nothing.


Like any good shell, psql(1) will save the commands you have entered so you can run them again (it’s full Readline; try a ^R some time). By default it stores the history in ~/.psql_history, but we can do better than that.

To start, let’s introduce another psql(1) command: \echo

bands=# \echo hello
bands=# \echo :DBNAME 

The variable :DBNAME is automatically set to the name of the database and available to all psql(1) commands. There are other pre-set variables like :PORT, :HOST, :USER, :ENCODING, and so on, but we’re going to use :DBNAME to start.

It just so happens that psql(1) will concatenate strings for you, so if you want different history for each database (the queries against the desserts table won’t make sense in the zoology database, for example), you can set that up:

\set HISTFILE ~/.psql_history- :DBNAME

You can combine these as much as you please, such as:

\set HISTFILE ~/.psql_history- :USER - :HOST - :PORT - :DBNAME


The pager is the program that paginates text. The classic is more(1), and the improvement is less(1). Puns.

The default value for the pager setting is on which — unlike the name suggests — only uses the pager sometimes. A few lines are shown without a pager, but 25 or more lines invoke pagination. (Specifically, if the text would scroll off the screen, it invokes the pager.)

To always have a pager, use the value always. To never use the pager — useful inside a terminal multiplexer or terminal emulator with good scrolling — use the value off.

You can also change the pager itself by setting the PAGER environment variable. For example:

export PAGER="/usr/local/bin/gvim -f -R -"

This will use gvim(1) as your pager.


By default NULL values show as blank spaces. Also by default the empty string shows as a blank space.

bands=# INSERT INTO members (name) VALUES ('');
bands=# INSERT INTO members (name) VALUES (NULL);
bands=# SELECT * FROM members;
 id |     name      
  1 | David Pajo
  2 | Brian McMahan
  3 | 
  4 | 
(4 rows)

To better distinguish NULL values from empty strings, you can have psql(1) show anything you want instead:

bands=# \pset null '(null)'
Null display is "(null)".
bands=# SELECT * FROM members;
 id |     name      
  1 | David Pajo
  2 | Brian McMahan
  3 | 
  4 | (null)
(4 rows)

And more

You can find all of this and more in the psql(1) manpage or in the official PostgreSQL Web documentation. We have also written previously on this topic.

As you read the documentation we’d love to see your favorite settings as pull requests against the .psqlrc in our dotfiles.

Episode #482 - July 22nd, 2014

Posted 3 days back at Ruby5

Get your mind in the Gutter, agree that Programming is Not Math, be a RubyCritic, master Vim Plugins for Ruby, review 3 Ways to Create Classes in Ruby, and take a trip to RailsPacific.

Listen to this episode on Ruby5

Sponsored by CodeShip.io

Codeship is a hosted Continuous Delivery Service that just works.

Set up Continuous Integration in a few steps and automatically deploy when all your tests have passed. Integrate with GitHub and BitBucket and deploy to cloud services like Heroku and AWS, or your own servers.

Visit http://codeship.io/ruby5 and sign up for free. Use discount code RUBY5 for a 20% discount on any plan for 3 months.

Also check out the Codeship Blog!



Rah-jeev Kannav Sharma wrote to us to let us know about a gem called Gutter, a low-overheard monitoring web dashboard for GNU and Linux machines. It features live, on-demand monitoring of RAM, load, uptime, disk allocation, users, and many more stats.

Programming Is Not Math

Sarah Mei wrote a really interesting blog post last week called Programming Is Not Math. She talks about how most the time programming is in fact much more like a language skill. Yet, somehow most computer science degrees focus quite heavily on math while it’s very possible to not need it that much later on.
Programming Is Not Math


Guilherme Simões sent us a note about RubyCritic, a gem he built for his Master's thesis. He describes it as an opinionated version of the MetricFu gem which does static code analysis.

Vim Plugins for Ruby

Milos Dolobac sent us a note this week about a blog post he wrote called Vim Plugins for Ruby. According to him, these plugins are productivity boosters that every Ruby developer should know about.
Vim Plugins for Ruby

3 Ways to Create Classes in Ruby

If you’ve written almost anything in Ruby, you’ve probably come across a Ruby class, and as Thuva Tharma explains in his most recent blog post, there are actually three ways to create a class in Ruby. Each of these styles could come in handy if you know how to use them.
3 Ways to Create Classes in Ruby


We recently found out about RailsPacific -- the first ever Ruby on Rails Conference in Asia. It runs September 26-27, one week after RubyKaigi, in Taipei, Taiwan. The conference is divided into one day of speaking and one day of workshops, including workshops on refactoring, performance tuning, object-oriented design, and TDD with RSpec.

Sponsored by Top Ruby Jobs

Simon & Schuster is looking for a Ruby on Rails developer in New York, NY Adobe Systems is looking for a Senior Web Developer in San Jose & San Francisco, CA Underdog.io is looking for a Ruby on Rails Developer in New York, NY or remote and Cambridge Systematics is looking for a Ruby on Rails Engineer in Cambridge, MA
Top Ruby Jobs

Sponsored by Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

Spring Stroll

Posted 4 days back at Mike Clark

Spring Stroll

Just out for a leisurely stroll through the balsam root...

Ruby 2 Keyword Arguments


Ruby 2.0 introduced first-class support for keyword arguments:

def foo(bar: 'default')
  puts bar

foo # => 'default'
foo(bar: 'baz') # => 'baz'

In Ruby 1.9, we could do something similar using a single Hash parameter:

def foo(options = {})
  bar = options.fetch(:bar, 'default')
  puts bar

foo # => 'default'
foo(bar: 'baz') # => 'baz'

Ruby 2.0 blocks can also be defined with keyword arguments:

define_method(:foo) do |bar: 'default'|
  puts bar

foo # => 'default'
foo(bar: 'baz') # => 'baz'

Again, to achieve similar behavior in Ruby 1.9, the block would take an options hash, from which we would extract argument values.

Required keyword arguments

Unfortunately, Ruby 2.0 doesn’t have built-in support for required keyword arguments. Luckily, Ruby 2.1 introduced required keyword arguments, which are defined with a trailing colon:

def foo(bar:)
  puts bar

foo # => ArgumentError: missing keyword: bar
foo(bar: 'baz') # => 'baz'

If a required keyword argument is missing, Ruby will raise a useful ArgumentError that tells us which required argument we must include.

Keyword arguments vs options hash

With first-class keyword arguments in the language, we don’t have to write the boilerplate code to extract hash options. Unnecessary boilerplate code increases the opportunity for typos and bugs.

With keyword arguments defined in the method signature itself, we can immediately discover the names of the arguments without having to read the body of the method.

Note that the calling code is syntactically equal to calling a method with hash arguments, which makes for an easy transition from options hashes to keyword arguments.

Keyword arguments vs positional arguments

Assume we have a method with positional arguments:

def mysterious_total(subtotal, tax, discount)
  subtotal + tax - discount

mysterious_total(100, 10, 5) # => 105

This method does its job, but as a reader of the code using the mysterious_total method, I have no idea what those arguments mean without looking up the implementation of the method.

By using keyword arguments, we know what the arguments mean without looking up the implementation of the called method:

def obvious_total(subtotal:, tax:, discount:)
  subtotal + tax - discount

obvious_total(subtotal: 100, tax: 10, discount: 5) # => 105

Keyword arguments allow us to switch the order of the arguments, without affecting the behavior of the method:

obvious_total(subtotal: 100, discount: 5, tax: 10) # => 105

If we switch the order of the positional arguments, we are not going to get the same results, giving our customers more of a discount than they deserve:

mysterious_total(100, 5, 10) # => 95

Connascence and trade-offs

Connascence between two software components A and B means either 1) that you can postulate some change to A that would require B to be changed (or at least carefully checked) in order to preserve overall correctness, or 2) that you can postulate some change that would require both A and B to be changed together in order to preserve overall correctness. - Meilir Page-Jones, What Every Programmer Should Know About Object-Oriented Design

When one Ruby method has to know the correct order of another method’s positional arguments, we end up with connascence of position.

If we decide to change the order of the parameters to mysterious_total, we must change all callers of that method accordingly. Not only that, but our mental model of how to use this method must change as well, which isn’t as simple as a find/replace.

Like most things, keyword arguments have their trade-offs. Positional arguments offer a more succinct way to call a method. Usually, the code clarity and maintainability gained from keyword arguments outweigh the terseness offered by positional arguments. I would use positional arguments if I could easily guess their meanings based on the method’s name, but I find this rarely to be the case.

What’s Next?

Episode #481 - July 18th, 2014

Posted 7 days back at Ruby5

Take a peek into your app, think about accessibility, write polyglot web apps, learn Rails, say goodbye to 1.8.7 and 1.9.2 support

Listen to this episode on Ruby5

Sponsored by New Relic

New Relic is _the_ all-in-one web performance analytics product. It lets you manage and monitor web application performance, from the browser down to the line of code. With Real User Monitoring, New Relic users can see browser response times by geographical location of the user, or by browser type.
This episode is sponsored by New Relic


Take a peek into your Rails application!


Simple chrome extension to notify websites of your accessibility requirements.


Polyglot is a distributed web framework that allows programmers to create web applications in multiple programming languages.


GoRails is a series of screencasts and guides for all aspects of Ruby on Rails.

EOL for 1.8.7 and 1.9.2

Extended maintenance of Ruby versions 1.8.7 and 1.9.2 will end on July 31, 2014.
EOL for 1.8.7 and 1.9.2

Notes on 'Notes on "Counting Tree Nodes"'

Posted 8 days back at interblah.net - Home

Having now finished watching Tom’s episode of Peer to Peer, I finally got around to reading his Notes on “Counting Tree Nodes” supplementary blog post. There are a couple of ideas he presents that are so interesting that I wanted to highlight them again here.

If you haven’t seen the video, then I’d still strongly encourage you to read the blog post. While I can now see the inspiration for wanting to discuss these ideas1, the post really does stand on it’s own.

Notes on ‘Enumerators’

Here’s the relevant section of the blog post. Go read it now!

I’m not going to re-explain it here, so yes, really, go read it now.

What I found really interesting here was the idea of building new enumerators by re-combining existing enumerators. I’ll use a different example, one that is perhaps a bit too simplistic (there are more concise ways of doing this in Ruby), but hopefully it will illustrate the point.

Let’s imagine you have an Enumerator which enumerates the numbers from 1 up to 10:

> numbers = 1.upto(10)
=> #<Enumerator: 1:upto(10)>
> numbers.next
=> 1
> numbers.next
=> 2
> numbers.next
=> 3
> numbers.next
=> 9
> numbers.next
=> 10
> numbers.next
StopIteration: iteration reached an end

You can now use that to do all sorts of enumerable things like mapping, selecting, injecting and so on. But you can also build new enumerables using it. Say, for example, we now only want to iterate over the odd numbers between 1 and 10.

We can build a new Enumerator that re-uses our existing one:

> odd_numbers = Enumerator.new do |yielder|
    numbers.each do |number|
      yielder.yield number if number.odd?
=> #<Enumerator: #<Enumerator::Generator:0x007fc0b38de6b0>:each>

Let’s see it in action:

> odd_numbers.next
=> 1
> odd_numbers.next
=> 3
> odd_numbers.next
=> 5
> odd_numbers.next
=> 7
> odd_numbers.next
=> 9
> odd_numbers.next
StopIteration: iteration reached an end

So, that’s quite neat (albeit somewhat convoluted compared to 1.upto(10).select(&:odd)). To extend this further, let’s imagine that I hate the lucky number 7, so I also don’t want that be included. In fact, somewhat perversely, I want to stick it right in the face of superstition by replacing 7 with the unluckiest number, 13.

Yes, I know this is weird, but bear with me. If you have read Tom’s post (go read it), you’ll already know that this can also be achieved with a new enumerator:

> odd_numbers_that_arent_lucky = Enumerator.new do |yielder|
    odd_numbers.each do |odd_number|
      if number == 7
        yielder.yield 13
        yielder.yield number
=> #<Enumerator: #<Enumerator::Generator:0x007fc0b38de6b0>:each>
> odd_numbers.next
=> 1
> odd_numbers.next
=> 3
> odd_numbers.next
=> 5
> odd_numbers.next
=> 13
> odd_numbers.next
=> 9
> odd_numbers.next
StopIteration: iteration reached an end

In Tom’s post he shows how this works, and how you can further compose enumerators to to produce new enumerations with specific elements inserted at specific points, or elements removed, or even transformed, and so on.


A hidden history of enumerable transformations

What I find really interesting here is that somewhere in our odd_numbers enumerator, all the numbers still exist. We haven’t actually thrown anything away permanently; the numbers we don’t want just don’t appear while we are enumerating.

The enumerator odd_numbers_that_arent_lucky still contains (in a sense) all of the numbers between 1 and 10, and so in the tree composition example in Tom’s post, all the trees he creates with new nodes, or with nodes removed, still contain (in a sense) all those nodes.

It’s almost as if the history of the tree’s structure is encoded within the nesting of Enumerator instances, or as if those blocks passed to Enumerator.new act as a runnable description of the transformations to get from the original tree to the tree we have now, invoked each time any new tree’s children are enumerated over.

I think that’s pretty interesting.

Notes on ‘Catamorphisms’

In the section on Catamorphisms (go read it now!), Tom goes on to show that recognising similarities in some methods points at a further abstraction that can be made – the fold – which opens up new possibilities when working with different kinds of structures.

What’s interesting to me here isn’t anything about the code, but about the ability to recognise patterns and then exploit them. I am very jealous of Tom, because he’s not only very good at doing this, but also very good at explaining the ideas to others.

Academic vs pragmatic programming

This touches on the tension between the ‘academic’ and ‘pragmatic’ nature of working with software. This is something that comes up time and time again in our little sphere:

Now I’m not going to argue that anyone working in software development should have a degree in Computer Science. I’m pretty sympathetic with the idea that many “Computer Science” degrees don’t actually bear much of direct resemblance to the kinds of work that most software developers do2.

Ways to think

What I think university study provides, more than anything else, is exposure and training in ways to think that aren’t obvious or immediately accessible via our direct experience of the world. Many areas of study provide this, including those outside of what you might consider “science”. Learning a language can be learning a new way to think. Learning to interpret art, or poems, or history is learning a new way to think too.

Learning and internalising those ways to think give perspectives on problems that can yield insights and new approaches, and I propose that that, more than any other thing, is the hallmark of a good software developer.

Going back to the blog post which, as far I know, sparked the tweet storm about “programming and maths”, I’d like to highlight this section:

At most academic CS schools, the explicit intent is that students learn programming as a byproduct of learning CS. Programming itself is seen as rather pedestrian, a sort of exercise left to the reader.

For actual developer jobs, by contrast, the two main skills you need these days are programming and communication. So while CS still does have strong ties to math, the ties between CS and programming are more tenuous. You might be able to say that math skills are required for computer science success, but you can’t necessarily say that they’re required for developer success.

What a good computer science (or maths or any other logic-focussed) education should teach you are ways to think that are specific to computation, algorithms and data manipulation, which then

  • provide the perspective to recognise patterns in problems and programs that are not obvious, or even easily intuited, and might otherwise be missed.
  • provide experience applying techniques to formulate solutions to those problems, and refactorings of those programs.

Plus, it’s fun to achieve that kind of insight into a problem. It’s the “a-ha!” moment that flips confusion and doubt into satisfaction and certainty. And these insights are also interesting in and of themselves, in the very same way that, say, study of art history or Shakespeare can be.

So, to be crystal clear, I’m not saying that you need this perspective to be a great programmer. I’m really not. You can build great software that both delights users and works elegantly underneath without any formal training. That is definitely true.

Back to that quote:

the ties between CS and programming are more tenuous … you can’t necessarily say that they’re required for developer success.

All I’m saying is this: the insights and perspectives gained by studying computer science are both useful and interesting. They can help you recognise existing, well-understood problems, and apply robust, well-understood and powerful solutions.

That’s the relevance of computer science to the work we do every day, and it would be a shame to forget that.

  1. In the last 15 minutes or so of the video, the approach Tom uses to add a “child node” to a tree is interesting but there’s not a huge amount of time to explore some of the subtle benefits of that approach

  2. Which is, and let’s be honest, a lot of “Get a record out of a database with an ORM, turn it into some strings, save it back into the database”.

Baseimage-docker 0.9.12 released

Posted 8 days back at Phusion Corporate Blog

Baseimage-docker is a special Docker image that is configured for correct use within Docker containers. It is Ubuntu, plus modifications for Docker-friendliness. You can use it as a base for your own Docker images. Learn more at the Github repository and the website, which explain in detail what the problems are with the stock Ubuntu base image, and why you should use baseimage-docker.

Changes in this release

  • We now officially support nsenter as an alternative way to login to the container. With official support, we mean that we’ve provided extensive documentation on how to use nsenter, as well as related convenience tools. However, because nsenter has various issues, and for backward compatibility reasons, we still support SSH. Please refer to the README for details about nsenter, and what the pros and cons are compared to SSH.
    • The docker-bash tool has been modified to use nsenter instead of SSH.
    • What was previously the docker-bash tool, has now been renamed to docker-ssh. It now also works on a regular sh shell too, instead of bash specifically.
  • Added a workaround for Docker’s inability to modify /etc/hosts in the container (Docker bug 2267). Please refer to the README for details.
  • Fixed an issue with SSH X11 forwarding. Thanks to Anatoly Bubenkov. Closes GH-105.
  • The init system now prints its own log messages to stderr. Thanks to mephi42. Closes GH-106.

Using baseimage-docker

Please learn more at the README.

The post Baseimage-docker 0.9.12 released appeared first on Phusion Corporate Blog.

Tmux Only For Long-Running Processes


This post describes a minimal Tmux workflow, used only for long-running processes. It is intended to reduce the cognitive load imposed by administrative debris of open tabs, panes, or windows.

Set up Tmux for a Rails project

From within a full-screen shell (to hide window chrome, status bars, notifications, the system clock, and other distractions), create a Tmux session for a Rails project:

cd project-name

tat (short for “tmux attach”) is a command from thoughtbot/dotfiles that names the Tmux session after the project’s directory name. That naming convention will help us re-attach to the session later using the same tat command.

At this point, tat is the same thing as:

tmux new -s `basename $PWD`

Run the Rails app’s web and background processes with Foreman:

foreman start

The process manager is a long-running process. It is therefore a great candidate for Tmux. Run it inside Tmux, then forget it.

After only running one command inside Tmux, detach immediately:

<tmux-prefix> d

Ctrl+b is the default Tmux prefix. Many people change it to be Ctrl+a to match the API provided by GNU Screen, another popular terminal multiplexer.

Perform ad-hoc tasks

Back in a full-screen shell, we perform ad-hoc tasks such as:

vim .
git status
git add --patch
git commit --verbose

Those commands are targeted, “right now” actions. They are executed in a split second and focus us immediately on the task at hand.

Doing most of the work from inside Vim

A majority of our work is done from within a text editor, such as fast grepping in Vim:

\ string-i-am-searching-for

Or, running specs from Vim:

<Leader> s

In thoughtbot/dotfiles, <Leader> is <Space>.

Suspending the Vim process when necessary

To return control from Vim to the command line, suspend the process:


Run this command to see suspended processes for this shell session:


It will output something like:

[1]  + suspended  vim spec/models/user_spec.rb

This is when we might do some Git work:

git fetch origin
git rebase -i origin/master
git push --force origin <branch-name>
git log origin/master..<branch-name>
git diff --stat origin/master
git checkout master
git merge <branch-name> --ff-only
git push
git push origin --delete <branch-name>
git branch --delete <branch-name>

When we’re ready to edit in Vim again, we foreground the process:


Re-attach to the Tmux session quickly

When we need to restart the process manager or start a new long-running process, we re-attach to the Tmux session:


At this point, tat is the same thing as:

tmux attach -t `basename $PWD`

Compared to other Tmux workflows, this workflow does involve more attaching and detaching from Tmux sessions. That is why the tat shortcut is valuable.

Back inside Tmux, we can kill the Foreman process:


Or, we might want to open a long-running Rails console in order to maintain a history of queries:

<tmux-prefix> c
rails console

After poking around in the database, we might detach from Tmux again:

<tmux-prefix> d

Get things done

At that point, we might take a break, go home, or move on to another project.

The next time we sit (or stand!) at our desks, we start fresh by creating a branch, opening Vim, or doing whatever ad-hoc task is necessary in a clean slate, distraction-free environment.

Meanwhile, Tmux handles one responsibility for us: quietly manages long-running processes.

Solitary Unit Test

Posted 9 days back at Jay Fields Thoughts

Originally found in Working Effectively with Unit Tests

It’s common to unit test at the class level. The Foo class will have an associated FooTestsclass. Solitary Unit Tests follow two additional constraints:

  1. Never cross boundaries
  2. The Class Under Test should be the only concrete class found in a test.
Never cross boundaries is a fairly simple, yet controversial piece of advice. In 2004, Bill Caputo wrote about this advice, and defined a boundary as: ”...a database, a queue, another system...”. The advice is simple: accessing a database, network, or file system significantly increases the the time it takes to run a test. When the aggregate execution time impacts a developer’s decision to run the test suite, the effectiveness of the entire team is at risk. A test suite that isn’t run regularly is likely to have negative-ROI.

In the same entry, Bill also defines a boundary as: ”... or even an ordinary class if that class is ‘outside’ the area your [sic] trying to work with or are responsible for”. Bill’s recommendation is a good one, but I find it too vague. Bill’s statement fails to give concrete advice on where to draw the line. My second constraint is a concrete (and admittedly restrictive) version of Bill’s recommendation. The concept of constraining a unit test such that ‘the Class Under Test should be the only concrete class found in a test’ sounds extreme, but it’s actually not that drastic if you assume a few things.
  1. You’re using a framework that allows you to easily stub most concrete classes
  2. This constraint does not apply to any primitive or class that has a literal (e.g. int, Integer, String, etc)
  3. You’re using some type of automated refactoring tool.
There are pros and cons to this approach, both of which are examined in Working Effectively with Unit Tests.

Solitary Unit Test can be defined as:
Solitary Unit Testing is an activity by which methods of a class or functions of a namespace are tested to determine if they are fit for use. The tests used to determine if a class or namespace is functional should isolate the class or namespace under test by stubbing all collaboration with additional classes and namespaces.

Knowledge Base updates

Posted 9 days back at entp hoth blog - Home

Howdy everyone!

Today I would like to highlight two updates we deployed to the Knowledge Base in the past few weeks.


KB articles are now versioned. You can see changes between versions, restore versions, and see who made the changes. This should make it safer to update your articles regularly and allow you to recover from mistakes more easily :)

You can see the versions on the KB list:

Showing the number of versions in the KB listing

On the KB page:

Showing the number of versions on the KB page

And look at the history:

Showing the versions history


On the KB admin page, in the left sidebar, there is an option to export your whole KB as an HTML page:

Export your KB

We just improved this feature to add a Table of Contents, and fix all links between the different articles. This means that if you use the same anchor names in different articles, they will now work flawlessly in the exported file. If you moved or renamed articles but still have links to the old address, the export will take care of that as well.

This change will allow you to export your entire KB in one page, and print it as a PDF, and you have a complete manual for your application/service. Some of our customers do just that!

This last part is a bit experimental (the re-linking of everything), so if you experience any issue, just let us know.


The hard graft of Linked Data ETL

Posted 9 days back at RicRoberts :

Allowing a range of users to repeatably and reliably produce high quality Linked Data requires a new approach to Extract, Transform, Load (ETL), and we’re working on a solution called Grafter.

At Swirrl, our aims are simple: we’re helping regional and national governments publish the highest quality open data imaginable, whilst helping consumers access and use this data in rich and meaningful ways. This means visualisations, browsing, metadata, statistical rigour, data modelling, data integration and lots of design; and it’s why our customers choose to buy PublishMyData.

This is great, but unlocking the data from the “ununderstood dark matter of IT” and the dozens of other formats in use within the bowels of government is currently a highly skilled job.

Extract, Transform and Load

For us this job involves converting the data from its source form (which is almost always tabular in nature) to a linked data graph. Once this is done the data then needs to be loaded into our publishing platform. This whole life-cycle is known as “Extract, Transform and Load”, or ETL for short.

Recently thanks to our continuing growth, in terms of employees, customers and data volume, we’ve come to identify ETL as a barrier. And it’s a barrier to both our own and our customers’ ability to produce high quality, high volume data in a repeatable efficient manner.

ETL is not a new problem; it dates back to the code breaking work of Turing at Bletchley Park in the 1940’s. Though solutions to it were largely popularised by the likes of IBM with the birth of mainframe, batch processing, and database computing in the 1950s and 60s.

The Bletchley Park ETL Pipeline

Unfortunately when it comes to Linked Data, the ETL tools available tend to be immature, unusable, flawed in a critical dimension, or too inflexible to be useful for the work we encounter on a day to day basis.

Scripting Transformations

For lack of a better, flexible option we’ve solved all of our ETL needs until now by writing bespoke Ruby scripts. Though this is flexible and we manage a certain amount of code re-use, it can be problematic and costly to do.

The scripts typically end up doing data cleaning and data conversion: they are often quite large and take time to develop, and they can be awkward to maintain and document. So if they need to be re-run again with new data, another time consuming step is required to check exactly how the script works and what inputs it needs.

But even deeper problems can occur. Is the script robust to structural changes in the source data? Does it identify errors and report them properly? Does the script need to be modified to accommodate any new assumptions in the data? And, when it’s finally done, how can we be sure it actually did the right thing?

Typically once we’ve run the script, we’ll have a file of linked data in an appropriate RDF serialisation (usually n-triples). This file will then need to be loaded and indexed by the triple store; an operation made awkward by having to load and transfer large files over the network.

Answering all of these questions and validating the result is time consuming - it’s clear that a better way is needed. We realised it was time to start addressing some of these thorny issues for ourselves and our customers and so are busy developing a suite of software we’re calling Grafter. We’re grateful for support from two EU FP7 projects: DaPaas and OpenCube.

DaPaas Logo

OpenCube Logo

ETL Users

The kinds of transformation we encounter broadly fall into two camps: simple or complex. With an appropriate tool simple transformations could be built by the talented Excel users we meet working within local authorties and government. These represent perhaps 60-80% of the datasets we encounter.

The remaining datasets will sometimes require a significantly more complicated transformation; one that requires a Turing complete programming language to process.

There’s a tendency for many existing graphical ETL tools to become Turing complete, i.e. to introduce loops and conditional branches. But we feel this is a big mistake because the complexities of user interface workflows around Turing-complete systems are so great that they become almost useless.

Graphical Interfaces to Turing complete systems are unwieldy

Discriminating between user types

At the one extreme of the spectrum are the software developers, like ourselves, who need to fall back on a Turing complete language to tackle the bigger, thornier transformation problems.

At the other end of that spectrum are the talented Excel workers who produce detailed and rigorous stats for Local Authorities and government. These users need tools that let them graphically perform the bulk of the data conversions necessary to publish their data online - without having to be programmers and deal with the many problems that come with software development.

It’s also worth mentioning that even experienced developers will prefer a graphical tool for simple cases, as GUI’s can make some aspects of the problem significantly easier, especially by introducing rapid feedback and interfaces for data exploration.

Finally there is another class of users, who are less willing to get into low level data transformation, but are responsible for putting the data online, curating it and loading it into the system. These users should be able to use the transformations built by the other users, by simply providing the source data through a file upload.

We are hoping to target these three classes of user by building a suite of tools which are built on a common basis and which target the different types of users within the data publication industry.

Clear & Coherent Transformation Model

The mental model for designing transformations should be clear, flexible and encourage people to build transformations that are unsurprising.

The set of operations for expressing transformations should help users fall into the pit of success, encouraging users to express their transformations in a way that is both natural and internally consistent.


Many of the existing data cleaning and transformation tools we’ve seen aren’t efficient enough at processing the data for robust, commercial grade ETL. For example Open Refine, though a great tool for data cleaning, isn’t suited to ETL because its processing model requires files to be eagerly loaded into RAM rather than streamed or lazily evaluated. This makes it unsuitable for large datasets, which can easily consume all available memory.

Also, it’s common for tools to make multiple unnecessary passes over the data - for example, performing an additional pass for each operation in the pipeline. We believe most data can be processed in a single pass, and that a good ETL tool needs to be efficient in terms of memory and minimizing the work it does.

Removing unnecessary intermediate steps

Our old way of doing things (with Ruby scripts) would frequently export the data into an RDF serialisation (such as n-triples) which would then need to be imported separately to the database via HTTP.

Sometimes you might want to serialise the output data locally to disk, but often we’d like to wire transformations to go directly into our triple store; because it minimises the amount of manual steps required and means that data can be processed as a stream from source to database.


We believe that ETL has to be a process you can trust: partial failures are common in distributed systems, particularly when dealing with large volumes of data. So detecting failures, recovering from them and failing into well known recoverable error states is extremely important for large scale ETL pipelines.

Robustness requires facilities for developers and pipeline builders to easily incorporate error detection into their transformations. It makes sense to support this kind of validation and assertions on the source data, at arbitrary intermediate steps and on the final output.

A big part of being robust is reporting good errors at different levels of severity, and handling them appropriately. Sometimes you might just want to warn the user of an error; other times you might want to fail the whole import. Sometimes if pipelines are to be reused, error handling may need to be overridden from outside of the pipeline itself.

Likewise, being robust to structural changes in the source data is critically important. Sometimes a transformation might expect a spreadsheet to grow in particular dimensions, but not others. For example, one transformation might need to be tolerant of new columns being added to the end of the sheet, whilst another should throw an error if that ever happened.

Layered architecture with a component interface

We believe that having a suite of interoperable ETL tools that operate and build on each other in a cleanly architected manner is the right way to solve the ETL problem. There is never a one size fits all solution to ETL, so having a suite of layered APIs, DSLs, file formats, services and tools that let users change levels of abstraction is important to ensure both flexibility and reuse.

It’s also important to be able to specify transformations abstractly without too much concern over the underlying file format. (e.g. lots of formats are inherently tabular in nature, but have different serialisations) So you need an appropriate abstraction that lets you convert arbitrary file formats into a unit of operation on your pipeline.

Import services

Transformations themselves should be loadable into a generic import service, that will inspect the transformation for its required inputs and generate a web form that allows other users to import and process the raw data.

Once a spreadsheet’s structure has been decided (and a transformation built for sheets with that structure), import services become essential to gaining reuse out of transformations and lowering barriers to repeatable publication.


A lot of our users face similar data transformation challenges. It’s important that the transformation pipelines can be easily shared between users, so that rather than starting from scratch, a user can tweak a transformation that someone else has already built for a similar purpose.

Commercial Grade ETL

We see a need for a different kind of ETL tool and we’re currently working on delivering on this vision. We’re starting small and have a good understanding of the problems we’re tackling, and for whom we are solving them.

We already have the fundamentals of this approach up and running and we are using it to process hundreds of datasets far more efficiently than we have before.

The next step is to start wrapping Grafter (our solution) in a user interface that will make high performance repeatable linked data creation easier and quicker for experts and more accessible to non-experts.

Phusion Passenger 4.0.46 released

Posted 9 days back at Phusion Corporate Blog

Phusion Passenger is a fast and robust web server and application server for Ruby, Python, Node.js and Meteor. Passenger takes a lot of complexity out of deploying web apps, and adds powerful enterprise-grade features that are useful in production. High-profile companies such as Apple, New York Times, AirBnB, Juniper, American Express, etc are already using it, as well as over 350.000 websites.

Phusion Passenger is under constant maintenance and development. Version 4.0.46 is a bugfix release.

Phusion Passenger also has an Enterprise version which comes with a wide array of additional features. By buying Phusion Passenger Enterprise you will directly sponsor the development of the open source version.

Recent changes

Most notable changes:

  • Further improved Node.js and Socket.io compatibility.
  • Sticky session cookies have been made more reliable.
  • Fixed WebSocket upgrade issues on Firefox. Closes GH-1232.
  • Improved Python compatibility.
  • Logging of application spawning errors has been much improved. Full details
    about the error, such as environment variables, are saved to a private log file.
    In the past, these details were only viewable in the browser. This change also
    fixes a bug on Phusion Passenger Enterprise, where enabling Deployment Error
    Resistance causes error messages to get lost. Closes GH-1021 and GH-1175.
  • Passenger Standalone no longer, by default, loads shell startup files before
    loading the application. This is because Passenger Standalone is often invoked
    from the shell anyway. Indeed, loading shell startup files again can interfere
    with any environment variables already set in the invoking shell. You can
    still tell Passenger Standalone to load shell startup files by passing
    --load-shell-envvars. Passenger for Apache and Passenger for Nginx still
    load shell startup files by default.
  • If you are a Union Station customer, then
    Phusion Passenger will now also log application spawning errors to Union Station.
    This data isn’t shown in the Union Station interface yet, but it will be
    implemented in the future.

Minor changes:

  • The Python application loader now inserts the application root into sys.path.
    The fact that this was not done previously caused a lot of confusion amongst
    Python users, who wondered why their passenger_wsgi.py could not import any
    modules from the same directory.
  • Fixed a compatibility problem with Django, which could cause Django apps to
    freeze indefinitely. Closes GH-1215.
  • Fixed a regression in Node.js support. When a Node.js app is deployed on
    a HTTPS host, the X-Forwarded-Proto header wasn’t set in 4.0.45.
    Closes GH-1231.
  • Passenger Standalone now works properly when the HOME environment variable
    isn’t set. Closes GH-713.
  • Passenger Standalone’s package-runtime command has been removed. It has
    been broken for a while and has nowadays been obsolete by our automatic
    binary generation system.
    Closes GH-1133.
  • The passenger_startup_file option now also works on Python apps. Closes GH-1233.
  • Fixed compilation problems on OmniOS and OpenIndiana. Closes GH-1212.
  • Fixed compilation problems when Nginx is configured with OpenResty.
    Thanks to Yichun Zhang. Closes GH-1226.
  • Fixed Nginx HTTP POST failures on ARM platforms. Thanks to nocelic for the fix.
    Closes GH-1151.
  • Documentation contributions by Tim Bishop and Tugdual de Kerviler.
  • Minor Nginx bug fix by Feng Gu. Closes GH-1235.

Installing or upgrading to 4.0.46

OS X OS X Debian Debian Ubuntu Ubuntu
Heroku Heroku Ruby gem Ruby gem Tarball Tarball


Fork us on Github!Phusion Passenger’s core is open source. Please fork or watch us on Github. :)

<iframe src="http://ghbtns.com/github-btn.html?user=phusion&amp;repo=passenger&amp;type=watch&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="170" height="30"></iframe><iframe src="http://ghbtns.com/github-btn.html?user=phusion&amp;repo=passenger&amp;type=fork&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="170" height="30"></iframe><iframe src="http://ghbtns.com/github-btn.html?user=phusion&amp;type=follow&amp;size=large&amp;count=true" allowtransparency="true" frameborder="0" scrolling="0" width="190" height="30"></iframe>

If you would like to stay up to date with Phusion news, please fill in your name and email address below and sign up for our newsletter. We won’t spam you, we promise.

Let Your Code Speak For Itself


Let’s say you have some code whose intent is clear to you but you can see a world where someone else may be confused. Often after writing such code, you realize this and add some comments to clarify the intent.

We add code comments for other developers who will interact with whatever we wrote, because we are courteous and thoughtful teammates:

class HostsController < ApplicationController
  def show
    # make sure user belongs to host if name is internal
    # set current host in session

    if current_user.hosts.include?(host) && host.match(/intranet.example.com/)
      session[:current_host_id] = host.id
      raise 'something went horribly wrong oh nooooo'

The Telephone Game

Remember the telephone game? Messages passed through intermediaries can get garbled in transmission. Particularly unreliable, logic-free intermediaries like code comments.

On their face, comments should be super helpful - I mean, you’ve left helpful notes for the next person! Isn’t that a good thing?

Yes, although now we have duplication - the comment and the code itself both speak to what this bit of logic should do.

Comments are a code smell which means “something may be improved here and we should dig deeper to see if that’s true.” In this case, the smell is “hey something is probably more complicated than it needs to be.”

Later on, when someone moves the session-setting behavior somewhere else, they have to remember to move this comment or update it. As humans, this is easy to forget.

Instead, let’s use intention-revealing method names to encapsulate the behavior. We’ll also move this logic into private methods since we don’t want other classes calling these methods:

class HostsController < ApplicationController
  def show
    if user_belongs_to_host? && host_name_is_internal?
      raise 'something went horribly wrong oh nooooo'


    def user_belongs_to_host?

    def host_name_is_internal?

    def set_current_host_in_session
      session[:current_host_id] = host.id

Other Smelly Comments

  • TODOs, like # Remember to fix this terrible method
  • Commented out dead code. Just delete it - that’s what Git is for.
  • Comments which restate the method name in English.

Comments are one of the code smells we address in our Ruby Science ebook.

When Are Comments Useful?

This isn’t a hard and fast rule. Some comments are useful:

  • Class-level comments: Adding a comment at the top of a class to describe its responsibilities can be helpful.
  • Open-Source: Ruby Gems and other open-source libraries are good places to add more detail, because we can use tools such Yard to automatically generate documentation from the comments. Here’s an example in Paperclip. If you’re providing a library for others to use, lightly commenting the public interface is typically encouraged so that documentation for the library can be auto-generated. Here’s an example in Golang.