The problem with using fixtures in Rails

Posted 3 months back at interblah.net - Home

I’ve been trying to largely ignore the recent TDD discussion prompted by DHH’s RailsConf 2014 keynote. I think I can understand where he’s coming from, and my only real concern with not sharing his point of view is that it makes it less likely that Rails will be steered in a direction which makes TDD easier. But that’s OK, and my concern grows then I have opportunities to propose improvements.

I don’t even really mind what he’s said in his latest post about unit testing the Basecamp codebase. There are a lot of Rails applications – including ones I’ve written – where a four-minute test suite would’ve been a huge triumph.

I could make some pithy argument like:

Sorry, I couldn't resist

… but let’s be honest, four minutes for a substantial and mature codebase is pretty excellent in the Rails world.

So that is actually pretty cool.

Using fixtures

A lot of that speed is no doubt because Basecamp is using fixtures: test data that is loaded once at the start of the test run, and then reset by wrapping each test in a transaction and rolling back before starting the next test.

This can be a benefit because the alternative – assuming that you want to get some data into the database before your test runs – is to insert all the data required for each test, which can potentially involve a large tree of related models. Doing this hundreds or thousands of times will definitely slow your test suite down.

(Note that for the purposes of my point below, I’m deliberately not considering the option of not hitting the database at all. In reality, that’s what I’d do, but let’s just imagine that it wasn’t an option for a second, yeah? OK, great.)

So, fixtures will probably make the whole test suite faster. Sounds good, right?

The problem with fixtures

I feel like this is glossing over the real problem with fixtures: unless you are using independent fixtures for each test, your shared fixtures have coupled your tests together. Since I’m pretty sure that nobody is actually using independent fixtures for every test, I am going to go out on a limb and just state:

Fixtures have coupled your tests together.

This isn’t a new insight. This is pain that I’ve felt acutely in the past, and was my primary motivation for leaving fixtures behind.

Say you use the same ‘user’ fixture between two tests in different parts of your test suite. Modifying that fixture to respond to a change in one test can now potentially cause the other test to fail, if the assumptions either test was making about its test data are no longer true (e.g. the user should not be admin, the user should only have a short bio, or so on).

If you use fixtures and share them between tests, you’re putting the burden of managing this coupling on yourself or the rest of your development team.

Going back to DHH’s post:

Why on earth would you run your entire test harness for every single line change in a particular model? If you have so little confidence in the locality of your changes, the tests are indeed telling you that the system has overly high coupling.

What fixtures do is introduce overly high coupling in the test suite itself. If you make any change to your fixtures, I do not think it’s possible to be confident that you haven’t broken a single test unless you run the whole suite again.

Fixtures separate the reason test data is like it is from the tests themselves, rather than keeping them close together.

I might be wrong

Now perhaps I have only been exposed to problematic fixtures, and there are techniques for reliably avoiding this coupling or at least managing it better. If that’s the case, then I’d really love to hear more about them.

Or, perhaps the pain of managing fixture coupling is objectively less than the pain of changing the way you write software to both avoid fixtures AND avoid slowing down the test suite by inserting data into the database thousands of times?

That’s certainly possible. I am skeptical though.

Docker-friendly Vagrant boxes 2014-04-30 released

Posted 3 months back at Phusion Corporate Blog

Vagrant

We provide Vagrant base boxes that are based on Ubuntu 14.04 and 12.04, 64-bit. These boxes are specifically customized to work well with Docker. Please learn more at the website

The changes in version 2014-02-30 are:

  • The Ubuntu 12.04 VirtualBox box in release 2014-02-22 was broken: the VirtualBox guest additions weren’t correctly installed because the kernel was incorrectly installed. This has now been fixed.
  • The Ubuntu 12.04 VMWare Fusion box now loads the VMWare Tools kernel modules during startup, so that Vagrant doesn’t have to wait so long at the “Waiting for HGFS kernel module” phase.
  • No changes in the Ubuntu 14.04 boxes.

Related resources: Github | Prebuilt boxes | Vagrant Cloud | Discussion forum | Twitter

Upgrade instructions for Vagrant >= 1.5 with Vagrant Cloud

Run:

vagrant box outdated

Upgrade instructions for Vagrant <= 1.4, or Vagrant >= 1.5 without Vagrant Cloud

Destroy your Vagrant VM:

vagrant destroy

Remove your existing base box:

# Vagrant >= 1.5
vagrant box remove phusion-open-ubuntu-12.04-amd64 --provider virtualbox
vagrant box remove phusion-open-ubuntu-12.04-amd64 --provider vmware_fusion

# Vagrant <= 1.4
vagrant box remove phusion-open-ubuntu-12.04-amd64 virtualbox
vagrant box remove phusion-open-ubuntu-12.04-amd64 vmware_fusion

Start your VM again. Vagrant will automatically download the latest version of the box.

vagrant up

Episode #460 - April 29th, 2014

Posted 3 months back at Ruby5

Configuring variants with Various, what to learn with What's Next, fixing Celluloid with RailsReloader, improving communication with Capistrano Team Notifications, integrating AngularJS with Rails, and getting more from Postgres with PG Power.

Listen to this episode on Ruby5

Sponsored by Pull Review

You want to ship it right instead of doing it again. But what could you clean while still meeting the deadline? PullReview reviews the Ruby code you just wrote and tells you what's wrong, why, and how to fix it - from style to security. Code, Spot, Fix, and Ship!
This episode is sponsored by Pull Review

Various

Various, by Zachary Friedman, is a gem that allows you to easily configure ActionPack::Variants for your Rails apps. The gem allows you to easily configure a mapping of user agent regular expressions which will set the request.variant automatically, based on whether or not there was a match from the User Agent.
Various

What's Next

Matthieu Tanguay-Carel wrote to us about What’s Next, a new micro-site designed to help programmers figure out what to learn at various stages in the learning process. It's got links for beginners and experts, as well as interview questions and a ranking of GitHub repositories.
What's Next

RailsReloader

Brandon Hilkert wrote a blog post on how he used the Rails Reloader to fix an issue on his library, SuckerPunch. He describes diving into Rails` source code, starting at the Reloader middleware. By following the comments found at the top of the Reloader class, he was able to find the solution.
RailsReloader

Capistrano Team Notifications

Alexander Balashov has created a gem which helps track Capistrano deploys via Space notifications and OSX Notification Center. The gem is called capistrano-team_notifications, and it allows everyone on your team to be notified any time a capistrano deploy occurs.
Capistrano Team Notifications

Rails & AngularJS

Sébastien Saunier wrote a blog post on how to integrate AngularJS services to an existing Rails app. He covers integrating Angular with Rails, setting up Karma and Jasmine for testing and making sure they all play nice with the Rails asset pipeline.
Rails & AngularJS

PG Power

Stan Carver wrote to us about PG Power, an ActiveRecord extension that helps to get more from PostgreSQL, like creating and dropping schemas, managing comments, and the ability to add foreign keys.
PG Power

TopRubyJobs

Red Digital Cinema is looking for a Sr. Ruby Software Engineer in Irvine, CA and Houston, TX. EquityMetrix is looking for an Application Developer in Dallas, TX and Science Exchange is looking for a Senior Ruby on Rails Engineer in Palo Alto, CA. If you’re looking for a Top Ruby gig or top ruby talent, head over to TopRubyJobs.com
TopRubyJobs

Thank You for Listening to Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

Announcing the Software Apprenticeship Podcast

Posted 3 months back at Jake Scruggs

In the summer of 2004 I did an apprenticeship of sorts at a place called Object Mentor.  At the time “Uncle” Bob Martin and his son Micah Martin were in leadership positions at the company and I somehow convinced them to let me work for free over the summer in exchange for teaching me software development. It wasn’t a very structured program, nothing like what Micah would later put together for 8th Light, but I was a pretty motivated learner.  I also had the advantage of coming from a teaching background so I knew how to learn.

All this has been covered in daily detail, if you'd like to read more.

After ten years of software experience I’m becoming a mentor to an apprentice and documenting the experience via podcast.  Backstop Solutions has graciously allowed me to pay our apprentice (the same rate we pay interns) as he is doing real work on a daily basis in addition to outside learning experiences.  From 9-5 he will be working on production code with 100% supervision as he will always be pairing with an experienced developer.  It’s a six month program with 2 month check-ins. 

The apprentice, Jonathan Howden, knows that should he fail to meet expectations we may end our relationship at 2, 4, or 6 months.  This is a bit scary, but if Backstop is going to be taking a chance on an un-credentialed employee we, as a company, need to be able mitigate the risk of such a person polluting our codebase.  It is therefore our responsibility to provide constant feedback to the apprentice so that he will know exactly what is needed to succeed.  So far he’s been doing great: He just finished his 3rd week of apprenticeship so we just recorded our third podcast and will be on a weekly schedule. Assuming he survives the six month apprenticeship, Jonathan will be offered a full time job at a damn good starting salary.  Interested in such a job right now? Check out https://careers.backstopsolutions.com/

In the first episode, Jonathan talks quite a bit about Dev Bootcamp (DBC).  I’ve known, worked with, and read the book of one of the founders so it seemed natural to reach out to Dave Hoover and DBC to help Backstop find its first apprentice.  We asked their “fairy job mother” to spread the word that we were looking for apprentices and ten applied.  They were all given coding homework challenges which were evaluated code review style with the whole InvestorBridge team allowed to attend.  We judged three submissions good enough to warrant an in-person interview.  Jonathan made it through this gauntlet and was rewarded with a brand new, much longer, gauntlet.  Of learning.  Look, there's no way not to make this sound hokey as we're trying to do a good thing here.

On a weekly basis I hope to capture what an apprenticeship is from the inside and perhaps provide some value to proto-developers and prospective mentor companies who may be wondering what this “apprenticeship” business is all about.  Changing careers is, like it or not, the future.  I did it in 2004 and I hope Jonathan will too in 2014.

Software Apprenticeship Podcast:
iTunes Page: https://itunes.apple.com/us/podcast/software-apprenticeship-podcast/id868371146?mt=2
RSS feed: http://softwareapprenticeship.libsyn.com/rss


Styling a Middleman Blog with Bourbon, Neat, and Bitters

Posted 3 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

This is a walk-through for styling your own static blog from scratch. We will be using Middleman for our framework and Bourbon, Neat, and Bitters for our Sass libraries.

Middleman is a lightweight static site framework built using Ruby. It compiles Markdown into HTML and is easily deployed to S3, Heroku, or GitHub Pages.

Middleman can also host a blog nicely like this blog.

The following steps will include instructions for installation, setup, deployment, and adding some very basic styles.

Installation

First, we install the Middleman gem and the blog extension gem. We can then initialize the project:

$ gem install middleman-blog
$ middleman init my_blog --template=blog --rack

Let’s start the server and take a look at what we’ve got:

$ cd my_blog
$ bundle exec middleman
$ open http://localhost:4567

You should see something like this in your browser:

Craigslist style

Middleman gives us a default post and just the default styles, which is a great place to start.

Configuration

We’ll be doing a lot of browser refreshing to see our progress, so let’s automate the process. We can use the built-in Livereload service to auto-refresh the page whenever we save. Let’s add the gem:

Gemfile

...
group :development do
  gem 'middleman-livereload'
end

And enable the service by uncommenting the configuration line:

config.rb

...
configure :development do
  activate :livereload
end

Livereload will begin working as soon as you run bundle and restart your server.

$ ctrl + c //shuts down Middleman server
$ bundle
$ bundle exec middleman

Prepare to Deploy

Our blog won’t run on Heroku as-is, but we only need to do a few things to change that.

First, we need to add some code to config.rb that will tell Middleman to put the files Heroku needs in a folder named tmp:

...
set :build_dir, 'tmp'
...

Next, we will create a file that tells Heroku how to build our source files. We will create a file named Rakefile in the root directory of our project and add the code below:

Rakefile

namespace :assets do
  task :precompile do
    sh "middleman build"
  end
end

config.ru

require 'rack/contrib/try_static'

use Rack::Deflater
use Rack::TryStatic,
  root: 'tmp',
  urls: %w[/],
  try: %w[.html index.html /index.html]

FIVE_MINUTES=300

run lambda { |env|
  [
    404,
    {
      'Content-Type'  => 'text/html',
      'Cache-Control' => "public, max-age=#{FIVE_MINUTES}"
    },
    ['File not found']
  ]
}

We’ll also need to include the rack-contrib gem in our Gemfile. Be sure to bundle and restart your server after this step.

Gemfile

...
gem 'rack-contrib'
...

The final step is initializing Git, which we will do next.

Initialize Git

To be able to track our changes and push our blog to Heroku, we need to initialize a Git repo.

$ git init
$ git add .
$ git commit -m 'initial commit'

To commit changes as we continue to work, we’ll run git add . to track new files and stage changes to files already being tracked. Then we can run git commit -m 'your commit message' and we’ll be able to push our latest changes to the remote.

It’s a good idea to commit changes to git at the end of each section of this post.

We’ll be using Heroku as a remote repository.

It would be beneficial to also set up a remote like GitHub for remote version tracking, but it’s not necessary.

We’ll just focus on Heroku for now. If you don’t already have a Heroku account, you’ll need to sign up for one before running heroku create.

$ heroku create
$ git push heroku master

And now, we can run heroku open in the terminal to open the page in our browser. We’ve just created a blog and pushed it live. Next is to add our own styles to customize the design.

Install Libraries

Our goal is to add a custom design to this blog, so let’s install our Sass toolkit and bundle:

Gemfile

gem 'bitters'
gem 'bourbon'
gem 'neat'

Bourbon is a library of vanilla Sass mixins, Neat gives us a responsive grid system, and Bitters sets default styles for Bourbon projects. These gems will make the assets available to our site through Middleman’s asset pipeline.

Since we’ve updated our Gemfile, we’ll need to bundle and restart our server again.

Bourbon and Neat are included by the gem, but Bitters requires an additional install in your stylesheets folder:

$ cd source/stylesheets
$ bitters install
Bitters files installed to /bitters

Next, we need to create a stylesheet manifest with the correct character encoding and import our design libraries:

source/stylesheets/all.css.scss

@charset "utf-8";

@import "bourbon";
@import "bitters/bitters";   /* Bitters needs to be imported before Neat */
@import "neat";

Include all of the stylesheets to be used in this site by adding the link to the manifest in the <head> section of layout.erb:

source/layout.erb

...
<head>
  <%= stylesheet_link_tag "all" %>
...

Let’s see how it looks now:

Bitters default style

Already, we see improvement. What did Bitters just set up for us?

  • Typography - Uses Helvetica as default font-family and sets the sizes for various header elements on a modular scale (e.g. <h1>’s and family)
  • Color - uses variables to systematize colors.
  • Lists - strips all styles from lists including bullets
  • Flash notice styles - very useful in Rails
  • Grid settings to accompany Neat
  • Some basic form styles

Get Stylish

We’ll need to customize a few things in Bitters to optimize our styles for a blog. First, let’s add our own Google font to layout.erb:

source/layout.erb

<head>
...
<link href='http://fonts.googleapis.com/css?family=Oxygen:400,300,700'
  rel='stylesheet' type='text/css'>
...
</head>

To make use of this font, all we need to do is change the $sans-serif variable in _variables.scss:

source/stylesheets/bitters/_variables.scss

...
$sans-serif: 'Oxygen', $helvetica;
...

By changing the $sans-serif variable, we’ve quickly and easily changed the font family globally.

Comfortable Reading

Let’s create a partial that will contain all of the layout related styles. We’ll need to import it in our manifest:

source/stylesheets/all.css.scss

...
@import "partials/layout";

Add the outer-container() mixin to the layout to center it in the viewport.

source/stylesheets/partials/_layout.scss

#main {
    @include outer-container;
}

For a good reading experience, we want to keep the length of the lines of text a comfortable length. If the lines are too long, the reader will have a hard time following the flow of the text.

Neat makes this easy to accomplish in as little as two steps. We’ll adjust the max-width property of the outer-container() mixin.

The first step will be to import _grid-settings.scss into Bitters. We’ll can just uncomment that line in _bitters.scss:

source/stylesheets/bitters/_bitters.scss

...
@import "grid-settings";

The second step is to edit _grid-settings.scss. Uncomment the $max-width variable and change its value to em(700). This should give us a comfortable line-length for reading.

source/stylesheets/_grid-settings.scss

...
$max-width: em(700);
...

Let’s see what our blog looks like now that we’ve added a few styles of our own:

Custom grid-settings

We see that Bitters has successfully applied our chosen font and centered our content. Don’t worry about some of the content being misaligned. We’re about to fix that.

Modular Structure

Our readers need to be able to easily move around the site, so we’ll add some helpful links in a navigation and footer.

To keep our code modular, we will break up the navigation and footer into separate partials. It’s good practice to make a folder to keep your partials in.

We’ll edit a group of new files:

  source/partials/_nav.erb
  source/partials/_footer.erb

  source/stylesheets/partials/_nav.scss
  source/stylesheets/partials/_footer.scss

Now that we’ve added some structure to our code we can import these partials into layout.erb and our Sass partials into all.css.scss.

source/layout.erb

...
<div id="main" role="main">
  <%= partial "partials/nav" %>
  <%= yield %>
  <%= partial "partials/footer" %>
</div>
...

Improved Markup

We are ready to improve the overall layout of our index page. A few small adjustments to the markup will make it more semantic and easier to style.

Paste the code below into _nav.erb:

source/_nav.erb

<nav>
  <ul>
    <li>
      <%= link_to "Blog Title", "index.html", :class => 'blog-title' %>
    </li>
  </ul>
</nav>

We will move some of the content from layout.erb into the footer. Paste the code below into _footer.erb and remove it from layout.erb:

source/_footer.erb

<footer>
  <ul class="large-column">
    <li><h5 class="heading">Recent Articles</h5></li>
    <li>
      <ol>
        <% blog.articles[0...10].each do |article| %>
          <li>
            <%= link_to article.title, article %>
            <span><%= article.date.strftime('%b %e') %></span>
          </li>
        <% end %>
      </ol>
    </li>
  </ul>

  <ul class="small-column">
    <li><h5 class="heading">Tags</h5></li>
    <li>
      <ol>
        <% blog.tags.each do |tag, articles| %>
          <li><%= link_to "#{tag} (#{articles.size})", tag_path(tag) %></li>
        <% end %>
      </ol>
    </li>
  <ul>
</footer>

Additionally, we’ll improve the markup in index.html.erb:

source/index.html.erb

...
<ul>
  <% page_articles.each_with_index do |article, i| %>
    <li>
      <h3><%= link_to article.title, article %></h3>
      <h5><%= article.date.strftime('%b %e') %></h5>
      <p><%= article.body %></p>
    </li>
  <% end %>
</ul>
...

Adding Custom Styles

As a finishing touch, we’ll add some custom styles to our navigation, footer, and layout: For consistency, we will also create Sass partials for the nav and the footer.

source/stylesheets/all.css.scss

...
@import "partials/nav";
@import "partials/footer";

source/stylesheets/partials/_nav.scss

nav {
  border-bottom: 1px solid $base-border-color;
  margin: em(30) 0;

  ul {
    display: inline-block;
    margin-bottom: em(10);
  }
}

.blog-title {
  font-size: 1.4em;
  font-weight: 700;
  letterspacing: 4px;
  text-transform: uppercase;
}

source/stylesheets/partials/_footer.scss

footer {
  border-top: 1px solid $base-border-color;
  margin: 2em 0;
  padding-top: 2em;
}

Some styles are providing a style pattern we’ll use throughout the blog. We’ll place these styles in the _layout.scss stylesheet since they create a reusable layout pattern.

source/stylesheets/partials/_layout.scss

...
ol {
  font-size: 1em;
  font-weight: 500;

  li {
    margin: .5em 0;

    span {
      display: inline-block;

      &:before {
        content: '/';
        margin: 0 .3em;
      }
    }
  }
}
...

Using the mixin @include span-columns(), Neat will calculate the width of the div based on the number of columns you specify in the argument.

source/stylesheets/partials/_layout.scss

...
.large-column {
  @include span-columns(8 of 12);
}

.small-column {
  @include span-columns(4 of 12);
}

Now we have a basic template set up for our very own blog.

Final Results

All that’s left to do is make sure all of our changes have been committed to Git and then deploy these updates to Heroku.

$ git push heroku master

Completion

Now we have a good foundation upon which to publish our own blog.

We Are All Wrong About Software Design

Posted 3 months back at Luca Guidi - Home

We are all wrong. When it comes to talk about opinions this is the way that things work. Everyone has her or his own beliefs shaped by years of experience in the field, frustrating code, books, successes, etc. How can all these backgrounds fall into a one unified theory? They just can’t.

You’ve always been told to pick the right tool for the job. But what’s the right tool? You decide it, according to your practical knowledge.

I love Ruby because I feel it natural, but other developers hate this language. I prefer clean code, other people don’t care. I’m for RSpec and Capybara, other for Test::Unit. CoffeeScript vs plain JavaScript, ERb vs HAML, Postgres vs MySQL. Vim or Emacs? Mac or Linux? TDD or not, anyone?

With all these partitions, we’re not freeing people from dogmas, but just creating fans of an opposite opinion.

Relativity can be applied to software design as well. How many levels of indirection do I need to get a certain job done? Well, it depends. It depends on a myriad of good reasons, but mainly on your judgement. Which can be superior for you, and fallacious for somebody else.

We can discuss about tradeoffs, but please stop using your successful product as the certification that you’re right about code.

I work at Litmus, a profitable company. If I’d put the following code in a template, do you will find it reasonable just because of my employer?

<%
  require 'mysql2'

  client = Mysql2::Client.new({
    host: 'host',
    username: 'username',
    database: 'database'})

  rows = client.query(%{SELECT * FROM previews
    ORDER BY created_at DESC
    LIMIT 5})
%>

<ul>
<% rows.each do |row| %>
  <li><%= row.fetch(:title) %></li>
<% end %>
</ul>

Hey, it works! Who needs all those fancy abstractions like controllers and ORMs. Who needs frameworks at all! That constructs are for architecture astronauts. Get off my lawn! Look at me, I’m a pragmatist. I proved this by ruining the multi-millionaire software I work on.

This isn’t an argument, just nonsense.

Episode #459 - April 25th, 2014

Posted 3 months back at Ruby5

A web interface for letter_opener, getting right at your Mint.com data, observers in mongoid 4, an interview with DHH, parsing out social network user ids, and the Ruby Hero Awards all in this episode of the Ruby5!

Listen to this episode on Ruby5

Sponsored by New Relic

New Relic is _the_ all-in-one web performance analytics product. It lets you manage and monitor web application performance, from the browser down to the line of code. With Real User Monitoring, New Relic users can see browser response times by geographical location of the user, or by browser type.
This episode is sponsored by New Relic

letter_opener_web

Add a web interface to letter_opener in development with letter_opener_web.
letter_opener_web

mint-exporter

Export your Mint.com transactions with Ruby!
mint-exporter

mongoid-observers

Mongoid 4 comes with the removal of observers. Thanks to Chamnap Chhorn though, we won't have to rewrite our apps depending on them!
mongoid-observers

Interview with DHH

Chat with DHH about nuance, where he stands with TDD today, increasing the bandwidth of conversations and his thoughts on how F/OSS software is used by for-profit corporations
Interview with DHH

ids_please

Please, please, please give me all your IDs! The ids_please gem can parse out user ids from all your favorite social network urls.
ids_please

Ruby Heroes 2014

The 2014 Ruby Heroes have been announced!
Ruby Heroes 2014

Open Cube: Making publishing statistics easier

Posted 3 months back at RicRoberts :

Open Cube logo

We’ve recently started work on another European collaborative project called OpenCube. This project is all about making open statistical data easier to publish, link and reuse so it can reach its full potential. The technical approach concentrates on the RDF Data Cube vocabulary, that recently reached W3C recommendation status.

This project is very relevant to our work: much of the data that our customers publish using PublishMyData is statistical data and we are big users of the RDF Data Cube approach. We can already see the power of this approach: for example how it enables combining datasets together, as in our Stats Selector tool. But we also have first hand experience of the challenges our customers face in creating RDF Data Cube data and can see many further opportunities for enriching access to data in this form.

A real strength of OpenCube is the close involvement of a group of ‘use case partners’. These are organisations that have agreed to participate in a series of pilot studies, assisting us with setting the requirements of the project and evaluating the tools we create. The pilots will involve the UK Department for Communities and Local Government, the Government of Flanders, the Irish Central Statistics Office and a (yet to be confirmed) Swiss bank.

As part of the project we’ll be creating an OpenCube extension of PublishMyData. This extension will include lots of cool stuff including:

  • tools for transforming CSV files and Excel spreadsheets to RDF Data Cube datasets (without the need for programming)
  • tools for managing a collection of concept schemes
  • tools for selecting URIs from external reference data
  • quality checks that make sure any generated RDF meets the required standards

So, like the DaPaaS project, we’re excited about being part of this because it will help us make Linked Data easier to create and easier to exploit; bringing the benefits to a larger community of users.

Again, we’re lucky to be working alongside some great data people on this project. Take a look at the links below to find out more about who’s involved:

Is Ruby Manor X happening this year?

Posted 3 months back at interblah.net - Home

It’s been almost six years since the first Ruby Manor. Last year we ran Ruby Manor 4.

The observant among you will notice that four conferences into six years leaves some years where a Ruby Manor did not happen. However, it might not be obvious why this is the case.

I’ve already written at length here about why Ruby Manor is a conference, just not a traditional one. With each iteration of the event, we have tried to stay true to the manifesto while making careful changes to see if we can improve on the previous event.

As I’ve said before, the right way to think about Ruby Manor is as an experiment. We make hypotheses (e.g. ‘community feedback during the proposal process will result in more relevant presentations’ or ‘not spending money on t-shirts will not detract from the conference experience’), then we run the experiment and evaluate the results.

In most cases the experiment has been largely successful. I’m pretty sure most people who’ve come to one or more Ruby Manors have enjoyed the day, enjoyed the majority of the content, enjoyed not having to spend hundreds of pounds or dollars, not minded the lack of swag and so on. It’s probably true to say that we could run a basically similar event ever year, and most people would be largely happy.

However – and here I speak mainly for myself, although I know that Murray and Tom feel largely similarly – I’m not desperately interested in putting bums-on-seats once per solar revolution, just for the sake of it.

I don’t know what the next experiment is.

Introducing Lotus::Model

Posted 3 months back at Luca Guidi - Home

Almost all the Ruby frameworks for the Model layer mix up business logic with database details. This kind of architecture leads to god classes, slow build times and to a general bad design. These problems are well known to legacy projects’s maintainers.

What if we assign these roles to smaller components that are able to collaborate together? Imagine how life changing would be to work just with objects, without worrying how to persist them. How easy and fast would be testing them? How small and well defined would be your objects?

Let me introduce Lotus::Model.

It brings a new level of decoupling that is a huge step in that direction. The framework constitutes a boundary that offers a convenient public API to execute queries and commands to persist and fetch entities.

Entities

An entity is the core of an application, where the part of the domain logic is implemented. It’s a small, cohesive object that express coherent and meaningful behaviors.

It deals with one and only one responsibility that is pertinent to the domain of the application, without caring about details such as persistence or validations.

This simplicity of design allows you to focus on behaviors, or message passing if you will, which is the quintessence of Object Oriented Programming.

Consider this object:

class Article
  attr_accessor :id, :title, :text

  def initialize(attributes = {})
    @id, @title, @text =
      attributes.values_at(:id, :title, :text)
  end
end

It can be optionally expressed as:

require 'lotus/model'

class Article
  include Lotus::Entity
  self.attributes = :title, :text
end

Yes, optionally. Lotus::Model can work with pure objects, as long they implement that small interface above.

But how the framework knows how to handle these objects?

Data Mapper

We use a data mapper for the job. It’s a persistence mapper that keep entities unaware of schema details. Good news are that it’s database independent, it can work with SQL, document, and even with key/value stores.

The role of a data mapper is to translate database columns into the corresponding attribute of an entity.

require 'lotus/model'

mapper = Lotus::Model::Mapper.new do
  collection :articles do
    entity Article

    attribute :id,    Integer
    attribute :title, String
    attribute :text,  String
  end
end

For simplicity sake, imagine that the mapper above is used with a SQL database. We use #collection to indicate the table that we want to map, #entity to indicate the kind of object to persist. In the end, each #attribute call specifies which Ruby type we want to associate for given column.

Repositories

Once we have in place all the entities and a mapping for them, we can use a repository to talk with a database.

A repository is an object that mediates between entites and the persistence layer. It offers a standardized API to query and execute commands on a database.

A repository is storage idenpendent, all the queries and commands are delegated to the current adapter.

This architecture has several advantages:

* An application depends on an standard API, instead of low level details (Dependency Inversion principle)

* An application depends on a stable API, that doesn’t change if the storage changes

* You can postpone storage decisions

* It confines persistence logic at a low level

* Multiple data sources can easily coexist in an application

require 'lotus/model'

class ArticleRepository
  include Lotus::Repository
end

When a class includes Lotus::Repository it will expose CRUD methods such as .create, .update, .find, and a powerful private query API. This decision forces developers to define intention revealing APIs, instead leak storage details outside of a repository.

Look at the following code:

ArticleRepository.
  where(author_id: 23).
  order(:published_at).
  limit(8)

This is an example of implicit API, it means nothing in terms of the behavior of the domain model. It’s just a chain of method calls, from the caller, that should be aware of the internal query mechanisms.

There is a better way to write it:

require 'lotus/model'

class ArticleRepository
  include Lotus::Repository

  def self.most_recent_by_author(author, limit = 8)
    query do
      where(author_id: author.id).
        order(:published_at)
    end.limit(limit)
  end
end

Look at how revealing is the name of that method. It encapsulates the implementation details, in favor of a clear and testable API.

If we change the type of database, the callers of that method will be unaffected.

Adapters

As mentioned above, Lotus::Model is database agnostic. A repository forwards method calls to its current adapter. An adapter is a concrete implementation of persistence logic for a specific database. The framework is shipped with two adapters:

* SqlAdapter * MemoryAdapter

An adapter can be associated to one or multiple repositories and different repositories can have different data sources. For instance an application can have ArticleAdapter that uses a SQL database and TweetRepository that talks to a third part JSON service.

Roadmap

For the next two months, I will focus on Lotus (the gem). The main goal is to make all the frameworks to work together in a full stack app. This will require improve the existing libraries and empower them with the missing features.

On June 23rd I will release Lotus.

To stay updated with the latest releases, to receive code examples, implementation details and announcements, please consider to subscribe to the Lotus mailing list.

<link href="//cdn-images.mailchimp.com/embedcode/slim-081711.css" rel="stylesheet" type="text/css"/>

Episode #458 - April 22nd, 2014

Posted 3 months back at Ruby5

This week we cover Searchlight's 3.0, dumping code in models, this week in Rails commits, Whiny Validations, Bootstrap 3 support for Simple Form, and how to be an open source gardener.

Listen to this episode on Ruby5

Sponsored by Top Ruby Jobs

If you're looking for a top Ruby job or for top Ruby talent, then you should check out Top Ruby Jobs. Top Ruby Jobs is a website dedicated to the best jobs available in the Ruby community.
This episode is sponsored by Top Ruby Jobs

Searchlight 3.0 released

Nathan Long dropped us an email recently to let us know about the 3.0 release of searchlight library, which we originally covered about a year ago now. Searchlight gives you a low-magic way to build database searches. It allows you to encapsulate your search options in an object which is more testable and database agnostic. It also plays really well with Rails forms, so if you find yourself building an advanced search page check it out.
Searchlight 3.0 released

Don’t just dump code into your models

Jakub Arnold of Sensible.io has an interesting post where he takes you through the consequences of dumping coupled code that depends on the Mailchimp gem inside of an ActiveRecord model’s callback cycle. As Jakub explains, this dependency may cause exceptions when saving the model and even if you handle the exception, your call to Mailchimp may fail. He goes through several steps, eventually extracting the Mailchimp logic into a separate class, and to handle possible network issues, he uses a queue which will retry in case of failure. It’s a nice, short example of something that can definitely bite you if you casually take dumps it inside of a Rails model.
Don’t just dump code into your models

This week in Rails

A few weeks ago Godfrey Chan started a new weekly newsletter called this week in Rails, where he’s covering the most interesting commits, pull requests on the Rails code base. In his most recent newsletter he mentioned a commit that introduced a new helper called “sanitize_sql_like” to help escape characters with special meaning in a LIKE clause, another that added a `to_io` method to `UploadedFile`which allows you to directly manipulate a uploaded (temp) file, and a bunch of fixes including getting rid of the wrapping divs on hidden form fields and some optimizations to the postgreSQL adapter. If you enjoy living on the edge of rails, this could be a great weekly newsletter to subscribe to.
This week in Rails

Whiny Validation

Sometimes when you run a spec on a Rails app, it fails silently but it isn’t clear why.. It could be a validation error, but the log may not even tell you that. This is a problem Brian Morearty set out to solve with his Whiny Validation gem, which adds a log message so when you debug you’ll know why your test failed. co: That sounds useful, but this sounds like a pretty simple gem... Yeah, admittedly this is a simple gem. But often enough, simple gems like this end up being merged into Rails when people realize they’re lightweight enough and make so much sense. What’s also great is that Brian wrote a blog post which walks us through the internals of developing the gem, how to use ActiveSupport notifications, and how to add your own Log Subscriber. co: It’s great to have the confidence to get in under the hood of these libraries when you need to, and simple examples can go a long way.
Whiny Validation

Bootstrap 3 Support for Simple Form

It’s been a while on the podcast since we’ve talked about Simple Form, which just this week received support for Bootstrap 3. Simple Form is the gem by plataformatec which gives your Rails app a fully featured DSL for creating form elements. It can be especially useful when you have complex forms which may be stored across multiple models. Aside from adding support for Bootstrap 3, the 3.1 RC1 version of Simple Form which was just released has a bunch of improvements, you can read about in the changelog.
Bootstrap 3 Support for Simple Form

How to be an open source gardener

Steve Klabnik wrote a great piece called How to be an open source gardener in which he talks about his experience working on Rails issue triaging. Everyone who uses Rails — or any open source project really — should give this a read. Steve does a good job to show that open source often involves hard tedious and not so creative work and I really love the gardening analogy. For a beautiful project to keep growing healthily we need people to pull the weeds out, so please check out Richard Schneeman’s CodeTriage and pick an open source project you’d like to occasionally help with issue triaging.
How to be an open source gardener

Sponsored by Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

Sass's @content Directive Use Cases

Posted 3 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Sass 3.2 added the @content directive, which allows us to pass a content block into a mixin.

@mixin apply-to-ie6-only {
  * html {
    @content
  }
}

@include apply-to-ie6-only {
  #logo {
    background-image: url(/logo.gif);
  }
}

Generates:

* html #logo {
  background-image: url(/logo.gif);
}

We can use it anywhere that declarations need to be wrapped in outer-scoped selectors, and many places where declarations are duplicated.

Media Queries

We can inline our media queries rather than maintaining separate, device-specific stylesheets, but writing queries over and over can get pretty unwieldy. We can simplify them by passing a content block into a mixin that wraps a query.

@mixin media($width) {
  @media only screen and (max-width: $width) {
    @content;
  }
}

@include media(320px) {
  background: red;
}

Generates:

@media only screen and (max-width: 320px) {
  background: red;
}

This becomes especially helpful for long, highly specific media queries, like the HiDPI mixin we use in Bourbon.

You can see our full @media mixin in Neat.

Keyframes

Keyframes are a good example of content duplication. Rather than rewriting the declarations for each vendor-specific selector, we can instead write a mixin to do it for us.

@mixin keyframes($name) {
  @-webkit-keyframes #{$name} {
    @content;
  }

  @-moz-keyframes #{$name} {
    @content;
  }

  @keyframes #{$name} {
    @content;
  }
}

@include keyframes(fadeIn) {
  from {
    opacity: 0%;
  }
  to {
    opacity: 100%;
  }
}

Generates:

@-webkit-keyframes fadeIn {
  from {
    opacity: 0%;
  }
  to {
    opacity: 100%;
  }
}

@-moz-keyframes fadeIn {
  from {
    opacity: 0%;
  }
  to {
    opacity: 100%;
  }
}

@keyframes fadeIn {
  from {
    opacity: 0%;
  }
  to {
    opacity: 100%;
  }
}

This is also used in Bourbon.

Context Specificity

I just picked up a project from Reda Lemeden, who wrote a pair of clever mixins to modify components for a given context.

Instead of creating many .component--modifiers or chaining modifying classes, we can better separate our concerns by defining a style's context specificity.

@mixin create-context($classes...) {
  @each $class in $classes {
    .#{$class} & {
      @content;
  }
}

@mixin context--alternate-template {
  @include create-context(about, blog) {
    @content
  }
}

.header {
  height: 12em;
  background: red;

  @include context--alternate-template {
    background: green;
  }
}

Generates:

  .header {
    height: 12em;
    background: red;
  }

  .about .header {
    background: green;
  }

  .blog .header {
    background: green;
  }

Getting BEMy

Sass 3.3 adds the @at-root directive and improved &s. The former allows us to nest declarations in Sass, but compile them to the stylesheet's root. The latter appends any following text directly to the parent's selector.

These can be used with @content to simplify writing BEM syntax. Thanks to Scott Kellum for the original implementation.

@mixin element($name) {
  @at-root #{&}__#{$name} {
    @content;
  }
}

@mixin modifier($name) {
  @at-root #{&}--#{$name} {
    @content;
  }
}

.block {
  color: red;

  @include element(element) {
    color: green;

    @include modifier(modifier) {
      color: blue;
    }
  }
}

Generates:

.block {
  color: red;
}

.block__element {
  color: green;
}

.block__element--modifier {
  color: blue;
}

In Conclusion

@content is just one of many Sass directives that can empower us to remove duplication in our SCSS, and think more creatively about its organization and implementation. Learn more by reading the Sass directives documentation.

Uplink Latency of WiFi and 4G Networks

Posted 3 months back at igvita.com

The user opens your application on their device and triggers an action requiring that we fetch a remote resource: application invokes the appropriate platform API (e.g. XMLHttpRequest), the runtime serializes the request (e.g. translates it to a well-formed HTTP request) and passes the resulting byte buffer to the OS, which then fragments it into one or more TCP packets and finally passes the buffer to the link layer.

So far, so good, but what happens next? As you can guess, the answer depends on the properties of the current link layer in use on the device. Let's dig a bit deeper...

Transmitting over WiFi

If the user is on WiFi, then the link layer breaks up the data into multiple frames and (optimistically) begins transmitting data one frame at a time: it waits until the radio channel is "silent," transmits the WiFi frame, and then waits for an acknowledgement from the receiver before proceeding with transmission of the next frame. Yes, you've read that right, each frame requires a full roundtrip between the sender and receiver! 802.11n is the first standard to introduce "frame aggregation," which allows multiple frames to be sent in a single transmission.

Of course, not all transmissions will succeed on their first attempt. If two peers transmit at the same time and on the same channel then a collision will happen and both peers will have to retransmit data: both peers sleep for a random interval and then repeat the process. The WiFi access model is simple to understand and implement, but as you can guess, also doesn't provide any guarantees about the latency costs of the transmission. If the network is mostly idle, then transmission times are nice and low, but if the network is congested, then all bets are off.

In fact, don't be surprised to see 100ms+ delays just for the first hop between the WiFi sender and the access point - e.g. see the histogram above, showing 180ms+ first-hop latency tails on my own (home) WiFi network. That said, note that there is no "typical" or "average" uplink WiFi latency: the latency will depend on the conditions and load of the particular WiFi network. In short, expect high variability and long latency tails, with an occasional chance of network collapse if too many peers are competing for access.

If your WiFi access point is also your gateway then you can run a simple ping command to measure your first hop latency.

Uplink scheduling on 4G networks

In order to make better use of the limited capacity of the shared radio channel and optimize energy use on the device, 4G/LTE standards take a much more hands-on approach to scheduling and resource assignment: the radio tower (eNodeB) notifies the device for when it should listen for inbound data, and also tells the device when it is allowed to transmit data. As you can imagine, this can incur a lot of coordination overhead (read, latency), but such is the cost of achieving higher channel and energy efficiency.

  1. The radio network has a dedicated Physical Uplink Control Channel (PUCCH) which is used by the device to notify the radio network that it wants to transmit data: each device has a periodic timeslot (typically on a 5, 10, or 20 ms interval) where it is allowed to send a Scheduling Request (SR) that consists of a single bit indicating that it needs uplink access.

  2. The SR request bit is received by the radio tower (eNodeB) but the SR request on its own is not sufficient to assign uplink resources as it doesn't tell the scheduler the amount of data that the device intends to transfer. So, the eNodeB responds with a small "uplink grant" that is just large enough to communicate the size of the pending buffer.

  3. Once the device receives its first uplink grant, it waits for its turn to transmit (up to ~5 ms), and sends a Buffer Status Report (BSR) indicating the amount of application data pending in its upload buffers. Finally, the eNodeB receives the BSR message, allocates the necessary uplink resources and sends back another uplink grant that will allow the device to drain its buffer.

What happens if additional data is added to the device buffer while the above process is underway? Simple, the device sends another BSR message and waits for new uplink grant! If timed correctly, then the BSR requests and uplink grants can be pipelined with existing data transfers from the device, allowing us to minimize first-hop delays. On the other hand, once the device buffer is drained and then new data arrives, the entire process is repeated once over: SR, uplink grant, BSR, uplink grant, data transmission.

So, what does this all mean in practice? Let's do the math:

  • If the network is configured to use a 10 ms periodic interval for communicating SR messages then we would expect a ~5 ms average delay before the SR request is sent.
  • There are two full roundtrips between the device and the eNodeB to negotiate the uplink resource assignment to transmit pending application data. The latency incurred by these roundtrips will vary for each network, but as a rule of thumb each exchange is ~5 ms.

Add it all up, and we're looking at 20+ ms of delay between application data arriving at the (empty buffer) of the link layer on the device and the same data being available at the link layer of the eNodeB. From there the packet needs to traverse the carrier network, exit onto the public network, and get routed to your server.

Above uplink latency overhead is one reason why low latency applications, such as delivering voice, can be a big challenge over 4G networks. In fact, for voice specifically, there is ongoing work on Voice over LTE (VoLTE) which aims to address this problem. How? Well, one way is to provide a persistent uplink grant: transmit up to X bytes on a Y periodic interval. Believe it or not, today most 4G networks still fall back to old 3G infrastructure to transmit voice!

Optimizing for WiFi and 4G networks

As you can tell, both WiFi and 4G have their challenges. WiFi can deliver low latency first hop if the network is mostly idle: no coordination is required and the device can transmit whenever it senses that the radio channel is idle. On the other hand, WiFi is subject to high variability and long latency tails if the network has many peers competing for access - and most networks do.

By contrast, 4G networks require coordination between the device and the radio tower for each uplink transfer, which translates to higher minimum latency, but the upside is that 4G can reign in the latency tails and provides more predictable performance and reduces congestion.

So, how does all this impact application developers? First off, latency aside, and regardless of wireless technology, consider the energy costs of your network transfers! Periodic transfers incur high energy overhead due to the need to wake up the radio on each transmission. Second, same periodic transfers also incur high uplink coordination overhead - 4G in particular. In short, don't trickle data. Aggregate your network requests and fire them in one batch: you will reduce energy costs and reduce latency by amortizing scheduling overhead.

Back to Basics: Writing SQL Queries

Posted 3 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Almost all applications store data in one format or another somewhere. Us developers spend a lot of time thinking about our data. In a most cases we end up storing that data in a relational database. The advent of the ORM has made writing raw SQL much less common, but when we do it's good to have an understanding of the fundamental elements of the language. Today we'll go over some basic queries and the syntax required to make them.

Setup

For the sake of simplicity we'll use SQLite3 for this blog. It's important to note that none of the topics or commands we'll look at are specific to SQLite3. All the query examples we'll go over below are ISO 9705 compliant and will work in any of the major databases (Postgresql, MySql, Oracle, etc).

From the command line fire up SQLite3 and create a database named back_to_basics:

% sqlite3 back_to_basics

Now let's create three tables. We'll call them players, teams and players_teams:

SQLite3 version 3.7.12 2012-04-03 19:43:07
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> CREATE TABLE players (id INTEGER PRIMARY KEY ASC, name TEXT, seasons_played INTEGER);
sqlite> CREATE TABLE teams (id INTEGER PRIMARY KEY ASC, name TEXT);
sqlite> CREATE TABLE players_teams (player_id INTEGER, team_id INTEGER, won_championship BOOLEAN);
sqlite> .tables
players        players_teams  teams

Before we're done with the setup stage let's turn on explain mode. This will let us see our column names in query results:

sqlite> .explain on

Our Data

In order to write any queries we'll need some data, which means we need some players and teams. We'll use baseball since players tend to move around more in that sport.

Let's enter information on three baseball players and the first two teams they played for in their careers (just to keep the data set small).

Player First Team Second Team
Nolan Ryan New York Mets California Angels
Jim Sundberg Texas Rangers Milwaukee Brewers
Ivan Rodriguez Texas Rangers Florida Marlins

INSERT

To get our player data into the database we'll use the INSERT statement:

sqlite> INSERT INTO players (name, seasons_played) VALUES ('Nolan Ryan', 27);
sqlite> INSERT INTO players (name, seasons_played) VALUES ('Jim Sundberg', 16);
sqlite> INSERT INTO players (name, seasons_played) VALUES ('Ivan Rodriguez', 21);

SELECT

Now that we have data in our first table let's run a query and make sure it looks right. We'll use the SELECT statement to do this. For our first query we'll just ask the database to return all rows and columns from our players table. We'll use the * operator to do this:

sqlite> SELECT *
   ...> FROM players;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
2     Jim Sundberg   16
3     Ivan Rodriguez 21

In place of the * operator we can also give the SELECT statement a list of columns. This will limit the result set to just the columns we're interested in:

sqlite> SELECT name
   ...> FROM players;
name
----
Nolan Ryan
Jim Sundberg
Ivan Rodriguez

ORDER BY

We can also order our results how we'd like. This is done by using the ORDER BY clause:

sqlite> SELECT *
   ...> FROM players
   ...> ORDER BY seasons_played;
id    name           seasons_played
----  -------------  --------------
2     Jim Sundberg   16
3     Ivan Rodriguez 21
1     Nolan Ryan     27

Previously our results were always ordered by id. Because we used the ORDER BY clause we get results ordered by the seasons_played column.

You can also specify you would like to order results descending:

sqlite> SELECT *
   ...> FROM players
   ...> ORDER BY seasons_played DESC;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
3     Ivan Rodriguez 21
2     Jim Sundberg   16

A Little More Setup

Now that we know how to insert data, and query to ensure we entered it correctly, let's add data to our teams table:

sqlite> INSERT INTO teams (name) VALUES ('Texas Rangers');
sqlite> INSERT INTO teams (name) VALUES ('Florida Marlins');
sqlite> INSERT INTO teams (name) VALUES ('New York Mets');
sqlite> INSERT INTO teams (name) VALUES ('California Angels');
sqlite> INSERT INTO teams (name) VALUES ('Milwaukee Brewers');
sqlite> INSERT INTO teams (name) VALUES ('New York Yankees');

Junction Table

Now we need to connect our players and teams. We'll do this in our teams_players table. This is what is called a junction table. Junction tables are used to create many to many relationships in relational databases. They achieve this by combining common data from multiple tables. In our case we're going to include the id columns from our teams and players tables thus allowing us to relate rows from one to the other.

We'll start with Nolan. His first two teams were the New York Mets and the California Angels. First we need to get our ids:

sqlite> SELECT *
   ...> FROM players;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
2     Jim Sundberg   16
3     Ivan Rodriguez 21

Nolan's id is 1. Let's find out what the Mets and Angles ids are:

sqlite> SELECT *
   ...> FROM teams;
id    name
----  -------------
1     Texas Rangers
2     Florida Marlins
3     New York Mets
4     California Angels
5     Milwaukee Brewers
6     New York Yankees

The Mets have an id of 3 and the Angels have an id of 4. Nolan won a World Series with the Mets. We now have enough information to write our INSERT statements:

sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (1, 4, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (1, 3, 1);

In the INSERT statements above we associate the player with id 1 (Nolan) to the teams with id 3 (Mets) and 4 (Angels) and provide a 0, which represents false, for the won_championship column for the Angels and a 1, for true, for the won_championship column for the Mets. As you can see we're able to create relationships between tables by using the ids our database is generating for each of our rows. This is one of the corner stones of the relational database and is called a primary key.

Now we just need to finish up building our players_teams table.

Jim played for the Rangers and Brewers and didn't win a championship with either:

sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (2, 1, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (2, 5, 0);

Ivan played for the Rangers and the Marlins and won a championship with the Marlins:

sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (3, 1, 0);
sqlite> INSERT INTO players_teams (player_id, team_id, won_championship) VALUES (3, 2, 1);

INNER JOIN

Now that we have some data to play with let's write some multi-table queries.

In order to this we'll have to use joins. The most common join we have in our tool box is called an INNER JOIN. An INNER JOIN allows us to combine two tables based on a condition we provide. The only rows from both tables that remain after the join are the rows that satisfy the condition.

Let's join our players table to our players_teams and take a look at the results on the conditions the players table id matches the players_teams player_id column:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id;
id    name           seasons_played  player_id  team_id  won_championship
----  -------------  --------------  ---------  -------  -------------
1     Nolan Ryan      27             1          4        0
1     Nolan Ryan      27             1          3        1
2     Jim Sundberg    16             2          1        0
2     Jim Sundberg    16             2          5        0
3     Ivan Rodriguez  21             3          1        0
3     Ivan Rodriguez  21             3          2        1

The condition we discussed above is what comes after the ON keyword.

Notice that the rows in the players table have all been doubled. This is because the INNER JOIN preserves the of the rows from both tables, so since there are two rows in the players_teams table for every one row in the players table (because we're looking at each players first two teams) we see the rows from the players table twice.

Earlier when we inserted data into our teams table we added a row for the New York Yankees, but we actually don't have a player who spent one of their first two seasons with the Yankees.

Let's see what happens if we INNER JOIN the teams table and players_teams table:

sqlite> SELECT *
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id;
id    name               player_id  team_id  won_championship
----  -------------      ---------  -------  ----------------
4     California Angels  1          4        0
3     New York Mets      1          3        1
1     Texas Rangers      2          1        0
5     Milwaukee Brewers  2          5        0
1     Texas Rangers      3          1        0
2     Florida Marlins    3          2        1

Because we used an INNER JOIN the only rows present are the rows that satisfy our condition and in this case the New York Yankees do not, so that row isn't present in our result set.

We're not limited to joining just two tables either. Let's inner join all three of our tables together and see what that looks like:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id
   ...> INNER JOIN teams ON players_teams.team_id = teams.id;
id    name           seasons_played  player_id  team_id  won_championship  id  name
----  -------------  --------------  ---------  -------  ----------------  --  -------------
1     Nolan Ryan      27             1          4        0                 4   California Angels
1     Nolan Ryan      27             1          3        1                 3   New York Mets
2     Jim Sundberg    16             2          1        0                 1   Texas Rangers
2     Jim Sundberg    16             2          5        0                 5   Milwaukee Brewers
3     Ivan Rodriguez  21             3          1        0                 1   Texas Rangers
3     Ivan Rodriguez  21             3          2        1                 2   Florida Marlins

We can also take advantage of the SELECT statement we learned about above to create more readable results sets. Let's look at a list of players and the teams they played on only:

sqlite> SELECT players.name, teams.name
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id
   ...> INNER JOIN teams ON players_teams.team_id = teams.id;
name            name
----            -------------
Nolan Ryan      California Angels
Nolan Ryan      New York Mets
Jim Sundberg    Texas Rangers
Jim Sundberg    Milwaukee Brewers
Ivan Rodriguez  Texas Rangers
Ivan Rodriguez  Florida Marlins

LEFT OUTER JOIN

A LEFT OUTER JOIN is very similar to an INNER JOIN with one big exception. If a row in the table being joined on to does not match the condition specified in the join the row still remains in the result set.

Let's look at the query from above where we joined players_teams to teams again. As we recall, when we used an INNER JOIN on the two tables the Yankees were omitted form the result set. Let's see what happens when we LEFT OUTER join the two tables:

sqlite> SELECT *
   ...> FROM teams
   ...> LEFT OUTER JOIN players_teams ON teams.id = players_teams.team_id;
id    name               player_id  team_id  won_championship
----  -------------      ---------  -------  ----------------
1     Texas Rangers      2          1        0
1     Texas Rangers      3          1        0
2     Florida Marlins    3          2        1
3     New York Mets      1          3        1
4     California Angels  1          4        0
5     Milwaukee Brewers  2          5        0
6     New York Yankees

The Yankees show up in our result set, but with no values in the columns associated with the players_teams table.

Like INNER JOIN we can also specify columns in the SELECT, we can LEFT OUTER JOIN multiple tables and we can also mix INNER JOIN and LEFT OUTER JOIN in the same query.

WHERE

The WHERE clause gives us the ability to specify a condition that will be applied to every row in our final result set. If the condition is not met the row will not remain part of the result set.

Let's take a look at a list of all the players in our database that spent over 20 years in the league:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played > 20;
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
3     Ivan Rodriguez 21

Our result set only includes the two players with over 20 years.

There are lots of different operators we can use in our WHERE clauses. Above we used the greater than operator. Let's take a look at a few more of our options.

We can look for rows that meet equality conditions:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played = 16;
id    name           seasons_played
----  -------------  --------------
2     Jim Sundberg   16

We can look for rows that contain values between two values:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played BETWEEN 20 and 22;
id    name           seasons_played
----  -------------  --------------
3     Ivan Rodriguez 21

We can look for rows that contain a value in a set we provide:

sqlite> SELECT *
   ...> FROM players
   ...> WHERE seasons_played IN (16, 27);
id    name           seasons_played
----  -------------  --------------
1     Nolan Ryan     27
2     Jim Sundberg   16

We can also user the WHERE clause on result sets that have been created by joining multiple tables. Let's look at a list of players that have won a championship:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id
   ...> WHERE won_championship = 1;
id    name           seasons_played  player_id  team_id  won_championship
----  -------------  --------------  ---------  -------  ----------------
1     Nolan Ryan      27             1          3        1
3     Ivan Rodriguez  21             3          2        1

We can also user the WHERE clause on two columns from two different tables by concatenating with an AND or OR:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN players_teams ON players.id = players_teams.player_id
   ...> WHERE players_teams.won_championship = 1
   ...> AND players.seasons_playerd > 21;
id    name           seasons_played  player_id  team_id  won_championship
----  -------------  --------------  ---------  -------  ----------------
1     Nolan Ryan      27             1          3        1

DELETE

Deleting from a table is done using the DELETE command. Above we made a mistake and inserted the New York Yankees into our teams table. None of the players we have in our database played for that team, so, like I wish we could in real life, we need to delete the Yankees.

First let's get the Yankees' id:

sqlite> SELECT *
   ...> FROM teams;
id    name
----  -------------
1     Texas Rangers
2     Florida Marlins
3     New York Mets
4     California Angels
5     Milwaukee Brewers
6     New York Yankees

We'll use their id, 6, to safely delete them from the database:

sqlite> DELETE FROM teams WHERE id = 6;
sqlite> SELECT * 
   ...> FROM teams;
id    name
----  -------------
1     Texas Rangers
2     Florida Marlins
3     New York Mets
4     California Angels
5     Milwaukee Brewers

I like that result set much better!

GROUP BY

We can also group results together and aggregate values. We'll list our teams and aggregate the total number of players in our database that spent one of their first two seasons playing there. This query will require us to GROUP BY team name and aggregate the number of players using the COUNT function. We'll count the number of players on a team by counting the number of players.name values in each group:

sqlite> SELECT teams.name, COUNT(players.name)
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
   ...> INNER JOIN players ON players.id = players_teams.player_id
   ...> GROUP BY teams.name;
name               COUNT(players.name)
----               -----------------
California Angels  1
Florida Marlins    1
Milwaukee Brewers  1
New York Mets      1
Texas Rangers      2

Let's look at what the result set would look like without grouping (we'll order by team name to make the results more obvious):

sqlite> SELECT teams.name, players.name
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
   ...> INNER JOIN players ON players.id = players_teams.player_id
   ...> ORDER BY teams.name;
name               players.name
----               -----------------
California Angels  Nolan Ryan
Florida Marlins    Ivan Rodriguez
Milwaukee Brewers  Jim Sundberg
New York Mets      Nolan Ryan
Texas Rangers      Jim Sundberg
Texas Rangers      Ivan Rodriguez

The Texas Rangers show up twice in the non-grouped result set.

One thing to note about the GROUP BY clause is anything we leave in the SELECT statement must either be aggregated, what we're doing with the COUNT function, or in the GROUP BY clause, in this case teams.name.

There are many other types of aggregate functions we can use and I encourage we to research more. Here is a list of aggregate functions available in Postgres.

HAVING

The HAVING clause works like a WHERE clause, but on grouped results sets. If we go back to our list of teams and player counts from above we can use a HAVING to limit the result set to only teams that have more than one player from our list on them in their first two seasons, or a COUNT(player.id) of greater than one:

sqlite> SELECT teams.name, COUNT(players.id)
   ...> FROM teams
   ...> INNER JOIN players_teams ON teams.id = players_teams.team_id
   ...> INNER JOIN players ON players.id = players_teams.player_id
   ...> GROUP BY teams.name
   ...> HAVING COUNT(players.id) > 1;
name           COUNT(players.id)
----           -----------------
Texas Rangers  2

Our result set has now been limited to only the teams we're interested in.

Sub Queries

We can also embed queries in our queries to create more useful result sets. Let's use a sub query to get a list of players who have won a championship:

sqlite> SELECT *
   ...> FROM players
   ...> INNER JOIN (
   ...>   SELECT player_id, MAX(won_championship)
   ...>   FROM players_teams
   ...>   GROUP BY player_id) sub_query_players_teams ON players.id = sub_query_players_teams.player_id;
id    name           seasons_played  player_id  MAX(won_championship)
----  -------------  --------------  ---------  ---------------------
1     Nolan Ryan      27             1          1
2     Jim Sundberg    16             2          0
3     Ivan Rodriguez  21             3          1

In this case we're just doing an inner join, but instead of to an existing table, we're creating a new result set and joining to that. Our subquery consists of only a list of player ids and the max of all of their won_championship columns. In the case they have the max will be one otherwise it will be zero. Taking advantage of the sub query we're able to get a nice list of players and whether they won a championship (the last column in our result set). If we tried to do this without a subquery we would have to include all the information from our players_teams table, which as we recall from our inital INNER JOIN would lead to the players in the result set being doubled. In addition we couldn't say definitively if a player had won or not. We would be forced to look at several different rows to deduce the information.

What's next?

If you found this useful, you might also enjoy:

Episode #457 - April 18th, 2014

Posted 3 months back at Ruby5

URL parsing with Rippersnapper, awesome APIs with Pliny, thread-safe utilities from Charles Nutter, a revival of the invoicing gem, info about recursion and memoization, querying git with gitql, and refactoring bad controllers all in this episode of the Ruby5 podcast!

Listen to this episode on Ruby5

Sponsored by New Relic

New Relic is _the_ all-in-one web performance analytics product. It lets you manage and monitor web application performance, from the browser down to the line of code. With Real User Monitoring, New Relic users can see browser response times by geographical location of the user, or by browser type.
This episode is sponsored by New Relic

Rippersnapper

Working with URLs in your app has never been easier with Rippersnapper. Give it a URL and get back all your favorite parts!
Rippersnapper

Pliny

Base Sinatra app for writing excellent APIs in Ruby.
Pliny

thread_safe

A collection of utilities for thread-safe programming in Ruby.
thread_safe

Invoicing

Before you roll your own invoicing framework, have a look at the invoicing gem from Codemancers. It might be just want you need!
Invoicing

Recursion and Memoization

Ray Hightower on avoiding the some recursion performance pitfalls in Ruby with memoization.
Recursion and Memoization

gitql

A git query language. Such git! So query!
gitql

Refactoring Controllers

Ever needed to refactor a bad controller? Justin Weiss walks you through his process in this new blog post.
Refactoring Controllers