UX & Front-End Development Bootcamp

Posted about 1 month back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

We’re proud to announce that we’re kicking off a new UX & Front-End Development bootcamp with Metis this Fall. Earlier this year, we helped launch the Ruby on Rails bootcamp with great success and as a design-driven company, it’s a natural step to expand into product design.

Designers today are increasingly involved in every step of a product’s evolution. It’s now common for our role to include front-end development and product, visual, interaction, and user experience design. Things move quick, products change rapidly, and having capable designers tightly integrated throughout the process is crucial. Our bootcamp at Metis is no different and we’ve been developing our curriculum around this notion.

The first bootcamp will be September 22nd through November 21st (9 weeks) and take place in New York City. It will consist of 100% in-person instruction, Monday–Friday, 9–6. I’m grateful to be joined by Allison House as my co-instructor. She’s worked with companies such as Dropbox, Codecademy and Treehouse and brings an incredible amount of knowledge and skill to our team.

Our goal is for students to graduate with confidence in producing quality digital product design solutions, with capabilities ranging from prototyping, to designing mobile and web applications, and working with HTML and CSS. Students will be able to pursue a job as an entry-level product designer, user experience designer, web designer or mobile designer.

Head over to the Metis website to find out more and apply.

Speed Up Tests by Selectively Avoiding Factory Girl

Posted about 1 month back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

I’ve talked about speeding up unit tests when using Factory Girl by relying on FactoryGirl.build_stubbed, but there’s another surefire way to speed up your test suite with Factory Girl.

Don’t use it.

Most Unit Tests Don’t Need Persisted Data

There are plenty of times where data needs to exist in the database to accurately test an application; most acceptance tests will require some amount of data persisted (either via Factory Girl or by creating data driven through UI interactions). When unit-testing most methods, however, Factory Girl (and even persisting data to the database) is unnecessary.

Let’s start with a couple of tests around a method we’ll need to define, User#age:

describe User do
  describe "#age" do
    it "calculates age given birthdate" do
      user = generate_user_born_on 366.days.ago

      expect(user.age).to eq 1
    end

    it "calculates age correctly by rounding age down to the appropriate integer" do
      user = generate_user_born_on 360.days.ago

      expect(user.age).to eq 0
    end

    def generate_user_born_on(date)
      FactoryGirl.create :user, birthdate: date
    end
  end
end

This seems like a harmless use of Factory Girl, and leads us to define User#age:

class User < ActiveRecord::Base
  def age
    ((Date.current - birthdate)/365.0).floor
  end
end

Running the specs:

$ rspec spec/models/user_spec.rb
..

Finished in 0.01199 seconds
2 examples, 0 failures

More than 100% Faster!

Looking at User#age, though, we don’t actually care about the database. Let’s swap FactoryGirl.create with User.new and re-run the spec.

rspec spec/models/user_spec.rb
..

Finished in 0.00489 seconds

Still a green suite, but more than 100% faster.

Associations Make a Test Suite Slower

Now, let’s imagine User grows and ends up having a Profile:

class User < ActiveRecord::Base
  has_one :profile

  def age
    ((Date.current - birthdate)/365.0).floor
  end
end

We update the factory, including the associated profile:

FactoryGirl.define do
  factory :user do
    profile
  end

  factory :profile
end

Let’s re-run the spec using Factory Girl:

$ rspec spec/models/user_spec.rb
..

Finished in 0.02278 seconds
2 examples, 0 failures

Whoa, it’s now taking twice as long as it was before, but absolutely zero tests changed, only the factories.

Let’s run it again, but this time using User.new:

$ rspec spec/models/user_spec.rb
..

Finished in 0.00474 seconds
2 examples, 0 failures

Whew, back to a reasonable amount of time, and we’re still green. What’s going on here?

Persisting Data is Slow

FactoryGirl.create creates two records in the database, a user and a profile. Persistence is slow, which we know, but because Factory Girl is arguably easy to write and use, it hides it well. Even changing from FactoryGirl.create to FactoryGirl.build doesn’t help much:

$ rspec spec/models/user_spec.rb
..

Finished in 0.01963 seconds
2 examples, 0 failures

That’s because FactoryGirl.build creates associations; so, every time we use Factory Girl to build a User, we’re still persisting a Profile.

Writing to Disk Makes Things Worse

Sometimes, objects will write to disk during the object’s persistence lifecycle. A common example is processing a file attachment during an ActiveRecord callback through gems like Paperclip or Carrierwave, which may result in processing thousands of files unnecessarily. Imagine how much more slowly a test suite is because data is being created.

It’s incredibly difficult to identify these bottlenecks because of the differences between FactoryGirl.build, FactoryGirl.create, and how associations are handled. By remembering to use FactoryGirl.build on an avatar factory, we may speed up some subset of tests, but if User has an avatar associated with it, even when calling FactoryGirl.build(:user), avatars still get created - meaning valuable time spent processing images and persisting likely unnecessary data.

How to Fix Things

User#age is a great example because it’s quite clear that there’s no interaction with the database. Many methods on core domain objects will have methods like these, and I suggest avoiding Factory Girl entirely in these, if possible. Instead, instantiate the objects directly, with the correct data necessary to test the method. In the example above, User#age relies only on one point of data: birthdate. Since that’s the method being tested, there’s no need to instantiate a User with anything else. It provides clarity to yourself and other developers by explicitly defining the set of data it’s using for the test.

When testing an object and collaborators, consider doubles like fakes or stubs.

My general advice, though, is to avoid Factory Girl as much as is reasonably possible. Not because it’s bad or unreliable software (Factory Girl is very reliable; we’ve used it successfully since 2008), but because its inherent persistence mechanism is calling #save! on the object, which will always take longer than not persisting data.

Using Clearance with RailsAdmin

Posted about 1 month back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

These days, I find myself implementing RailsAdmin on almost every consulting project that I’m on. While RailsAdmin is not a replacement for a custom or complex admin interface, it’s a great way to give non-technical stakeholders access to the data being created and updated in a Rails app. And it takes just a few minutes to set up!

Where there is an admin interface, there is also authentication. Most of the documentation out there covers how to integrate RailsAdmin with Devise. But setting up RailsAdmin with Clearance is easy, too!

Here is a step by step guide on how to set up RailsAdmin with Clearance authentication:

Step 1: Set up Clearance

# Gemfile
gem "clearance"
bundle install
rails generate clearance:install

Running the Clearance generator creates a migration. Clearance’s generated migration will either create the users table or add only the columns necessary for Clearance to an existing users table. Before we run the migration, let’s add a boolean column to indicate whether a user is an admin or not.

# db/migrate/20140808213224_create_users.rb
class CreateUsers < ActiveRecord::Migration
  def change
    create_table :users  do |t|
      t.timestamps null: false
      t.boolean :admin, null: false, default: false
      t.string :email, null: false
      t.string :encrypted_password, limit: 128, null: false
      t.string :confirmation_token, limit: 128
      t.string :remember_token, limit: 128, null: false
    end

    add_index :users, :email
    add_index :users, :remember_token
  end
end

Now we’re ready to run the migration:

rake db:migrate

Step 2: Write a test

We want to be sure that only admin users can see our admin dashboard, so we’ll start with some feature specs that test this behavior. See this overview of testing Rails applications for more detail on how thoughtbot tests Rails applications.

# spec/features/admin_dashboard_spec.rb
feature "Admin dashboard" do
  scenario "visitor is admin" do
    admin = create(:admin)

    visit rails_admin_path(as: admin)

    expect(page).to have_content("Site Administration")
  end

  scenario "visitor is not an admin user" do
    user = create(:user)

    visit rails_admin_path(as: user)

    expect(page).to have_content("You are not permitted to view this page")
  end
end

The tests above are using Clearance::BackDoor to sign the user in directly. This is one of Clearance’s super awesome tools that speeds up tests and makes writing feature specs a breeze.

Before these tests will run properly, we need to set up admin and user factories:

# spec/factories.rb
factory :user do
  email "test@example.com"
  password "password"

  factory :admin do
    admin true
  end
end

When we run our feature spec, our app does not recognize rails_admin_path, so it’s time to set up RailsAdmin.

Step 3. Set up RailsAdmin

Add the gem:

# Gemfile
gem "rails_admin"
bundle install

Tell RailsAdmin where we want it mounted (here, we’re choosing at “/admin”)

# config/routes.rb
Rails.application.routes.draw do
  mount RailsAdmin::Engine => "/admin", as: "rails_admin"

When we run the tests again, we find that one of our tests is passing. Woot!

Unfortunately, the one that is failing is the spec that makes sure only admin users can view RailsAdmin. That is no bueno.

Time to configure RailsAdmin to redirect non-admin users (we’re assuming here that we have a root path defined in our routes.rb file):

# config/initializers/rails_admin.rb
RailsAdmin.config do |config|
  config.authorize_with do
    unless current_user.admin?
      redirect_to(
        main_app.root_path,
        alert: "You are not permitted to view this page"
      )
    end
  end

  config.current_user_method { current_user }
end

Now our tests both pass! But don’t celebrate too quickly. There is one final step we need to take care of: If we create an admin user in console, start our server, and log in as that admin user, we will see the following form at “/admin/user/new”:

rails admin new
user

RailsAdmin assumes that because we have a password field, we will also have a password_confirmation field. If we try to fill these fields out and save a new user, we will get an error like this:

ActiveRecord::UnknownAttributeError in RailsAdmin::MainController#new
unknown attribute: password_confirmation

Clearance doesn’t have a password_confirmation field, so we are unable to create or update users in RailsAdmin out of the box. We can use the RailsAdmin DSL for configuring which fields to expose:

# config/initializers/rails_admin.rb
RailsAdmin.config do |config|

  ...

  config.model "User" do
    edit do
      field :admin
      field :email
      field :password
    end
  end
end

If we restart our server and re-load the admin dashboard, we’ll see that only the admin, email, and password fields are exposed and we can create a new user from within RailsAdmin.

We’re done! Now we can wow our teammates with the awesome admin dashboard we put together in just a few minutes.

Bonus: Add a sign out link to RailsAdmin

While we will probably include a sign out link in our main app, we’ve found that admin users frequently look for a sign out link from within RailsAdmin. Since RailsAdmin looks for Devise when deciding whether or not to show a sign out link, we need to provide a little workaround:

# lib/rails_admin_logout_link.rb
RailsAdmin::ApplicationHelper

  module RailsAdmin
    module ApplicationHelper
      def logout_path
        main_app.send(:sign_out_path) rescue false
      end
    end
  end

  class Devise
    def self.sign_out_via
      :delete
    end
  end

After restarting our server, we see a bright red “log out” link in the upper right hand corner of RailsAdmin:

rails admin log
out

Time-Series Database Design with InfluxDB

Posted about 1 month back at Ryan's Scraps

Here at Spreedly we’ve recently started using the time series database InfluxDB to store a variety of customer activity metrics. As with any special purpose database, using and designing for a time-series database is quite different than what you may be used to with structured (SQL) databases. I’d like to describe our experience designing our InfluxDB schema, the mistakes we made, and the conclusions we’ve come to based on those experiences.

The mark

Consider the following scenario, closely resembling Spreedly’s: You run a service that lets your customers transact against a variety of payment gateways. You charge for this service on two axes – by the number of gateways provisioned and the number of credit cards stored. For any point in time you want to know how many of both each of your customers has for their account.

Initially we setup two series (InfluxDB’s term for a collection of measurements, organizationally similar to a SQL database table) to store the total number of each item per account:

  • gateway.account.sample
  • payment-method.account.sample

On some regular interval we’d collect the number of gateways and payment methods (credit cards) for each account and store it in the respective series. Each measurement looked like:

gateway.account.sample

{
  "time": 1400803300,
  "value": 2,
  "account_key": "abc123"
}

time is required by InfluxDB and is the epoch time of the measurement. The value is our value of the measurement at that time, and account_key is an additional property of that measurement.

Simple enough. This approach felt good and we went to production with this schema. That’s when we learned our first lesson…

Time-scoped queries

The first app that used the data in InfluxDB was our customer Dashboard product. It displays all your transactions and a simple view of your current billing counts (number of gateways and number of stored payment methods). Dashboard simply queried for the most recent measurement from each series for the current account:


select value
  from gateway.account.sample
  where account_key = 'abc123'
  limit 1

Since results in InfluxDB are ordered by default most recent first, the limit 1 clause ensures only the most recent measurement is returned for that customer (account).

All was fine initially, but as our dataset grew into the hundreds of thousands entries for each series we noticed our queries were taking quite some time to complete - about a constant 5s for every account. It turns out that these queries were incurring a full table scan, hence the constant (poor) performance.

Avoid a full table scan by always time-scoping your queries

In InfluxDB, the non-time fields aren’t indexed, meaning any queries that filter based on them require a full table scan (even if you’re only fetching a single result). The way to avoid a full table scan is to always time-scope your queries. Knowing this we modified our queries to only query against the previous 2 days worth of data (enough time to capture the most recent input):


select value
  from gateway.account.sample
  where time > now() - 2d
    and account_key = 'abc123'
  limit 1

Adding the where time > now() - 2d clause ensures that the query operates against a manageable set of data and avoids a full table scan. This dropped our query times from 5s (and growing) down to a steady 100ms - 200ms. (Keep in mind this is a remote instance of InfluxDB, meaning the bulk of that is in connection setup and network latency.)

InfluxDB response time reduction using time-scoped queries. Y-axis truncated for maximum obfuscation.

Obviously your use-case may differ wildly from ours. If your data is collected at unknown intervals, or in real-time, you don’t have the luxury of limiting your queries to a known window of time. In these situations it is wise to think about how to segment your data into series for optimal performance.

Series granularity

How many series should you have? How much data should you store in each series? When should you break out queries into their own series? These are all common questions when designing your time-series schema and, unfortunately, there is no concrete right or wrong answer. However, there are some good rules of thumb to keep in mind when structuring your data.

Continuing from our previous example: We were now using time-scoped queries to get the total number of gateways and cards for each account. While we were seeing good performance, each query was operating against a single series that contained data for all accounts. The query’s account_key condition was responsible for filtering the data by account:


select value
  from gateway.account.sample
  where time > now() - 2d
    and account_key = 'abc123'
  limit 1

As even this already time-scoped set of data grows, querying against a non-indexed field will start to become an issue. Queries whose conditions eliminate a large percentage of the data within the series should be extracted out into their own series. E.g., in our case we have a query that gets a single account’s count of stored gateways to the exclusion of all the other accounts. This is an example of a query that filters out the majority of the data in a series and should be extracted so each account has its own series.

Series are cheap. Use them liberally to isolate highly conditional data access.

If you’re coming from a SQL-based mindset, the thought of creating one series per account might seem egregious. However, it’s perfectly acceptable in time-series land. So that’s what we did - we starting writing data from each account into its own series (with each series’ name including the account key). Now, when querying for an account’s total number of stored gateways we do:


select value
  from account-abc1234.gateway.sample
  where time > now() - 2d
    ...

Since you have to know the key in question to access the right series, this type of design is most common with primary (or other well-known keys). But… not only can series be segmented by key, segmenting by time period is also possible. While not useful in our specific situation, you can imagine segmenting data into monthly series, e.g., 201407.gateway.sample or some other period, depending on your access pattern.

You can imagine…

Multi-purpose data

At this point your series are lean and efficient, well-suited for accessing a single type of query and data. However, sometimes life isn’t that clean and you have one set of data that needs to be accessed in many different ways.

For instance, at Spreedly, we’d like to have a business-level set of metrics available that shows the total number of gateways and payment-methods across all customers. We could just dump summary-level data into a new series (not a terrible idea), but we’re already collecting this data on a customer-level. It’d be nice not to have to do two writes per measurement.

Use continuous queries to re-purpose broad series by access pattern

Fortunately, InfluxDB has a feature called continuous queries that lets you modify and isolate data from one series into one or more other dependent series. Continuous queries are useful when you want to “rollup” time-series data by time period (e.g., get the 99th percentile service times across 5, 10 and 15 minute periods) and also to isolate a subset data for more efficient access. This latter application is perfect for our use-case.

To use continuous queries to support both summary and account-specific stats we need to create the parent series that contain measurements for each account.

gateway.account.sample

{
  "time": 1400803300,
  "value": 2,
  "account_key": "abc123"
},
{
  "time": 1400803300,
  "value": 7,
  "account_key": "def456"
}

We can access this series directly to obtain the business-level stats we need across all customers:


select sum(value)
  from gateway.account.sample
  where time > now() - 1d

With continuous queries we can also use this parent series to spawn several “fanout” queries that isolate the data by account (replicating the account-specific series naming scheme from earlier):


select value
  from gateway.account.sample
  into account-[account_key].gateway.sample;

Notice the [account_key] interpolation syntax? This creates one series per account and stores the value field from each measurement into the new account-specific series (retaining the original measurement’s time):

account-abc123.gateway.sample

{
  "time": 1400803300,
  "value": 2
}
account-def456.gateway.sample

{
  "time": 1400803300,
  "value": 7
}

With this structure we:

  • Only write the data one time into the parent series gateway.account.sample
  • Can perform summary level queries against this parent series
  • Have access to the highly efficient, constantly updated, account-specific, data series account-def456.gateway.sample etc…

This is a great use of fanout continuous queries. Also available are regular continuous queries which operate by precomputing expensive group by queries. I’ll skip over them for now since we’re not yet using them at Spreedly, but I encourage you to look at them for your use cases.

Naming and structure

Series naming and packet structure is a tough topic due to personal preferences, differences in client languages and highly varied access patterns. I’m not going to label the following as best-practices, instead I’ll present what we’ve found at Spreedly, our motivations, and let you decide whether it makes sense for you to apply.

  • Come up with a naming structure that conveys both the purpose of the series and the type of data contained within. At Spreedly it’s something like (and still evolving): [key].measured-item.[grouping].measurement-type. For instance, the series that contains the count of all gateways stored by account is gateway.account.sample. The account-specific version is: account-abc123.gateway.sample. The measurement-type component is highly influenced by the l2met logging conventions and deserves further discussion.

    • count series record a specific number of times something happened in a specific period of time as an integer. Counts can be summed with other counts in the same series to perform time-based aggregations (rollups). Measured number of requests or transactions per minute are an example of a count series.
    • sample series take a point in time measurement of some metric that supercedes all previous samples of the same series. Sum totals are a good example of this type of series, e.g., total revenue to date, or total number of payment methods. With each measurement in the series, previous measurements are no longer relevant, though they may still be used to track trends over time.
    • measure series are similar to count series except that instead of being a simple representation of the number of times something happen, they can represent any unit of measure such as ms, Mb etc… Measurements are be mathmatically operable and can be summed, percentiled, averaged etc… CPU load and response times are examples of measure series.
  • Often there is a single value that represents the thing being measured, with the rest of the fields being meta-data or conditions. To facilitate re-usable client parsing we’ve found it nice to use the same field name across all series to represent the value of the measurement. Unsurprisingly, we chose value. All our series data contains a value field that contains the measurement value. This makes it easy to retrieve, which is especially useful in queries that select across multiple series or even merge results from multiple series into a single result set.

There’s a lot of subjectivity that goes into database design, independent of the storage paradigm. While SQL has been around for awhile and has well-known patterns, alternative databases, including time-series databases, are a bit more of a wild west. I’m hoping that by sharing our experiences we can prevent some common mistakes, freeing you up to create all new ones of your own!

Many thanks to Paul and Todd and the rest of InfluxDB for their tireless guidance on the subject

Episode #488 - August 12th, 2014

Posted about 1 month back at Ruby5

We talk about using the Facebook SDK with RubyMotion, Event Sourcing with Sandthorn, gems like rails_param and Groupdate, and time tracking with Hours.

Listen to this episode on Ruby5

Sponsored by CodeShip.io

Codeship is a hosted Continuous Delivery Service that just works.

Set up Continuous Integration in a few steps and automatically deploy when all your tests have passed. Integrate with GitHub and BitBucket and deploy to cloud services like Heroku and AWS, or your own servers.

Visit http://codeship.io/ruby5 and sign up for free. Use discount code RUBY5 for a 20% discount on any plan for 3 months.

Also check out the Codeship Blog!

CodeShip.io

Integrating the Facebook SDK with RubyMotion

Kamil Lelonek wrote a blog post about using the Facebook SDK with RubyMotion. He also goes into detail by creating a Facebook login app showing how it works.
Integrating the Facebook SDK with RubyMotion

rails_param

rails_param is a gem by Nicolas Blanco that brings parameter validation and type coercion into your Rails controllers.
rails_param

Sandthorn

Sandthorn is a Ruby library for saving an object's state as a series of events, a pattern that's known as Event Sourcing.
Sandthorn

Hours

Defacto Software open sourced “Hours”, a time tracking system written in Rails that makes it easy to track, categorize, and tag your time.
Hours

Groupdate

The Groupdate Ruby gem by Andrew Kane provides a simple interface to group temporal data. It supports grouping by day, week, hour of the day and more.
Groupdate

Top Ruby Jobs

ChallengePost is looking for a Senior Web Developer in New York, NY.
Top Ruby Jobs

Sponsored by Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

Buttons with Hold Events in Angular.js

Posted about 1 month back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Creating an interaction with a simple button in Angular only requires adding the ngClick directive. However, sometimes an on click style interaction isn’t sufficient. Let’s take a look at how we can have a button which performs an action as long as it’s pressed.

For the example, we’ll use two buttons which can be used to zoom a camera in and out. We want the camera to continue zooming, until the button is released. The final effect will work like this:

Zooming in Martial Codex

Our template might look something like this:

<a href while-pressed="zoomOut()">
  <i class="fa fa-minus"></i>
</a>
<a href while-pressed="zoomIn()">
  <i class="fa fa-plus"></i>
</a>

We’re making a subtle assumption with this interface. By adding the parenthesis, we imply that whilePressed will behave similarly to ngClick. The given value is an expression that will get evaluated continuously when the button is pressed, rather than us handing it a function object for it to call. In practice, we can use the '&' style of arguments in our directive to capture the expression. You can find more information about the different styles of scopes here.

whilePressed = ->
  restrict: "A"

  scope:
    whilePressed: '&'

Binding the Events

When defining more complex interactions such as this one, Angular’s built-in directives won’t give us the control we need. Instead, we’ll fall back to manual event binding on the element. For clarity, I tend prefer to separate the callback function from the event bindings. Since we’re manipulating the DOM, our code will go into a link function. Our initial link function will look like this:

link: (scope, elem, attrs) ->
  action = scope.whilePressed

  bindWhilePressed = ->
    elem.on("mousedown", beginAction)

  beginAction = (e) ->
    e.preventDefault()
    # Do stuff

  bindWhilePressed()

Inside of our action we’ll need to do two things:

  1. Start running the action
  2. Bind to mouseup to stop running the action.

For running the action, we’ll use Angular’s $interval service. $interval is a wrapper around JavaScript’s setInterval, but gives us a promise interface, better testability, and hooks into Angular’s digest cycle.

In addition to running the action continuously, we’ll also want to run it immediately to avoid a delay. We’ll run the action every 15 milliseconds, as this will roughly translate to once per browser frame.

+TICK_LENGTH = 15
+
-whilePressed = ->
+whilePressed = ($interval) ->
   restrict: "A"

   link:
     action = scope.whilePressed

@@ -23,7 +24,7 @@
     beginAction = (e) ->
       e.preventDefault()
+      action()
+      $interval(action, TICK_LENGTH)
+      bindEndAction()

In our beginAction function, we call bindEndAction to set up the events to stop running the event. We know that we’ll at least want to bind to mouseup on our button, but we have to decide how to handle users who move the mouse off of the button before releasing it. We can handle this by listening for mouseleave on the element, in addition to mouseup.

bindEndAction = ->
  elem.on('mouseup', endAction)
  elem.on('mouseleave', endAction)

In our endAction function, we’ll want to cancel the $interval for our action, and unbind the event listeners for mouseup and mouseleave.

unbindEndAction = ->
  elem.off('mouseup', endAction)
  elem.off('mouseleave', endAction)

endAction = ->
  $interval.cancel(intervalPromise)
  unbindEndAction()

We’ll also need to store the promise that $interval returned so that we can cancel it when the mouse is released.

 whilePressed = ($parse, $interval) ->
   link: (scope, elem, attrs) ->
     action = scope.whilePressed
+    intervalPromise = null

     bindWhilePressed = ->
       elem.on('mousedown', beginAction)
@@ -23,7 +24,7 @@
     beginAction = (e) ->
       e.preventDefault()
       action()
-      $interval(action, TICK_LENGTH)
+      intervalPromise = $interval(action, TICK_LENGTH)
       bindEndAction()

Cleaning Up

Generally I consider it a smell to have an isolated scope on any directive that isn’t an element. Each DOM element can only have one isolated scope, and attribute directives are generally meant to be composed. So let’s replace our scope with a manual use of $parse instead.

$parse takes in an expression, and will return a function that can be called with a scope and an optional hash of local variables. This means we can’t call action directly anymore, and instead need a wrapper function which will pass in the scope for us.

-whilePressed = ($interval) ->
-  scope:
-    whilePressed: "&"
-
+whilePressed = ($parse, $interval) ->
   link: (scope, elem, attrs) ->
-    action = scope.whilePressed
+    action = $parse(attrs.whilePressed)
     intervalPromise = null

     bindWhilePressed = ->
@@ -26,14 +23,17 @@ whilePressed = ($interval) ->

     beginAction = (e) ->
       e.preventDefault()
-      action()
-      intervalPromise = $interval(action, TICK_LENGTH)
+      tickAction()
+      intervalPromise = $interval(tickAction, TICK_LENGTH)
       bindEndAction()

     endAction = ->
       $interval.cancel(intervalPromise)
       unbindEndAction()

+    tickAction = ->
+      action(scope)

And that’s it. Our end result is a nicely decoupled Angular UI component that can easily be reused across applications. The final code looks like this.

TICK_LENGTH = 15

whilePressed = ($parse, $interval) ->
  restrict: "A"

  link: (scope, elem, attrs) ->
    action = $parse(attrs.whilePressed)
    intervalPromise = null

    bindWhilePressed = ->
      elem.on('mousedown', beginAction)

    bindEndAction = ->
      elem.on('mouseup', endAction)
      elem.on('mouseleave', endAction)

    unbindEndAction = ->
      elem.off('mouseup', endAction)
      elem.off('mouseleave', endAction)

    beginAction = (e) ->
      e.preventDefault()
      tickAction()
      intervalPromise = $interval(tickAction, TICK_LENGTH)
      bindEndAction()

    endAction = ->
      $interval.cancel(intervalPromise)
      unbindEndAction()

    tickAction = ->
      action(scope)

    bindWhilePressed()

Silver Searcher Tab Completion with Exuberant Ctags

Posted about 1 month back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

I’m a heavy Vim user and demand speedy navigation between files. I rely on Exuberant Ctags and tag navigation (usually Ctrl-]) to move quickly around the codebase.

There were times, however, when I wasn’t in Vim but wanted to use tags to access information quickly; most noticeable was time spent in my shell, searching the codebase with ag.

As a zsh user, I was already aware of introducing tab completion by way of compdef and compadd:

_fn_completion() {
  if (( CURRENT == 2 )); then
    compadd foo bar baz
  fi
}

compdef _fn_completion fn

In this example, fn is the binary we want to add tab completion to, and we only attempt to complete after typing fn and then TAB. By checking CURRENT == 2, we’re verifying the position of the cursor as the second field in the command. This will complete with options foo, bar, and baz, and filter the options accordingly as you start typing and hit TAB again.

Now that we understand how to configure tab completion for commands, next up is determining how to extract useful information from the tags file. Here’s the first few lines of the file from a project I worked on recently:

==      ../app/models/week.rb   /^  def ==(other)$/;"   f       class:Week
AccessToken     ../app/models/access_token.rb   /^class AccessToken < ActiveRecord::Base$/;"    c
AccessTokensController  ../app/controllers/access_tokens_controller.rb  /^class AccessTokensController < ApplicationController$/;"      c

The tokens we want to use for tab completion are the first set of characters per line, so we can use cut -f 1 path/to/tags to grab the first field. We then use grep -v to ignore autogenerated ctags metadata we don’t care about. With a bit of extra work (like writing stderr to /dev/null in the instance where the tags file doesn’t exist yet), the end result looks like this:

_ag() {
  if (( CURRENT == 2 )); then
    compadd $(cut -f 1 .git/tags tmp/tags 2>/dev/null | grep -v '!_TAG')
  fi
}

compdef _ag ag

With this in place, we can now ag a project and tab complete from the generated tags file. With ag AccTAB:

$ ag AccessToken
AccessToken             AccessTokensController

And the result:

[ ~/dev/thoughtbot/project master ] ✔ ag AccessToken
app/controllers/access_tokens_controller.rb
1:class AccessTokensController < ApplicationController
15:    @project = AccessToken.find_project(params[:id])

app/models/access_token.rb
1:class AccessToken < ActiveRecord::Base

db/migrate/20140416195446_create_access_tokens.rb
1:class CreateAccessTokens < ActiveRecord::Migration

db/migrate/20140718175701_add_index_on_access_tokens_project_id.rb
1:class AddIndexOnAccessTokensProjectId < ActiveRecord::Migration

spec/models/access_token_spec.rb
3:describe AccessToken, 'Associations' do
7:describe AccessToken, '.find_project' do
12:      result = AccessToken.find_project(access_token.to_param)
20:      expect { AccessToken.find_project('unknown') }.
26:describe AccessToken, '.generate' do
40:describe AccessToken, '#to_param' do
50:    expect(AccessToken.find(access_token.to_param)).to eq(access_token)

Voila! Tab completion with ag based on the tags file.

If you’re using thoughtbot’s dotfiles, you already have this behavior.

Automatic versioning in Xcode with git-describe

Posted about 1 month back at zargony.com

Do you manually set a new version number in Xcode every time you release a new version of your app? Or do you use some tool that updates the Info.plist in your project like agvtool or PlistBuddy? Either way, you probably know that it's a pain to keep track of the version number in the project.

I recently spent some time trying out various methods to automatically get the version number from git and put it into the app that Xcode builds. I found that most of them have drawbacks, but in the end I found a way that I finally like most. Here's how.

Getting a version number from git

Why should we choose a version number manually if we're using git to manage all source files anyway. Git is a source code management tool that keeps track of every change and is able to uniquely identify every snapshot of the source by its commit ids. The most obvious idea would be to simply use the commit ids as the version number of your software, but unfortunately (because of the distributed nature of git) commit ids are not very useful to the human reader: you can't tell at once which one is earlier and which is later.

But git has a very useful command called git describe that extracts a human readable version number from the repository. If you check out a specific tagged revision of your code, git describe will print the tag's name. If you check out any another commit, it will go back the commit history to find the latest tag and print its name followed by the number of commits and the current commit id. This is incredible useful to exactly describe the currently checked out version (hence the name of this command).

If you additionally use the --dirty option, git describe will append the string '-dirty' if your working directory isn't clean (i.e. you have uncommitted changes). Perfect!

So if tag all releases of your app (which you should be doing already anyway), it's easy to automatically create a version number with git describe --dirty for any commit, even between releases (e.g. for betas).

Here are some examples of version numbers:

v1.0                   // the release version tagged 'v1.0'
v1.0-8-g1234567        // 8 commits after release v1.0, at commit id 1234567
v1.0-8-g1234567-dirty  // same as above but with unspecified local changes (dirty workdir)

Automatically set the version number in Xcode

You'll find several ideas how to use automaticly generated version numbers in Xcode projects if you search the net. However most of them have drawbacks that I'd like to avoid. Some ways use a custom build phase to create a header file containing the version number. Besides that this approach gets more complicated with Swift, it only allows you to display the version number in your app, but doesn't set it in the app's Info.plist. Most libraries like crash reporters or analytics will take the version number from Info.plist, so it's useful to have the correct version number in there.

So let's modify the Info.plist using PlistBuddy in a custom build phase. But we don't want to modify the source Info.plist, because that would change a checked-in file and lead to a dirty workdir. We need to modify the Info.plist inside the target build directory (after the ProcessInfoPlistFile build rule ran).

Instructions

  • Add a new run script build phase with the below script.
  • Make sure it runs late during building by moving it to the bottom.
  • Make sure that the list of input files and output files is empty and that "run script only when installing" is turned off.
# This script sets CFBundleVersion in the Info.plist of a target to the version
# as returned by 'git describe'.
# Info: http://zargony.com/2014/08/10/automatic-versioning-in-xcode-with-git-describe
set -e
VERSION=`git describe --dirty |sed -e "s/^[^0-9]*//"`
echo "Updating Info.plist version to: ${VERSION}"
/usr/libexec/PlistBuddy -c "Set :CFBundleVersion ${VERSION}" "${TARGET_BUILD_DIR}/${INFOPLIST_PATH}"
/usr/bin/plutil -convert ${INFOPLIST_OUTPUT_FORMAT}1 "${TARGET_BUILD_DIR}/${INFOPLIST_PATH}"

Thoughts

  • By keeping the list of output files empty, Xcode runs the script every time (otherwise it would detect an existing file and skip running the script even if changes were made and the version number may have changed)
  • Some sed magic strips any leading non-numbers from the version string so that you can use tags like release-1.0 or v1.5.
  • PlistBuddy converts the plist to XML, so we're running plutil at the end to convert it back to the desired output format (binary by default)
  • If you need more information than just the output of git describe, try the excellent "autorevision" script.

Episode #487 - August 8th, 2014

Posted about 1 month back at Ruby5

Beautiful API documentation, deprecating paths in Rails mailers, taking RubySteps, meeting Starboard, and the new Heroku Button

Listen to this episode on Ruby5

Sponsored by New Relic

New Relic is _the_ all-in-one web performance analytics product. It lets you manage and monitor web application performance, from the browser down to the line of code. With Real User Monitoring, New Relic users can see browser response times by geographical location of the user, or by browser type.
This episode is sponsored by New Relic

tripit/slate

Beautiful static documentation for your API
tripit/slate

Deprecating *_path in Mailers

"Email does not support relative links since there is no implicit host. Therefore all links inside of emails must be fully qualified URLs. All path helpers are now deprecated."
Deprecating *_path in Mailers

RubySteps

Daily coding practice via email and interactive lessons, every weekday.
RubySteps

Starboard

Starboard is a tool which creates Trello boards for tracking the various tasks necessary when onboarding, offboarding, or crossboarding employees.
Starboard

Heroku Button

One-click deployment of publicly-available applications on GitHub
Heroku Button

Thank You for Listening to Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

Intent to Add

Posted about 1 month back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

The git add command runs blind, but can be controlled with more fine-grained precision using the --patch option. This works great for modified and deleted files, but untracked files do not show up.

$ echo "Hello, World!" > untracked
$ git status --short
?? untracked
$ git add --patch
No changes.

To remedy this, the --intent-to-add option can be used. According to git-add(1), --intent-to-add changes git add’s behavior to:

Record only the fact that the path will be added later. An entry for the path is placed in the index with no content. This is useful for, among other things, showing the unstaged content of such files with git diff and committing them with git commit -a.

What this means is that after running git add --intent-to-add, the specified untracked files will be added to the index, but without content. Now, when git add --patch is run it will show a diff for each previously staged untracked file with every line as an addition. This gives you a chance to look through the file, line by line, before staging it. You can even decide not to stage specific lines by deleting them from the patch using the edit command.

$ echo "Hello, World!" > untracked
$ git status --short
?? untracked
$ git add --intent-to-add untracked
$ git status --short
AM untracked
$ git add --patch
diff --git a/untracked b/untracked
index e69de29..8ab686e 100644
--- a/untracked
+++ b/untracked
@@ -0,0 +1 @@
+Hello, World!
Stage this hunk [y,n,q,a,d,/,e,?]?

In my .gitconfig I alias add --all --intent-to-add to aa and add --patch to ap which means that for most commits, I type:

$ git aa
$ git ap

Or in gitsh:

& aa
& ap

Linked Development: Linked Data from CABI and DFID

Posted about 1 month back at RicRoberts :

In March this year we launched the beta version of Linked Development. It’s a linked open data site for CABI and the DFID, which provides data all about international development projects and research.

Linked Development site screenshot

This is a slightly unusual one for us: we’re taking it on after others have worked on it in the past. It’s currently in beta release and we’re hosting it on our PublishMyData service so users can easily get hold of the data they want in both human, and machine readable, formats. So it’s comes with most of the usual benefits we offer: thematic data browsing, a SPARQL endpoint and Linked Data APIs. And, because the data’s linked, each data point has a unique identifier so users can select and combine data from different data sources to get the exact information they’re after.

We’ve also rebuilt the site’s custpom Research Documents API, that was offered by the alpha version of the site to make it faster and more robust (it’s backward-compatible with the previous version).

Linked Development custom API screenshot

This site illustrates what’s possible for government organisations using linked data: it allows for collective ownership of data and data integration whilst aiming to improve audience reach and data availablility. It’s great to see linked data being embraced by an increasing number of public bodies.

DNS to CDN to Origin

Posted about 1 month back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Content Distribution Networks (CDNs) such as Amazon CloudFront and Fastly have the ability to “pull” content from their origin server during HTTP requests in order to cache them. They can also proxy POST, PUT, PATCH, DELETE, and OPTION HTTP requests, which means they can “front” our web application’s origin like this:

DNS -> CDN -> Origin

Swapping out the concepts for actual services we use, the architecture can look like this:

DNSimple -> CloudFront -> Heroku

Or like this:

DNSimple -> Fastly -> Heroku

Or many other combinations.

Without Origin Pull or an Asset Host

Let’s first examine what it looks like serve static assets (CSS, JavaScript, font, and image files) from a Rails app without a CDN.

We could point our domain name to our Rails app running on Heroku using a CNAME record (apex domains in cloud environments have their own set of eccentricities):

www.thoughtbot.com -> thoughtbot-production.herokuapp.com

We’ll also need to set the following configuration:

# config/environments/{staging,production}.rb
config.serve_static_assets = true

In this setup, we’ll then see something like the following in our logs:

no asset host

That screenshot is from development mode but the same effect will occur in production:

  • all the application’s requests to static assets will go through the Heroku routing mesh,
  • get picked up by one of our web dynos,
  • passed to one of the Unicorn workers on the dyno,
  • then routed by Rails to the asset

This isn’t the best use of our Ruby processes. They should be reserved for handling real logic. Each process should have the fastest possible response time. Overall response time is affected by waiting for other processes to finish their work.

How Can We Solve This?

AssetSync is a popular approach that we have used in the past with success. We no longer use it because there’s no need to copy all files to S3 during deploy (rake assets:precompile). Copying files across the network is wasteful and slow, and gets slower as the codebase grows. S3 is also not a CDN, does not have edge servers, and therefore is slower than CDN options.

Asset Hosts that Support “Origin Pull”

A better alternative is to use services that “pull” the assets from the origin (Heroku) “Just In Time” the first time they are needed. Services we’ve used include CloudFront and Fastly. Fastly is our usual default due to its amazingly quick cache invalidation. Both have “origin pull” features that work well with Rails' asset pipeline.

Because of the asset pipeline, in production, every asset has a hash added to its name. Whenever the file changes, the browser requests the latest version as the hash and therefore the whole filename changes.

The first time a user requests an asset, it will look like this:

GET 123abc.cloudfront.net/application-ql4h2308y.css

A CloudFront cache miss “pulls from the origin” by making another GET request:

GET your-app-production.herokuapp.com/application-ql4h2308y.css

All future GET and HEAD requests to the CloudFront URL within the cache duration will be cached, with no second HTTP request to the origin:

GET 123abc.cloudfront.net/application-ql4h2308y.css

All HTTP requests using verbs other than GET and HEAD proxy through to the origin, which follows the Write-Through Mandatory portion of the HTTP specification.

Making it Work with Rails

We have standard configuration in our Rails apps that make this work:

# Gemfile
gem "coffee-rails"
gem "sass-rails"
gem "uglifier"

group :staging, :production do
  gem "rails_12factor"
end

# config/environments/{staging,production}.rb:
config.action_controller.asset_host = ENV["ASSET_HOST"] # will look like //123abc.cloudfront.net
config.assets.compile = false
config.assets.digest = true
config.assets.js_compressor = :uglifier
config.assets.version = ENV["ASSETS_VERSION"]
config.static_cache_control = "public, max-age=#{1.year.to_i}"

We don’t have to manually set config.serve_static_assets = true because the rails_12factor gem does it for us, in addition to handling any other current or future Heroku-related settings.

Fastly and other reverse proxy caches respect the Surrogate-Control standard. To get entire HTML pages cached in Fastly, we only need to include the Surrogate-Control header in the response. Fastly will cache the page for the duration we specify, protecting the origin from unnecessary requests and serving the HTML from Fastly’s edge servers.

Caching Entire HTML Pages (Why Use Memcache?)

While setting the asset host is a great start, a DNS to CDN to Origin architecture also lets us cache entire HTML pages. Here’s an example of caching entire HTML pages in Rails with High Voltage:

class PagesController < HighVoltage::PagesController
  before_filter :set_cache_headers

  private

  def set_cache_headers
    response.headers["Surrogate-Control"] = "max-age=#{1.day.to_i}"
  end
end

This will allow us to cache entire HTML pages in the CDN without using a Memcache add-on, which still goes through the Heroku router, then our app’s web processes, then Memcache. This architecture entirely protects the Rails app from HTTP requests that don’t require Ruby logic specific to our domain.

Rack Middleware

If we want to cache entire HTML pages site-wide, we might want to use Rack middleware. Here’s our typical config.ru for a Middleman app:

$:.unshift File.dirname(__FILE__)

require "rack/contrib/try_static"
require "lib/rack_surrogate_control"

ONE_WEEK = 604_800
FIVE_MINUTES = 300

use Rack::Deflater
use Rack::SurrogateControl
use Rack::TryStatic,
  root: "tmp",
  urls: %w[/],
  try: %w[.html index.html /index.html],
  header_rules: [
    [
      %w(css js png jpg woff),
      { "Cache-Control" => "public, max-age=#{ONE_WEEK}" }
    ],
    [
      %w(html), { "Cache-Control" => "public, max-age=#{FIVE_MINUTES}" }
    ]
  ]

run lambda { |env|
  [
    404,
    {
      "Content-Type"  => "text/html",
      "Cache-Control" => "public, max-age=#{FIVE_MINUTES}"
    },
    File.open("tmp/404.html", File::RDONLY)
  ]
}

We build the Middleman app at rake assets:precompile time during deploy to Heroku, as described in Styling a Middleman Blog with Bourbon, Neat, and Bitters. In production, we serve the app using Rack, so we are able to insert middleware to handle the Surrogate-Control header:

module Rack
  class SurrogateControl
    # Cache content in a reverse proxy cache (such as Fastly) for a year.
    # Use Surrogate-Control in response header so cache can be busted after
    # each deploy.
    ONE_YEAR = 31557600

    def initialize(app)
      @app = app
    end

    def call(env)
      status, headers, body = @app.call(env)
      headers["Surrogate-Control"] = "max-age=#{ONE_YEAR}"
      [status, headers, body]
    end
  end
end

CloudFront Setup

If we want to use CloudFront, we use the following settings:

  • “Download” CloudFront distribution
  • “Origin Domain Name” as www.thoughtbot.com (our app’s URL)
  • “Origin Protocol Policy” to “Match Viewer”
  • “Object Caching” to “Use Origin Cache Headers”
  • “Forward Query Strings” to “No (Improves Caching)”
  • “Distribution State” to “Enabled”

As a side benefit, in combination with CloudFront logging, we could replay HTTP requests on the Rails app if we had downtime at the origin for any reason, such as a Heroku platform issue.

Fastly Setup

If we use Fastly instead of CloudFront, there’s no “Origin Pull” configuration we need to do. It will work “out of the box” with our Rails configuration settings.

We often have a rake task in our Ruby apps fronted by Fastly like this:

# Rakefile
task :purge do
  api_key = ENV["FASTLY_KEY"]
  site_key = ENV["FASTLY_SITE_KEY"]
  `curl -X POST -H 'Fastly-Key: #{api_key}' https://api.fastly.com/service/#{site_key}/purge_all`
  puts 'Cache purged'
end

That turns our deployment process into:

git push production
heroku run rake purge --remote production

For more advanced caching and cache invalidation at an object level, see the fastly-rails gem.

Back to the Future

Fastly is really “Varnish as a Service”. Early in its history, Heroku used to include Varnish as a standard part of its “Bamboo” stack. When they decoupled the reverse proxy in their “Cedar” stack, we gained the flexibility of using different reverse proxy caches and CDNs fronting Heroku.

Love is Real

We have been using this stack in production for thoughtbot.com, robots.thoughtbot.com, playbook.thoughtbot.com, and many other apps for almost a year. It’s a stack in real use and is strong enough to consider as a good default architecture.

Give it a try on your next app!

Avoid AngularJS Dependency Annotation with Rails

Posted about 1 month back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

In AngularJS, it is a common practice to annotate injected dependencies for controllers, services, directives, etc.

For example:

angular.module('exampleApp', [])
  .controller('ItemsCtrl', ['$scope', '$http', function ($scope, $http) {
    $http.get('items/index.json').success(function(data) {
      $scope.items = data;
    });
  }]);

Notice how the annotation of the injected parameters causes duplication ($scope and $http appears twice). It becomes the responsibility of the developer to ensure that the actual parameters and the annotation are always in sync. If they are not, problems will occur that can cause lost time while head-scratching. As the list of parameters grows, it gets even harder to maintain.

The reason for needing to annotate the injected dependencies is documented in the AngularJS docs under “Dependency Annotation”:

To allow the minifiers to rename the function parameters and still be able to inject right services, the function needs to be annotated…

So, JavaScript minifiers will rename function parameters to something short and ambiguous (usually using just one letter). In that case, AngularJS does not know what service to inject since it tries to match dependencies to the parameter names.

Variable mangling and Rails

When using AngularJS with Rails, developers typically rely on the asset pipeline to handle the minification of JavaScript. Namely, the uglifier gem is the default, and most commonly used. With uglifier we are given an option to disable the mangling of variable names. JavaScript will still be minified – whitespace stripped out and code nicely compacted – but the variable and parameter names will remain the same.

To do this, in your Rails project disable the mangle setting for uglifier in the production (and staging) environment config, like so:

# config/environments/production.rb
ExampleApp::Application.configure do
  ...
  config.assets.js_compressor = Uglifier.new(mangle: false)
  ...
end

With that in place, you can write your AngularJS code without needing to annotate the injected dependencies. The previous code can now be written as:

angular.module('exampleApp', [])
  .controller('ItemsCtrl', function ($scope, $http) {
    $http.get('items/index.json').success(function(data) {
      $scope.items = data;
    });
  });

With this trick, there is no duplication of parameter names and no strange array notation.

The catch

Here are a couple of screenshots that show the difference in HTTP responses between mangling and non-mangling variable names, on a project of about 500 lines of production AngularJS code:

With full uglifier minification (variables are mangled): full-minification-screenshot

With variable mangling disabled (no variable mangling): mangling-disabled-screenshot

Disabling variable name mangling comes at the cost of about 200KB more. The size difference between the two settings can be more significant on larger projects with a lot more JavaScript code. The dilemma is to decide whether the convenience gained during development outweighs the size cost. Keep in mind that HTTP compression of web requests can help reduce the size difference. Benchmarking and comparison is advised on a per project basis.

What’s next?

If you found this useful, you might also enjoy:

KB Ratings

Posted about 1 month back at entp hoth blog - Home

Howdy!

What is this new section that just appeared in the sidebar for KB articles?

Screenshot of the sidebar showing a 'Is this article helpful, thumbs up/down' section

Well, starting today, users can now rate your KB articles! If you are logged in as a regular user, you will see the rating widget, and if you are logged in as staff, you will see the actual rating:

Same section showing the actual rating

Click through and you will be able to see all ratings and comments for the article, as well as the version they are associated with, so that you can keep track of your progress when improving articles:

Or head over to Knowledge Base > Ratings to see all ratings for all articles.

I hope you enjoy the change, and let us know if you have any feedback ;)

Cheers!

Efficient JSON in Swift with Functional Concepts and Generics

Posted about 1 month back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

A few months ago Apple introduced a new programming language, Swift, that left us excited about the future of iOS and OS X development. People were jumping into Swift with Xcode Beta1 immediately and it didn’t take long to realize that parsing JSON, something almost every app does, was not going to be as easy as in Objective-C. Swift being a statically typed language meant we could no longer haphazardly throw objects into typed variables and have the compiler trust us that it was the actually the type we claimed it would be. Now, in Swift, the compiler is doing the checking, making sure we don’t accidentally cause runtime errors. This allows us to lean on the compiler to create bug free code, but means we have to do a bit more work to make it happy. In this post, I discuss a method of parsing JSON APIs that uses functional concepts and Generics to make readable and efficient code.

Request the User Model

The first thing we need is a way to parse the data we receive from a network request into JSON. In the past, we’ve used NSJSONSerialization.JSONObjectWithData(NSData, Int, &NSError) which gives us an optional JSON data type and a possible error if there were problems with the parsing. The JSON object data type in Objective-C is NSDictionary which can hold any object in its values. With Swift, we have a new dictionary type that requires us to specify the types held within. JSON objects now map to Dictionary<String, AnyObject>. AnyObject is used because a JSON value could be a String, Double, Bool, Array, Dictionary or null. When we try to use the JSON to populate a model we’ve created, we’ll have to test that each key we get from the JSON dictionary is of that model’s property type. As an example, let’s look at a user model:

struct User {
  let id: Int
  let name: String
  let email: String
}

Now let’s take a look at what a request and response for the current user might look like:

func getUser(request: NSURLRequest, callback: (User) -> ()) {
  let task = NSURLSession.sharedSession().dataTaskWithRequest(request) { data, urlResponse, error in
    var jsonErrorOptional: NSError?
    let jsonOptional: AnyObject! = NSJSONSerialization.JSONObjectWithData(data, options: NSJSONReadingOptions(0), error: &jsonErrorOptional)

    if let json = jsonOptional as? Dictionary<String, AnyObject> {
      if let id = json["id"] as AnyObject? as? Int { // Currently in beta 5 there is a bug that forces us to cast to AnyObject? first
        if let name = json["name"] as AnyObject? as? String {
          if let email = json["email"] as AnyObject? as? String {
            let user = User(id: id, name: name, email: email)
            callback(user)
          }
        }
      }
    }
  }
  task.resume()
}

After a lot of if-let statements, we finally have our User object. You can imagine that a model with more properties will just get uglier and uglier. Also, we are not handling any errors so if any of the steps don’t succeed, we have nothing. Finally, we would have to write this code for every model we want from the API, which would be a lot of code duplication.

Before we start to refactor, let’s define some typealias’s to simplify the JSON types.

typealias JSON = AnyObject
typealias JSONDictionary = Dictionary<String, JSON>
typealias JSONArray = Array<JSON>

Refactoring: Add Error Handling

First, we will refactor our function to handle errors by introducing the first functional programming concept, the Either<A, B> type. This will let us return the user object when everything runs smoothly or an error when it doesn’t. We can implement an Either<A, B> type in Swift like this:

enum Either<A, B> {
  case Left(A)
  case Right(B)
}

We can use Either<NSError, User>as the type we’ll pass to our callback so the caller can handle the successfully parsed User or the error.

func getUser(request: NSURLRequest, callback: (Either<NSError, User>) -> ()) {
  let task = NSURLSession.sharedSession().dataTaskWithRequest(request) { data, urlResponse, error in
    // if the response returned an error send it to the callback
    if let err = error {
      callback(.Left(err))
      return
    }

    var jsonErrorOptional: NSError?
    let jsonOptional: JSON! = NSJSONSerialization.JSONObjectWithData(data, options: NSJSONReadingOptions(0), error: &jsonErrorOptional)

    // if there was an error parsing the JSON send it back
    if let err = jsonErrorOptional {
      callback(.Left(err))
      return
    }

    if let json = jsonOptional as? JSONDictionary {
      if let id = json["id"] as AnyObject? as? Int {
        if let name = json["name"] as AnyObject? as? String {
          if let email = json["email"] as AnyObject? as? String {
            let user = User(id: id, name: name, email: email)
            callback(.Right(user))
            return
          }
        }
      }
    }

    // if we couldn't parse all the properties then send back an error
    callback(.Left(NSError()))
  }
  task.resume()
}

Now the function calling our getUser can switch on the Either and do something with the user or display the error.

getUser(request) { either in
  switch either {
  case let .Left(error):
    // display error message

  case let .Right(user):
    // do something with user
  }
}

We will simplify this a bit by assuming that the Left will always be an NSError. Instead let’s use a different type Result<A> which will either hold the value we are looking for or an error. It’s implementation might look like this:

enum Result<A> {
  case Error(NSError)
  case Value(A)
}

Replacing Either with Result will look like this:

func getUser(request: NSURLRequest, callback: (Result<User>) -> ()) {
  let task = NSURLSession.sharedSession().dataTaskWithRequest(request) { data, urlResponse, error in
    // if the response returned an error send it to the callback
    if let err = error {
      callback(.Error(err))
      return
    }

    var jsonErrorOptional: NSError?
    let jsonOptional: JSON! = NSJSONSerialization.JSONObjectWithData(data, options: NSJSONReadingOptions(0), error: &jsonErrorOptional)

    // if there was an error parsing the JSON send it back
    if let err = jsonErrorOptional {
      callback(.Error(err))
      return
    }

    if let json = jsonOptional as? JSONDictionary {
      if let id = json["id"] as AnyObject? as? Int {
        if let name = json["name"] as AnyObject? as? String {
          if let email = json["email"] as AnyObject? as? String {
            let user = User(id: id, name: name, email: email)
            callback(.Value(user))
            return
          }
        }
      }
    }

    // if we couldn't parse all the properties then send back an error
    callback(.Error(NSError()))
  }
  task.resume()
}
getUser(request) { result in
  switch result {
  case let .Error(error):
    // display error message

  case let .Value(user):
    // do something with user
  }
}

Not a big change but let’s keep going.

Refactoring: Eliminate Type Checking Tree

Next, we will get rid of the ugly JSON parsing by creating separate JSON parsers for each type. We only have a String, Int, and Dictionary in our object so we need three functions to parse those types.

func JSONString(object: JSON?) -> String? {
  return object as? String
}

func JSONInt(object: JSON?) -> Int? {
  return object as? Int
}

func JSONObject(object: JSON?) -> JSONDictionary? {
  return object as? JSONDictionary
}

Now the JSON parsing will look like this:

if let json = JSONObject(jsonOptional) {
  if let id = JSONInt(json["id"]) {
    if let name = JSONString(json["name"]) {
      if let email = JSONString(json["email"]) {
        let user = User(id: id, name: name, email: email)
      }
    }
  }
}

Using these functions we’ll still need a bunch of if-let syntax. The functional programming concepts Monads, Applicative Functors, and Currying will help to condense this parsing. First, let’s look at the Maybe Monad which is similar to Swift optionals. Monads have a bind operator which, when used with optionals, allows us to bind an optional with a function that takes a non-optional and returns an optional. If the first optional is .None then it returns .None, otherwise it unwraps the first optional and applies the function to it.

infix operator >>> { associativity left precedence 150 }

func >>><A, B>(a: A?, f: A -> B?) -> B? {
  if let x = a {
    return f(x)
  } else {
    return .None
  }
}

In other functional languages, >>= is used for bind; however, in Swift that operator is used for bitshifting so we will use >>> instead. Applying this to the JSON parsing we get:

if let json = jsonOptional >>> JSONObject {
  if let id = json["id"] >>> JSONInt {
    if let name = json["name"] >>> JSONString {
      if let email = json["email"] >>> JSONString {
        let user = User(id: id, name: name, email: email)
      }
    }
  }
}

Then we can remove the optional parameters from our parsers:

func JSONString(object: JSON) -> String? {
  return object as? String
}

func JSONInt(object: JSON) -> Int? {
  return object as? Int
}

func JSONObject(object: JSON) -> JSONDictionary? {
  return object as? JSONDictionary
}

Functors have an fmap operator for applying functions to values wrapped in some context. Applicative Functors also have an apply operator for applying wrapped functions to values wrapped in some context. The context here is an Optional which wraps our value. This means that we can combine multiple optional values with a function that takes multiple non-optional values. If all values are present, .Some, then we get a result wrapped in an optional. If any of the values are .None, we get .None. We can define these operators in Swift like this:

infix operator <^> { associativity left } // Functor's fmap (usually <$>)
infix operator <*> { associativity left } // Applicative's apply

func <^><A, B>(f: A -> B?, a: A?) -> B? {
  if let x = a {
    return f(x)
  } else {
    return .None
  }
}

func <*><A, B>(f: (A -> B)?, a: A?) -> B? {
  if let x = a {
    if let fx = f {
      return fx(x)
    }
  }
  return .None
}

Before we put it all together, we will need to manually curry our User’s init since Swift doesn’t support auto-currying. Currying means that if we give a function fewer parameters than it takes, it will return a function that takes the remaining parameters. Our User model will now look like this:

struct User {
  let id: Int
  let name: String
  let email: String

  static func create(id: Int)(name: String)(email: String) -> User {
    return User(id: id, name: name, email: email)
  }
}

Putting it all together, our JSON parsing now looks like this:

if let json = jsonOptional >>> JSONObject {
  let user = User.create <^>
              json["id"]    >>> JSONInt    <*>
              json["name"]  >>> JSONString <*>
              json["email"] >>> JSONString
}

If any of our parser’s return .None then user will be .None. This looks much better, but we’re not done yet.

Now, our getUser function looks like this:

func getUser(request: NSURLRequest, callback: (Result<User>) -> ()) {
  let task = NSURLSession.sharedSession().dataTaskWithRequest(request) { data, urlResponse, error in
    // if the response returned an error send it to the callback
    if let err = error {
      callback(.Error(err))
      return
    }

    var jsonErrorOptional: NSError?
    let jsonOptional: JSON! = NSJSONSerialization.JSONObjectWithData(data, options: NSJSONReadingOptions(0), error: &jsonErrorOptional)

    // if there was an error parsing the JSON send it back
    if let err = jsonErrorOptional {
      callback(.Error(err))
      return
    }

    if let json = jsonOptional >>> JSONObject {
      let user = User.create <^>
                  json["id"]    >>> JSONInt    <*>
                  json["name"]  >>> JSONString <*>
                  json["email"] >>> JSONString
      if let u = user {
        callback(.Value(u))
        return
      }
    }

    // if we couldn't parse all the properties then send back an error
    callback(.Error(NSError()))
  }
  task.resume()
}

Refactoring: Remove Multiple Returns with Bind

Notice that we’re calling callback four times in the previous function. If we were to forget one of the return statements, we could introduce a bug. We can eliminate this potential bug and clean up this function further by first breaking up this function into 3 distinct parts: parse the response, parse the data into JSON, and parse the JSON into our User object. Each of these steps takes one input and returns the next step’s input or an error. This sounds like a perfect case for using bind with our Result type.

The parseResponse function will need a Result with data and the status code of the response. The iOS API only gives us NSURLResponse and keeps the data separate, so we will make a small struct to help out here:

struct Response {
  let data: NSData
  let statusCode: Int = 500

  init(data: NSData, urlResponse: NSURLResponse) {
    self.data = data
    if let httpResponse = urlResponse as? NSHTTPURLResponse {
      statusCode = httpResponse.statusCode
    }
  }
}

Now we can pass our parseResponse function a Response and check the response for errors before handing back the data.

func parseResponse(response: Response) -> Result<NSData> {
  let successRange = 200..<300
  if !contains(successRange, response.statusCode) {
    return .Error(NSError()) // customize the error message to your liking
  }
  return .Value(response.data)
}

The next functions will require us to transform an optional to a Result type so let’s make one quick abstraction before we move on.

func resultFromOptional<A>(optional: A?, error: NSError) -> Result<A> {
  if let a = optional {
    return .Value(a)
  } else {
    return .Error(error)
  }
}

Next up is our data to JSON function:

func decodeJSON(data: NSData) -> Result<JSON> {
  let jsonOptional: JSON! = NSJSONSerialization.JSONObjectWithData(data, options: NSJSONReadingOptions(0), error: &jsonErrorOptional)
  return resultFromOptional(jsonOptional, NSError()) // use the error from NSJSONSerialization or a custom error message
}

Then, we add our JSON to model decoding on the model itself:

struct User {
  let id: Int
  let name: String
  let email: String

  static func create(id: Int)(name: String)(email: String) -> User {
    return User(id: id, name: name, email: email)
  }

  static func decode(json: JSON) -> Result<User> {
    let user = JSONObject(json) >>> { dict in
      User.create <^>
          dict["id"]    >>> JSONInt    <*>
          dict["name"]  >>> JSONString <*>
          dict["email"] >>> JSONString
    }
    return resultFromOptional(user, NSError()) // custom error message
  }
}

Before we combine it all, let’s extend bind, >>>, to also work with the Result type:

func >>><A, B>(a: Result<A>, f: A -> Result<B>) -> Result<B> {
  switch a {
  case let .Value(x):     return f(x)
  case let .Error(error): return .Error(error)
  }
}

And add a custom initializer to Result:

enum Result<A> {
  case Error(NSError)
  case Value(A)

  init(_ error: NSError?, _ value: A) {
    if let err = error {
      self = .Error(err)
    } else {
      self = .Value(value)
    }
  }
}

Now, we combine all these functions with the bind operator.

func getUser(request: NSURLRequest, callback: (Result<User>) -> ()) {
  let task = NSURLSession.sharedSession().dataTaskWithRequest(request) { data, urlResponse, error in
    let responseResult = Result(error, Response(data: data, urlResponse: urlResponse))
    let result = responseResult >>> parseResponse
                                >>> decodeJSON
                                >>> User.decode
    callback(result)
  }
  task.resume()
}

Wow, even writing this again, I’m excited with this result. You might think, “This is really cool. Can’t wait to use it!”, but we’re not done yet!

Refactoring: Type Agnostic using Generics

This is great but we still have to write this for every model we want to get. We can use Generics to make this completely abstracted.

We introduce a Decodable protocol and tell our function that the type we want back must conform to that protocol. The protocol looks like this:

protocol Decodable {
  class func decode(json: JSON) -> Result<Self>
}

Now make User conform:

struct User: Decodable {
  let id: Int
  let name: String
  let email: String

  static func create(id: Int)(name: String)(email: String) -> User {
    return User(id: id, name: name, email: email)
  }

  static func decode(json: JSON) -> Result<User> {
    let user = User.create <^>
                json["id"]    >>> JSONInt    <*>
                json["name"]  >>> JSONString <*>
                json["email"] >>> JSONString
    return resultFromOptional(user, NSError()) // custom error message
  }
}

Our final performRequest function now looks like this:

func performRequest<A: Decodable>(request: NSURLRequest, callback: (Result<A>) -> ()) {
  let task = NSURLSession.sharedSession().dataTaskWithRequest(request) { data, urlResponse, error in
    let responseResult = Result(error, Response(data: data, urlResponse: urlResponse))
    let result = responseResult >>> parseResponse
                                >>> decodeJSON
                                >>> A.decode
    callback(result)
  }
  task.resume()
}

Further Learning

If you are curious about functional programming or any of the concepts discussed in this post, check out Haskell and specifically this post from the Learn You a Haskell book. Also, check out Pat Brisbin’s post about options parsing using the Applicative.