Paid Private Repos for Hound

Posted 15 days back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

In April, we announced Hound, a hosted service that comments on Ruby style guide violations in your GitHub pull requests. Since then, about 2,000 users have signed up for Hound. We have been hard at work across 127 commits making Hound faster and more reliable. Recently, Hound has been reviewing about 7,000 GitHub pull requests across about 300 GitHub repos each week.

What people are telling us

We have heard from users that their teams are quickly getting on board with Hound. By their nature, formatting issues are quick fixes. By its nature, Hound is in your face. So far, the combination seems effective for actually making positive change in the repo.

Some people are treating Hound as CI in the sense of catching the final 5% of style issues they may have missed locally. These folks may even have RuboCop running locally in their text editor.

Others rely more heavily on a “human memory + Hound” approach, preferring to configure plugins such as Syntastic to focus only on syntax issues, which keeps Vim running fast.

Another group includes style checkers / linters in their test suites. We hope to convince them that style violations should not fail the build, that a system like Hound commenting during code review is the right balance of machine-based feedback and humans making final decisions.

Pricing

As we promised in the launch announcement, we are now charging for private repos. Private personal repos are $9 per month per repo. Private organization repos are $24 per month per repo.

There are no limits on number of users, pull requests, or Hound comments. Public repos are free.

We believe this pricing strategy is low cost compared to the benefits Hound provides. It is tedious, error-prone, and time-consuming for humans to do the menial work of style-checking. Saving even one developer 15-30 minutes each month during code review would make Hound worth paying for.

Based on feedback from users, though, we think we’re saving teams more time than 15-30 minutes each month. By keeping codebases formatted consistently, Hound lowers the cognitive overhead while we read code and lets us focus on higher-level concerns.

Enable Hound on your private repo

Enable Hound on a private repo by clicking the toggle button next to the repo name. When you submit the credit card form, you will be charged for the first month. We will then charge you once a month from the time it was enabled. When you disable a repo, we will stop charging you for it.

Existing private repos

As a thank you to early users, current Hound-enabled private repos will remain free until September 19, 2014. To keep using Hound after that date, please disable and re-enable Hound on your private repos. If you do nothing, you will not be charged and we will automatically disable Hound on uncharged private repos on September 19, 2014.

What’s next?

We are charging for Hound in order to run it sustainably, improve it, make it review other languages such as CoffeeScript, and maintain it. We hope you love it as much as we do.

Thank you for supporting Hound!

Scott's dog wearing a sombrero

Building Ralph with SVG

Posted 16 days back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

SVG (Scalable Vector Graphic) is a graphic format, similar to PNG or JPEG, that’s great for icons, charts and supporting high-DPI displays. They’re vector-based, so you won’t lose visual fidelity or see any pixelation, no matter what size your graphic is. Because it’s written in XML, you can mark it up just like HTML and modify it with other languages like Javascript or CSS.

Let’s build a robot

SVGs are typically made up of objects, like shapes and paths. Let’s start by building something simple like Ralph’s head. Here’s our full logo in SVG for reference:

Open your favorite text editor and create a new file. Start with an empty <svg> element and define its XML namespace:

<svg xmlns="http://www.w3.org/2000/svg">
</svg>

To start creating a graphic, we can add a simple rectangle by specifying a height, width and fill color.

<svg xmlns="http://www.w3.org/2000/svg">
  <rect height="60" width="100" fill="#B32317" />
</svg>

Save that file, open it in a web browser, and you should see something like this:

We can round the corners of the rectangle with rx and ry attributes. These attribute should always be the same if you want a circular corner.

...
<rect height="60" width="100" rx="10" ry="10" fill="#B32317" />
...

Just like the <rect> element, SVG has a <circle> element. We’ll use that for the eyes.

...
<circle cx="26" cy="26" r="10" fill="#000000"></circle>
<circle cx="74" cy="26" r="10" fill="#000000"></circle>
...

As you can see, circles and rectangles have different attributes. The cx and cy attributes are the x and y coordinates of the center of the circle, respectively. The coordinate system has the origin at the top/left with the x-axis pointing to the right and the y-axis pointing down. The r attribute is the radius of the circle.

To keep your code more organized, SVG has a <g> element, similar to HTML’s <div>, that you can use for grouping. These elements are a great place to add classes for styling or manipulating later.

While we’re at it, let’s change the fill attribute on the circles to #FFFFFF so they will show up on the red rectangle better.

<svg xmlns="http://www.w3.org/2000/svg">
  <g class="head">
    <rect height="60" width="100" rx="10" ry="10" fill="#B32317" />
    <g class="eyes">
      <circle cx="26" cy="26" r="10" fill="#FFFFFF"></circle>
      <circle cx="74" cy="26" r="10" fill="#FFFFFF"></circle>
    </g>
  </g>
</svg>

Starting to look more familiar…

Add a couple more circles, and we’ve got ourselves a robot head:

...
<circle cx="26" cy="26" r="7" fill="#B32317"></circle>
<circle cx="74" cy="26" r="7" fill="#B32317"></circle>
...

Working with SVG in the real world

Hand writing SVGs like this rarely makes sense. Exporting an SVG you’ve already made in a vector graphics application is much quicker. An example workflow using Sketch, an OS X-only vector graphics app:

  • Select the shape or group you want to export as SVG.

  • Click “Make Exportable” in the lower right corner.

  • Find “SVG” in the dropdown and click “Export”.

Although Sketch doesn’t give you insight into the code it produces, it exports relatively clean SVG files. Next, you take the outputted code and paste it into the page you’re working on. Here is an example:

Here’s a gist of that code, so you can see what it will look like when you’ve exported a more complex shape. Now that you can export SVGs from an application and have a basic understanding of how the attributes work, you can start to modify them with Javascript or CSS.

Episode #489 - August 15th, 2014

Posted 19 days back at Ruby5

Dokkufy, Rails Helpers, JRuby, Xiki and DHH code review

Listen to this episode on Ruby5

Sponsored by NewRelic

NewRelic recently announced that they joind the Cloud Security Alliance to Promote SaaS Security
NewRelic

Your mini-Heroku with Dokku and Dokkufy

Cristiano Betta has released the Dokkufy gem
Your mini-Heroku with Dokku and Dokkufy

Rails Misapprehensions: Replace Helpers With View Models!

Nick Sutterer says Cells view models are better then Rails Helpers
Rails Misapprehensions: Replace Helpers With View Models!

JRuby: The Hard Parts

Charles Nutter has shared a slide deck entitled JRuby: The Hard Parts
JRuby: The Hard Parts

Xiki Kickstarter

Xiki is most then a shell, its the simplest way to create interactive interfaces with text in and text out
Xiki Kickstarter

Basecamp: Search

Gregg Pollack and Carlos Souza have released a Feature Focus video as they walk you through their creation of a Basecamp search functionality.
Basecamp: Search

Write a Vim Plugin with TDD

Posted 19 days back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

My colleague Chris Toomey and I are writing a Vim plugin called vim-spec-runner that runs RSpec and JavaScript tests. We’re test-driving it and learned a lot.

Vimrunner

We’re using the excellent vimrunner Ruby gem. It provides a Ruby interface to run arbitrary commands in Vim by hooking into Vim’s client-server architecture. If Vim was compiled with +clientserver, you can launch a Vim process that acts as a command server, with a client that sends commands to it. Vimrunner launches a Vim server and then sends your commands to it.

We discovered that terminal Vim doesn’t work with vimrunner, but MacVim works perfectly. Vimrunner will pick up MacVim if it’s installed, so all you have to do is brew install macvim. If you’re using Linux/BSD, it might work out of the box, but we haven’t tried it.

Here’s how to use Vimrunner to sort a file:

VimRunner.start do |vim|
  vim.edit "file.txt"
  vim.command "%sort"
end

Vimrunner also comes with some neat RSpec helpers. You can look at vim-spec-runner’s full spec_helper, but here are the important bits:

require "vimrunner"
require "vimrunner/rspec"

ROOT = File.expand_path("../..", __FILE__)

Vimrunner::RSpec.configure do |config|
  config.reuse_server = true

  config.start_vim do
    vim = Vimrunner.start
    vim.add_plugin(File.join(ROOT, "plugin"), "spec-runner.vim")
    vim
  end
end

First, we require vimrunner and vimrunner/rspec, for RSpec support. Then we set the ROOT constant to the directory containing the plugin directory. We then configure Vimrunner’s RSpec integration:

  • config.reuse_server = true: Use the same Vim instance for every spec.
  • config.start_vim: This block is used whenever Vimrunner needs a new Vim instance. We’re using it to always add our plugin to the vim instance.

Customizing RSpec

We use RSpec’s custom matchers quite a bit in our specs. They reduce the amount of code we have to write and make our tests much more readable.

Here’s an example:

it "does not create a mapping if one already exists" do
  using_vim_without_plugin do |clean_vim|
    clean_vim.edit "my_spec.rb"
    clean_vim.command "nnoremap <Leader>x <Plug>RunCurrentSpecFile"
    load_plugin(clean_vim)

    expect(clean_vim).to have_no_normal_map_from("<Leader>a")
  end
end

using_vim_without_plugin and load_plugin are both plain Ruby methods defined in our main spec file. Here’s the have_no_normal_map_from matcher:

RSpec::Matchers.define :have_no_normal_map_from do |expected_keys|
  match do |vim_instance|
    mapping_output(vim_instance, expected_keys) == 'No mapping found'
  end

  failure_message_for_should do |vim_instance|
    "expected no map for '#{expected_keys}' but it maps to something"
  end

  def mapping_output(vim_instance, expected_keys)
    vim_instance.command "nmap #{expected_keys}"
  end
end

The outer block variable, expected_keys, is what we pass to expect, while the block variable for match is what we pass to the have_no_normal_map_from method.

We define failure_message_for_should so that if there is a mapping, we get a useful, human-formatted error message.

Travis CI

It took some work, but we got our plugin running on Travis CI. Vimrunner needs a Vim that was compiled with +clientserver, so we install the vim-gnome Ubuntu package. Vimrunner also needs an X server, so we use xvfb to start up a headless X environment. Here’s the result:

before_install:
  - "sudo apt-get update"
  - "sudo apt-get install vim-gnome"
  - "vim --version"
install: bundle
script: xvfb-run bundle exec rspec --format documentation

We use the --format documentation option to RSpec so we can see exactly which test failed on Travis. You can see our full .travis.yml file here and a sample test run here.

What’s next?

Try out vim-spec-runner! If you want to add tests to your own Vim plugin, check out the full spec file.

UX & Front-End Development Bootcamp

Posted 20 days back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

We’re proud to announce that we’re kicking off a new UX & Front-End Development bootcamp with Metis this Fall. Earlier this year, we helped launch the Ruby on Rails bootcamp with great success and as a design-driven company, it’s a natural step to expand into product design.

Designers today are increasingly involved in every step of a product’s evolution. It’s now common for our role to include front-end development and product, visual, interaction, and user experience design. Things move quick, products change rapidly, and having capable designers tightly integrated throughout the process is crucial. Our bootcamp at Metis is no different and we’ve been developing our curriculum around this notion.

The first bootcamp will be September 22nd through November 21st (9 weeks) and take place in New York City. It will consist of 100% in-person instruction, Monday–Friday, 9–6. I’m grateful to be joined by Allison House as my co-instructor. She’s worked with companies such as Dropbox, Codecademy and Treehouse and brings an incredible amount of knowledge and skill to our team.

Our goal is for students to graduate with confidence in producing quality digital product design solutions, with capabilities ranging from prototyping, to designing mobile and web applications, and working with HTML and CSS. Students will be able to pursue a job as an entry-level product designer, user experience designer, web designer or mobile designer.

Head over to the Metis website to find out more and apply.

Speed Up Tests by Selectively Avoiding Factory Girl

Posted 20 days back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

I’ve talked about speeding up unit tests when using Factory Girl by relying on FactoryGirl.build_stubbed, but there’s another surefire way to speed up your test suite with Factory Girl.

Don’t use it.

Most Unit Tests Don’t Need Persisted Data

There are plenty of times where data needs to exist in the database to accurately test an application; most acceptance tests will require some amount of data persisted (either via Factory Girl or by creating data driven through UI interactions). When unit-testing most methods, however, Factory Girl (and even persisting data to the database) is unnecessary.

Let’s start with a couple of tests around a method we’ll need to define, User#age:

describe User do
  describe "#age" do
    it "calculates age given birthdate" do
      user = generate_user_born_on 366.days.ago

      expect(user.age).to eq 1
    end

    it "calculates age correctly by rounding age down to the appropriate integer" do
      user = generate_user_born_on 360.days.ago

      expect(user.age).to eq 0
    end

    def generate_user_born_on(date)
      FactoryGirl.create :user, birthdate: date
    end
  end
end

This seems like a harmless use of Factory Girl, and leads us to define User#age:

class User < ActiveRecord::Base
  def age
    ((Date.current - birthdate)/365.0).floor
  end
end

Running the specs:

$ rspec spec/models/user_spec.rb
..

Finished in 0.01199 seconds
2 examples, 0 failures

More than 100% Faster!

Looking at User#age, though, we don’t actually care about the database. Let’s swap FactoryGirl.create with User.new and re-run the spec.

rspec spec/models/user_spec.rb
..

Finished in 0.00489 seconds

Still a green suite, but more than 100% faster.

Associations Make a Test Suite Slower

Now, let’s imagine User grows and ends up having a Profile:

class User < ActiveRecord::Base
  has_one :profile

  def age
    ((Date.current - birthdate)/365.0).floor
  end
end

We update the factory, including the associated profile:

FactoryGirl.define do
  factory :user do
    profile
  end

  factory :profile
end

Let’s re-run the spec using Factory Girl:

$ rspec spec/models/user_spec.rb
..

Finished in 0.02278 seconds
2 examples, 0 failures

Whoa, it’s now taking twice as long as it was before, but absolutely zero tests changed, only the factories.

Let’s run it again, but this time using User.new:

$ rspec spec/models/user_spec.rb
..

Finished in 0.00474 seconds
2 examples, 0 failures

Whew, back to a reasonable amount of time, and we’re still green. What’s going on here?

Persisting Data is Slow

FactoryGirl.create creates two records in the database, a user and a profile. Persistence is slow, which we know, but because Factory Girl is arguably easy to write and use, it hides it well. Even changing from FactoryGirl.create to FactoryGirl.build doesn’t help much:

$ rspec spec/models/user_spec.rb
..

Finished in 0.01963 seconds
2 examples, 0 failures

That’s because FactoryGirl.build creates associations; so, every time we use Factory Girl to build a User, we’re still persisting a Profile.

Writing to Disk Makes Things Worse

Sometimes, objects will write to disk during the object’s persistence lifecycle. A common example is processing a file attachment during an ActiveRecord callback through gems like Paperclip or Carrierwave, which may result in processing thousands of files unnecessarily. Imagine how much more slowly a test suite is because data is being created.

It’s incredibly difficult to identify these bottlenecks because of the differences between FactoryGirl.build, FactoryGirl.create, and how associations are handled. By remembering to use FactoryGirl.build on an avatar factory, we may speed up some subset of tests, but if User has an avatar associated with it, even when calling FactoryGirl.build(:user), avatars still get created - meaning valuable time spent processing images and persisting likely unnecessary data.

How to Fix Things

User#age is a great example because it’s quite clear that there’s no interaction with the database. Many methods on core domain objects will have methods like these, and I suggest avoiding Factory Girl entirely in these, if possible. Instead, instantiate the objects directly, with the correct data necessary to test the method. In the example above, User#age relies only on one point of data: birthdate. Since that’s the method being tested, there’s no need to instantiate a User with anything else. It provides clarity to yourself and other developers by explicitly defining the set of data it’s using for the test.

When testing an object and collaborators, consider doubles like fakes or stubs.

My general advice, though, is to avoid Factory Girl as much as is reasonably possible. Not because it’s bad or unreliable software (Factory Girl is very reliable; we’ve used it successfully since 2008), but because its inherent persistence mechanism is calling #save! on the object, which will always take longer than not persisting data.

Using Clearance with RailsAdmin

Posted 21 days back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

These days, I find myself implementing RailsAdmin on almost every consulting project that I’m on. While RailsAdmin is not a replacement for a custom or complex admin interface, it’s a great way to give non-technical stakeholders access to the data being created and updated in a Rails app. And it takes just a few minutes to set up!

Where there is an admin interface, there is also authentication. Most of the documentation out there covers how to integrate RailsAdmin with Devise. But setting up RailsAdmin with Clearance is easy, too!

Here is a step by step guide on how to set up RailsAdmin with Clearance authentication:

Step 1: Set up Clearance

# Gemfile
gem "clearance"
bundle install
rails generate clearance:install

Running the Clearance generator creates a migration. Clearance’s generated migration will either create the users table or add only the columns necessary for Clearance to an existing users table. Before we run the migration, let’s add a boolean column to indicate whether a user is an admin or not.

# db/migrate/20140808213224_create_users.rb
class CreateUsers < ActiveRecord::Migration
  def change
    create_table :users  do |t|
      t.timestamps null: false
      t.boolean :admin, null: false, default: false
      t.string :email, null: false
      t.string :encrypted_password, limit: 128, null: false
      t.string :confirmation_token, limit: 128
      t.string :remember_token, limit: 128, null: false
    end

    add_index :users, :email
    add_index :users, :remember_token
  end
end

Now we’re ready to run the migration:

rake db:migrate

Step 2: Write a test

We want to be sure that only admin users can see our admin dashboard, so we’ll start with some feature specs that test this behavior. See this overview of testing Rails applications for more detail on how thoughtbot tests Rails applications.

# spec/features/admin_dashboard_spec.rb
feature "Admin dashboard" do
  scenario "visitor is admin" do
    admin = create(:admin)

    visit rails_admin_path(as: admin)

    expect(page).to have_content("Site Administration")
  end

  scenario "visitor is not an admin user" do
    user = create(:user)

    visit rails_admin_path(as: user)

    expect(page).to have_content("You are not permitted to view this page")
  end
end

The tests above are using Clearance::BackDoor to sign the user in directly. This is one of Clearance’s super awesome tools that speeds up tests and makes writing feature specs a breeze.

Before these tests will run properly, we need to set up admin and user factories:

# spec/factories.rb
factory :user do
  email "test@example.com"
  password "password"

  factory :admin do
    admin true
  end
end

When we run our feature spec, our app does not recognize rails_admin_path, so it’s time to set up RailsAdmin.

Step 3. Set up RailsAdmin

Add the gem:

# Gemfile
gem "rails_admin"
bundle install

Tell RailsAdmin where we want it mounted (here, we’re choosing at “/admin”)

# config/routes.rb
Rails.application.routes.draw do
  mount RailsAdmin::Engine => "/admin", as: "rails_admin"

When we run the tests again, we find that one of our tests is passing. Woot!

Unfortunately, the one that is failing is the spec that makes sure only admin users can view RailsAdmin. That is no bueno.

Time to configure RailsAdmin to redirect non-admin users (we’re assuming here that we have a root path defined in our routes.rb file):

# config/initializers/rails_admin.rb
RailsAdmin.config do |config|
  config.authorize_with do
    unless current_user.admin?
      redirect_to(
        main_app.root_path,
        alert: "You are not permitted to view this page"
      )
    end
  end

  config.current_user_method { current_user }
end

Now our tests both pass! But don’t celebrate too quickly. There is one final step we need to take care of: If we create an admin user in console, start our server, and log in as that admin user, we will see the following form at “/admin/user/new”:

rails admin new
user

RailsAdmin assumes that because we have a password field, we will also have a password_confirmation field. If we try to fill these fields out and save a new user, we will get an error like this:

ActiveRecord::UnknownAttributeError in RailsAdmin::MainController#new
unknown attribute: password_confirmation

Clearance doesn’t have a password_confirmation field, so we are unable to create or update users in RailsAdmin out of the box. We can use the RailsAdmin DSL for configuring which fields to expose:

# config/initializers/rails_admin.rb
RailsAdmin.config do |config|

  ...

  config.model "User" do
    edit do
      field :admin
      field :email
      field :password
    end
  end
end

If we restart our server and re-load the admin dashboard, we’ll see that only the admin, email, and password fields are exposed and we can create a new user from within RailsAdmin.

We’re done! Now we can wow our teammates with the awesome admin dashboard we put together in just a few minutes.

Bonus: Add a sign out link to RailsAdmin

While we will probably include a sign out link in our main app, we’ve found that admin users frequently look for a sign out link from within RailsAdmin. Since RailsAdmin looks for Devise when deciding whether or not to show a sign out link, we need to provide a little workaround:

# lib/rails_admin_logout_link.rb
RailsAdmin::ApplicationHelper

  module RailsAdmin
    module ApplicationHelper
      def logout_path
        main_app.send(:sign_out_path) rescue false
      end
    end
  end

  class Devise
    def self.sign_out_via
      :delete
    end
  end

After restarting our server, we see a bright red “log out” link in the upper right hand corner of RailsAdmin:

rails admin log
out

Time-Series Database Design with InfluxDB

Posted 21 days back at Ryan's Scraps

Here at Spreedly we’ve recently started using the time series database InfluxDB to store a variety of customer activity metrics. As with any special purpose database, using and designing for a time-series database is quite different than what you may be used to with structured (SQL) databases. I’d like to describe our experience designing our InfluxDB schema, the mistakes we made, and the conclusions we’ve come to based on those experiences.

The mark

Consider the following scenario, closely resembling Spreedly’s: You run a service that lets your customers transact against a variety of payment gateways. You charge for this service on two axes – by the number of gateways provisioned and the number of credit cards stored. For any point in time you want to know how many of both each of your customers has for their account.

Initially we setup two series (InfluxDB’s term for a collection of measurements, organizationally similar to a SQL database table) to store the total number of each item per account:

  • gateway.account.sample
  • payment-method.account.sample

On some regular interval we’d collect the number of gateways and payment methods (credit cards) for each account and store it in the respective series. Each measurement looked like:

gateway.account.sample

{
  "time": 1400803300,
  "value": 2,
  "account_key": "abc123"
}

time is required by InfluxDB and is the epoch time of the measurement. The value is our value of the measurement at that time, and account_key is an additional property of that measurement.

Simple enough. This approach felt good and we went to production with this schema. That’s when we learned our first lesson…

Time-scoped queries

The first app that used the data in InfluxDB was our customer Dashboard product. It displays all your transactions and a simple view of your current billing counts (number of gateways and number of stored payment methods). Dashboard simply queried for the most recent measurement from each series for the current account:


select value
  from gateway.account.sample
  where account_key = 'abc123'
  limit 1

Since results in InfluxDB are ordered by default most recent first, the limit 1 clause ensures only the most recent measurement is returned for that customer (account).

All was fine initially, but as our dataset grew into the hundreds of thousands entries for each series we noticed our queries were taking quite some time to complete - about a constant 5s for every account. It turns out that these queries were incurring a full table scan, hence the constant (poor) performance.

Avoid a full table scan by always time-scoping your queries

In InfluxDB, the non-time fields aren’t indexed, meaning any queries that filter based on them require a full table scan (even if you’re only fetching a single result). The way to avoid a full table scan is to always time-scope your queries. Knowing this we modified our queries to only query against the previous 2 days worth of data (enough time to capture the most recent input):


select value
  from gateway.account.sample
  where time > now() - 2d
    and account_key = 'abc123'
  limit 1

Adding the where time > now() - 2d clause ensures that the query operates against a manageable set of data and avoids a full table scan. This dropped our query times from 5s (and growing) down to a steady 100ms - 200ms. (Keep in mind this is a remote instance of InfluxDB, meaning the bulk of that is in connection setup and network latency.)

InfluxDB response time reduction using time-scoped queries. Y-axis truncated for maximum obfuscation.

Obviously your use-case may differ wildly from ours. If your data is collected at unknown intervals, or in real-time, you don’t have the luxury of limiting your queries to a known window of time. In these situations it is wise to think about how to segment your data into series for optimal performance.

Series granularity

How many series should you have? How much data should you store in each series? When should you break out queries into their own series? These are all common questions when designing your time-series schema and, unfortunately, there is no concrete right or wrong answer. However, there are some good rules of thumb to keep in mind when structuring your data.

Continuing from our previous example: We were now using time-scoped queries to get the total number of gateways and cards for each account. While we were seeing good performance, each query was operating against a single series that contained data for all accounts. The query’s account_key condition was responsible for filtering the data by account:


select value
  from gateway.account.sample
  where time > now() - 2d
    and account_key = 'abc123'
  limit 1

As even this already time-scoped set of data grows, querying against a non-indexed field will start to become an issue. Queries whose conditions eliminate a large percentage of the data within the series should be extracted out into their own series. E.g., in our case we have a query that gets a single account’s count of stored gateways to the exclusion of all the other accounts. This is an example of a query that filters out the majority of the data in a series and should be extracted so each account has its own series.

Series are cheap. Use them liberally to isolate highly conditional data access.

If you’re coming from a SQL-based mindset, the thought of creating one series per account might seem egregious. However, it’s perfectly acceptable in time-series land. So that’s what we did - we starting writing data from each account into its own series (with each series’ name including the account key). Now, when querying for an account’s total number of stored gateways we do:


select value
  from account-abc1234.gateway.sample
  where time > now() - 2d
    ...

Since you have to know the key in question to access the right series, this type of design is most common with primary (or other well-known keys). But… not only can series be segmented by key, segmenting by time period is also possible. While not useful in our specific situation, you can imagine segmenting data into monthly series, e.g., 201407.gateway.sample or some other period, depending on your access pattern.

You can imagine…

Multi-purpose data

At this point your series are lean and efficient, well-suited for accessing a single type of query and data. However, sometimes life isn’t that clean and you have one set of data that needs to be accessed in many different ways.

For instance, at Spreedly, we’d like to have a business-level set of metrics available that shows the total number of gateways and payment-methods across all customers. We could just dump summary-level data into a new series (not a terrible idea), but we’re already collecting this data on a customer-level. It’d be nice not to have to do two writes per measurement.

Use continuous queries to re-purpose broad series by access pattern

Fortunately, InfluxDB has a feature called continuous queries that lets you modify and isolate data from one series into one or more other dependent series. Continuous queries are useful when you want to “rollup” time-series data by time period (e.g., get the 99th percentile service times across 5, 10 and 15 minute periods) and also to isolate a subset data for more efficient access. This latter application is perfect for our use-case.

To use continuous queries to support both summary and account-specific stats we need to create the parent series that contain measurements for each account.

gateway.account.sample

{
  "time": 1400803300,
  "value": 2,
  "account_key": "abc123"
},
{
  "time": 1400803300,
  "value": 7,
  "account_key": "def456"
}

We can access this series directly to obtain the business-level stats we need across all customers:


select sum(value)
  from gateway.account.sample
  where time > now() - 1d

With continuous queries we can also use this parent series to spawn several “fanout” queries that isolate the data by account (replicating the account-specific series naming scheme from earlier):


select value
  from gateway.account.sample
  into account-[account_key].gateway.sample;

Notice the [account_key] interpolation syntax? This creates one series per account and stores the value field from each measurement into the new account-specific series (retaining the original measurement’s time):

account-abc123.gateway.sample

{
  "time": 1400803300,
  "value": 2
}
account-def456.gateway.sample

{
  "time": 1400803300,
  "value": 7
}

With this structure we:

  • Only write the data one time into the parent series gateway.account.sample
  • Can perform summary level queries against this parent series
  • Have access to the highly efficient, constantly updated, account-specific, data series account-def456.gateway.sample etc…

This is a great use of fanout continuous queries. Also available are regular continuous queries which operate by precomputing expensive group by queries. I’ll skip over them for now since we’re not yet using them at Spreedly, but I encourage you to look at them for your use cases.

Naming and structure

Series naming and packet structure is a tough topic due to personal preferences, differences in client languages and highly varied access patterns. I’m not going to label the following as best-practices, instead I’ll present what we’ve found at Spreedly, our motivations, and let you decide whether it makes sense for you to apply.

  • Come up with a naming structure that conveys both the purpose of the series and the type of data contained within. At Spreedly it’s something like (and still evolving): [key].measured-item.[grouping].measurement-type. For instance, the series that contains the count of all gateways stored by account is gateway.account.sample. The account-specific version is: account-abc123.gateway.sample. The measurement-type component is highly influenced by the l2met logging conventions and deserves further discussion.

    • count series record a specific number of times something happened in a specific period of time as an integer. Counts can be summed with other counts in the same series to perform time-based aggregations (rollups). Measured number of requests or transactions per minute are an example of a count series.
    • sample series take a point in time measurement of some metric that supercedes all previous samples of the same series. Sum totals are a good example of this type of series, e.g., total revenue to date, or total number of payment methods. With each measurement in the series, previous measurements are no longer relevant, though they may still be used to track trends over time.
    • measure series are similar to count series except that instead of being a simple representation of the number of times something happen, they can represent any unit of measure such as ms, Mb etc… Measurements are be mathmatically operable and can be summed, percentiled, averaged etc… CPU load and response times are examples of measure series.
  • Often there is a single value that represents the thing being measured, with the rest of the fields being meta-data or conditions. To facilitate re-usable client parsing we’ve found it nice to use the same field name across all series to represent the value of the measurement. Unsurprisingly, we chose value. All our series data contains a value field that contains the measurement value. This makes it easy to retrieve, which is especially useful in queries that select across multiple series or even merge results from multiple series into a single result set.

There’s a lot of subjectivity that goes into database design, independent of the storage paradigm. While SQL has been around for awhile and has well-known patterns, alternative databases, including time-series databases, are a bit more of a wild west. I’m hoping that by sharing our experiences we can prevent some common mistakes, freeing you up to create all new ones of your own!

Many thanks to Paul and Todd and the rest of InfluxDB for their tireless guidance on the subject

Episode #488 - August 12th, 2014

Posted 22 days back at Ruby5

We talk about using the Facebook SDK with RubyMotion, Event Sourcing with Sandthorn, gems like rails_param and Groupdate, and time tracking with Hours.

Listen to this episode on Ruby5

Sponsored by CodeShip.io

Codeship is a hosted Continuous Delivery Service that just works.

Set up Continuous Integration in a few steps and automatically deploy when all your tests have passed. Integrate with GitHub and BitBucket and deploy to cloud services like Heroku and AWS, or your own servers.

Visit http://codeship.io/ruby5 and sign up for free. Use discount code RUBY5 for a 20% discount on any plan for 3 months.

Also check out the Codeship Blog!

CodeShip.io

Integrating the Facebook SDK with RubyMotion

Kamil Lelonek wrote a blog post about using the Facebook SDK with RubyMotion. He also goes into detail by creating a Facebook login app showing how it works.
Integrating the Facebook SDK with RubyMotion

rails_param

rails_param is a gem by Nicolas Blanco that brings parameter validation and type coercion into your Rails controllers.
rails_param

Sandthorn

Sandthorn is a Ruby library for saving an object's state as a series of events, a pattern that's known as Event Sourcing.
Sandthorn

Hours

Defacto Software open sourced “Hours”, a time tracking system written in Rails that makes it easy to track, categorize, and tag your time.
Hours

Groupdate

The Groupdate Ruby gem by Andrew Kane provides a simple interface to group temporal data. It supports grouping by day, week, hour of the day and more.
Groupdate

Top Ruby Jobs

ChallengePost is looking for a Senior Web Developer in New York, NY.
Top Ruby Jobs

Sponsored by Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

Buttons with Hold Events in Angular.js

Posted 22 days back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Creating an interaction with a simple button in Angular only requires adding the ngClick directive. However, sometimes an on click style interaction isn’t sufficient. Let’s take a look at how we can have a button which performs an action as long as it’s pressed.

For the example, we’ll use two buttons which can be used to zoom a camera in and out. We want the camera to continue zooming, until the button is released. The final effect will work like this:

Zooming in Martial Codex

Our template might look something like this:

<a href while-pressed="zoomOut()">
  <i class="fa fa-minus"></i>
</a>
<a href while-pressed="zoomIn()">
  <i class="fa fa-plus"></i>
</a>

We’re making a subtle assumption with this interface. By adding the parenthesis, we imply that whilePressed will behave similarly to ngClick. The given value is an expression that will get evaluated continuously when the button is pressed, rather than us handing it a function object for it to call. In practice, we can use the '&' style of arguments in our directive to capture the expression. You can find more information about the different styles of scopes here.

whilePressed = ->
  restrict: "A"

  scope:
    whilePressed: '&'

Binding the Events

When defining more complex interactions such as this one, Angular’s built-in directives won’t give us the control we need. Instead, we’ll fall back to manual event binding on the element. For clarity, I tend prefer to separate the callback function from the event bindings. Since we’re manipulating the DOM, our code will go into a link function. Our initial link function will look like this:

link: (scope, elem, attrs) ->
  action = scope.whilePressed

  bindWhilePressed = ->
    elem.on("mousedown", beginAction)

  beginAction = (e) ->
    e.preventDefault()
    # Do stuff

  bindWhilePressed()

Inside of our action we’ll need to do two things:

  1. Start running the action
  2. Bind to mouseup to stop running the action.

For running the action, we’ll use Angular’s $interval service. $interval is a wrapper around JavaScript’s setInterval, but gives us a promise interface, better testability, and hooks into Angular’s digest cycle.

In addition to running the action continuously, we’ll also want to run it immediately to avoid a delay. We’ll run the action every 15 milliseconds, as this will roughly translate to once per browser frame.

+TICK_LENGTH = 15
+
-whilePressed = ->
+whilePressed = ($interval) ->
   restrict: "A"

   link:
     action = scope.whilePressed

@@ -23,7 +24,7 @@
     beginAction = (e) ->
       e.preventDefault()
+      action()
+      $interval(action, TICK_LENGTH)
+      bindEndAction()

In our beginAction function, we call bindEndAction to set up the events to stop running the event. We know that we’ll at least want to bind to mouseup on our button, but we have to decide how to handle users who move the mouse off of the button before releasing it. We can handle this by listening for mouseleave on the element, in addition to mouseup.

bindEndAction = ->
  elem.on('mouseup', endAction)
  elem.on('mouseleave', endAction)

In our endAction function, we’ll want to cancel the $interval for our action, and unbind the event listeners for mouseup and mouseleave.

unbindEndAction = ->
  elem.off('mouseup', endAction)
  elem.off('mouseleave', endAction)

endAction = ->
  $interval.cancel(intervalPromise)
  unbindEndAction()

We’ll also need to store the promise that $interval returned so that we can cancel it when the mouse is released.

 whilePressed = ($parse, $interval) ->
   link: (scope, elem, attrs) ->
     action = scope.whilePressed
+    intervalPromise = null

     bindWhilePressed = ->
       elem.on('mousedown', beginAction)
@@ -23,7 +24,7 @@
     beginAction = (e) ->
       e.preventDefault()
       action()
-      $interval(action, TICK_LENGTH)
+      intervalPromise = $interval(action, TICK_LENGTH)
       bindEndAction()

Cleaning Up

Generally I consider it a smell to have an isolated scope on any directive that isn’t an element. Each DOM element can only have one isolated scope, and attribute directives are generally meant to be composed. So let’s replace our scope with a manual use of $parse instead.

$parse takes in an expression, and will return a function that can be called with a scope and an optional hash of local variables. This means we can’t call action directly anymore, and instead need a wrapper function which will pass in the scope for us.

-whilePressed = ($interval) ->
-  scope:
-    whilePressed: "&"
-
+whilePressed = ($parse, $interval) ->
   link: (scope, elem, attrs) ->
-    action = scope.whilePressed
+    action = $parse(attrs.whilePressed)
     intervalPromise = null

     bindWhilePressed = ->
@@ -26,14 +23,17 @@ whilePressed = ($interval) ->

     beginAction = (e) ->
       e.preventDefault()
-      action()
-      intervalPromise = $interval(action, TICK_LENGTH)
+      tickAction()
+      intervalPromise = $interval(tickAction, TICK_LENGTH)
       bindEndAction()

     endAction = ->
       $interval.cancel(intervalPromise)
       unbindEndAction()

+    tickAction = ->
+      action(scope)

And that’s it. Our end result is a nicely decoupled Angular UI component that can easily be reused across applications. The final code looks like this.

TICK_LENGTH = 15

whilePressed = ($parse, $interval) ->
  restrict: "A"

  link: (scope, elem, attrs) ->
    action = $parse(attrs.whilePressed)
    intervalPromise = null

    bindWhilePressed = ->
      elem.on('mousedown', beginAction)

    bindEndAction = ->
      elem.on('mouseup', endAction)
      elem.on('mouseleave', endAction)

    unbindEndAction = ->
      elem.off('mouseup', endAction)
      elem.off('mouseleave', endAction)

    beginAction = (e) ->
      e.preventDefault()
      tickAction()
      intervalPromise = $interval(tickAction, TICK_LENGTH)
      bindEndAction()

    endAction = ->
      $interval.cancel(intervalPromise)
      unbindEndAction()

    tickAction = ->
      action(scope)

    bindWhilePressed()

Silver Searcher Tab Completion with Exuberant Ctags

Posted 23 days back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

I’m a heavy Vim user and demand speedy navigation between files. I rely on Exuberant Ctags and tag navigation (usually Ctrl-]) to move quickly around the codebase.

There were times, however, when I wasn’t in Vim but wanted to use tags to access information quickly; most noticeable was time spent in my shell, searching the codebase with ag.

As a zsh user, I was already aware of introducing tab completion by way of compdef and compadd:

_fn_completion() {
  if (( CURRENT == 2 )); then
    compadd foo bar baz
  fi
}

compdef _fn_completion fn

In this example, fn is the binary we want to add tab completion to, and we only attempt to complete after typing fn and then TAB. By checking CURRENT == 2, we’re verifying the position of the cursor as the second field in the command. This will complete with options foo, bar, and baz, and filter the options accordingly as you start typing and hit TAB again.

Now that we understand how to configure tab completion for commands, next up is determining how to extract useful information from the tags file. Here’s the first few lines of the file from a project I worked on recently:

==      ../app/models/week.rb   /^  def ==(other)$/;"   f       class:Week
AccessToken     ../app/models/access_token.rb   /^class AccessToken < ActiveRecord::Base$/;"    c
AccessTokensController  ../app/controllers/access_tokens_controller.rb  /^class AccessTokensController < ApplicationController$/;"      c

The tokens we want to use for tab completion are the first set of characters per line, so we can use cut -f 1 path/to/tags to grab the first field. We then use grep -v to ignore autogenerated ctags metadata we don’t care about. With a bit of extra work (like writing stderr to /dev/null in the instance where the tags file doesn’t exist yet), the end result looks like this:

_ag() {
  if (( CURRENT == 2 )); then
    compadd $(cut -f 1 .git/tags tmp/tags 2>/dev/null | grep -v '!_TAG')
  fi
}

compdef _ag ag

With this in place, we can now ag a project and tab complete from the generated tags file. With ag AccTAB:

$ ag AccessToken
AccessToken             AccessTokensController

And the result:

[ ~/dev/thoughtbot/project master ] ✔ ag AccessToken
app/controllers/access_tokens_controller.rb
1:class AccessTokensController < ApplicationController
15:    @project = AccessToken.find_project(params[:id])

app/models/access_token.rb
1:class AccessToken < ActiveRecord::Base

db/migrate/20140416195446_create_access_tokens.rb
1:class CreateAccessTokens < ActiveRecord::Migration

db/migrate/20140718175701_add_index_on_access_tokens_project_id.rb
1:class AddIndexOnAccessTokensProjectId < ActiveRecord::Migration

spec/models/access_token_spec.rb
3:describe AccessToken, 'Associations' do
7:describe AccessToken, '.find_project' do
12:      result = AccessToken.find_project(access_token.to_param)
20:      expect { AccessToken.find_project('unknown') }.
26:describe AccessToken, '.generate' do
40:describe AccessToken, '#to_param' do
50:    expect(AccessToken.find(access_token.to_param)).to eq(access_token)

Voila! Tab completion with ag based on the tags file.

If you’re using thoughtbot’s dotfiles, you already have this behavior.

Automatic versioning in Xcode with git-describe

Posted 24 days back at zargony.com

Do you manually set a new version number in Xcode every time you release a new version of your app? Or do you use some tool that updates the Info.plist in your project like agvtool or PlistBuddy? Either way, you probably know that it's a pain to keep track of the version number in the project.

I recently spent some time trying out various methods to automatically get the version number from git and put it into the app that Xcode builds. I found that most of them have drawbacks, but in the end I found a way that I finally like most. Here's how.

Getting a version number from git

Why should we choose a version number manually if we're using git to manage all source files anyway. Git is a source code management tool that keeps track of every change and is able to uniquely identify every snapshot of the source by its commit ids. The most obvious idea would be to simply use the commit ids as the version number of your software, but unfortunately (because of the distributed nature of git) commit ids are not very useful to the human reader: you can't tell at once which one is earlier and which is later.

But git has a very useful command called git describe that extracts a human readable version number from the repository. If you check out a specific tagged revision of your code, git describe will print the tag's name. If you check out any another commit, it will go back the commit history to find the latest tag and print its name followed by the number of commits and the current commit id. This is incredible useful to exactly describe the currently checked out version (hence the name of this command).

If you additionally use the --dirty option, git describe will append the string '-dirty' if your working directory isn't clean (i.e. you have uncommitted changes). Perfect!

So if tag all releases of your app (which you should be doing already anyway), it's easy to automatically create a version number with git describe --dirty for any commit, even between releases (e.g. for betas).

Here are some examples of version numbers:

v1.0                   // the release version tagged 'v1.0'
v1.0-8-g1234567        // 8 commits after release v1.0, at commit id 1234567
v1.0-8-g1234567-dirty  // same as above but with unspecified local changes (dirty workdir)

Automatically set the version number in Xcode

You'll find several ideas how to use automaticly generated version numbers in Xcode projects if you search the net. However most of them have drawbacks that I'd like to avoid. Some ways use a custom build phase to create a header file containing the version number. Besides that this approach gets more complicated with Swift, it only allows you to display the version number in your app, but doesn't set it in the app's Info.plist. Most libraries like crash reporters or analytics will take the version number from Info.plist, so it's useful to have the correct version number in there.

So let's modify the Info.plist using PlistBuddy in a custom build phase. But we don't want to modify the source Info.plist, because that would change a checked-in file and lead to a dirty workdir. We need to modify the Info.plist inside the target build directory (after the ProcessInfoPlistFile build rule ran).

Instructions

  • Add a new run script build phase with the below script.
  • Make sure it runs late during building by moving it to the bottom.
  • Make sure that the list of input files and output files is empty and that "run script only when installing" is turned off.
# This script sets CFBundleVersion in the Info.plist of a target to the version
# as returned by 'git describe'.
# Info: http://zargony.com/2014/08/10/automatic-versioning-in-xcode-with-git-describe
set -e
VERSION=`git describe --dirty |sed -e "s/^[^0-9]*//"`
echo "Updating Info.plist version to: ${VERSION}"
/usr/libexec/PlistBuddy -c "Set :CFBundleVersion ${VERSION}" "${TARGET_BUILD_DIR}/${INFOPLIST_PATH}"
/usr/bin/plutil -convert ${INFOPLIST_OUTPUT_FORMAT}1 "${TARGET_BUILD_DIR}/${INFOPLIST_PATH}"

Thoughts

  • By keeping the list of output files empty, Xcode runs the script every time (otherwise it would detect an existing file and skip running the script even if changes were made and the version number may have changed)
  • Some sed magic strips any leading non-numbers from the version string so that you can use tags like release-1.0 or v1.5.
  • PlistBuddy converts the plist to XML, so we're running plutil at the end to convert it back to the desired output format (binary by default)
  • If you need more information than just the output of git describe, try the excellent "autorevision" script.

Episode #487 - August 8th, 2014

Posted 24 days back at Ruby5

Beautiful API documentation, deprecating paths in Rails mailers, taking RubySteps, meeting Starboard, and the new Heroku Button

Listen to this episode on Ruby5

Sponsored by New Relic

New Relic is _the_ all-in-one web performance analytics product. It lets you manage and monitor web application performance, from the browser down to the line of code. With Real User Monitoring, New Relic users can see browser response times by geographical location of the user, or by browser type.
This episode is sponsored by New Relic

tripit/slate

Beautiful static documentation for your API
tripit/slate

Deprecating *_path in Mailers

"Email does not support relative links since there is no implicit host. Therefore all links inside of emails must be fully qualified URLs. All path helpers are now deprecated."
Deprecating *_path in Mailers

RubySteps

Daily coding practice via email and interactive lessons, every weekday.
RubySteps

Starboard

Starboard is a tool which creates Trello boards for tracking the various tasks necessary when onboarding, offboarding, or crossboarding employees.
Starboard

Heroku Button

One-click deployment of publicly-available applications on GitHub
Heroku Button

Thank You for Listening to Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

Intent to Add

Posted 25 days back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

The git add command runs blind, but can be controlled with more fine-grained precision using the --patch option. This works great for modified and deleted files, but untracked files do not show up.

$ echo "Hello, World!" > untracked
$ git status --short
?? untracked
$ git add --patch
No changes.

To remedy this, the --intent-to-add option can be used. According to git-add(1), --intent-to-add changes git add’s behavior to:

Record only the fact that the path will be added later. An entry for the path is placed in the index with no content. This is useful for, among other things, showing the unstaged content of such files with git diff and committing them with git commit -a.

What this means is that after running git add --intent-to-add, the specified untracked files will be added to the index, but without content. Now, when git add --patch is run it will show a diff for each previously staged untracked file with every line as an addition. This gives you a chance to look through the file, line by line, before staging it. You can even decide not to stage specific lines by deleting them from the patch using the edit command.

$ echo "Hello, World!" > untracked
$ git status --short
?? untracked
$ git add --intent-to-add untracked
$ git status --short
AM untracked
$ git add --patch
diff --git a/untracked b/untracked
index e69de29..8ab686e 100644
--- a/untracked
+++ b/untracked
@@ -0,0 +1 @@
+Hello, World!
Stage this hunk [y,n,q,a,d,/,e,?]?

In my .gitconfig I alias add --all --intent-to-add to aa and add --patch to ap which means that for most commits, I type:

$ git aa
$ git ap

Or in gitsh:

& aa
& ap

Linked Development: Linked Data from CABI and DFID

Posted 26 days back at RicRoberts :

In March this year we launched the beta version of Linked Development. It’s a linked open data site for CABI and the DFID, which provides data all about international development projects and research.

Linked Development site screenshot

This is a slightly unusual one for us: we’re taking it on after others have worked on it in the past. It’s currently in beta release and we’re hosting it on our PublishMyData service so users can easily get hold of the data they want in both human, and machine readable, formats. So it’s comes with most of the usual benefits we offer: thematic data browsing, a SPARQL endpoint and Linked Data APIs. And, because the data’s linked, each data point has a unique identifier so users can select and combine data from different data sources to get the exact information they’re after.

We’ve also rebuilt the site’s custpom Research Documents API, that was offered by the alpha version of the site to make it faster and more robust (it’s backward-compatible with the previous version).

Linked Development custom API screenshot

This site illustrates what’s possible for government organisations using linked data: it allows for collective ownership of data and data integration whilst aiming to improve audience reach and data availablility. It’s great to see linked data being embraced by an increasing number of public bodies.