A Guide to Core Data Concurrency

Posted 7 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

The iOS ethos of instant responsive UI elements means putting as much work as possible in background threads and as little work in the main thread. For most cases we are fine with using an NSOperationQueue or GCD, but getting concurrency to work in core data sometimes feels more like black magic than science. This post intends to demystify concurrency and offer two ways to go about it.

Setup 1: Private queue context and main queue context off of single persistent store coordinator

In this setup we will make two NSManagedObjectContext instances one with concurrency type NSMainQueueConcurrencyType and the other with type NSPrivateQueueConcurrencyType. We will attach ourself to the NSManagedObjectContextDidSaveNotification to propagate saves.

We add two methods to our TBCoreDataStore.h file:

+ (NSManagedObjectContext *)mainQueueContext;
+ (NSManagedObjectContext *)privateQueueContext;

In our implementation file we add two private properties and lazy load them:

@interface TBCoreDataStore ()

@property (strong, nonatomic) NSPersistentStoreCoordinator *persistentStoreCoordinator;
@property (strong, nonatomic) NSManagedObjectModel *managedObjectModel;

@property (strong, nonatomic) NSManagedObjectContext *mainQueueContext;
@property (strong, nonatomic) NSManagedObjectContext *privateQueueContext;

@end

#pragma mark - Singleton Access

+ (NSManagedObjectContext *)mainQueueContext
{
    return [[self defaultStore] mainQueueContext];
}

+ (NSManagedObjectContext *)privateQueueContext
{
    return [[self defaultStore] privateQueueContext];
}

#pragma mark - Getters

- (NSManagedObjectContext *)mainQueueContext
{
    if (!_mainQueueContext) {
        _mainQueueContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSMainQueueConcurrencyType];
        _mainQueueContext.persistentStoreCoordinator = self.persistentStoreCoordinator;
    }

    return _mainQueueContext;
}

- (NSManagedObjectContext *)privateQueueContext
{
    if (!_privateQueueContext) {
        _privateQueueContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
        _privateQueueContext.persistentStoreCoordinator = self.persistentStoreCoordinator;
    }

    return _privateQueueContext;
}

Next we override the initializer to add our observing:

- (id)init
{
    self = [super init];
    if (self) {
        [[NSNotificationCenter defaultCenter] addObserver:self selector:@selector(contextDidSavePrivateQueueContext:)name:NSManagedObjectContextDidSaveNotification object:[self privateQueueContext]];
        [[NSNotificationCenter defaultCenter] addObserver:self selector:@selector(contextDidSaveMainQueueContext:) name:NSManagedObjectContextDidSaveNotification object:[self mainQueueContext]];
    }
    return self;
}

- (void)dealloc
{
    [[NSNotificationCenter defaultCenter] removeObserver:self];
}

- (void)contextDidSavePrivateQueueContext:(NSNotification *)notification
{
    @synchronized(self) {
        [self.mainQueueContext performBlock:^{
            [self.mainQueueContext mergeChangesFromContextDidSaveNotification:notification];
        }];
    }
}

- (void)contextDidSaveMainQueueContext:(NSNotification *)notification
{
    @synchronized(self) {
        [self.privateQueueContext performBlock:^{
            [self.privateQueueContext mergeChangesFromContextDidSaveNotification:notification];
        }];
    }
}

Now we have a working private queue and main queue context which will both be updated whenever one is saved. Here is an example usage:

[[TBCoreDataStore privateQueueContext] performBlock:^{
    NSFetchRequest *fetchRequest = [NSFetchRequest fetchRequestWithEntityName:@"MyEntity"];
    NSArray *results = [[TBCoreDataStore privateQueueContext] executeFetchRequest:fetchRequest error:nil];
}];

One of the great advantages of this type of core data stack is it allows us to make great use of NSFetchedResultsController. An example of this is parsing JSON from a web service into a core data object as a background operation and then using the fetched results controller to indicate when said object has changed and updating the UI as a result.

Setup 2: The throwaway main queue context backed by a private queue context.

In this setup we will have only one NSManagedObjectContext which will stay with us for the life time of the app. This will be a private queue context which we will use to create child main queue contexts from. This allows us to spend as much time in the background and only when we need to do UI work do we create a new main queue context.

Starting from our base core data setup we add the following to TBCoreDataStore.h:

+ (NSManagedObjectContext *)newMainQueueContext;
+ (NSManagedObjectContext *)defaultPrivateQueueContext;

In our implementation file we add a single property and lazy load it:

@interface TBCoreDataStore ()

@property (strong, nonatomic) NSPersistentStoreCoordinator *persistentStoreCoordinator;
@property (strong, nonatomic) NSManagedObjectModel *managedObjectModel;

@property (strong, nonatomic) NSManagedObjectContext *defaultPrivateQueueContext;

@end

#pragma mark - Singleton Access

+ (NSManagedObjectContext *)newMainQueueContext
{
    NSManagedObjectContext *context = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSMainQueueConcurrencyType];
    context.parentContext = [self defaultPrivateQueueContext];
    
    return context;
}

+ (NSManagedObjectContext *)defaultPrivateQueueContext
{
    return [[self defaultStore] defaultPrivateQueueContext];
}

#pragma mark - Getters

- (NSManagedObjectContext *)defaultPrivateQueueContext
{
    if (!_defaultPrivateQueueContext) {
        _defaultPrivateQueueContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
        _defaultPrivateQueueContext.persistentStoreCoordinator = self.persistentStoreCoordinator;
    }

    return _defaultPrivateQueueContext;
}

Here we have no need to observe save notifications as any saves on the created main queue context will bubble up to its parent the defaultPrivateQueueContext. This approach is very robust and spends the least possible time on the main queue. The downside is that it we cannot use the NSFetchedResultsController out of the box though we could cobble our own version using the different notifications sent out by core data.

Lets say we have a really big database (20k+ objects) and we want to do a complex fetch, the best way to go about this is to first use the background queue to fetch the NSManagedObjectIDs and then hop on the main queue and call -objectWithID on the results. This is how we should always be passing around managed objects between threads.

[[TBCoreDataStore defaultPrivateQueueContext] performBlock:^{

    NSFetchRequest *fetchRequest = [NSFetchRequest fetchRequestWithEntityName:@"MyEntity"];
    fetchRequest.resultType = NSManagedObjectIDResultType;

    NSArray *managedObjectIDs = [[TBCoreDataStore defaultPrivateQueueContext] executeFetchRequest:fetchRequest error:nil];

    NSManagedObjectContext *mainQueueContext = [TBCoreDataStore newMainQueueContext];
    [mainQueueContext performBlock:^{

        for (NSManagedObjectID *managedObjectID in managedObjectIDs) {
            MyEntity *myEntity = [mainQueueContext objectWithID:managedObjectID];
            // Update UI with myEntity
        }
    }];
}];

In this scenario we need to update our UI with a bunch of MyEntity managed objects. For efficiencie's sake we perform the costly fetch in the background and set the result type to be NSManagedObjectIDResultType which will return NSManagedObjectIDs. We then create a new mainQueueContext and get each managed object from the cache by using [mainQueueContext objectWithID:managedObjectID]. These objects are then safe to use on the main thread.

If your fetch is not too intensive we can just perform it on your new main queue context. If we want to use this stack I recommend making this snippet:

NSManagedObjectContext *mainQueueContext = [TBCoreDataStore newMainQueueContext];
[mainQueueContext performBlock:^{
    <#code#>
}];

Caveats

When performing extremely intensive fetch operations (10+ seconds) in the background thread and simultaneously needing to perform operations on another thread we will run into blockage. To prevent this from happening we should perform this operation an an entirely new context linked to a entirely new PSC. This will ensure that the operation stays in the background.

Useful utility methods

An extremely useful little one liner is the ability to turn an NSManagedObjectID into a string. we can use this to store the ID in the user defaults.

@implementation NSManagedObjectID (TBExtras)

- (NSString *)stringRepresentation
{
    return [[self URIRepresentation] absoluteString];
}

The flip side of this is then to get an NSManagedObjectID out of such a string. Add this method to your CoreDataStore:

+ (NSManagedObjectID *)managedObjectIDFromString:(NSString *)managedObjectIDString
{
    return [[[self defaultStore] persistentStoreCoordinator] managedObjectIDForURIRepresentation:[NSURL URLWithString:managedObjectIDString]];
}

With these two methods we have an easy way to build a cache on disk by using a plist. This is useful for saving a list of managed objects which need to be updated or maybe deleted between app launches.

Creating a managed object is a pain, so here is a little method which will make your life better:

@implementation NSManagedObject (TBAdditions)

+ (instancetype)createManagedObjectInContext:(NSManagedObjectContext *)context
{
    NSEntityDescription *entity = [NSEntityDescription entityForName:NSStringFromClass([self class]) inManagedObjectContext:context];
    return  [[[self class] alloc] initWithEntity:entity insertIntoManagedObjectContext:context];
}

Finally, while Apple does provide a method to get an NSManagedObject from an NSManagedObjectID we often want to convert a whole array of ids into objects to do this we can use the following:

@implementation NSManagedObjectContext (TBAdditions)

- (NSArray *)objectsWithObjectIDs:(NSArray *)objectIDs
{
    if (!objectIDs || objectIDs.count == 0) {
        return nil;
    }
    __block NSMutableArray *objects = [[NSMutableArray alloc] initWithCapacity:objectIDs.count];

    [self performBlockAndWait:^{
        for (NSManagedObjectID *objectID in objectIDs) {
            if ([objectID isKindOfClass:[NSNull class]]) {
                continue;
            }

            [objects addObject:[self objectWithID:objectID]];
        }
    }];

    return objects.copy;
}

@end

What's next?

I've placed the two core data stacks on GitHub.

Mark and Gordon talked about this on Build Phase episode 18.

Custom Ember Computed Properties

Posted 7 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

EmberJS has a lot of features for helping you build a clean JavaScript interface. One of my favorites is the computed property; Ember can watch a property or set of properties and when any of those change, it recalculates a value that is currently displayed on a screen:

fullName: (->
  "#{@get('firstName)} #{@get('lastName')}"
).property('firstName', 'lastName')

Any time the firstName or lastName of the current object change, the fullName property will also be updated.

In my last project I needed to cacluate the sum of a few properties in an array. I started with a computed property:

sumOfCost: (->
  @reduce ((previousValue, element) ->
    currentValue = element.get('cost')
    if isNaN(currentValue)
      previousValue
    else
      previousValue + currentValue
  ), 0
).property('@each.cost')

This works fine but I need to use this same function for a number of different properties on this controller as well as others. As such, I extracted a helper function:

# math_helpers.js.coffee
class App.mathHelpers
  @sumArrayForProperty: (array, propertyName) ->
    array.reduce ((previousValue, element) ->
      currentValue = element.get(propertyName)
      if isNaN(currentValue)
        previousValue
      else
        previousValue + currentValue
    ), 0


# array_controller.js.coffee
sumOfCost: (->
  App.mathHelpers.sumArrayForProperty(@, 'cost')
).property('@each.cost')

This removes a lot of duplication but I still have the cost property name in the helper method as well as the property declation. I also have the 'decoration' of setting up a computed property in general.

What I need is something that works like Ember.computed.alias('name') but allows me to transform the object instead of just aliasing a property:

# computed_properties.js.coffee
App.computed = {}

App.computed.sumByProperty = (propertyName) ->
  Ember.computed "@each.#{propertyName}", ->
    App.mathHelpers.sumArrayForProperty(@, propertyName)

# array_controller.js.coffee
sumOfCost: App.computed.sumByProperty('cost')

This allows me to easily define a 'sum' for a property without a lot of duplication. In this application I have a lot of similar functions around computing information in arrays. Being able to easily have one function for calculation allowed me to easily unit test that function and feel confident that it would work on any other object. It also simplifies the model or controller a lot for anyone viewing the class for the first time.

Prevent Spoofing with Paperclip

Posted 7 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Egor Homakov recently brought to my attention a slight problem with how Paperclip handles some content type validations. Namely, if an attacker puts an entire HTML page into the EXIF tag of a completely valid JPEG and named the file "gotcha.html", they could potentially trick users into an XSS vulnerability.

Now, this is kind of a convoluted means of attacking. It involves:

  • A server that's running Paperclip configured to not validate content types or filenames
  • A front-end HTTP server that will serve the assets with a content type based on their file name
  • The attacker must get the user to load the crafted image directly (injecting it in an img tag is not enough)

Even with this list of requirements, it's possible, and so we need to take it seriously.

Content Type Spoof Detection

To combat this, we've released Paperclip 4.0 (and then quickly released 4.1), which has a few new restrictions in order to improve out-of-the-box security. The change that handles this problem directly is an automatic validation that checks uploaded files for content type spoofing. That is, if you upload a JPEG and name it .html, it's not going to get through. This happens automatically during the upload process, and uses the file command in order to determine the actual content type of the file. If you don't have file already (for example, because you're on Windows), you can install the file command separately.

Required Content Type or Filename Validations

Next, we're also turning on a new requirement: You must have a content type or filename validation, or you must explicitly opt-out of it.

class ActiveRecord::Base
  has_attached_file :avatar

  # Validate content type
  validates_attachment_content_type :avatar, :content_type => /\Aimage/

  # Validate filename
  validates_attachment_file_name :avatar, :matches => [/png\Z/, /jpe?g\Z/]

  # Explicitly do not validate
  do_not_validate_attachment_file_type :avatar
end

Note that older versions of Paperclip are susceptible to this attack if you don't have a content type validation. If you do have one, then you are protected against people crafting images to perform this type of attack.

The filename validation is new with 4.0.0. We know that some people don't store the content types on their models, but still need a way to be valid. Using the file name can help ensure you're only getting the kinds of files you expect, and all Paperclip attachments have that. This will allow those users to upgrade without having to implement a possibly costly migration of that data into their database.

Content Type Mapping

Immediately, some users reported problems with the spoof detection added in 4.0. In order to fix this, we released 4.1 that added an option called :content_type_mappings that will allow you to specify an extension that cannot otherwise be mapped. For example:

Paperclip.options[:content_type_mappings] = {
  :pem => "text/plain"
}

This will allow users to upload ".pem" files (public certificates for encryption), because file considers those files as "text/plain". This will tell Paperclip "I consider a .pem file that file calls 'text/plain' to be correct" and it will accept it.

Handling console.log errors

Posted 7 months back at Web On Rails

https://twitter.com/bansalakhil/status/433950675613921280

Announcing Taco Tuesdays: A Product Design Talk Series at thoughtbot SF

Posted 7 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

We're excited to introduce a new talk series focused on product design, hosted by thoughtbot in San Francisco.

On February 25th at 6:30pm, we will host the first Taco Tuesdays event at thoughtbot San Francisco (85 2nd St, Suite 700, 94105)

Our first two speakers are Adam Morse, product designer at Salesforce, and Wells Riley, product designer at KickSend and Hack Design. They will give talks on the topic: "What is a design problem you recently encountered, and how did you approach it?"

Food (in the form of tacos) and beverages will be provided.

RSVP at Eventbrite for free to join us. Hope to see you there!

Function Currying in CoffeeScript

Posted 7 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Have you had a function that takes two arguments, but you want to pass the second argument in later? Here's one possible example:

updateUsers = (db, users) ->
  _.map(users, (user) -> updateUser(db, user))

updateUser = (db, user) -> db.update("users", name: user.name)

This is messy, and extremely hard to parse mentally. However, we can make this a bit cleaner using function currying. Originally worked out by Haskell Curry, function currying is the act of taking a function that takes multiple arguments, and replacing it with a chain of functions that take a single argument each.

Curried functions take advantage of closures to emulate multiple arguments. In some functional languages, such as haskell or the lambda calculus, curried functions are the only way to pass multiple arguments to functions.

If we curry our add function, our code becomes much more readable.

updateUsers = (db, users) ->
  _.map(users, updateUser(db))

updateUser = (db) -> (user) -> db.update("users", name: user.name)

The resulting JavaScript for our updateUser function will look like this:

var updateUser = function(db) {
  return function(user) {
    return db.update("users", { name: user.name });
  };
};

It's a simple trick, but a prime example of how CoffeeScript's syntax can make certain tasks much cleaner!

Episode #439 - February 11th, 2014

Posted 7 months back at Ruby5

In this episode we cover Structuring Sinatrap Apps, REST clients with ActiveRestClient, supporting 12-Factor App with ENV_BANG using Foreman to manage services and a new DSL for creating objects with MooseX.

Listen to this episode on Ruby5

Sponsored by TopRubyJobs

If you're looking for a top Ruby job or for top Ruby talent, then you should check out Top Ruby Jobs. Top Ruby Jobs is a website dedicated to the best jobs available in the Ruby community.
This episode is sponsored by Top Ruby Jobs

Structuring Sinatra Apps with Trevi

Last week, Alex MacCaw posted an article on the Sourcing.io blog which focused on a very opinionated way to develop and structure Sinatra applications. He’s even released a companion gem called Trevi that bundles all of this knowledge up and helps you follow along.
Structuring Sinatra Apps with Trevi

ActiveRestClient

ActiveRestClient is a gem is for accessing REST services in an ActiveRecord style. It aims to be a more flexible alternative to ActiveResource. It allows things like setting different endpoints for different REST actions and has additional features like built-in caching.
ActiveRestClient

ENV!

ENV! is a variant for supporting 12-Factor Apps similar to dotenv, but which provides a bit more friendly onboarding experience to a new application. Where dotenv just loads whatever is in your .env file into ENV, ENV! will fail loudly if required variables are undefined or missing and gives you the opportunity to provide helpful messages in that case.
ENV!

Using Foreman to Manage services

Maurício Linhares published an article last week detailing how to use Foreman to isolate and manage application development on OS X machines. He points out that while installing Postgres, for example, is a good thing, you don’t necessarily need it running all the time. The same is true for other application dependencies, like Redis.
Using Foreman to Manage services

MooseX

MooseX is a DSL that helps to make Object Oriented programming in Ruby easier, more consistent, and less tedious. The gem is maintained by Tiago Peczenyj and it's based on Perl's Moose and Moo, two very popular modules in the Perl community. With MooseX you can think more about what you want to do and less about the mechanics of OOP.
MooseX

RubyHeroes

The nominations are open for Ruby Heroes 2014. Head on over to rubyheroes.com, armed with the GitHub usernames of people who have made this past last year a pleasure for you to be in the Ruby community.
RubyHeroes

Thank You for Listening to Ruby5

Ruby5 is released Tuesday and Friday mornings. To stay informed about and active with this podcast, we encourage you to do one of the following:

Thank You for Listening to Ruby5

brew leaves

Posted 7 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

No, it's not about tea. We're continuing our rundown of lesser-known Homebrew features with brew leaves. Let's check the man brew page:

leaves   Show installed formulae that are not dependencies of another installed formula.

Or, in more computer science-y terms, it shows you the leaves of the Homebrew dependency graph.

When to use it

brew leaves shows you programs that you can safely uninstall. If you want to clean house, just run brew leaves and happily uninstall:

$ brew leaves | wc -l
45
$ brew leaves
...
leiningen
...
pngcrush
...

We have 45 leaves. We haven't used leiningen in a while, and forgot pngcrush was even installed. Let's uninstall:

$ brew uninstall pngcrush leiningen
$ brew leaves | wc -l
43

We now have 2 fewer leaves. If pngcrush or leiningen were the only things that depended on a third package foo, then uninstalling those two packages would make foo a new leaf, since now nothing depends on foo.

Easily create a Brewfile

Brewfiles are an easy way to install frequently-used Homebrew packages on a new machine. We can easily create a Brewfile using brew leaves:

$ brew leaves | sed 's/^/install /' > Brewfile
$ wc -l Brewfile
42
$ head -3 Brewfile
install aspell
install bison
install colordiff

Now all 42 packages we depend on are neatly listed. One possible concern is that a package will be left out - for example, we use rbenv but it's not in the Brewfile. This is because we also have rbenv-gem-rehash installed, which depends on rbenv, making rbenv not a leaf. Since rbenv-gem-rehash depends on rbenv, installing it will also install rbenv. We're safe.

What's next?

You can learn how to start and stop background services in Homebrew. You can also take a deep dive into graph theory.

Announcing gitsh

Posted 7 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

gitsh is a new way to use Git: instead of running Git commands in a general purpose shell like zsh or bash, gitsh provides you with a dedicated shell just for your Git commands.

<iframe src="//fast.wistia.net/embed/iframe/wkl3njtmz0" allowtransparency="true" frameborder="0" scrolling="no" class="wistia_embed" name="wistia_embed" allowfullscreen="" mozallowfullscreen="" webkitallowfullscreen="" oallowfullscreen="" msallowfullscreen="" width="640" height="480"></iframe>

Many of the early Unix utilities, like dc, didn't take sub-commands like Git and other modern programs do, instead they launched a shell. For a program like Git, which has so many commands and options, interacting via a shell still makes a lot of sense, and so gitsh follows in this long Unix tradition.

Save yourself some typing

At its simplest, gitsh saves you from typing the word git over and over.

Git commands are very moreish, you almost never just want one. If you work with Git, this flurry of commands is probably very familiar to you:

$ git status
$ git add -p
$ git commit
$ git push

With gitsh this gets easier:

$ gitsh
gitsh@ status
gitsh@ add -p
gitsh@ commit
gitsh@ push
gitsh@ :exit
$

All of your Git aliases will work in gitsh too, so you can save yourself even more typing.

Deep integration

Now we're in a dedicated Git shell, there's a lot more it can do than just save us a few keystrokes. gitsh is only concerned with Git, so it has all kinds of little ways to make using Git easier.

What's my status?

Of all the Git commands, I find myself using git status most often. If I'm about to commit, or push, or pull, it's a great way of quickly checking where things are up to.

In gitsh, if you hit return without entering a command, we assume you wanted a status, saving you even more typing and making it really easy to check the status after any command.

If you prefer the more taciturn output of git status -s, or find yourself using a completely different command with annoying regularity, you can always change gitsh's default command by setting the gitsh.defaultCommand variable using git config:

gitsh@ config --global gitsh.defaultCommand "status -s"

Tab completion and Git prompts

In gitsh you automatically get tab completion for commands, branch names, and paths, and the name and status of the current branch in your prompt. For example, if everything is committed and your working directory is clean the prompt is blue and ends with @, but if you have untracked files the prompt is red and ends with !.

It is possible to set up parts of this in bash or zsh, but it can be fiddly to get working, easily broken, and can interact strangely with aliases and third-party Git commands.

Git environment variables

Like most general purpose Unix shells, gitsh also provides environment variables. You can set a variable using the :set command, and read them using a $ prefix:

gitsh@ :set message "A commit message"
gitsh@ commit -m $message

If the variable name contains a dot, it will temporarily override one of your git config settings, until the end of your gitsh session. This is useful when pair programming:

gitsh@ :set user.name "George Brocklehurst & Mike Burns"
gitsh@ :set user.email support+george+mburns@thoughtbot.com
gitsh@ commit -m "We are pair programming!"

Convinced?

If you're on Mac OS X, you can install gitsh via Homebrew:

brew tap thoughtbot/formulae
brew install gitsh

If you're on Linux, there are install instructions in the gitsh README.

Don't forget to check out the man page:

man gitsh

And if you do find a bug, please report it on the gitsh GitHub repo.

How to Evaluate Your Rails JSON API for Performance Improvements

Posted 7 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

Let's say your company's product is a mobile app that gets its data from an internal JSON API. The API, built using Rails, is a few years old. Response objects are large, request latency is high, and your data indicates mobile users aren’t converting because of it.

It can be tempting to immediately dig into your code and look for N+1 queries to refactor. But if you have the time and bandwidth, try to view this as a great opportunity to take a step back and rethink the high-level requirements for your JSON API. Starting with a conversation about the desired functionality of each endpoint will help keep your team's efforts focused on delivering no more than is required by the client, as efficiently as possible.

Grab your team for a whiteboarding session and review your assumptions about the behavior of each API endpoint:

  • How is this endpoint currently being used by the client?
  • What information does the client require for display to the user?
  • What needs to be done on the server side before sending a response to the client?
  • How frequently does the response content change?
  • Why does the response content change?

With the big picture in mind, review your Rails code to identify opportunities for improving performance. In addition to those N+1 queries, keep an eye out for the following patterns:

The response object has properties the client doesn’t use

If you're using #as_json to serialize your ActiveRecord models, it's possible your application is returning more than the client needs. To address this, consider using ActiveModel Serializers instead of #as_json.

The delivery of the response has unnecessary dependencies

Let's say your API has an endpoint the clients uses for reporting analytics events. Your controller might look something like this:

class AnalyticsEventsController < ApiController
  def create
    job = AnalyticsEventJob.new(params[:analytics_event])

    if job.enqueue
      head 201
    else
      head 422
    end
  end
end

Something to consider here is whether the client really needs to know if enqueueing the job is successful. If not, a simple improvement which preserves the existing interface might look something like this:

class AnalyticsEventsController < ApiController
  before_filter :ensure_valid_params, only: [:create]

  def create
    job = AnalyticsEventJob.new(analytics_event_params)
    job.enqueue
    head 201
  end

  private

  def ensure_valid_params
    unless analytics_event_params.valid?
      head 422
    end
  end

  def analytics_event_params
    analytics_event_params ||= AnalyticsParametersObject.new(
      params[:analytics_event]
    )
  end
end

With these changes, the server will respond with a 422 only when the request parameters are invalid.

Static responses aren't being cached effectively

It's possible your Rails application is handling more requests than necessary. Data which is requested frequently by the client but changes infrequently – the current user, for example – presents an opportunity for HTTP caching. Think about using a CDN like Fastly to provide a caching layer.

What's next?

The next step after implementing optimizations for performance is to measure performance gains. You can use tools like JMeter or services like BlazeMeter and Blitz.io to perform load tests in your staging environment.

It's good to keep in mind that through the process of evaluating and improving your Rails application, your team may discover your API is out of date with the needs of the client. You may also see opportunities to move processes currently handled by your Rails application (e.g. persisting and reporting on analytics events) into separate services.

If an API redesign is in order and the idea of non-RESTful routing doesn't make you too uncomfortable, you can explore the possibility of adding an orchestration layer to your API.

Episode #438 - February 7th, 2014

Posted 7 months back at Ruby5

We learn about Recursion a list of deprecated stuff in Ruby and the value of Rails worst practices

Listen to this episode on Ruby5

Recursion

Dave Bock was recently on the Ruby Hangout and gave a great presentation on recursion for ruby developers.
Recursion

7 Lines Every Gem's Rakefile Should Have

Ernie Miller published a post showing you how to create a rake console task to load irb and require your gem so you can have a console to play around with it
7 Lines Every Gem's Rakefile Should Have

Token Based Authentication in Rails

using authenticate_or_request_with_http_token for token based API authentication
Token Based Authentication in Rails

A List of Deprecated Stuff in Ruby

Bozhidar Batsov went through the the code built a list of decrecated stuff in Ruby
A List of Deprecated Stuff in Ruby

The value of Rails worst practices

When interviewing potential Rails developers, Devin found that the quickest way to gauge the experience of a potential hire is to show them some shockingly bad Rails code and ask them what they see
The value of Rails worst practices

Sponsored by NewRelic

Using their Real User Monitoring feature, they've once again culled the average browser speeds experienced by end users of nearly 3 million application instances and the data doesn’t lie.
NewRelic

Every line of code is always documented

Posted 7 months back at No Strings Attached

Every line of code comes with a hidden piece of documentation. It’s just not immediately visible.

Whoever wrote line 4 of the following code snippet decided to access the clientLeft property of a DOM node for some reason, but do nothing with the result. It’s pretty mysterious. Can you tell why they did it, or is it safe to change or remove that call in the future?

1 // ...
2 if (duration > 0) this.bind(endEvent, wrappedCallback)
3 
4 this.get(0).clientLeft
5 
6 this.css(cssValues)

If someone pasted you this code, like I did here, you probably won’t be able to tell who wrote this line, what was their reasoning, and is it necessary to keep it. However, most of the time when working on a project you’ll have access to its history via version control systems.

A project’s history is its most valuable documentation.

The mystery ends when we view the commit message which introduced this line:

$ git show $(git blame example.js -L 4,4 | awk '{print $1}')

Fix animate() for elements just added to DOM

Activating CSS transitions for an element just added to the DOM won’t work in either Webkit or Mozilla. To work around this, we used to defer setting CSS properties with setTimeout (see 272513b).

This solved the problem for Webkit, but not for latest versions of Firefox. Mozilla seems to need at least 15ms timeout, and even this value varies.

A better solution for both engines is to trigger “layout”. This is done here by reading clientLeft from an element. There are other properties and methods that trigger layout; see gent.ilcore.com/2011/03/how-not-to-trigger-layout-in-webkit

As it turns out, this line—more specifically, the change which introduced this line—is heavily documented with information of why it’s necessary, why did the previous approach (referred to by a commit SHA) not work, which browsers are affected, and a link for further reading.

So does every other line in the project have documentation, going back to the first day when the project was created. The quality of this documentation, however, relies heavily on the diligence of the people involved while writing good commit messages.

Effective spelunking of project’s history

git blame

I’ve already demonstrated how to use git blame from the command line above. When you don’t have access to the local git repository, you can also open the “Blame” view for any file on GitHub.

A very effective way of exploring a file’s history is with Vim and Fugitive:

  1. Use :Gblame in a buffer to open the blame view;
  2. Press P on a line of blame pane to re-blame at the parent of that commit, if you need to go deeper;
  3. Press o to open a split showing the commit currently selected in the blame pane.
  4. Use :Gbrowse in the commit split to open the commit in the GitHub web interface;
  5. Press C-o in the main buffer to close all other splits when you’re done exploring. Optionally, use :Gedit to reset the buffer to the most recent version in case you did any spelunking with P earlier.

git blame view in vim Fugitive

Find the pull request where a commit originated

With git blame you might have obtained a commit sha that introduced a change, but commit messages don’t always carry enough information or context to explain the rationale behind the change. However, if the team behind a project practices GitHub Flow, the context might be found in the pull request discussion:

$ hub log --merges --ancestry-path --oneline <SHA>..origin | tail
# ...
bc4712d Merge pull request #42 from sticky-sidebar
3f883f0 Merge branch 'master' into sticky-sidebar

Here, a single commit SHA was enough to discover that it originated in pull request #42.

The git pickaxe

Sometimes you’ll be trying to find something that is missing: for instance, a past call to a function that is no longer invoked from anywhere. The best way to find which commits have introduced or removed a certain keyword is with the ‘pickaxe’ argument to git log:

$ git log -S<string>

This way you can dig up commits that have, for example, removed calls to a specific function, or added a certain CSS selector.

Being on the right side of history

Keep in mind that everything that you’re making today is going to enter the project’s history and stay there forever. To be nicer to other people who work with you (even if it’s a solo project, that includes yourself in 3 months), follow these ground rules when making commits:

  • Always write commit messages as if you are explaining the change to a colleague sitting next to you who has no idea of what’s going on. Per Thoughtbot’s tips for better commit messages:

    Answer the following questions:

    • Why is this change necessary?
    • How does it address the issue?
    • What side effects does this change have?
    • Consider including a link [to the discussion.]
  • Avoid unrelated changes in a single commit. You might have spotted a typo or did tiny code refactoring in the same file where you made some other changes, but resist the temptation to record them together with the main change unless they’re directly related.

  • Always be cleaning up your history before pushing. If the commits haven’t been shared yet, it’s safe to rebase the heck out of them. The following could have become permanent history of the Faraday project, but I squashed it down to only 2 commits and edited their messages to hide the fact I had troubles setting this up in the first place:

    messy git history before rebase

  • Corollary of avoiding unrelated changes: stick to a line-based coding style that allows you to append, edit or remove values from lists without changing adjacent lines. Some examples:

      var one = "foo"
        , two = "bar"
        , three = "baz"   // Comma-first style allows us to add or remove a
                          // new variable without touching other lines
    
      # Ruby:
      result = make_http_request(
        :method => 'POST',
        :url => api_url,
        :body => '...',   // Ruby allows us to leave a trailing comma, making it
                          // possible to add/remove params while not touching others
      )
    

    Why would you want to use such coding styles? Well, always think about the person who’s going to git blame this. In the JavaScript example, if you were the one who added a committed the value "baz", you don’t want your name to show up when somebody blames the line that added "bar", since the two variables might be unrelated to the change.

rcm, an rc file manager

Posted 7 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

We have built a suite of tools for managing your rc files.

The rcm suite of tools is for managing dotfiles directories. This is a directory containing all the .*rc files in your home directory (.zshrc, .vimrc, and so on). These files have gone by many names in history, such as "rc files" because they typically end in rc or "dotfiles" because they begin with a period. Creative, I know.

It's a unification of the existing shell scripts, make targets, rake tasks, GNU Bash constructions, and Python hacks that people copy and paste into their dotfiles repo, with a classical unix flair.

Here's a very quick example:

% lsrc
/home/mike/.zshrc:/home/mike/.dotfiles/zshrc
% rcup
linking /home/mike/.zshrc

This blog post demonstrates the features, but you may want to install rcm and run through the tutorial too.

Build on it

Once unified, we extended the suite with support for sharing rc files via host-specific files, tags, multiple dotfile directories, and hooks.

A little something for the sysadmins out there, host-specific files automate the configuration you need to do on each host. Maybe the computer jupiter needs a .gitconfig but the computer mars needs a .mailrc. You'd put the .gitconfig in host-jupiter/gitconfig, and the .mailrc in host-mars/mailrc, and our suite takes care of the rest.

The next step up from host-specific is tagging: tag the .mailrc file as mailx (tag-mailx/mailrc) and .gitconfig as git (tag-git/gitconfig), and install and uninstall the tags as needed:

% rcup -t mailx
% rcdn -t mailx

Tagging is great for teams sharing the same dotfiles repo, but we can do better. While some of us come to computers with a blank slate, others come with well over a decade of fine-tuned rc files. Let them combine dotfiles repos, preferring theirs:

% rcup -d personal-dotfiles -d thoughtbot-dotfiles

While automating things we noticed that some things require setup. For example, after linking the .vimrc you need to run :BundleInstall. This is why we added hooks, such as the one in the thoughtbot dotfiles as hooks/post-up:

#!/bin/sh

if [ ! -e $HOME/.vim/bundle/vundle ]; then
  git clone https://github.com/gmarik/vundle.git $HOME/.vim/bundle/vundle
fi
vim -u $HOME/.vimrc.bundles +BundleInstall +qa

Automate it

We make it easier to add something to your dotfiles, too. This is great for getting started, but it's also great for experimentation. For example, add your .cshrc to the openbsd tag:

% mkrc -t openbsd .cshrc

Or get fancy by adding a host-specific file to the dotfiles repo you share with your brunch friends:

% mkrc -o -d the-brunch-dotfiles .rcrc

Configure it

Given the power, we had to make an rc file for our rc files. Enter .rcrc.

The simplest things to configure are your tags and source directories:

TAGS="openbsd mailx gnupg"
DOTFILES_DIRS="~/.dotfiles /usr/local/share/global-dotfiles"

Some files should never be symlinks:

COPY_ALWAYS=weechat/*

And some files should be excluded:

EXCLUDES=global-dotfiles:python*

This means a normal rcup will do the right thing, without thinking hard about what you have configured, which machine you're on, or what has changed in your shared repos.

The .rcrc file is perfect as a host-specific file in your personal dotfiles repo:

mkrc -o .rcrc

Read about it

Since this is a unix tool, we treat it like a unix tool. Read the full tutorial in the rcm(7) manpage, read about each individual tool (with examples) in the respective lsrc(1), rcup(1), rcdn(1), and mkrc(1) manpages, and the full configuration file is in the rcrc(5) manpage.

The traditional whatis command will jog your memory:

% whatis rcm
rcup (1)             - update and install dotfiles
rcdn (1)             - remove dotfiles
lsrc (1)             - show configuration files
mkrc (1)             - bless files into a dotfile
rcrc (5)             - configuration for rcm
rcm (7)              - dotfile management

Install anywhere

The rcm suite is written in POSIX sh, available out of the box on BSD, GNU, OS X, and many other systems. We do our best to keep it portable.

The source package can be installed using GNU autotools, as is typical for many projects:

% configure
% gmake
% gmake install

But it gets easier on Arch and Debian, which are supported by their native package managers. Check our installation instructions for the details on those.

We also support OS X using Homebrew from our new thoughtbot tap:

% brew tap thoughtbot/formulae
% brew install rcm

Watch that tap for other tools for command line champions.

Get started quickly

Instead of inventing something new, we decided to codify existing practices. If you have a dotfiles repo much like ours—one where all the normal files should be symlinked as dotted files in your home directory—you can get started immediately:

% lsrc -d ~/dotfiles
% rcup -v -d ~/dotfiles

If you have no dotfiles repo yet, you can get started instantly:

% mkrc .zshrc .vimrc

We also cover special cases in our tutorial.

Let's build this

Please share your feedback on GitHub. Together we can build the greatest rc file management suite.

What's next

Back to Basics&#58; HTTP Requests in Rails Apps

Posted 8 months back at GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS - Home

The Hypertext Transfer Protocol specifies how a client machine requests information or actions from servers. This protocol specifies how two machines share information, which is called a request. These requests are composed of several parts which I'll outline below.

The first line of an HTTP request is called the Request-Line. It contains:

Let's take a closer look at these four elements.

URI

A URI or Uniform Resource Identifier is how objects are identified. Clients use URIs to tell the server what object to act on for a given request. In more general terms a URI is nothing more than a web address.

Request Header Fields

A client can communicate additional information to the server via a request's headers. In addition to just communicating information about the client, ie. what type of browser the request originated from, these values can modify how the server responds to the request. For example, the Content-Type header value is used by the Rails framework to decode a request’s message body.

Message Body

The message body is used to send information from the client to the server about the entity the client wants to modify or create. The message can be communicated using several different encoding mechanisms, some of which I'll discuss below. It is important to note that not all of the requests I discuss below are allowed to have message bodies.

Method

The method, sometimes called "verb" or "action", tells the server what the client wants it to do. There are many different methods available, but we're going to limit this blog to the four that are most relevant to Rails developers.

  • GET - how a client machine tells a server that it wants information about the item identified by the URI. Because GET requests are all about asking for information, they are not permitted to have request bodies. You still have the URI query string available to you if you need to send data from the client to the server on a GET request.
  • POST - how a client tells a server to add an entity as a child of the object identified by the URI. The entity that the client expects the server to add is transmitted in the request body.
  • PATCH - how a client tells a server it wants to modify an object identified by the URI the request is sent to.
  • DELETE - as you might guess, how a client tells a server to remove an object identified by the URI the request is sent to.

Let's make some requests

cURL

cURL is a utility that makes it possible to send requests from the command line. We'll use cURL to make some requests to a test Rails app which responds with strings.

Routes:

BackToBasics::Application.routes.draw do
  match '/curl_example' => 'request_example#curl_get_example', via: :get
  match '/curl_example' => 'request_example#curl_post_example', via: :post
end

Controller:

class RequestExampleController < ActionController::Base
  def curl_get_example
    render text: 'Thanks for sending a GET request with cURL!'
  end

  def curl_post_example
    render text: "Thanks for sending a POST request with cURL! Payload: #{request.body.read}"
  end
end

First we'll make a GET request. We tell cURL to do a GET request (which is actually the default) with the "X" option.

% curl -X GET http://localhost:3000/curl_example
Thanks for sending a GET request with cURL!

Rails server log:

Started GET "/curl_example" for 127.0.0.1 at 2013-06-21 14:38:22 -0700
Processing by RequestExampleController#curl_get_example as */*
  Rendered text template (0.0ms)
Completed 200 OK in 1ms (Views: 0.3ms | ActiveRecord: 0.0ms)

As you can see, our Rails app receives the GET request we sent from the terminal and then responds with the string we provided in the controller. The app uses the request's URI and Method to figure out which controller and action to call.

Next we'll make a POST request with a data payload. Again, we use "X" to specify the method. We also use "d" to specify the data to send in the payload.

% curl -X POST -d "backToBasics=for the win" http://localhost:3000/curl_example
Thanks for sending a POST request with cURL! Payload: backToBasics=for the win

Rails server log:

Started POST "/curl_example" for 127.0.0.1 at 2013-06-21 14:47:37 -0700
Processing by RequestExampleController#curl_post_example as */*
  Parameters: {"backToBasics"=>"for the win"}
  Rendered text template (0.0ms)
Completed 200 OK in 0ms (Views: 0.3ms | ActiveRecord: 0.0ms)

The Rails app receives our request. This time, in addition to the URL, it logs the data payload as hash of parameters.

Web Browsers

We're all familiar with surfing the web. I'm sure it comes as no surprise that this experience is made up of a series of requests and responses. Let's take a look at what's happening when we type a URL into our browser's address bar and hit enter. We'll also look at what happens when we enter data into a form and submit it.

Below is the demo controller we'll be sending requests to for this example.

Routes:

BackToBasics::Application.routes.draw do
  root to: 'request_example#index'
  match '/request' => 'request_example#create', via: :post
  match '/request' => 'request_example#create', via: :get
end

Controller:

class RequestExampleController < ActionController::Base
  def index
  end

  def create
    render json: params
  end
end

Address Bar

Typing a URL into the address bar of a web browser sends a GET request to the URL specified. Sending a GET request in this fashion looks identical to sending a GET request through the terminal with cURL.

If we set the root path of our demo app to the index action of our dummy controller and navigate to localhost:3000 the browser will send the GET request. If we take a look at the Rails console we'll notice the output is almost identical to what we saw with cURL.

Started GET "/" for 127.0.0.1 at 2014-01-31 15:09:53 -0800
Processing by RequestExampleController#index as HTML
  Rendered request_example/index.html.erb (1.2ms)
Completed 200 OK in 26ms (Views: 25.5ms | ActiveRecord: 0.0ms)

Forms

A form is made up of several key parts (we'll look at several simple examples a bit later). In the opening form tag we have the action attribute. This attribute tells the form where to send the request. In addition to the action attribute we have the method attribute. This tells the form what type of request to send to the URI specified in the action attribute.

Request bodies are defined by a form's markup. In the form tag there is an attribute called enctype, this attribute tells the browser how to encode the form data. There are several different values this attribute can have. The default is application/x-www-form-urlencoded, which tells the browser to encode all of the values. If a form includes a file upload an enctype of multipart/form-data should be used. This encodes none of the values. Finally, you can set the enctype to text/plain this converts spaces, but leaves all other characters unencoded. Inside the inside the form element we have input elements. These elements will render as assorted input types in our website. Each input element in the form should have a name attribute. This name attribute tells the browser what to name the data specified in that input in the message body. The type attribute tells the browser in what format to communicate the data in the message body. There is more to this as outlined here.

GET

Let's take a look at how we could send a GET request with a form.

The simple form below is made up of one text field and that text field will have the name my_data. The final input is the submit which tells the form we actually want to send the request to the URI specified in the action attribute.

Let’s send a request. Assume a user has navigated to a page and the following form has been rendered. In the text box named my_data a user has entered the string “back to basics" and clicked the submit button.

<form action="/request" method="GET">
  <input type="text" name="my_data">
  <input type=submit>
</form>

Rails server log:

Started GET "/request?my_data=back+to+basics" for 127.0.0.1 at 2013-06-21 14:44:25 -0700
Processing by RequestExampleController#create as HTML
  Parameters: {"my_data"=>"back to basics"}
Completed 200 OK in 0ms (Views: 0.2ms | ActiveRecord: 0.0ms)

There are several interesting things about this request. You’ll notice that our Rails app received the request, but the URL includes a query parameter called my_data. This is the result of our decision to use the GET method for this request. Because GET requests have no payloads the data we collected with our form is added to the URI. In addition, you’ll notice that our text input ends up with a name of my_data in the payload, or the value we specified with the name attribute of our text input.

We can open the Network tab in our developer tools (Firefox or Chrome) and see that the data was added to the query string. This is because the action on the form is GET.

POST

The POST action works almost identically to the GET request with the exception of the payload. Let's submit our form with the same text input and see what happens this time.

<form action="/request" method="POST">
  <input type="text" name="my_data">
  <input type=submit>
</form>

Rails server log:

Started POST "/request" for 127.0.0.1 at 2013-06-21 14:49:11 -0700
Processing by RequestExampleController#create as HTML
  Parameters: {"my_data"=>"back to basics"}
Completed 200 OK in 0ms (Views: 0.2ms | ActiveRecord: 0.0ms)

The first thing to notice is that our URL no longer contains the query parameter. It's also important to note that our parameters hash looks identical. Let's take a look at our network tab and see if we can learn anything about the request.

Examining the request we see that our payload includes what's called form data. Our input elements are converted to a request payload and sent to the server. Again the name of our text input element is used as the name associated with the user's text input.

XMLHttpRequest

It’s also possible to send requests via JavaScript. There is nothing special about these request from a mechanical perspective. They’re just requests like the ones we’ve sent above.

Because these requests are sent with JavaScript, in order to see what happens we'll have to provide a function that deals with the response. We'll use a simple function that will write whatever response we get to the JavaScript console.

function callback () {
  console.log(this.responseText);
};

Ajax Form Data

When sending our Ajax requests we have several different options as to how we want to send the data. As we saw above there is a concept of form data. We can easily create a form data payload using only JavaScript.

var request = new XMLHttpRequest();
request.onload = callback;
request.open("post", "http://localhost:3000/request");
var formData = new FormData();
formData.append('my_data', 'back to basics')
request.send(formData);

Rails server log:

Started POST "/request" for 127.0.0.1 at 2013-06-21 14:53:03 -0700
Processing by RequestExampleController#create as */*
  Parameters: {"my_data"=>"back to basics"}
Completed 200 OK in 0ms (Views: 0.1ms | ActiveRecord: 0.0ms)

As is obvious from our log this request was handled no different than a “normal" form submission by our Rails application. There is no special magic about an Ajax request as far as our server is concerned.

Ajax JSON Data

Another option we have is to use JSON. In order to do this we need to slightly modify our request headers and tell our server that it needs to do something slightly different to parse our payload.

var request = new XMLHttpRequest();
request.onload = callback;
request.open("post", "http://localhost:3000/request");
request.setRequestHeader("Content-Type", "application/json");
request.send('{"my_data":"back to basics"}');

Rails server log:

Started POST "/request" for 127.0.0.1 at 2013-06-21 14:55:55 -0700
Processing by RequestExampleController#create as */*
  Parameters: {"my_data"=>"back to basics", "request_example"=>{"my_data"=>"back to basics"}}
Completed 200 OK in 0ms (Views: 0.2ms | ActiveRecord: 0.0ms)

As you can see by simply modifying our request headers our Rails app is able to appropriately parse our payload and we end up with our my_data value available to our application.

Requests are one of the foundational elements of the internet as we know it. Understanding the individual elements of a request can make it much easier to debug issues with our Rails apps.

Episode #437 - February 4th, 2014

Posted 8 months back at Ruby5

Token Based Authentication, Recommundle, git_pretty_accept, PStore, Practicing Ruby, and RailsBricks 2 all in this episode of the Ruby5!

Listen to this episode on Ruby5

Sponsored by Top Ruby Jobs

If you're looking for a top Ruby job or for top Ruby talent, then you should check out Top Ruby Jobs. Top Ruby Jobs is a website dedicated to the best jobs available in the Ruby community.
This episode is sponsored by Top Ruby Jobs

Token Based Authentication in Rails

This week our very own Carlos Souza wrote up a blog post about how to use Token Based Authentication in your Rails app.
Token Based Authentication in Rails

Recommundle

Chris Tonkinson released recommundle, a recommendation engine for Gemfiles. You upload your project's gemfile and it recommends gems that it thinks you might be interested in checking out.
Recommundle

git_pretty_accept

George Mendoza released the git_pretty_accept gem this week which automates his teams preferred method of accepting github pull requests in their project to keep their history readable.
git_pretty_accept

Persisting data in Ruby with PStore

Rob Miller wrote up a blog post about how to persist data in ruby in situations where using a database might seem like overkill.
Persisting data in Ruby with PStore

Practicing Ruby journal moves to open-access

This week Gregory Brown of Prawn fame announced that he's giving open access to 68 articles from the Practicing Ruby journal.
Practicing Ruby journal moves to open-access

RailsBricks 2

Nico Schuele dropped us an email to let us know about RailsBricks 2. This new version is 100% in Ruby, doesn’t have anymore bash commands, and includes a test framework.
RailsBricks 2