Crazy Little Hacks

Some little hacks and random thoughts on what interests me at the moment in the area of computer science.

When to Use Cassandra’s OPP

First off let’s brush up on what is the token ring and the partitioners so that we are one the same page.

Token Ring

Cassandra’s data is distributed across a cluster in the form of a ring. This ring is a goes starts at the position 0 and goes on until the position 2127, both these positions and all of the ones between them are known as tokens and nodes in the cluster have a certain range of tokens that are their responsibility. Here is an example where each node is responsible for 25% of the token ring, borrowed from Datastax documentation:

center

Cassandra Partitioners

There are two basic types of partitioners that come out of the box with Cassandra, the Random Partitioner (RP) and the Order Preserving Partitioner (OPP). What they do is just decide to which part of the token ring each node is responsible for. If you use RP the MD5 hash of the row will be used to map onto the ring, on the other hand if you use OPP the actual value of the row is truncated so that it fits in the ring’s range and then the row is mapped accordingly.

When to use OPP

Well, the blunt answer is never. Since OPP maps the row keys directly it is very prone to create hotspots (data that is frequently accessed or a lot of the data may be assigned to the same node), which may and probably will destroy all your hopes and dreams of scalability. And now you ask, but what if I want to perform range queries, is that impossible?

The answer is of course it is possible, just use RP and index your row keys. So, the strategy is to have a Column Family where each row has the name of another Column Family and the columns are its row keys, since RP only applies to the row keys you can have the columns ordered.

Your data model will look something like this:

cenas cenas

Your actual data will then be randomly distributed across the cluster and if you need to perform a range query you just have to get the range from the index columns first ant then multiget the data. It is true that this introduces one more step for each range query performed, but this is a much lesser price to pay than the one you pay with OPP, since now your database will actually scale which means you can add more machines if it does not perform as well as you wished.

DISCLAIMER: This post reflects my honest opinion, but it is nevertheless a personal opinion.

My First Cup of Coffee

Lately I’ve been doing and learning so much that I haven’t had the time to sit down and put those ideas into “paper”. Therefore, here is my first post (of hopefully many) concerning stuff I’ve been looking at recently.

Using javascript while doing web development nowadays is pretty much a no brainer, and on top of that you’ll probably use some kind of framework such as jQuery due to browser compatibility, ease of coding or pure awesomeness. Still, even though this frameworks may add some great functionalities, most of them will not extend the language to make it more beautiful and less cluttered with punctuation you actually do not need. This is where CoffeeScript comes in!

I’m not going to provide an extensive tutorial on coffeescript (there are far too many of those around) or list why it is great (watch Sam Stephenson’s talk for that). This is just a log of a code transformation from plain old jQuery to bright and shiny CoffeeScript.

So, enough with the talk and on to the code! Here is where I started with, an actual javascript file from an actual application:

$(document).ready(function($){
  $("th.ec-month-nav").live('click',function(event){
    var query_params = $(this).find("a").attr("href").split("?")[1];

    $('div#event-calendar').load("/calendar?"+query_params,function(){
      $('div.events').parent().addClass("has-events");
    });

    event.preventDefault();
  });
});

From my (not so great) knowledge of CS the first thing that pops out as being removable are the semicolons, braces, parenthesis and function declarations, resulting in the following:

$ ->
  $("th.ec-month-nav").live 'click', (event) ->
    var query_params = $(this).find("a").attr("href").split("?")[1]

    $('div#event-calendar').load "/calendar?"+query_params, ->
      $('div.events').parent().addClass "has-events"

     event.preventDefault()

Hum… Looks pretty and it is code that actually works. But we are not quite finished yet, there are three more changes that can be made. First, we can move the query_params variable into the string (since CS supports ruby like string interpolation) and drop the var keyword because variables will be automatically scoped. Lastly, we can switch this to its CS counterpart @ and we are done.

$ ->
  $("th.ec-month-nav").live 'click', (event) ->
    query_params = $(@).find("a").attr("href").split("?")[1]

    $('div#event-calendar').load "/calendar?#{query_params}", ->
      $('div.events').parent().addClass "has-events"

    event.preventDefault()

This was my first time coding in CS, which means that there might be some other changes to be made in this example, but I am really happy with the way it looks. Hope I have the time to write anything else soon.

Until then, happy coding! ;)

Adding a Method to All Records

If there are some methods you wish all of your Active Record instances would have, the easiest way to add them is to monkey patch. Here is how to add a random method that retrieves a random record from a table:

module ActiveRecordBaseExtension
  def random
    self.find :first, 
              :offset => (self.count * rand).to_i
  end
end
ActiveRecord::Base.extend ActiveRecordBaseExtension

This code can be appended to lib/rails_extension.rb which should be required in the config/environment.rb file:

require 'lib/rails_extension.rb'

Creating Emails With Rails

By default, rails sends emails with the MIME type text/plain, but sometimes you might wish to make your emails look better, by adding links or images and for that you need your mail to have the MIME type of text/html.

The easier way to do this, would be by change the content type variable in the notifier.rb file, but if you do that, email reader that don’t support html will not show your email correctly.

So how have the best of both worlds with as little effort as possible? Fear not my friends, for RoR convention over configuration policy has come to the rescue yet again!

All that you need to do is to create two files in your notifier views, one with the name whatever_you_want.text.html.erb and another with the name whatever_you_want.text.plain.erb and rails ActiveMailer will take care of the rest for you. Sweet!

PS: This is for rails 2.x, in rails 3.x the names would be whatever_you_want.text.erb and whatever_you_want.html.erb