Kulpreet उर्फ Jungly

Urgh.. Why Are People Still Using ActiveRecord Observers?

Came across some source code where models are using observers.

That is kind of OK. If you like them, and they fulfill a need, it is all good.

But if all that the observers are doing are tracking updates to the models they are observing, then are they really needed?

Here’s what I saw recently

class Profile < ActiveRecord::Base

class ProfileObserver < ActiveRecord::Observer
  observe :notification_profile

    def after_update(profile)
      profile.update_column(:change_attribute, :to_a_value)

I would make after_update a filter on Profile.

If someone really wants to separate of concerns and move filters out of the model, then maybe use Rails concerns!


Messed about with JCrop, which is an awesome plugin to capture user cropped image uploads.

Was using it as a front end for user upload of images to the excellent rails plugin - paperclip

There were a couple of nastiness that ate into my time.

Paperclip and setting active record attributes

Apparently, it is best to set the crop_* attributes before you set attachment attribute.

I was expecting paperclip to kick in when you save or after validation of the record. However, it happens just as you set the attachment attribute.

Found an explanation on why it happens as a issue on paperclip

So don’t do

User.create(params[:user].permit(:email, :password, :attachment))

Instead do

user = User.create(params[:user].permit(:email, :password))
user.attachment = params[:user][:attachment]

JCrop’s trueSize

Definitely need to use either boxWidth and boxHeight or trueSize.

I used trueSize cause we already set box height and width to meet the design.

Wal-e for Managing Postgres WAL Backups

Included file ‘JB/setup’ not found in _includes directory


  1. Smarts built in to find last backup and use that number to delete older wal segments

  2. Really active community


  1. Python and all sorts of other dependencies. I don’t like my postgres.conf having dependencies on libraries that have to be installed using not so robust package managers like easy_install or pip. [I do a lot more with Ruby, so Python’s package managers still seem alien to me.]

  2. Keep N Backups still is a todo.

  3. Documentation on how to switch from existing S3/WAL backups to wal-e is not there.

Switching from an existing setup

I used to do the following

psql my_database
pg_start_backup('some label');
archive_command = '/var/lib/postgresql/8.4/main/s3test %f && s3cmd -c /home/ubuntu/.s3cfg put --acl-private %p s3://pg_archive/%f'

In the above s3test is

xxx=`s3cmd -c /home/ubuntu/.s3cfg ls s3://pg_archive/$1`
res=`expr "$xxx" : ".*s3://pg_archive/$1$"`
if [[ $res > 0 ]]
    exit 1
    exit 0

The switch is basically a hack, as I haven’t found much help on what is the best way to switch directories.

Should I change the archive_command, restart server and then run the first backup-push?

Instead, I am running with the following steps

  1. Let the old archive_command (as shown above) run as it does.
  2. run a wal-r backup-push. This will create a directory called basebackups_NNN under the s3-prefix path you specify. 1. As soon as backup-push returns, restart databased to pick up the new archive_command.

I really recommend someone to find out more about this first. But this is what I am doing for now.

There has to be a tool that is easier to setup and better documented. But using wal-e for now.

Dropbox API, a SCRUM Taskboard, and Offline Application Framework?

Offline Apps and Sharing Data

What about having an offline application that saves your data using offline storage, but syncs to a server that allows the same ‘offline storage’ to be edited by two more users.

The idea is simple.

  1. Load an app from a service that stores and serves files - S3? Dropbox? Anything you fancy. That is where you get the ‘app’ from.

  2. The App is essentially a HTML/JS page that allows you to create and edit data.

  3. The app saves your data in local offline storage - which seems to have a limit of 5MB. So of course such offline apps will only work for apps that keep data consumption to less than 5MB.

  4. Finally on issuing a ‘sync’ command that offline storage is saved back on to a file server. It could be S3, Dropbox or a plain simple FTP server.

  5. Security will be an issue as credentials for ftp, or s3 will have to be saved in plain text in the application page.

Dropbox Apps to the rescue

I think the folks at Dropbox are really up to something.

While looking around what the current state of the art is on such an approach to building offline applications, I found that Dropbox team has been busily building away something that just works marvelously well. They call them apps Dropbox Apps.

Using the Dropbox Core API from their Javascript client library everything seems so simple and straight forward.

No offline storage

You even start to wonder if you need to use the offline storage at all. Just update the dropbox files and let the dropbox daemon sync them up with your team.


I do think it’ll be cool to use a lock file so that multiple user’s can’t edit the app at the same time. If we really need that then we need to start thinking merging and probably using Git JS libs to do the syncing. All to complicated for people who are simply interested in say a SCRUM Taskboard.

Having a start editing -> edit -> save semantics will enable us to check for lock file, and if two people do manage to get locks then the one with the earlier version wins. Dropbox core api lets us get the versions and timestamps for files.

Current State of the Art

A few people have tried and it seems quite an active area.

Unhosted have been working on providing just these kinds of ‘offline’ applications.

RemoteStorage lets anyone run a remote storage server which they provide client apis to talk to from JS applications.

Definitely worth keeping an eye on them.

Game Intelligence

In this post I argue for the need for an open source analytics engine for use in games and any other gamified (or just a regular web application).

Why not just use Google Analytics (or any such analytics)

  1. Only tracks front end events, like page loads and clicks. Does a fantastic job of handling country, browser etc. We need to able to track back end events, creating equivalent front end events for google analytics might be too much?

  2. Can’t really be used as a business intelligence tool. For example, we can’t answer the question, “How many users have reached level 10” or even a more complex question to answer “What percentage of users have reached level 10 - split by the week number they joined our system.”

The latter is what is called Cohort Analysis and Google Analytics tools and other ‘web analytics’ tools (for example Piwik) don’t help us do cohort analysis.

Open Source Cohort Analysis

So the next question is what is on offer for a self hosted free, open source cohort analysis tool that we can use. I really can’t find anything useful out there. The closest I came to was aarrr a Ruby library backed by MongoDB to track cohorts.

It seems a lot of people build their own little tools for such analysis. Could it be cause

  1. Describing cohorts and tracking activity along cohorts can result in a lot of data replication, so no one is releasing a framework, or

  2. Most of the analysis is carried out in an Business Intelligence ETL manner.

So Why Not an ETL Business Intelligence Tool?

  1. Not open source and can be pretty expensive

  2. Mostly written in Java, and have a painful learning curve, especially the wysiwg query editors. The whole thing stinks of ‘enterprise’ I must say.

  3. ETL might not be flexible and dynamic enough. With databases like mongodb, we don’t need to run cron jobs hogging the database to return analytics results. Instead, we can try and set up a small db to collect events from the backend as users/players hit certain landmarks.

MongoDB and Analytics

A lot has been written on how Mongo can be used to aggregate analytics data. The difference is very clear, no more ETL, just simple ’pings’ to mongodb to track analytics. Late the aggregated results can be shown using any of the freely available graphing libraries - google charts or rgraph come to mind.