Announcing: MF60 Ruby Gem

| Comments

mf60 on the train

For the past several months I have been commuting by train to Zurich and enjoy hacking on my laptop. For a while, I was using my iPhone to tether, but it was slow and unreliable, especially in the longer stretches between towns. So I got myself an MF60 Hotspot from Swisscom. It acts as a WIFI router, does several 3G/2G bands and auto-switches pretty quickly. The battery life is not great, I only get about 3-4 hours tops, but connected to the laptop via USB, it will keep going for a long time.

Managing it, however, is annoying. Sadly, the built-in web administrative interface is not a shining example of usability. It’s functional but klunky - you can do all that you would expect with an access point: manage the settings and view usage stats. That’s important since my account with Swisscom has a limit on monthly data, I needed to keep track of my usage. But it is a pain to use with the tiny links and non-sensical organization of the menu pages.

Another problem is that the network gets stuck sometimes (like after a long tunnel). This requires a reset, which means powering off/on (a 20-30 sec ordeal), or going through the previously-deemed crap admin interface.

I too on the task of peeking through the HTML and Javascript with Firebug, and wrote a set of scripts to help me do basic things fro the command-line: view stats, reconnect, check signal strength, etc. Last week, I refactored it and released it as a Gem.

$ sudo gem install mf60

More details are at the project Github page.

Quest for a Nicer Prompt

| Comments

Fancy console prompts are fun, but its a tradeoff. The problem is, once you put all this extra info in there it gets too long.

Fuuuuuu

The prompt should not get in the way. It should show as few characters as possible but be highly helpful, informative, and pleasant.

Home Sweet Home

the simplest prompt

Logged in as myself, on my local machine, in my home directory. It just reads at home - three characters and a cursor.

Once I’ve logged into a remote box as a different user, and switched away from my home directory, the same PS1 shows a more conventional prompt:

anatomy of the prompt

The path is underlined not just because it looks nice. Command-clicking on pathnames opens a Finder window in iTerm2, and I think that it looks more like a clickable link in this way.

A Shorter Path

Back to how I find it annoying when the path gets too long and commands start wrapping lines as I type. 40 chars is about the maximum to show for a path. There are two strategies to shorten paths:

1. First-class Website Roots

As a web developer I spend most of my time inside of a web project root folders. Locally, that’s ~/Sites/sitename.com but on remote servers it’s usually /var/www/vhosts/sitename.com. For these cases, the path is reduced to a web-like scheme prefix (//):

web projects

2. Ellipsis in the Middle

Long paths are truncated using two methods:

  • Split the path on the slashes, show the first three parts and the last two, join with “…”
  • Use the first 25 characters of the path and the last 15, join with “…”
  • display whichever version ends up being shorter

The result is that paths are split up more naturally. Really long directory names end up being chopped, wheres shorter paths make it through ok. Its a good balance, and you can still figure out where you are.

truncate long paths

A ruby one-liner does this:

1
d=ENV["PWD"].gsub(%r{^#{ENV["HOME"]}},"~").gsub(%r{^~/Sites/},"//").gsub(%r{^/var/www/vhosts/},"//");p1=d[0,25]+"…"+d[-15,15].to_s;a=d.split("/");p2=a.first(4)+["…"]+a.last(2);p2=p2.join("/");puts(d.size>[p1.size,p2.size].min)?((p1.size< p2.size)?p1:p2):d

Git Branch and Status

Finally, the git branch is displayed in brackets, for any directory under git control. If there are any uncommitted changes, a red asterisk is also helpfully displayed. It would be nice to have the branch name be truncated as well (they too can get longish), and maybe showing ahead/behind status. But that’s not there yet.

shows current git branch and status

Other Stuff

  • sets the title bar for the terminal window (without the colors or git branch info)
  • The root user is always shown in red
  • hosts on local subnets (192.*, 10.*) are shown in green
  • many other colors and effects are defined for tweaking

What’s your PS1?

I’ve been slowly customizing my PS1 bash prompt over the last few years to try and get something really pretty and usable working. So this is what I’m currently satisfied with, a fairly involved collection of bash functions, mixed with some inline ruby code. Looks chaotic, but it should pretty easy to customize.

The script: https://gist.github.com/1494881

To get it set up, save the source locally:

curl -o ~/.ps1 https://raw.github.com/gist/1494881/b1a323956126a4346b803c09b691e1a1e013c7ff/nicer_ps1.sh

Then edit your .bashrc (or .bash_profile) add these lines:

DEFAULT_USERNAME=jeremy    # change this to yours!
SERVERNAME=ninja           # only if you want to override the default hostname
source ~/.ps1

Now do this for all your shell accounts. Or hey, make a dotfiles repository and use that. Or use Puppet or something else!


thanks to Ivan Jovanovic for sharing the git branch and bash functions approach with me, and to the dotfiles of Ryan Bates and Mathias Bynens for further inspirations

* I prefer Liberation Mono font for programming.

Dropbox and Seemingly Incredible Upload Speeds

| Comments

Dropbox is a really cool service. I love being able to share files with friends and co-workers with such ease.

But recently I noticed something a bit surprising.

When I copied a large disk image to my Dropbox, within a few seconds, it had finished uploading: “All Files up to date.” Ok, I have a pretty good connection, but it seemed impossible that I could somehow upload hundreds of megabytes in just a few seconds. I then went to download that file from a different computer, and sure enough, the whole file came down, in tact.

How was Drobox able to upload so damn quickly? I tested this again several times using various large files (music, movies, whatever). What I found was that certain large files uploaded almost immediately, while others took hours. Was it compression? Maybe delayed uploads?

Here’s a screencast showing this in action (without sound):

dropbox exposed from Jeremy Seitz on Vimeo.

From what I can tell, Dropbox calculates and remembers the unique fingerprints of files. So if user A uploads a big file, and later, user B uploads the SAME file, then Dropbox recognizes that. There’s no need to waste the bandwidth (and presumably, the disk space in Dropbox’s cloud) for that second upload.

This is brilliant, because I’m sure it saves them huge amounts of bandwidth and disk resources. However, in my testing, the files I uploaded were were not shared with anyone, and they were not public. Dropbox was able to “magically” upload huge files that I had never put in my account before. I can only assume that another user had uploaded the same files to their Dropbox before I had.

Maybe everyone is cool with it, but I think the privacy implications are pretty significant. Dropbox claims that their employees can’t see the CONTENTS of your files. But apparently, they know that different users have the SAME FILES.

With that in mind, here’s some hypothetical ways in which this could be exploited:

  • Suppose a movie industry lawyer uploads a pirated film to Dropbox. If the upload finished instantly, they could potentially prove that at least one user had that file. Perhaps this is enough to convince a court to force Dropbox to release information?

  • Suppose a recording studio wanted to make sure that a hot new album, ready for release, had not been leaked to the Internet. So they put the files in their own Dropbox folder. If they uploaded right away, they would realize that there was a security problem at hand.

  • For a hacker, knowing when a file is NOT UNIQUE could be particularly interesting. Let’s suppose that company A has a file on Dropbox that contains encrypted data. Evil company B has some information about that file, but they don’t have the encryption key. Could they use this feature to crack it?

Maybe this sounds overly paranoid. Again, I like Dropbox, but I would not put sensitive or private data there. Clever and cool tech? Definitely. Scary? I think it is, a little.

UPDATE: According to the Wikipedia page for Dropbox, they use Delta Technology: “Files are split into chunks, the chunks are hashed and only those chunks that have never been uploaded before by any user are transmitted again. This makes uploading popular files very efficient and helps if only small portions of a large file has changed.” That certainly explains what I observed.

Mocking GeoKit’s Geocoder

| Comments

I’ve been using GeoKit a lot in Rails, most often for geocoding. Models like Points, Companies and Users usually have an address, city, state, postal code and country - which might need to be geocoded so we can place them on a map, or check distances.

It’s definitely bad to let geocoder calls happen in tests. Not only does it slow things down, but if for some reason your network is not so good, tests will fail unexpectedly. Here are the helpers I like to use when writing the tests. This is using Test::Unit and Mocha:

In use, it’s pretty straightforward. You either expect the geocoder to succeed (a certain number of times), fail, or not be used at all. That last point is important because it’s easy to forget and have a model that geocodes on every save (via a before_save callback, for instance). That will slow down site response time because of the network delays.

Finally, here are some generic examples of each of the test helper methods in use:

A Really Simple Fix for Memory Bloat

| Comments

A common problem with Rails in a production environment is memory bloat. EngineYard recently posted about this very issue on thier blog (and I fear that several of my support tickets were maybe related!).

For example, suppose you have some code that loads tons of records, or uses an external library that needs a lot of RAM. After the task is finished, that memory can get garbage collected, but the Ruby VM doesn’t shrink. Anyone who’s needed to parse large XML files will know what I’m talking about.

This problem can be fixed by changing your code, so it doesn’t use too much memory at once. Or, you could run God or something similar to kill off greedy processes. But what if you just really DO need to use a lot of memory?

I recently wrote a rake task to resize a bunch of attachment_fu images. The model looked like this:

Then I had a rake task:

As you might imagine, when run, this task consumed all of the RAM on our server in no time. And when the downsample! method was called in production, bam, Mongrel bloat. Big FAIL.

The Solution

Then, it dawned upon me. I had been using the wonderful spawn plugin on projects to handle high-latency tasks in controllers, and to parallelize batch processing. It basically can thread or fork out a block of ruby code, and you can optionally wait for the job to finish. So I wrapped the code like this:

And there you have it. When running the rake task again, it never grew beyond 50m in size, even after processing 1000s of large images! And bye-bye to mongrel bloat.

Even though this specific example could be fixed in other ways (using MiniMagick for example), my point is that the same simple technique could be applied to any bloat-inducing code. In this case, it was just a couple lines of code.

Spawning processes is not without issues, however. But, spawn does a good job for simple things. There’s also workling, which is more flexible, and solutions that do job queuing, such as delayed_job and background-fu. These libraries are mainly designed to offload slow tasks, but as I discovered, they can be effective for memory-intesive tasks as well.

Automated Sphinx Install in Mac OS X Using Rake

| Comments

One challenge for any team building a Rails project with Sphinx: keeping everyone up to date, on the same version of searchd. We wanted to make sure it was installed the same way, same version, on everyone’s dev machine. And we all work remotely of course :)

The solution for us was a rake task that downloads, compiles and installs Sphinx on OS X Leopard. We assume that you previously installed the developer tools (Mac OS X Install Disc 2, XCode).

To get the compile to work, you need to install mysql5 via MacPorts so you have the correct libs available. sudo port install mysql5 Alas, we never were able to get Leopard’s MySQL libs to compile with Sphinx correctly. But never fear, installing MySQL via MacPorts will not affect the standard Apple mysqld server or client in any way.

lib/tasks/sphinx.rake:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
require "#{RAILS_ROOT}/config/environment.rb"
SPHINX_SOURCE='http://www.sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz' # r1234

namespace :sphinx do
  desc "Install Sphinx from source"
  task :install do
    build_dir = "#{ENV['HOME']}/tmp/sphinx"
    system "rm -rf #{build_dir}"
    system "mkdir -p #{build_dir}"
    puts "Downloading Sphinx indexer from #{SPHINX_SOURCE}"
    cd build_dir do
      uri = URI.parse(SPHINX_SOURCE)
      tarball = File.basename(uri.path)
      Net::HTTP.start(uri.host) do |http|
        resp = http.get(uri.path)
        open(tarball, "wb") do |file|
          file.write(resp.body)
        end
      end
      system("tar -xzf #{tarball}")
      cd "#{build_dir}/#{tarball.gsub('.tar.gz','')}" do
        system("./configure --with-mysql-libs=/opt/local/lib/mysql5/mysql/ --with-mysql-includes=/opt/local/include/mysql5/mysql/")
        system("make")
        puts "\nRunning 'sudo make install' - this will install Sphinx."
        system("sudo make install")
      end
    end
  system("sudo mkdir -p /opt/local/var/db/sphinx")
  system("sudo chown -R `whoami` /opt/local/var/db/sphinx")
  end
end

The Worst Rails Code

| Comments

I just came back from RailsConf 2008 in Portland. This year was great. There were a lot of exciting developments to talk about, like MagLev, SkyNet, mod_rails and Rails 2.1.

The talks seemed better this year as well. The one I was most looking forward to was from Obie Fernandez, who wrote The Rails Way, published last fall. I can easily say this is the best Rails book published to date (sorry, Pragmatic). It’s packed with useful information, best practices, and real-world code. Obie’s excellent writing style along with contributions from numerous Rails coders make it a great read too. My copy is already showing wear. And at 900+ pages, it’s like a phone book.

Obie’s talk was given to a packed room, despite being scheduled on Sunday morning at 9am. The title of the talk, ”The Worst Rails Code You’ve Ever Seen (and how not to write it yourself)”, discouraged my friends from attending (“sounds depressing”, one said). During the first lightning round, we had seen some pretty bad code proudly presented (to which Ryan Davis publicly expressed his horror).

But the talk was worth getting up for. Through a series of real-world examples, Obie (and co-presenter Rein Henrichs) showed the audience just how bad Rails coding can get. Some of the code was truly appalling, like a 1200+ line app in a single controller (no, really). Other examples looked, well, kind of familiar. Having been involved in several Rails projects myself since 2005, I’ve seen (and written) my share of bad code.

The talk started out with a bit of an elitist air, with the presenters snobbishly laughing at common mistakes. But as the talk wore on, the tone changed and became more helpful. At one point Obie showed some pretty ugly code and then admitted that he had actually written it.

But what really struck me about this talk was how programmers (myself included) often are unaware of “best practices”, or simply don’t understand Rails and Ruby well enough.

Now some of you reading this may think it’s all pretty obvious, basic stuff. But we all write bad code sometimes, even the slickest meta-programmers I know do. Here’s what I took away from the talk:

Common Reasons People Write Bad Code

1) Folks just don’t write tests, and as a result, they are afraid to refactor (and therefore improve) their code. It’s painful and dangerous to refactor without tests.

2) They are too lazy to look up the correct way to do something, either because they don’t like to read and research, or because they assume it will take longer than just “figuring it out”.

3) They Google for a solution to a problem and come across bad or misleading examples on someone’s blog (or a DZone snippet). You just can’t assume that because it’s posted on someone’s blog that it’s correct. In reality, you should be suspicious of posts without comments or attribution. There are a ton of bad snippets and pasties out there.

What Is Bad Code?

It’s so important to understand the Rails framework. Rails already provides us with many best practices and time savers. It’s silly to waste time re-inventing an already solved problem.

Many of the examples from the talk confirmed that people are struggling with parameters, hashes, and passing data between controller actions.

  • Ruby’s Hash class gives us merge and other useful methods, so there’s often no need to iterate over values or put in explicit assignments.
  • The Rails session object is the proper way to pass data between controller actions, and the flash object is there for displaying messages to the user (and let’s not forget flash.now for AJAX actions). Don’t use instance variables to pass messages to the view.
  • If you have validations on parameters in the controller, they should probably be in the model instead (validates_presence_of, etc).
  • Manipulating parameters outside of the context of a model (or other container) feels like PHP. It’s much nicer to call @MyModel.new(params[:my_model]) and then validate. Why muck around with things like @params[person][date][i1], etc?

Other less-obvious suggestions to help clean up bad code included:

  • Use attr_protected in models instead of worrying about how certain parameters might be set via the url or hacked forms
  • Avoid creating a big config class. Instead, use constants, and put them in the classes that need them.
  • application.rb should only be for actions, put utility methods somewhere else. Like a module, or in models.
  • Consider using the presenter pattern to clean up code

Where To Turn For Help

  • The Ruby Way, 2nd ed. (Hal Fulton) and The Rails Way. They are both excellent.
  • Refer to the Rails API and Ruby standard library documentation as a primary resource
  • When you are having trouble understanding how to use a plugin, Gem, or Rails itself, don’t be afraid to look at the source code. It’s Ruby after all, and often surprisingly easy to understand.
  • Search (or post) on an appropriate IRC channel or forum
  • Study other people’s code. I learn a lot this way. Some of my favorites: Rick Olson, Ryan Davis, Eric Hodel and Evan Weaver.
  • Use script/console: this is often a really fast way to try a concept out, play with it and verify that it will really work.
  • Keep a library of your own code snippets. When you run across something useful, copy and paste it into a file locally that you can search through later.

Don’t Code Solo

Obie also strongly recommended pair programming. Working with other programmers is a really great way to learn quickly. He specifically suggested pairing with a senior coder (like, a Smalltalk guru). For most of this, that’s not a realistic option, unless you are lucky enough to have expert contacts who are willing to spend that kind of time with you. But I can certainly agree that it’s so helpful to work through code with another person.

To that extent, I also found the talk about Remote Pair Programing interesting, especially since I spend a lot of time in Europe while working on projects in the US. I definitely plan to pair program more often now!

Geodistance Searching With Ultrasphinx

| Comments

I’m happy to annouce a patch for Ultrasphinx that enables access to the geographical distance searching in the Sphinx full-text search engine.

Why

Through my company Somebox, I recently led a team that launched TravelSkoot.com for NBC Digital Innovation, which is a Google Maps mash-up that allows people to group travel destinations together (called “skoots”) along with comments, ratings, etc. The entire project was built in Rails in about four months, and is living at EngineYard.

One of the challenges was to come up with an efficient way to search a large number of points, along with other metadata and fulltext searching. Normally, we would use GeoKit for this kind of thing, but once you combine fulltext with lots of other filters, things get complicated (and slow). That’s where Sphinx really shines.

I knew that Sphinx had support for geodistance, but unfortunately none of the Rails plugins did at the time. UltraSphinx offers more features than any of the other Rails/Sphinx plugins, and is based on Pat Allan’s excellent Riddle Client, which already had the basics for geodistance baked in (and required only a tiny patch to make it work right). The rest of the changes were then made to the UltraSphinx plugin to make it usable.

How it Works

To set up UltraSphinx for geodistance searches, you need to declare your latitude and longitude columns in the model. Since Sphinx expects these to be stored in radians, you can use the :function_sql option of is_indexed to do the conversion:

class Point < ActiveRecord::Base
  is_indexed :fields => [
    :title,
    :description,
    {:field => "lat", :function_sql => 'RADIANS(?)'},
    {:field => "lng", :function_sql => 'RADIANS(?)'}
  ]
end

The search itself ends up looking like this:

@search = Ultrasphinx::Search.new(
  :query          => 'pizza',
  :sort_mode   => 'extended',
  :sort_by        => 'distance asc',
  :filters        => {'distance' => 0..10000},
  :location => {
    :units => 'degrees',
    :lat => 40.343,
    :long => -74.233
  }
)
@search.run

The actual distance is then available in your models (in meters):

@search.results.first.distance

You can also filter and sort results by distance and combine all the other features of UltraSphinx (faceting, weighting, etc).

Thanks

Open source rocks. Evan Weaver was very encouraging in helping this patch along, cleaning up the API, and guiding discussion. A lot of support came through Dr. Mark Lane, who helped guide me through the internals, rewrote the tests in his roll-up patches, and pushed me to finish. Also thanks to Michael Hill, Michael Burmann, and Jason Lee for testing and feedback.

Footnotes

In order to use the geodistance features of UltraSphinx, you need to be using version 1.10 or higher (which also requires at minimum Sphinx v1198). These geo features are still young, and not without some minor issues. Be sure to consult the forum for answers (or submit a patch!), or leave a comment here if you need help. And if you end up using this patch in your project, recommend me at WWR!