Dropbox and Seemingly Incredible Upload Speeds 4 Comments

Dec 19, 2009

Dropbox is a really cool service. I love being able to share files with friends and co-workers with such ease.

But recently I noticed something a bit surprising.

When I copied a large disk image to my Dropbox, within a few seconds, it had finished uploading: “All Files up to date.” Ok, I have a pretty good connection, but it seemed impossible that I could somehow upload hundreds of megabytes in just a few seconds. I then went to download that file from a different computer, and sure enough, the whole file came down, in tact.

How was Drobox able to upload so damn quickly? I tested this again several times using various large files (music, movies, whatever). What I found was that certain large files uploaded almost immediately, while others took hours. Was it compression? Maybe delayed uploads?

Here’s a screencast showing this in action (without sound):

dropbox exposed from Jeremy Seitz on Vimeo.

From what I can tell, Dropbox calculates and remembers the unique fingerprints of files. So if user A uploads a big file, and later, user B uploads the SAME file, then Dropbox recognizes that. There’s no need to waste the bandwidth (and presumably, the disk space in Dropbox’s cloud) for that second upload.

This is brilliant, because I’m sure it saves them huge amounts of bandwidth and disk resources. However, in my testing, the files I uploaded were were not shared with anyone, and they were not public. Dropbox was able to “magically” upload huge files that I had never put in my account before. I can only assume that another user had uploaded the same files to their Dropbox before I had.

Maybe everyone is cool with it, but I think the privacy implications are pretty significant. Dropbox claims that their employees can’t see the CONTENTS of your files. But apparently, they know that different users have the SAME FILES.

With that in mind, here’s some hypothetical ways in which this could be exploited:

- Suppose a movie industry lawyer uploads a pirated film to Dropbox. If the upload finished instantly, they could potentially prove that at least one user had that file. Perhaps this is enough to convince a court to force Dropbox to release information?

- Suppose a recording studio wanted to make sure that a hot new album, ready for release, had not been leaked to the Internet. So they put the files in their own Dropbox folder. If they uploaded right away, they would realize that there was a security problem at hand.

- For a hacker, knowing when a file is NOT UNIQUE could be particularly interesting. Let’s suppose that company A has a file on Dropbox that contains encrypted data. Evil company B has some information about that file, but they don’t have the encryption key. Could they use this feature to crack it?

Maybe this sounds overly paranoid. Again, I like Dropbox, but I would not put sensitive or private data there. Clever and cool tech? Definitely. Scary? I think it is, a little.

UPDATE: According to the Wikipedia page for Dropbox, they use Delta Technology: “Files are split into chunks, the chunks are hashed and only those chunks that have never been uploaded before by any user are transmitted again. This makes uploading popular files very efficient and helps if only small portions of a large file has changed.” That certainly explains what I observed.

Mocking GeoKit's Geocoder 0 Comments

Dec 01, 2009

I've been using GeoKit a lot in Rails, most often for geocoding. Models like Points, Companies and Users usually have an address, city, state, postal code and country - which might need to be geocoded so we can place them on a map, or check distances.

It's definitely bad to let geocoder calls happen in tests. Not only does it slow things down, but if for some reason your network is not so good, tests will fail unexpectedly. Here are the helpers I like to use when writing the tests. This is using Test::Unit and Mocha:

In use, it's pretty straightforward. You either expect the geocoder to succeed (a certain number of times), fail, or not be used at all. That last point is important because it's easy to forget and have a model that geocodes on every save (via a before_save callback, for instance). That will slow down site response time because of the network delays.

Finally, here are some generic examples of each of the test helper methods in use:

A really simple fix for memory bloat 0 Comments

Nov 25, 2009
A common problem with Rails in a production environment is memory bloat. EngineYard recently posted about this very issue on thier blog (and I fear that several of my support tickets were maybe related!).

For example, suppose you have some code that loads tons of records, or uses an external library that needs a lot of RAM. After the task is finished, that memory can get garbage collected, but the Ruby VM doesn't shrink. Anyone who's needed to parse large XML files will know what I'm talking about.

This problem can be fixed by changing your code, so it doesn't use too much memory at once. Or, you could run God or something similar to kill off greedy processes. But what if you just really DO need to use a lot of memory?

I recently wrote a rake task to resize a bunch of attachment_fu images. The model looked like this:

Then I had a rake task:

As you might imagine, when run, this task consumed all of the RAM on our server in no time. And when the downsample! method was called in production, bam, Mongrel bloat. Big FAIL.

The Solution

Then, it dawned upon me. I had been using the wonderful spawn plugin on projects to handle high-latency tasks in controllers, and to parallelize batch processing. It basically can thread or fork out a block of ruby code, and you can optionally wait for the job to finish. So I wrapped the code like this:

And there you have it. When running the rake task again, it never grew beyond 50m in size, even after processing 1000s of large images! And bye-bye to mongrel bloat.

Even though this specific example could be fixed in other ways (using MiniMagick for example), my point is that the same simple technique could be applied to any bloat-inducing code. In this case, it was just a couple lines of code.

Spawning processes is not without issues, however. But, spawn does a good job for simple things. There's also workling, which is more flexible, and solutions that do job queuing, such as delayed_job and background-fu. These libraries are mainly designed to offload slow tasks, but as I discovered, they can be effective for memory-intesive tasks as well.

Peggy2 Halloween Sketch

Oct 31, 2009

I made a little Arduino sketch for my Peggy2.

Peggy2 Halloween from Jeremy Seitz on Vimeo.

Source code is here: http://github.com/somebox/peggy2-halloween

Happy Halloween!!!

Automated Sphinx Install in Mac OS X Using Rake

Sep 06, 2008

One challenge for any team building a Rails project with Sphinx: keeping everyone up to date, on the same version of searchd. We wanted to make sure it was installed the same way, same version, on everyone's dev machine. And we all work remotely of course :)

The solution for us was a rake task that downloads, compiles and installs Sphinx on OS X Leopard. We assume that you previously installed the developer tools (Mac OS X Install Disc 2, XCode).

To get the compile to work, you need to install mysql5 via MacPorts so you have the correct libs available. sudo port install mysql5 Alas, we never were able to get Leopard's MySQL libs to compile with Sphinx correctly. But never fear, installing MySQL via MacPorts will not affect the standard Apple mysqld server or client in any way.

lib/tasks/sphinx.rake :

require "#{RAILS_ROOT}/config/environment.rb"
SPHINX_SOURCE='http://www.sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz' # r1234

namespace :sphinx do  
  desc "Install Sphinx from source"
  task :install do
    build_dir = "#{ENV['HOME']}/tmp/sphinx"
    system "rm -rf #{build_dir}"
    system "mkdir -p #{build_dir}"
    puts "Downloading Sphinx indexer from #{SPHINX_SOURCE}"
    cd build_dir do 
      uri = URI.parse(SPHINX_SOURCE)
      tarball = File.basename(uri.path)
      Net::HTTP.start(uri.host) do |http|
        resp = http.get(uri.path)
        open(tarball, "wb") do |file|
          file.write(resp.body)
        end
      end
      system("tar -xzf #{tarball}")
      cd "#{build_dir}/#{tarball.gsub('.tar.gz','')}" do
        system("./configure --with-mysql-libs=/opt/local/lib/mysql5/mysql/ --with-mysql-includes=/opt/local/include/mysql5/mysql/")
        system("make")
        puts "\nRunning 'sudo make install' - this will install Sphinx."
        system("sudo make install")
      end
    end  
  system("sudo mkdir -p /opt/local/var/db/sphinx")
  system("sudo chown -R `whoami` /opt/local/var/db/sphinx")
  end 
end

The Worst Rails Code

Jun 07, 2008

I just came back from RailsConf 2008 in Portland. This year was great. There were a lot of exciting developments to talk about, like MagLev, SkyNet, mod_rails and Rails 2.1.

The talks seemed better this year as well. The one I was most looking forward to was from Obie Fernandez, who wrote The Rails Way, published last fall. I can easily say this is the best Rails book published to date (sorry, Pragmatic). It’s packed with useful information, best practices, and real-world code. Obie’s excellent writing style along with contributions from numerous Rails coders make it a great read too. My copy is already showing wear. And at 900+ pages, it’s like a phone book.

Obie’s talk was given to a packed room, despite being scheduled on Sunday morning at 9am. The title of the talk, “The Worst Rails Code You’ve Ever Seen (and how not to write it yourself)”, discouraged my friends from attending (“sounds depressing”, one said). During the first lightning round, we had seen some pretty bad code proudly presented (to which Ryan Davis publicly expressed his horror).

But the talk was worth getting up for. Through a series of real-world examples, Obie (and co-presenter Rein Henrichs) showed the audience just how bad Rails coding can get. Some of the code was truly appalling, like a 1200+ line app in a single controller (no, really). Other examples looked, well, kind of familiar. Having been involved in several Rails projects myself since 2005, I’ve seen (and written) my share of bad code.

DemocracyNow.com is a Webby Honoree

May 23, 2008

Wow, I just found out today that Democracy Now! was a honoree this year in the News, Political and Podcast categories. Congrats to everyone on the web dev team – it’s exciting to see our Rails effort up there ranked with such big names!

Democracy Now!

Geodistance Searching with Ultrasphinx 4 Comments

May 01, 2008

I’m happy to annouce a patch for Ultrasphinx that enables access to the geographical distance searching in the Sphinx full-text search engine.

Syntactical Sugar

Apr 29, 2008

On recent rails projects, I found myself clinging to a few useful helpers and additions. Here’s a few.

hide_unless

Often in views, I find I want to hide a particular div or element, but only if certain conditions are met. An ajax call might later reveal it, or replace the contents of the div with something.

 def hide_unless(condition)
    condition ? '' : 'display:none'
 end

In use:

<div id="edit_pane" style="<%= hide_unless(@story) %>"></div>

present?

Rails gives us .blank?, but I hate writing things like:

  <% if !@stories.blank? %>
    ... etc
  <% end %>

So, I add this as an extension in my projects:

class Object
  def present?
    !blank?
  end
end

And obviously it works on anything: arrays, hashes, nil, etc.

  <% if session[:setting].present? %>
   etc...
  <% end %>

UPDATE (29-Jun-2008): DHH just committed this to Edge Rails. I have no proof that I had anything to do with it, but I’ll pretend :)

user_owns_it

A common task is to check if the current user owns a particular resource.

  def user_owns_it(asset)
    asset.respond_to?(:created_by) and asset.created_by and current_user and asset.created_by.id == current_user.id
  end

This allows easy checking in views:

<% if user_owns_it(@post) %>
   link_to "Edit Your Post", edit_post_path(@post)
<% end %>

Please share if you have other interesting tidbits from your toolbox!

Fix for slow gem updates

Apr 18, 2008

Lately, rubygems seems to be slow when updating. I guess there’s a lot more gems being released than ever before. As a result, running gem update is painful:

$ sudo gem update
Updating installed gems...
Updating metadata for 345 gems from http://gems.rubyforge.org
....................... 

Argh! Turns out that there’s a “buik update” setting, and the default threshold is 1000. Metadata will be downloaded a gem at a time if there are 999 to get. Fortunately, it can be changed, by passing the -B flag to gem commands, or you can put this in ~/.gemrc :

update: -B 10
install: -B 10

Now your gem updates will be much faster.

Update: zenspider notes: the latest (last 2 actually) version of rubygems has http keepalive so it should be much much much faster and the bulk update threshold setting shouldn’t be necessary.

Autotest Sounds with playlists!

Apr 09, 2008

Ken Collins put together an awesome update to the autotest sound plugin. His version supports a playlist directory, so you can easily cycle through different init, red and green sounds. His sounds are hilarious!

http://www.metaskills.net/2008/4/6/autotest-playlist-for-red-green-feedback

I’ve been using it all day :)

My .irbrc

Dec 23, 2007

Some collected recipes to make irb and script/console a bit nicer to use:

  • pretty printing (pp whatever)
  • enhanced tab completion
  • rails logging to console
  • saves command history between sessions
  • use readline extensions module

Tips were collected from posts by Dr. Nic, Toolman Tim and Dzone Snippets.

~/.irbrc

require 'pp'
require 'irb/completion'
ARGV.concat [ "--readline", "--prompt-mode", "simple" ]
IRB.conf[:AUTO_INDENT]=true

# load console_with_helpers if possible
script_console_running = ENV.include?('RAILS_ENV') && IRB.conf[:LOAD_MODULES] && IRB.conf[:LOAD_MODULES].include?('console_with_helpers')
rails_running = ENV.include?('RAILS_ENV') && !(IRB.conf[:LOAD_MODULES] && IRB.conf[:LOAD_MODULES].include?('console_with_helpers'))
irb_standalone_running = !script_console_running && !rails_running

# log Rails stuff to STDOUT
if script_console_running
  require 'logger'
  Object.const_set(:RAILS_DEFAULT_LOGGER, Logger.new(STDOUT))
end

# Keep command-line history between startups
require 'irb/ext/save-history'
IRB.conf[:SAVE_HISTORY] = 100
IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irb-save-history" 

Update: my friend Joannou pointed out Utility Belt which looks pretty nice also.

Update 2 (a month later): Been using Utility Belt for a while and noticed some problems… it has a tendency to conflict with ActiveRecord validations and trigger bogus errors during callbacks. Also seems to destroy some of the init process with certain plugins like ActiveScaffold. Perhaps it’s too clever. I ended up rolling back to my old .irbc – YMMV.

Autotest: Now, With Sound Effects!

Jul 28, 2007

Update April 9, 2008: Ken Collins has released a new version of the sound plugin with playlist support!

We’ve all been enjoying autotest, part of the ZenTest gem. If you’ve tricked out your kit, then you have plugins configured, so at minimum you’re red, green and growling. Now, things get really fun.

Watch a screencast of autotest running with sound effects

I’m stoked to announce the sound plugin for autotest. This simple chunk of code will fire off sounds for different events in autotest. I’ve provided a set of custom-made sounds, produced with my trusty Nord Modular synthesizer and fine-tuned for an optimal testing experience. You should be able to use these all day without annoying your neighbors too much.

Here’s what you need to do:

1. Install mgp321

in OS X:
$ sudo port install mpg321
for Linux:
$ sudo apt-get install mpg321

2. Download and extract the plugin

The starter sound fx are in the zip file. Extract it in your home directory, it will create ~/autotest/sound.

autotest-sound-1_2.zip (86k)

3. Configure your ~/.autotest file:

require '~/autotest/sound/sound.rb'
Autotest::Sound.sound_path = "~/autotest/sound/sound_fx/" 

Enjoy TDD with audio feedback!

I’ve been using this setup for several weeks now. I initially wrote it as a gag, but I have since found it to be incredibly useful. It’s nice know what your testing status via audio – you don’t have to switch windows or take your eyes off the code. I’ve even turned off Growl, I don’t need it any more. audio makes testing more fun. :)

If there are any problems or feedback, please post a comment here.

UPDATE:Plugin instructions and zip file updated, now with Windows support. Thanks, John and Jamie.

UPDATE #2:Fixed bad path in instructions and doc fixes in zip file. (thanks, Matt)

Healthy Migrations

Jun 01, 2007

Continuing with the fixtures/test theme, I want to focus on the place where fixtures actually live – the database. Migrations are the blueprint, however, they often break and we don’t notice. You should alway be able to do this:

$ rake db:migrate VERSION=0
$ rake db:migrate

I used to say “what does it matter? We’re never going back to migration 3, we’re on 156 now!” This kind of thinking showed how I didn’t understand the usefulness of migrations:

  • Setting up a new development system should not require a recent database snapshot.
  • Automated tests and build notifications are simpler when the migrations are clean.
  • cap rollback will save your life some day

Migrations are not just a historical record of your database design. They instead give you a way to build your database up from scratch, doing more than just creating a schema. You can seed data, create indexes, and make transformations.

When you first start a rails project, and everything is golden, migrations are easy. Eventually, you run into problems. It happens a lot, because we typically don’t test the entire migration sequence. For example:

A model changes somewhere, and breaks a dependent migration

Using models in migrations is a common way to seed the database, or manipulate things:

class CreateNewsSection < ActiveRecord::Migration
  def self.up
    Section.create(:name => 'news', :title=>"News")
  end

  def self.down
    Section.find_by_name('news').destroy
  end
end

If you delete or refactor the Section model later, this migration will likely break. The solution for this one is to define the model in the migration:

class CreateNewsSection < ActiveRecord::Migration
  class Section < ActiveRecord::Base; end
  def self.up
    Section.create(:name => 'news', :title=>"News")
  end
  ... etc.  

Someone on the team checks in a migration that has a bug

If the problem is trivial, they might be tempted to skip reporting it and just fix it in the database to keep things moving. Or, they may not even notice the problem, depending on when they updated. These issues can lead to inconsistencies, and tests that pass for one developer, but not another!

Developers only migrate up

Migrating down should work too, what if you need to roll back to fix something in production? Always write a sensible down method and test it. It does not have to perfectly reverse the database, it just needs to return it to a state that will enable the previous migration to run it’s down method. I’ve seen horrific migrations checked in like this:

  def self.down
    # no need to do this
  end

The team works from a production db snapshot based on a deployed site

This is bad, because it means the team is probably not using TDD, and are instead relying on browser interaction to develop the app. At minimum, they are blind to migration issues. Relying on an external database for development is an unwise dependency. It also complicates setup for testing.

Keep Migrations Working!

Each time you add a migration, or refactor a number models, you should check that all the migrations are working. There are a number of solutions for doing this – the most obvious is to drop the dev db and migrate up from scratch, see if it works.

Err the blog posted a task a while back. There’s also this often referenced snippet that works. And today, I noticed this post on Ryan’s Scraps—it looks like Rails itself now has a task to do this.

However, my favorite solution at the moment is sitting in a patch #8389 (not committed at this date), which offers this bit of sugar:

# in config/environment.rb:
config.active_record.schema_format = :migrations

This setting would force rails to build the database schema from migrations, not from sql or db/schema.rb. This simple solution seems elegant, and I hope it gets committed to core!

So before you check in migrations, make sure you can run them up from scratch. And then, don’t forget to make sure your fixtures are still valid, too!

Validating Fixtures

May 23, 2007

Fixtures Are Painful

I just got back from RailsConf 2007, it was a brain-expanding conference. I spent a lot of time talking to Rails coders about how they do testing and use fixtures. I saw some patterns emerge. Generally, people agree that fixtures are painful:

  • referencing ids in the fixtures is annoying and prone to error
  • the lack of a grouping mechanism makes selecting them harder
  • fixtures get out of date with model changes

I plan to do a few posts about fixtures in the next coming weeks, to share some of what I have learned. For now, I’ll focus on validation.

Fixtures Break

The project I’m currently working on has a large number of fixtures (over 100 models). It’s become really hard to manage them all. Over time, some fixtures busted, and it became hard to diagnose random problems in tests.

We are fortunate to be working with zenspider, he’s helping us get our tests, dev process and performance in better shape. I complained a lot about invalid fixtures, and how I longed for a Rake task that could identify the broken ones. He offered up this:

namespace :db do
  namespace :fixtures do
    task :validate => :environment do
      name_map = Hash.new { |h,k| h[k] = k }

      Dir.chdir("app/models") do
        map = `grep set_table_name *.rb`.gsub(/[:\'\"]+|set_table_name/, '').split
        Hash[*map].each do |file, name|
          name_map[name] = file.sub(/\.rb$/, '')
        end
      end

      Dir["test/fixtures/*.yml"].each do |fixture|
        fixture = name_map[File.basename(fixture, ".yml")]

        begin
          klass = fixture.classify.constantize
          klass.find(:all).each do |thing|
            unless thing.valid? then
              puts "#{fixture}: id ##{thing.id} is invalid:" 
              thing.errors.full_messages.each do |msg|
                puts "   - #{msg}" 
              end
            end
          end
        rescue => e
          puts "#{fixture}: skipping: #{e.message}" 
        end
      end
    end # validate
  end # fixtures
end # db

Put it in your project Rakefile, and you can then run:

$ rake db:fixtures:validate

You will get back a list of all the fixtures that are not valid, with the validation messages.

But what about edge cases? What about “bad” form data? DHH has declared that most folks want all fixtures loaded at the start of the tests. The data in fixtures does not have to pass model validation before it is loaded into the db.

After discussing this with a number of folks, I’m starting to believe:

  • All fixtures should be valid at all times
  • Fixtures should provide stuff required by your app to run tests (your admin user, default options, etc)
  • Edge cases should be created in tests, not fixtures
  • Test invalid data by loading a good fixture and changing it

Rake db:fixtures:validate is helpful for keeping them valid. I plan to use it whenever I do a migration, just to ensure that nothing got broken.