A Really Simple Fix for Memory Bloat

| Comments

A common problem with Rails in a production environment is memory bloat. EngineYard recently posted about this very issue on thier blog (and I fear that several of my support tickets were maybe related!).

For example, suppose you have some code that loads tons of records, or uses an external library that needs a lot of RAM. After the task is finished, that memory can get garbage collected, but the Ruby VM doesn’t shrink. Anyone who’s needed to parse large XML files will know what I’m talking about.

This problem can be fixed by changing your code, so it doesn’t use too much memory at once. Or, you could run God or something similar to kill off greedy processes. But what if you just really DO need to use a lot of memory?

I recently wrote a rake task to resize a bunch of attachment_fu images. The model looked like this:

Then I had a rake task:

As you might imagine, when run, this task consumed all of the RAM on our server in no time. And when the downsample! method was called in production, bam, Mongrel bloat. Big FAIL.

The Solution

Then, it dawned upon me. I had been using the wonderful spawn plugin on projects to handle high-latency tasks in controllers, and to parallelize batch processing. It basically can thread or fork out a block of ruby code, and you can optionally wait for the job to finish. So I wrapped the code like this:

And there you have it. When running the rake task again, it never grew beyond 50m in size, even after processing 1000s of large images! And bye-bye to mongrel bloat.

Even though this specific example could be fixed in other ways (using MiniMagick for example), my point is that the same simple technique could be applied to any bloat-inducing code. In this case, it was just a couple lines of code.

Spawning processes is not without issues, however. But, spawn does a good job for simple things. There’s also workling, which is more flexible, and solutions that do job queuing, such as delayed_job and background-fu. These libraries are mainly designed to offload slow tasks, but as I discovered, they can be effective for memory-intesive tasks as well.