Ruby on Rails and Application Memory Consumption Patterns

Posted by Bosko Thu, 23 Mar 2006 01:40:00 GMT

First, a brief warning to my readers. This post is strictly technical in nature, so those of you who read my blog for non-technical (or less-technical) content, feel free to skip this one. If you are a developer, particularly a Ruby and/or Ruby on Rails developer, do read this; it might be useful to you or, better, save you hours of digging and analysis.

This post is about deploying Ruby on Rails web applications ~~- or any long-lived Ruby applications -~~ or, in fact, any long-lived applications, period. Specifically, this post deals with how said applications consume memory, particularly on FreeBSD (but also very likely on any UNIX-like OS). It’s also evidence of the great collaborative nature of open-source projects.

First, some background. I developed the peoplefeeds.com web application entirely using Ruby on Rails. It is currently deployed on two FreeBSD 5.4 servers (one of these is used almost exclusively for talking to other web services, such as del.icio.us or Flickr, via their respective APIs and, more generally, crawling RSS feeds; the other is the web, database, and application server). The current setup is pretty typical: Apache with proxypass, lighttpd, and four Rails (FCGI) dispatchers.

In the very first version of the application, absolutely no caching was performed. This allowed me to focus on other more important aspects of the application’s design and introduce caching later if necessary and ~~- this one is very important -~~ where necessary. A common problem when designing and implementing any software is premature optimization; it often adds complexity, and complexity is bad unless completely necessary.

One of the side effects of developing without caching, making observations, and then implementing caching where necessary, is that one gets to observe the effect of heavy code paths. Heavy code paths are code paths that tend to suck up resources (generally CPU, memory, bandwidth, etc.) and which may and probably should be cached. They’re often not cached in the initial implementation because the author did not foresee them to be heavy and therefore did not expect a problem.

There are a couple of such code paths in the earlier versions of peoplefeeds.com. I noticed, in particular, that running one specific action repeatedly caused the Rails dispatcher (a long-lived Ruby process) to grow in size (both VM size as well as RSS—resident size). This behavior was more or less expected as the action in question fetched and iterated over a large [database] result set. What was not expected was what followed after the memory increase… nothing.

If I hammered the application with just a few requests to the heavy action, one of the dispatchers would grow from about 28M resident, to about 36M resident. If I followed the heavy requests with a larger number of light action requests, the same dispatcher would often return to its initial 28M, both in VM size and RSS. So, Ruby’s Garbage Collector (GC) was certainly doing something, and I was probably doing something silly like keeping around a stale global reference and leaking memory. However, if I hammered the application with more than a few heavy action requests, it would sometimes grow to as much as 60-65M resident before apparently “stabilizing” a little. And following the end of its growth, the dispatcher would never recover to the initial 28M in VM size, nor in RSS size; this worried me and initially suggested that:

1. Either I have a leak, despite what I thought, or…

2. Ruby’s GC is not very good.

(Neither of these was the actual problem in the end).

After some analysis of my own code with Rails’ excellent ‘console’ script (bootstraps your application’s environment and let’s you run code live), I was able to make a couple of optimizations (beyond the scope of this blog entry). Although these contributed to slowing the rate of increase of the dispatcher’s VM size and RSS, they did not change the overall observations reported above.

I proceeded to make absolutely sure that Ruby’s GC was being invoked, and explicitly made sure to invoke it early enough with GC.start from my own code in case it for some reason needed more cycles. No go.

I was finally able to narrow down exactly which part of my application was causing the memory consumption to grow and, because of this, became absolutely convinced that a stale reference and lack of GC was in fact not the problem. I was looping on my large result set and for each element allocating a temporary string with the element inserted in the middle of it. Basically, iterating and calling a helper which returned a dynamically-constructed string at each iteration. Reducing the size of the constructed string significantly reduced the dispatcher’s memory consumption habits.

But why was the memory not being freed once the action completed?

At this point, my Java-loving friends argued that I should be using a well-tested, well-rounded, well-engineered, and of course mature Java framework because the Ruby runtime environment just sucked. Not so.

I ended up peeking into #ruby-lang on irc.freenode.org and got the opportunity to talk to a few people there (including Eivind Eklund, fellow FreeBSD committer), as well as Yohanes Santoso. Yohanes pointed me at this and from there a discussion and analysis ensued. Finally, I spoke with Poul-Henning Kamp (also from the FreeBSD Project, and author of the current userland malloc() implementation), and he was a major help on shedding light on the situation. Here I’ve included the full gory explanation, for your benefit and reading enjoyment.

Using a slightly modified version of Yohanes’ leak.c code (see the above link beside his name), I was able to simulate the following scenario, completely in bare C code:

1. Loop and allocate a whole bunch of fairly large (but under a page size) buffers and dirty them (write something to them);

2. Allocate a 1-byte buffer last, and dirty it;

3. At this point, the application has consumed _{225M of address space as well as RAM (}225M both in VM size and RSS)... now free all the buffers allocated in (1) above;

4. Observe the application still has a VM size of ~225M and RSS of ~225M… but almost all the buffers (pages) have been freed except the last one, occupied by the 1-byte buffer.

5. If you free the 1-byte buffer, everything gets immediately reclaimed.

6. If you don’t free the 1-byte buffer, subsequent allocations tend to dip into the process’ already owned space, so VM size nor RSS don’t continue to grow much beyond this point unless of course you make another huge sweep of new allocations.

7. If you exhert enough memory demand/pressure on the system, FreeBSD pages out all of or a portion of the first application’s consumed memory (VM size remains the same, but RSS shrinks, in some cases considerably). These never get paged back in unless a request is made for more memory (in FreeBSD, there is actually a couple of things that can happen with respect to paging these back in… more on this below).

The conclusion here is that it’s certainly not Ruby’s fault that the dispatchers weren’t shrinking in RSS and VM size. This is a normal side effect of how malloc() works. I’m told it’s the same (or similar) in Linux. Ruby GC is also likely not to blame. In fact, it is very likely that most dynamic-language (and GC-enabled) applications have a similar memory consumption pattern following large memory spikes. This somewhat contradicts what you may have been told before regarding typical Ruby on Rails dispatcher memory consumption habits (if you see a “bubble up,” you shouldn’t always be concerned).

Finally, with respect to the additional note in point (7) above… there is a documented option for FreeBSD’s malloc(3) called “H”; here is what the manual page says:

H Pass a hint to the kernel about pages unused by the allocation functions. This will help performance if the system is paging excessively. This option is off by default.

If you symlink /etc/malloc.conf to “H” (at least), the option will be turned on. When turned on, FreeBSD’s malloc(3) tells the kernel which pages are really unused so that if/when they get paged out and memory is demanded by the process again, that the kernel does not page those [unused] pages back in and merely substitues a new free page for them. This avoids unnecessary IO. This might help some if your web server is already tight on RAM (it avoids an unnecessary IO path—the tradeoff is more system call overhead due to the increased number of system calls required to remap the page, a price worth paying in some cases).