Speeding Up Data Access Times with memcached

Problem

In your deployment configuration you have one or more servers with extra resourcesin the form of RAMthat you'd like to leverage to speed up your application.

Solution

Install memcached, a distributed memory object caching system, for quick access to data-like session information or cached content. memcached can run on any server with excess RAM that you'd like to take advantage of. You run one or more instances of the memcached demon and then set up a memcache client in your Rails application, which lets you access resources stored in a distributed cache over the network.

To set up memcached, install it on the desired servers. For example:

$ apt-get install memcached

Next, you'll need to install the Ruby memcache client. Install memcache-client with:

$ sudo gem install memcache-client

With a server and the client installed, you can start the server and establish communication between it and your application. For initial testing, you can start the server-side memcached daemon with:

$ /usr/bin/memcached -vv

The -vv option tells memcached to run with verbose output, printing client commands and responses to the screen as they happen.

Once you have a server running, you need to configure memcache-client to know which servers it can connect to, as well as various other options. Rails will automatically load the memcache-client gem, if present, so you don't need to require it. Configure the client for use with your Rails application by adding the following lines (or something like them) to environment.rb:

config/environment.rb:

CACHE = MemCache.new :namespace => 'memcache_recipe',
 :c_threshold => 10_000,
 :compression => true,
 :debug => false,
 :readonly => false,
 :urlencode => false CACHE.servers = 'www.tupleshop.com:11211'
ActionController::Base.session_options[:expires] = 1800 # Auto-expire after 3 minutes ActionController::Base.session_options[:cache] = CACHE

Now, from the console, you can test the basic operations of the Cache object while watching the output of the demon running on your server. For example:

$ ruby script/console
Loading development environment.
>> CACHE.put 'my_data', {:one => 111, :two => 222} 
=> true
>> CACHE.get 'my_data'
=> {:one=>111, :two=>222}
>> CACHE.delete 'my_data'
=> true
>> CACHE.get 'my_data'
=> nil

Now you can start taking advantage of the speed of accessing data directly from RAM. The following methods demonstrate a typical caching scenario:

class User < ActiveRecord::Base
 def self.find_by_username(username)
 user = CACHE.get "user:#{username}"
 unless user then
 user = super
 CACHE.put "user:#{username}", user
 end 
 return user
 end
 def after_save
 CACHE.delete "user:#{username}"
 end 
end

The find_by_username class method takes a username and checks to see if a user record already exists in the cache. If it does, it's stored in the local user variable. Otherwise the method attempts to fetch a user record from the database via super, which invokes the noncaching version of find_by_username from ActiveRecord::Base. The result is put into the cache with the key of "user:< username >", and the user record is returned. nil is returned if no user is found. The after_save callback method ensures that data in the cache is not stale. After a record is saved, Rails will automatically invoke this method, which discards the outdated model from the cache.

Discussion

memcached is most commonly used to reduce database lookups in dynamic web applications. It's used on high-traffic web sites such as LiveJournal, Slashdot, Wikipedia, and others. If you are having performance problems, and you have the option of adding more RAM to your cluster or even a single server environment, you should experiment and decide if memcache is worth the setup and administrative overhead.

Rails comes with memcache support integrated into the framework. For example, you can set up Rails to use memcache as a your session store with the following configuration in environment.rb:

Rails::Initializer.run do |config|
 # ...
 config.action_controller.session_store = :mem_cache_store
 # ...
end 
CACHE = MemCache.new :namespace => 'memcache_recipe', :readonly => false CACHE.servers = 'www.tupleshop.com:11211'
ActionController::Base.session_options[:cache] = CACHE

The solution demonstrates how to set up customized access and storage routines within your application's model objects. If you call the solution's find_by_username method twice from the Rails console, you'll see results like this:

>> User.find_by_username('rorsini')
=> #<User:0x264d6a0 @attributes={"profile"=>"Author: Rails Cookbook", 
"username"=>"rorsini", "lastname"=>"Orsini", "firstname"=>"Rob", "id"=>"1"}>
>> User.find_by_username('rorsini')
=> #<User:0x2648420 @attributes={"profile"=>"Author: Rails Cookbook", 
"username"=>"rorsini", "id"=>"1", "firstname"=>"Rob", "lastname"=>"Orsini"}>

You get a User object each time, as expected. Watching your development logs shows what's happening with the database and memcache behind the scenes:

MemCache Get (0.017254) user:rorsini
 User Columns (0.148472) SHOW FIELDS FROM users
 User Load (0.011019) SELECT * FROM users WHERE (users.'username' = 'rorsini' ) LIMIT 1
MemCache Set (0.005070) user:rorsini MemCache Get (0.008847) user:rorsini

As you can see, the first time find_by_username is called, a request is made to Active Record, and the database is hit. Every subsequent request for that user will be returned directly from memcache, taking significantly less time and resources.

When you're ready to test memcached in your deployment environment, you will want to run each memcached server with more specific options about network addressing and the amount of RAM that each server should allocate. The following command starts memcached as a daemon running under the root user, using 2 GB of memory, and listening on IP address 10.0.0.40, port 11211:

$ sudo /usr/bin/memcached -d -m 2048 -l 10.0.0.40 -p 11211

As you experiment with the setup that give you the best performance, you can decide how many servers you want to run and how much RAM each one will contribute. If you have more than one server, you configure Rails to use them all by passing an array to CACHE.servers. For example:

CACHE.servers = %w[r2.tupleshop.com:11211, c3po.tupleshop.com:11211]

The best way to decide whether memcache (or any other performance strategy) is right for your application is to benchmark each option in a structured, even scientific manner. With solid data about what performs best, you can decide whether something like memcache is worth the extra administrative overhead.

See Also