In the early 2000s, LiveJournal dominated the blogging world. While known as a pioneer in the world of online communities, many may not be aware that its creators are also responsible for one of the most important caching technologies currently powering the web: memcached (pronounced “mem-cache-dee”). Memcached is the caching engine behind Facebook, Twitter, and, a favorite at 10up, WordPress.com. Even though memcached is a stable and mature caching system, it has subtle nuances that can make it difficult to tame. Given that our work at 10up frequently involves development within memcached environments, we have become quite familiar with the ins and outs of the tool. In this article, I share some of my insights, cautions and thoughts on developing in a memcached environment.
A Note on Memcached in WordPress
The official Memcached website offers the best definition for what memcached is:
Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
By storing frequently accessed data in memory as opposed to the disk or database, memcached offers faster access to data and less strain on the database. Out of the box, WordPress does not use memcached. In order to take advantage of memcached, you need to install the necessary dependencies on your server, as well as a caching add-on that supports Memcached, such as the Memcached Object Cache drop-in. The Memcached Object Cache drop-in causes WordPress to load a memcached compatible version of the
WP_Object_Cache class instead of the built in version of the class that only offers run time object caching. Once these elements are in place, WordPress will use memcached to store objects.
It is worth noting that the nomenclature in the memcached world can be very confusing. The “in-memory data storage daemon” is known as memcached. The term “memcache” (without a “d”), tends to refer to the original memcached PECL extension that is the more widely spread and used memcached libary. To further confuse the point, Digg released a PECL extension called “memcached” in an effort to take advantage of more of the memcached features (e.g., multiget, multiset, and check and set). The WordPress Memcached Object Cache uses the memcache (no “d”) extension. Scott Taylor has recently begun work on a new Memcached Object Cache that uses the memcached (+d) PECL extension (note that this is VERY beta at the time of this writing). For a comparison of the two PECL extensions, please see Brian Moon’s article on the differences between the two extensions, as well as the PHP Client Comparison table on Google Code.
Assume That Cache Misses will Occur
Memcached is not perfect, nor is it meant to be. In a highly distributed environment with numerous memcached instances, it is likely when querying for a value stored in memcached, the queried key may not be located. This can occur due to a number of reasons, but is primarily due to one of the following reasons: the value is expired, evicted, or not found on the instance queried.
When adding or setting values in memcached, you can optionally set an expiration time. If that time is passed when you query for the key, the key will not be found. Interestingly, the key/value pair may still exist in the cache, but because the expiration time has passed, no value will be returned. The data may also have been evicted from the cache. Memcached is a Least Recently Used (LRU) cache which attempts to fight “cache pollution” by removing objects from the object that have not been used recently. This allows for more space in the cache for items that are “hotter” or used more frequently. As such, when requesting an object from cache, it may not be found due to an eviction by the LRU algorithm. Finally, while memcached attempts to get the key from the memcached instance it was stored on, it does not always succeed (e.g., server outage, issue with determining which machine stored the key). In such situations, the value will not be located.
All that detail is to say that memcached environments are built to produce cache misses and it is the job of the developer to prepare for that event. It is important that when getting data from memcached, a check is performed to verify that good data has been received, and in the event that it has not, a “fallback” plan is initiated. I briefly discussed some fallback plans during my WordCamp San Diego 2012 talk. These cache misses occur more frequently than one would expect, so it is important to plan for such a scenario.
Groups Are not a Memcached Concept
In WordPress, the wp_cache_* functions accept not only a “key” arugment, but also an optional “group” argument. The value of the group argument is prepended to the key when the object is stored in memcached. For example, if you were to assign an object to cache with
wp_cache_add( 'most-viewed', $most_viewed_articles, 'top-posts' ); for a site with $blog_id of 4293, the actual key would be “4293:top-posts:most-viewed”. At best, the group argument provides namespacing for the key. It is not a core memcached feature and is not stored any differently in the cache based on the group value.
This can be misleading because it suggests that if a value can be stored by group membership, the whole group can be easily invalidated. Unfortunately, this is not the case (note that Scott Taylor has an excellent plugin, Johnny Cache, that can invalidate by group). My assumption is that this group argument was added to the WordPress memcached backend for compatibility with WordPress’
WP_Object_Cache class for run time caching that does support grouping cached values. All that said, you can leverage an “incrementor” as part of a key to invalidate large parts of the cache, but that is very different than invalidating by group.
Invalidation is Difficult
When developing an application that uses a cache (regardless of type or implementation), determining how the cache will be invalidated can be particularly tricky. Given that memcached does not offer flushing of groups, designing the invalidation routines for the caching layer presents some challenges. As such, the first thing I consider when working with caching data is how I will eventually invalidate that data. If I can determine what I will need to do to invalidate the data, it often simplifies my strategy for generating that data in the first place. When reviewing code and reading about caching, it seems that the data invalidation is often an after thought. After a system for caching data is built, it can be difficult to figure out how to invalidate the data and therefore it gets neglected.
I have found that asking myself two questions can help with this problem: 1) When should the data be generated?, 2) When should the data be invalidated? By doing so, I tend to orient myself more to the question of how to “refresh” the data as opposed to how to “generate” the data in the first place. There is a subtle, but meaningful difference in that statement. When I think of “refreshing” the data, I consider how I transition from one version of the data to the next version of the data. If I think about “generating” the data, I tend to only think about obtaining the data. I neglect the importance of the shift between sets of data. This has recently led me to release a plugin, A Fresher Cache, that allows for easily calling functions that cause this data to be “freshened”. This tool is a major time saver when developing cached components of application and it encourages and rewards me for thinking through the how to invalidate and refresh caches.
Locking is Nearly Impossible
In the event that you are using a distributed memcached environment, it is likely because you are dealing with a high traffic site. Inherent in working with high traffic sites is dealing with race conditions. A race condition, or the “stampeding herd” problem, occurs when two different requests compete for a shared resource. This is typically more of an issue for developers dealing with multi-threaded applications, but the general principle applies for web developers dealing with large amounts of traffic. For example, imagine that you have an expensive query that generates a complex data table. This table is generated using external HTTP requests, as well as intensive database queries. The data, once generated, is cached in memcached; however, as mentioned before, you need to prepare for it magically being evicted from the cache. If the value is not in the cache, you may write your application to regenerate the data on the fly. The problem with this is that in high concurrency environments, multiple requests that generate the same data may occur. This can increase the load on the server and cause performance issues for the site, or worse yet, bring it crashing down.
One technique for dealing with the stampeding herd is to use a lock. A lock signals to the application that the data is being generated and “locks” the application from trying to generate that data again with a future request. The memcached docs refer to this as the “ghetto locking” method for stopping the stampeding herd. The issue with high concurrency is that locks in a distributed memcached environment may not be set quick enough in order for the application to be aware that the lock is set and prevent future requests for the data. In a single memcached server environment this is not an issue. When you scale to many memcached servers it immediately becomes an issue because the requests may come in so quickly that the network of memcached machines is not aware of the lock.
The best defense against this issue is to attempt to generate data on scheduled events or admin events (i.e., situations where concurrency is unlikely) and program fallbacks that are less intensive than the initial data request (I have argued for storing a backup copy of the data in some situations elsewhere). Another way to handle the issue is to create a lock that uses a non-memcached resource, such as a MySQL database. MySQL has support for locking and if the lock is set and read from a single database server in a multi-database server environment, the lock should be stable. I would recommend attempting to develop the application without a need for locking first and only deal with locking issues if absolutely necessary.
1 MB Object Size Limit
It is important to note that memcached objects are limited to 1 MB in size. Additionally, object keys are limited to 250 bytes. It becomes really important to understand these limits especially when developing using the transient functions in WordPress. Since the transient functions use WordPress’s object cache if it has been enabled (which is where memcached plugs in), storing data or keys that exceed this limit can cause very difficult to debug issues. Transients that are stored in the database have an approximately 4GB size limit and a 64 character key length limit (although 19 character are reserved for the “_transient_timeout_” prefix reducing the key length to 45 characters). As a result, I recommend always thinking about caching objects in WordPress with a 1 MB size limit and a 45 character key limit regardless of whether the transient or wp_cache_* functions are being used. This should guarantee maximum compatibility amongst systems.
Memcached, in all its power and glory, can be a pest at times. I hope by explaining some of the issues I have encountered and some of the strategies to avoid these pitfalls, you can avoid some troubles in the near future!