Logging Apache's *_cache modules efficiency
Recently I had to setup several Apache web servers with an in-RAM cache for frequently accessed content with the help of Apache's mem_cache module. All the servers where front-end servers to application servers that sits behind.
Although in some cases the content expiration time was set as low as 30 seconds, it worked great because it reduced request concurrency for the most costly requests, providing an important performance improvement and reducing heavily the requests to the application servers. The problem comes when you want to log cache hits and misses in order to analyze cache efficiency.
The *_cache modules doesn't provide any information to apache through environment variables, e.g., neither provides any logging facility.
Googling about it I even found some people that modified mem_cache module source code to add some logging facility to it. However, that's not the way in my opinion. If you are working with separate servers (separate Apache instances, not necessarily two different machines althought it's better), there is a better solution to it:
1. Enable mod_headers in the application server and add an HTTP header to register the "cost" to process the request (the time taken by the server since it received the request until the response is sent to the wire). We need to add the following header:Header set CacheMiss "%D %t"
2. In the front-end server we will add a new header (so we ned mod_headers again) and create a specific log file for the cache performance data.Header set CacheHit "%D %t"
LogFormat "%t \"%r\" %X \"%{CacheMiss}o\" \"%{CacheHit}o\" %>s" performance
CustomLog /var/log/apache2/performance.log performance
The LogFormat drective tells Apache to define a new log entry type containing:
Provided this log, is quite easy to analyze how many cache hits and misses we have over the total number of requests, the time to process a cache hit vs. a cache miss,... I highly recommend to enable this kind of log some days or weeks before enabling the cache, so you can analyze your cache performance better by comparing with historical data. This way you can know better the impact of every configuration change. This method is valid also to measure the request processing cost between front-end and backend servers, no matter if you are using a cache or not.
Enjoy with mem_cache and logging!
PS: I encourage you to read apache documentation for mod_log_config, mod_cache, mod_mem_cache and mod_headers. You may need to take a look at mod_proxy and mod_http_proxy too.




6 comentarios:
Have you tried Cherokee Josep?
I'd like to know your opinion about it
http://www.cherokee-project.com/
cheers
Ludo
Ludo,
thanks for comment. I didn't know of Cherokee Project. However, what does cherokee offers that is not offered by nginx or lighttpd?
Thanks again.
Josep.
From what I've gathered what offers cherokee is a huge eficiency and performance due to its architecture. Cherokee is extremelly modular, even the "file reader" is a plug-in ... so it has less overhead that the rest of them... The interview that the podcast "el geek herrante" made to the project leader two weeks ago is very ilustrative... that's why I wanted feedback from an expert like you ;-)
Hi Josep,
thanks for that interesting article. We have similar requirements at the moment and I wonder if simply logging the existing "Age" HTTP header field (\"%{Age}o\") would actually bring the same result.
I mean, this header field already exists (at least in our Apache configuration) and already tells us if the request was served from the cache. If it is not set in the response or if it is "0" we would know that a request was served by the backend, if it is >0 then obviously the cache as served it.
What do you think?
Hi vdietmar,
thanks a lot for posting! The problem with the Age haeader (or with other HTTP headers) is that an application can set it, for example to control content time to live for client caches (ie. browsers ). Some application servers, depending on configuration, may set it too.
On the other hand, I wanted some performance data see the real benefit of the cache and the Age header doesn't gave me this info. Using the method described here you will know, with total confidence, not only the hit/miss ratio, but also the cost of a hit and the cost of a miss, that is very important to evaluate if the resources wasted in cache are worth of it.
Thanks!
Hi Josep,
hmm, yes and no.
In our case it really is the Apache (frontend) server who sets the "Age" field in the response header when it took the response from its cache.
The (backend) Tomcat server does not use it. It just tells Apache what the maximum cache lifetime should be (max-age). And %D is already part of the logs.
So if we just want to know if the cache is working or not and how much request load was taken from the backend servers, this should work fine.
If one would actually want to look closer into cache efficiency (e.g. with regards to resource and cost calculations) than obviously your way gives us more information at hand.
Dietmar
Post a Comment