EhCache vs Memcached caching for JEE Applications
Caching is a building block of modern JEE applications. In order to handle any significant load you will need to establish a 2nd level cache. Let's just clear a common confusion. A 2nd level cache is not the same as the cache implemented by databases. Those are 1st level caches and they usually cache data blocks, not specific objects. 2nd level cache is the cache implemented outside of the database.
In the JEE world caching is specified by the the JCache JSR. The two major open source providers of caching are TreeCache from JBoss and EhCache. Both allow the creation of a distributed cache and provide built in mechanisms to deal with dirty data. Also, both libraries work with Hibernate as 2nd level cache providers.
I have had good experiences with both libraries as an architect. The overhead of both libraries is small and they both implement JCache making them interchangeable. Both, EhCache and TreeCache work in the same JVM as the application server. Surprisingly, in the last few months I have encountered more and more JEE applications using Memcached instead of EhCache or TreeCache.
Memcached is a C++ program that acts as a 2nd level cache. It is the default standard for caching in the LAMP stack. Wikipedia and Facebook base their entire caching infrastructure on Memcached. Besides that, Memcached has APIs for PHP, Perl, Ruby and of course Java. Memcached can also work as cache inside MySQL to keep tables in memory. The Java APIs are pure Java APIs. Memcached also has a distributed mechanism implemented via smart hashing.
I can understand using Memcached in a JEE application to access Memcached infrastructure that many LAMP applications are using. But, to use Memcached as the first option for a JEE application just seems strange to me for the following reasons:
- The Memcached Java APIs are not JCache compliant. Which means you can't change your mind about them later on without paying a major price.
- The application server needs to connect to the Memcached process via sockets calls. This requires the serialization and transmission of objects between processes. Something avoided with the Cache being inside the application server.
- The JCache solutions just appear to be much faster with Java.
In order to see if my last point was valid I ran a small test. In a Linux dual core box with 4 Gigs of memory, I created a small Java application that tested both options. The application is a just a simple engine that puts and gets objects from the cache. The objects that it puts are a simple class representing personal information.
Memcached was running in the same machine as the Java program doing the testing. I first started by testing 10000 puts and 10000 gets. Then I moved to 20000 puts and 20000 gets. I kept doing the same until I got to 100000 puts and gets. I measured the average time in milliseconds to do 10000 puts and 10000 gets from the cache. I compared the results of the Memcached test against running the same test for EhCache. The object I was placing in the cache was exactly the same in both cases, and the engine did not know what type of cache it was using due to an abstraction. I used the spymemcache Java API for Memcached.
These are the results. The numbers represent the average time it takes to perform 10000 puts or 10000 gets in milliseconds.
| EhCache | Memcached | |
| put | 31 | 545 |
| get | 16 | 2072 |
As we can see EhCached is one order of magnitude faster for put operations and two orders of magnitude faster get operations.
I was planning to do a more complete test using various caching servers to test the distributed capabilities of each solution. But, after the results I decided I had enough information. I believe that we should not use Memcached for a JEE application unless, there is a necessity to interact with an existing Memcached cache.
Please email me if you want the source code of my tests.
Links
no
Posted by: | July 14, 2008 at 06:20 AM
No? You mean that you don't agree with the results or that I should do more extensive testing?
Posted by: Hugo Troche | July 14, 2008 at 09:36 AM
You should have tried to run either caches on a separate machine, so your app server would access them via tcp/ip.
I believe memcached was so slow in your example because it used serialization and ehcache did not. Serialization is a serious bottle neck, you should consider using externalization instead.
Posted by: googlebot | July 21, 2008 at 10:43 AM
Thank you for commenting. You are right, the equation is different when the cache is in a different machine that the application server.
The question then is: in a distributed cache environment, will Memcached be faster?
It could be an interesting test to try it out. On the other hand, the performance of the cache in the case where the cached object is in the same box as the application server will be orders of magnitude better. So, if the application has two servers, 50% of the cache calls should be local. If the application has 3, 33% should be local, etc. That along could be a big performance difference in the system. Also, EhCache does not distribute objects like Memcached and that would have to be taken into account. Again, without actually testing this is just speculation.
Posted by: Hugo Troche | July 21, 2008 at 11:39 AM
This comparison is not really clean. Memcached was designed to increase performance for multi-server enviroment, but not for only one JVM. For sure additional serialisation will produce additinal performance-leak, but...
...it is nothing in comparison with pure( for ex. ) database access. Try to improve your tests with 10000 DB-selects from 2,4,8.. different servers with and without both of caches...
and then pls share your test-resuls! ;)
Posted by: vpupkin | October 16, 2008 at 12:39 PM
Hi,
You are not comparing apple to apple here. EhCache is residing in the same JVM with your app server, memcached is not. I would said that in a distributed test, memcached would beat ehcache. But how to put EhCache in a distributed environment is a myth to me!
Posted by: Doug | October 28, 2008 at 08:38 PM
memcached is as much a *scaling* solution as it is a performance solution. This is like testing mapreduce using a single box.
Posted by: | December 30, 2008 at 04:50 PM