Skip to main content

Benchmarking Java Locks with Counters

These days I am analyzing some Java Flight Recordings from taken from WSO2 API Manager performance tests and I found out that main processing threads were in "BLOCKED" state in some situations.

The threads were mainly blocked due to "synchronized" methods in Java. Synchronizing the methods in a critical section of request processing causes bottlenecks and it has an impact on the throughput and overall latency.

Then I was thinking whether we could avoid synchronizing the whole method. The main problem with synchronized is that only one thread can run that critical section. When it comes to consumer/producer scenarios, we may need to give read access to data in some threads and write access to a thread to edit data exclusively. Java provides ReadWriteLock for these kinds of scenarios.

Java 8 provides another kind of lock named StampedLock. The StampedLock provides an alternative way to the standard ReadWriteLock and it also supports optimistic reads. I'm not going to compare the features and functionalities of the each lock type in this blog post. You may read the StampedLock Idioms by Dr. Heinz M. Kabutz.

I'm more interested in finding out which lock is faster when it is accessed by multiple threads. Let's write a benchmark!


The code for benchmarks


There is an article on "Java 8 StampedLocks vs. ReadWriteLocks and Synchronized" by Tal Weiss, who is the CEO of Takipi. In that article, there is a benchmark for Java locks with different counter implementations. I'm using that counters benchmark as the basis for my benchmark. 

I also found another fork of the same benchmark and it has added the Optimistic Stamped version and Fair mode of ReentrantReadWriteLock. I found out about that from the slides on "Java 8 - Stamped Lock" by Haim Yadid after I got my benchmark results.

I also looked at the article "Java Synchronization (Mutual Exclusion) Benchmark" by Baptiste Wicht.

I'm using the popular JMH library for my benchmark. The JMH has now become the standard way to write Java microbenchmarks. The benchmarks done by Tal Weiss do not use JMH.

See JMH Resources by Nitsan Wakart for an introduction to JMH and related links to get more information about JMH.

I used Thread Grouping feature in JMH and the Group states for benchmarking different counter implementations.

This is my first attempt in writing a proper microbenchmark. If there are any problems with the code, please let me know. When we talk about benchmarks, it's important to know that you should not expect the same results in a real life application and the code may behave differently in runtime.

There are 11 counter implementations. I also benchmarked the fair and non-fair modes of ReentrantLockReentrantReadWriteLock and Semaphore.

Class Diagram for Counter implementations

There are altogether 14 benchmark methods!

  1. Adder - Using LongAdder. This is introduced in Java 8.
  2. Atomic - Using AtomicLong
  3. Dirty - Not using any mechanism to control concurrent access
  4. Lock Fair Mode - Using ReentrantLock
  5. Lock Non-Fair Mode - Using ReentrantLock
  6. Read Write Lock Fair Mode - Using ReentrantReadWriteLock
  7. Read Write Lock Non-Fair Mode - Using ReentrantReadWriteLock
  8. Semaphore Fair Mode - Using Semaphore
  9. Semaphore Non-Fair Mode - Using Semaphore
  10. Stamped - Using  StampedLock
  11. Optimistic Stamped - Using  StampedLock with tryOptimisticRead(); If it fails, I used the read lock. There are no more attempts to tryOptimisticRead().
  12. Synchronized - Using synchronized block with an object
  13. Synchronized Method - Using synchronized keyword in methods
  14. Volatile  - Using volatile keyword for counter variable

The code is available at https://github.com/chrishantha/microbenchmarks/tree/v0.0.1-initial-counter-impl

Benchmark Results


As I mentioned, I used Thread Grouping feature in JMH and I ran the benchmarks for different thread group distributions. There were 10 iterations after 5 warm-up iterations. I measured only the "throughput". Measuring latency would be very difficult (as the minimum throughtput values were having around 6 digits)

The thread group distribution was passed by the "-tg" argument to the JMH and the first number was used for "get" (read) operations and the second number was used for "increment" (write) operations.

There are many combinations we can use to run the benchmark tests. I used 12 combinations for thread group distribution and those are specified in the benchmark script.

These 12 combinations include the scenarios tested by Tal Weiss and Baptiste Wicht.

The benchmark was run on my Thinkpad T530 laptop.

$ hwinfo --cpu --short
cpu:
Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3394 MHz
Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3333 MHz
Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3305 MHz
Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 3333 MHz
$ free -m
total used free shared buff/cache available
Mem: 15866 4424 7761 129 3680 11204
Swap: 18185 0 18185

Note: I added the "Dirty" counter only to compare the results, but I omitted it from the benchmark as no one wants to keep a dirty counter in their code. :)

I have committed all results to the Github repository and I used gnuplot for the graphs.

It's very important to note that the graphs show the throughput for both reader and writer threads. If you need to look at individual reader and writer throughput, you can refer the results at https://github.com/chrishantha/microbenchmarks/tree/v0.0.1-initial-counter-impl/counters/results

Let's see the results!

1 Reader, 1 Writer

2 Readers, 2 Writers

4 Readers, 4 Writers

5 Readers, 5 Writers

10 Readers, 10 Writers

16 Readers, 16 Writers

64 Readers, 64 Writers

128 Readers, 128 Writers

1 Reader, 19 Writers

19 Readers, 1 Writer

4 Readers, 16 Writers

16 Readers, 4 Writers



Conclusion


Following are some conclusions we can make when looking at above results

  1. Optimistic Stamped counter has much better throughput when there is high contention.
  2. Fair modes of the locks are very slow.
  3. Adder counter has better throughput than Atomic counter when there are more writers.
  4. When there are less threads, the Synchronized and Synchronized Method counters has better throughput than using a Read Write Lock (in non-fair mode, which is the default)
  5. The Lock counter also has better throughput than Read Write Lock when there are less threads.

Adder, Atomic and Volatile counter examples do not show a way to provide mutual exclusion, but those are thread-safe ways to keep a count. You may refer benchmark results for other counters with Java locks if you want to have a mutual exclusion to some logic in your code.

In this benchmark, the read write lock has performed poorly. The reason could be that there are writers continuously trying to access the write lock. There may be situations that a write lock may be required less frequently and therefore this benchmark is probably not a good way to evaluate performance for read write locks.

Please make sure that you run the benchmarks for your scenarios before making a decision based on these results. Even my benchmarks give slightly different results for each run. So, it's not a good idea to rely entirely on benchmarks and you must test the performance of the overall application.


If there are any questions or comments on the results or regarding benchmark code, please let me know.

Comments

Popular posts from this blog

Finding how many processors

I wanted to find out the processor details in my laptop and I found out that there are several ways to check. For example, see The RedHat community discussion on  Figuring out CPUs and Sockets . In this blog post, I'm listing few commands to find out details about CPUs. I'm using Ubuntu in my Lenovo ThinkPad T530 laptop and following commands should be working any Linux system. Display information about CPU architecture $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 58 Model name: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz Stepping: 9 CPU MHz: 1199.988 CPU max MHz: 3600.0000 CPU min MHz: 1200.0000 BogoMIPS: 5787.1...

Java Mission Control & Java Flight Recorder

Last year, I got two opportunities to talk about Java Mission Control & Java Flight Recorder. I first talked about " Using Java Mission Control & Java Flight Recorder " as an internal tech talk at WSO2 . I must thank Srinath for giving me that opportunity. After that, Prabath also invited me to do a talk at Java Colombo Meetup . Prabath, Thank you for inviting me and giving me the opportunity to talk at the Java Colombo Meetup! I'm also very excited to see that Marcus Hirt , the Team Lead for Java Mission Control has mentioned about the Java Colombo Meetup in his blog post: " My Favourite JMC Quotes ". It's so nice to see "Sri Lanka" was mentioned in his blog post! :) From Marcus' Blog Here are the slides used at the meetup. Java Colombo Meetup: Java Mission Control & Java Flight Recorder from Isuru Perera Marcus Hirt's blog posts really helped me to understand JMC & JFR concepts and his tutorials were very helpful...

Flame Graphs with Java Flight Recordings

Flame Graphs Brendon D. Gregg , who is a computer performance analyst , has created  Flame Graphs to visualize stack traces in an interactive way. You must watch his talk at USENIX/LISA13 , titled Blazing Performance with Flame Graphs , which explains Flame Graphs in detail. There can be different types of flame graphs and I'm focusing on  CPU Flame Graphs  with Java in this blog post. Please look at the Flame Graphs Description  to understand the Flame Graph visualization. CPU Flame Graphs and Java Stack Traces As  Brendon  mentioned in his talk, understanding why CPUs are busy is very important when analyzing performance.  CPU Flame Graphs  is a good way to identify hot methods from sampled stack traces. In order to generate CPU Flame Graphs for Java Stack Traces , we need a way to get sample stack traces. Brendon has given examples to use jstack  and Google's lightweight-java-profiler . Please refer to his perl program on g...