One of the great advantages of microservices is that, when there is an issue, you already have a pretty good idea of where it is happening and which microservice is responsible for it. And if it is a performance issue, you have a manageable amount of code or libraries to investigate, rather than dealing with the monolith as a whole.
There are a lot of performance measurement tools that come as part of JDK itself — JConsole, VisualVM, HPROF, etc. Most of them profile the application as a whole and it would take some effort to get to class or method level hot spots. While I was trying to evaluate the performance of one of our microservices, I came across a method using flame graphs which I found very effective in finding out CPU usage of the code. This post is more of a how-to and all credits go to Brendan Gregg.
- A Linux machine with perf
- JDK — JDK8u60 and above
- FlameGraph visualizer
- An application to profile :)
I used an EC2 machine running RHEL 7 for this exercise — although I never tried, I expect Vagrant or VirtualBox should also work. If the application is an API, you need a load testing tool like JMeter or wrk to generate traffic for the API.
At a very high level, this is what needs to be done.
- Run the Java application in the machine with
- Generate load for the application using a load testing tool
perf-recordcommand to capture performance counter profile
perf-map-agentto generate a map for JIT-compiled methods
- Generate stack trace output from the previously recorded data by running
- Generate flame graph
I am using a RHEL machine, the commands in this post are based on it, but it should be easy to find equivalent commands for your OS. Let’s look at each of these steps in detail now.
As the flame graphs are generated from the output of Linux perf_events, the first steps is to install it which provides the
perf CLI command. Command to install perf_events:
yum install perf
When an application is running, JVM performs just-in-time (JIT) compilation of the byte code at runtime to optimize frequently used “hot” code. The byte code is converted to native code to improve performance and this native code is stored in memory. When
perf runs, only this memory address is accessible and not the actual Java class or method. A tool like
perf-map-agent connects to a running a JVM process and exports a map file which can be used by
perf to generate the stack trace with the actual Java method names.
perf-map-agent follow the instructions in the source repo. It should be something like this:
git clone https://github.com/jvm-profiling-tools/perf-map-agent.git
yum install cmake
Run the application
The next step is to run the application with the JVM option
-XX:+PreserveFramePointer. Frame pointers are commonly used to provide information to the debuggers about the call stack. With this option set,
perf can construct more accurate stack traces by using information in the frame pointer about the currently executing method. Using this feature requires, JDK8u60 and above.
java -XX:+PreserveFramePointer -jar app.jar
Keep the application running to until performance profile (
perf record) and symbol table (
perf-map-agent) are captured.
Generate load for your application using any of the load testing tools or a different approach depending on the application.
Capture performance profile
When the application is running, start capturing the CPU profile using perf_events with the following command:
perf record -F 99 -p `pgrep java` -g -- sleep 10
-F 99— Run profile at this frequency
-p— Profile an existing process with this PID
-g— Generate call graph
sleep 10— Profile for ten seconds
Once the profiling is completed after ten seconds, this command will generate a file called
Assuming you have already built
perf-map-agent, run the following command while the application is running to generate a map file of JVM symbols:
bin/create-java-perf-map.sh `pgrep java`
This command will create the file
/tmp/<PID>.map. The application can be stopped at this point and is not needed for the subsequent steps.
Generate trace output
Now that we have the profile data and the symbols map, we can generate a details trace output of the profiled information. Run this command in the same directory as the
perf.data file generated earlier:
perf script > out.perf
This command will look for the map file in
\tmp and use it to generate the output. It will fail if the
.map file is not present in
Flame graph 🔥
Get the scripts to generate the flame graph from the source repo. Run the scripts by passing the trace output generated earlier.
git clone --depth 1 https://github.com/brendangregg/FlameGraph.git
./FlameGraph/stackcollapse-perf.pl out.perf > out.folded
./FlameGraph/flamegraph.pl out.folded > graph.svg
graph.svg is the flame graph and it can be opened in your favorite browser to explore.
Check out the reference articles and video linked below to get more information about flame graphs. Hope you found this useful.
CPU Flame Graphs
On this page I'll introduce and explain CPU flame graphs, list generic instructions for their creation, then discuss…