MySQL flame graph from Brendan Gregg’s article

Java performance profiling using flame graphs

One of the great advantages of microservices is that, when there is an issue, you already have a pretty good idea of where it is happening and which microservice is responsible for it. And if it is a performance issue, you have a manageable amount of code or libraries to investigate, rather than dealing with the monolith as a whole.

There are a lot of performance measurement tools that come as part of JDK itself — JConsole, VisualVM, HPROF, etc. Most of them profile the application as a whole and it would take some effort to get to class or method level hot spots. While I was trying to evaluate the performance of one of our microservices, I came across a method using flame graphs which I found very effective in finding out CPU usage of the code. This post is more of a how-to and all credits go to Brendan Gregg.


I used an EC2 machine running RHEL 7 for this exercise — although I never tried, I expect Vagrant or VirtualBox should also work. If the application is an API, you need a load testing tool like JMeter or wrk to generate traffic for the API.


At a very high level, this is what needs to be done.

I am using a RHEL machine, the commands in this post are based on it, but it should be easy to find equivalent commands for your OS. Let’s look at each of these steps in detail now.

Install perf_events

As the flame graphs are generated from the output of Linux perf_events, the first steps is to install it which provides the perf CLI command. Command to install perf_events:

yum install perf

Build perf-map-agent

When an application is running, JVM performs just-in-time (JIT) compilation of the byte code at runtime to optimize frequently used “hot” code. The byte code is converted to native code to improve performance and this native code is stored in memory. When perf runs, only this memory address is accessible and not the actual Java class or method. A tool like perf-map-agent connects to a running a JVM process and exports a map file which can be used by perf to generate the stack trace with the actual Java method names.

To build perf-map-agent follow the instructions in the source repo. It should be something like this:

git clone
cd perf-map-agent
yum install cmake
cmake .

Run the application

The next step is to run the application with the JVM option -XX:+PreserveFramePointer. Frame pointers are commonly used to provide information to the debuggers about the call stack. With this option set, perf can construct more accurate stack traces by using information in the frame pointer about the currently executing method. Using this feature requires, JDK8u60 and above.

java -XX:+PreserveFramePointer -jar app.jar

Keep the application running to until performance profile (perf record) and symbol table (perf-map-agent) are captured.

Generate load

Generate load for your application using any of the load testing tools or a different approach depending on the application.

Capture performance profile

When the application is running, start capturing the CPU profile using perf_events with the following command:

perf record -F 99 -p `pgrep java` -g -- sleep 10

Once the profiling is completed after ten seconds, this command will generate a file called

Export symbols

Assuming you have already built perf-map-agent, run the following command while the application is running to generate a map file of JVM symbols:

bin/ `pgrep java`

This command will create the file /tmp/<PID>.map. The application can be stopped at this point and is not needed for the subsequent steps.

Generate trace output

Now that we have the profile data and the symbols map, we can generate a details trace output of the profiled information. Run this command in the same directory as the file generated earlier:

perf script > out.perf

This command will look for the map file in \tmp and use it to generate the output. It will fail if the .map file is not present in /tmp.

Flame graph 🔥

Get the scripts to generate the flame graph from the source repo. Run the scripts by passing the trace output generated earlier.

git clone --depth 1
./FlameGraph/ out.perf > out.folded
./FlameGraph/ out.folded > graph.svg

graph.svg is the flame graph and it can be opened in your favorite browser to explore.


Check out the reference articles and video linked below to get more information about flame graphs. Hope you found this useful.


Data Engineering