Tutorial :Recommended Open Source Profilers [closed]


I'm trying to find open source profilers rather than using one of the commercial profilers which I have to pay $$$ for. When I performed a search on SourceForge, I have come across these four C++ profilers that I thought were quite promising:

  1. Shiny: C++ Profiler
  2. Low Fat Profiler
  3. Luke Stackwalker
  4. FreeProfiler

I'm not sure which one of the profilers would be the best one to use in terms of learning about the performance of my program. It would be great to hear some suggestions.


You could try Windows Performance Toolkit. Completely free to use. This blog entry has an example of how to do sample-based profiling.



There's more than one way to do it.

Don't forget the no-profiler method.

Most profilers assume you need 1) high statistical precision of timing (lots of samples), and 2) low precision of problem identification (functions & call-graphs).

Those priorities can be reversed. I.e. the problem can be located to the precise machine address, while cost precision is a function of the number of samples.

Most real problems cost at least 10%, where high precision is not essential.

Example: If something is making your program take 2 times as long as it should, that means there is some code in it that costs 50%. If you take 10 samples of the call stack while it is being slow, the precise line(s) of code will be present on roughly 5 of them. The larger the program is, the more likely the problem is a function call somewhere mid-stack.

It's counter-intuiitive, I know.

NOTE: xPerf is nearly there, but not quite (as far as I can tell). It takes samples of the call stack and saves them - that's good. Here's what I think it needs:

  • It should only take samples when you want them. As it is, you have to filter out the irrelevant ones.

  • In the stack view it should show specific lines or addresses at which calls take place, not just whole functions. (Maybe it can do this, I couldn't tell from the blog.)

  • If you click to get the butterfly view, centered on a single call instruction, or leaf instruction, it should show you not the CPU fraction, but the fraction of stack samples containing that instruction. That would be a direct measure of the cost of that instruction, as a fraction of time. (Maybe it can do this, I couldn't tell.) So, for example, even if an instruction were a call to file-open or something else that idles the thread, it still costs wall clock time, and you need to know that.

NOTE: I just looked over Luke Stackwalker, and the same remarks apply. I think it is on the right track but needs UI work.

ADDED: Having looked over LukeStackwalker more carefully, I'm afraid it falls victim to the assumption that measuring functions is more important than locating statements. So on each sample of the call stack, it updates the function-level timing info, but all it does with the line-number info is keep track of min and max line numbers in each function, which, the more samples it takes, the farther apart those get. So it basically throws away the most important information - the line number information. The reason that is important is that if you decide to optimize a function, you need to know which lines in it need work, and those lines were on the stack samples (before they were discarded).

One might object that if the line number information were retained it would run out of storage quickly. Two answers. 1) There are only so many lines that show up on the samples, and they show up repeatedly. 2) Not so many samples are needed - the assumption that high statistical precision of measurement is necessary has always been assumed, but never justified.

I suspect other stack samplers, like xPerf, have similar issues.


It's not open source, but AMD CodeAnalyst is free. It also works on Intel CPUs despite the name. There are versions available for both Windows (with Visual Studio integration) and Linux.


From those who have listed, I have found Luke Stackwalker to work best - I liked its GUI, it was easy to get running.

Other similar is Very Sleepy - similar functionality, sampling seems more reliable, GUI perhaps a little bit harder to use (not that graphical).

After spending some more time with them, I have found one quite important drawback. While both try to sample at 1 ms resolution, in practice they do not achieve it because their sampling method (StackWalk64 of the attached process) is way too slow. For my application it takes something like 5-20 ms to get a callstack. Not only this makes your results imprecise, it also makes them skewed, as short callstacks are walked faster, therefore tend to get more hits.


We use LtProf and have been happy with it. Not open source, but only $$, not $$$ :-)

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »