A Short Profiling Primer
In my previous post, we got interested in the performances of glogg. To analyse them in more details, we need some tools, so here is a short introduction to a few common profiling tools. In a future post, we will then come back to glogg to see how to get useful results from them.
Profiling is a software activity that consists in analysing the non-functional aspects of a piece of software, most commonly the execution time, but memory usage for example is another frequently interesting behaviour.
Measuring execution time can be as simple as using a stopwatch (which probably means your app really needs some attention to performance) or asking users how it feels (snappy or laboured). This is perfectly fine and actually, this is the only thing that really count.
A slightly more systematic approach is to have some automated tests (possibly unit tests by more probably integration tests) that measure the time taken for a given action as opposed to testing the actual behaviour. This is what we have done for glogg in the first post of the series.
Performance tests are pretty much mandatory if you are serious about performances. They greatly help driving the optimisation process by keeping you focused on the goals and permit quick feedback when making changes. They are also the only practical way to keep track of the evolution of performance as new versions are released.
But sometimes a closer examination of the factors affecting performance is necessary. This is the time to get help from profiling tools.
Profiling tools are basically the slightly higher-tech version of splattering
your code with
printf( "%s %f", __function__, time_now() );. It will
tell you which bit of code is the most often run, takes the most time, by how
much and so on…
Given the infamous 80/20 rule (or 90/10 depending on who you ask) applies to software performance as much as to everything else, the tools will give us an invaluable guidance as to what should be optimised.
Profiling tools have existed for a long time, at least since the venerable UNIX prof back in the early 1980s. Most UNIX hackers will have heard of (and hopefully used) its famous successor gprof. This is still a very relevant tool today and actually until fairly recently, pretty much my only tool of choice.
Profiling tools can broadly be divided in two categories depending on how they work, though many tools are hybrid. A third category - probe based tools - also exists but won’t be discussed today.
They indeed instrument the code to be run, which is basically akin to adding printfs at strategically chosen places in our code. Typically entry and exit of all functions are instrumented.
This is done either statically at compile time (by telling the compiler to do so), or dynamically using some dynamic librarie loading trickery or even emulating the underlying hardware, in which case the programme does not need to be rebuilt.
Once the instruments are in place, the program is run as usual, the instruments will store the timing information somewhere and the program is terminated (hopefully) in the usual way.
The tool is then run “off line” to process and display the information collected. It includes the count of calls for each function (possibly interesting), the time taken by each function (interesting), the proportion of time spent in each function (very interesting) and some more complex deductions such as the proportion spent in each function for each call path (extremelly useful).
An excellent, modern example of an instrumentation profiler for UNIX-like systems is callgrind, part of the valgrind suite of tools.
Instrumentation profilers give us a very detailed picture of where the execution time goes, to the point where clear UIs for those tools can be difficult to design to show the wealth of information available.
They have a problem though, since they run non-trivial code everytime a function is entered or exited, they can be slow, very slow (imagine what happens in those C++ programs with tons of one-line accessor functions).
Of course this is slightly annoying, but much worse than that, because they slow down significantly the processing during analysis, they tend to skew results for programs that rely a lot on I/Os (most everyday “system” programs really). Because processing sections will look a lot worse than they are, and I/Os much less important than they are, those tools can provide erroneous information and must be used with care.
To work around this limitation, we have sampling profilers. They run the code unmodified but a few hundreds times per second, they “sample” the CPU to know which function is being run. If the interesting portion of the program is run for a long enough time, a statistical analysis of the samples will give us a reliable idea of the proportion of the time spent in each function.
Configured with a reasonable sampling rate (say under about 1000 s/sec on modern hardware), a sampling profiler will not noticeably slow down our program.
As a bonus, because they don’t rely on instrumentation, sampling profilers can also show the time spent in other programs and the kernel. Especially interesting is the time spent in the kernel on behalf of the program analysed (e.g. “How long do we spend sending data to the network?”)
Being statistical by nature though, they have some limitations to be aware of, for example they can completely miss very short events, even if they happen often.
A very good sampling profiler (that can also use probes) is perf for Linux, which was originally written to analyse the Linux kernel and is part of its distribution. It can also analyse user-space applications or the whole system.
I hope this introduction helped to clear up what profilers were and how they worked. In the next posts we will use callgrind and perf to try to find hot-spots in glogg.