Mostly Software

Ramblings about what interests me

Optimising glogg

glogg has been around for quite a few years now (five to be exact) and the code that reads the file from disk and passes it up to the UI has remained mostly untouched since the very first commit. I have always known it was a bit of a clunky hack that would need some attention sometime but hey if it ain’t broke

One of glogg primary design goal was to be FAST, so even though since 2009 logs have moved to SSDs for many of us and a few cores have been added, maybe it’s time to wonder: how fast is it really?

Measuring performance

Probably the most important phase in every optimisation effort is measuring and doing so in an easy and repeatable way in order to drive optimisation, compare fixes and importantly get some well-deserved satisfaction when seeing numbers being slashed.

So I set up a few automatic performance tests as part of the google-tests driven test suite, excluding the UI:

Measure time taken to fully index a sample big log file (5M lines, about 500 MiB).
Read every line of the big log file sequentially.
Read every line and do basic string treatment before passing the line up to
the (simulated) UI. At the moment, this is only expanding tabs.
Read pages of the log in a “random” fashion, actually alternatively from the
front and from the end, we want this to be reproducible.
As before but does tab expansion.
Run a search with a simple regexp that matches every single line.
Run a search with a slightly more complicated regexp that matches roughly
half of the lines.
You can probably guess this one.
Browse every single line of the filtered view after having used the “some
matching” regexp.

The important thing here is to pick tests that are representative of what a user will typically do, but are still fully automated and reproducible.

Results pre-optimisation (glogg v1.0.2)

And here are the results on my old Core 2 Duo based Linux machine (the log file is on a SSD to de-emphasise IO throughput slightly):

simpleLoad 11.8s
sequentialRead 79s
sequentialReadExpanded 93s
randomPageRead 15.2s
randomPageReadExpanded 40.9s
allMatchingSearch 712s
someMatchingSearch 315s
noneMatchingSearch 11.6s
browsingSearchResults 122s

Intuitive interpretation

Those numbers, being absolute, don’t tell us much by themselves. Though casually comparing some similar tests might be enough to raise some eyebrows amongst experienced optimisers. For instance, the contribution of tab expansion is much bigger when reading by page (~60%) than when reading line by line (~15%), this points to a major inefficiency when reading by line.

Also, when searching, a matching line cost, on average, 60 times more than a non matching line, surely we want to see what glogg is doing when it finds a match!

So in the next post I will bring on the tools to look a bit more in the details.