glogg has been around for quite a few years now (five to be exact) and the code that reads the file from disk and passes it up to the UI has remained mostly untouched since the very first commit. I have always known it was a bit of a clunky hack that would need some attention sometime but hey if it ain’t broke…
One of glogg primary design goal was to be FAST, so even though since 2009 logs have moved to SSDs for many of us and a few cores have been added, maybe it’s time to wonder: how fast is it really?
Probably the most important phase in every optimisation effort is measuring and doing so in an easy and repeatable way in order to drive optimisation, compare fixes and importantly get some well-deserved satisfaction when seeing numbers being slashed.
So I set up a few automatic performance tests as part of the google-tests driven test suite, excluding the UI:
- Measure time taken to fully index a sample big log file (5M lines, about 500 MiB).
- Read every line of the big log file sequentially.
- Read every line and do basic string treatment before passing the line up to
- the (simulated) UI. At the moment, this is only expanding tabs.
- Read pages of the log in a “random” fashion, actually alternatively from the
- front and from the end, we want this to be reproducible.
- As before but does tab expansion.
- Run a search with a simple regexp that matches every single line.
- Run a search with a slightly more complicated regexp that matches roughly
- half of the lines.
- You can probably guess this one.
- Browse every single line of the filtered view after having used the “some
- matching” regexp.
The important thing here is to pick tests that are representative of what a user will typically do, but are still fully automated and reproducible.
Results pre-optimisation (glogg v1.0.2)
And here are the results on my old Core 2 Duo based Linux machine (the log file is on a SSD to de-emphasise IO throughput slightly):
Those numbers, being absolute, don’t tell us much by themselves. Though casually comparing some similar tests might be enough to raise some eyebrows amongst experienced optimisers. For instance, the contribution of tab expansion is much bigger when reading by page (~60%) than when reading line by line (~15%), this points to a major inefficiency when reading by line.
Also, when searching, a matching line cost, on average, 60 times more than a non matching line, surely we want to see what glogg is doing when it finds a match!
So in the next post I will bring on the tools to look a bit more in the details.