Why timing before coding is a good idea
I'm working on a system which does some analysis of incoming messages, processing them according to the results of that analysis. One such piece of analysis is implemented as a bunch of set membership checks: we take some string from the input, and compare it to predefined sets. Those sets are often composed of fixed-length strings, often numeric.
All of this is implemented in Clojure on the JVM.
It occurred to me while making coffee this morning that Clojure's set implementation is probably not internally switching implementations (as do Cocoa's collection classes), and thus is unlikely to be taking advantage of regularities in the input. For example, if a set consists entirely of 11-character strings, it would intuitively seem that checking the string-ness and length of an input prior to the set membership check would be a good idea. Similarly, it might be possible to store the set itself differently depending on its contents (perhaps as an n-way tree, or with a Bloom pre-filter). For certain repetitive inputs, memoization would be useful. Etc.
In the course of writing this up as a TODO comment, I decided to do some measuring. (I've been in this game too long to just start coding.)
The setup: 500,000 ten-digit numeric strings (US phone numbers) dumped into a Clojure PersistentHashSet in a var. Two test collections: a ‘positive’ subset of 1,000 pseudo-randomly selected members of the test set, and a ‘negative’ set: the positive set with strings reversed (none of which are present in the test data).
Prior to the JIT doing its magic, testing the negative set for membership in the test data, one at a time (doseq), took about 4ms; the positive set took 35ms.
After the JIT finished its work, those times dropped to 1ms and 4ms. That equates to 1 million failing set membership tests per second.
I then wrote a little code to test length checking. I tested the negative set (including type hinting), simply (= 10 (count s)). 1,000 tests took 2.4ms after JIT. Switching to == had no appreciable effect.
It's more than twice as fast to test the set membership of 1,000 strings against a 500,000-element set than it is to make sure they're all of length 10. That doesn't seem right, but in a way any error in my approach is irrelevant: this is how I'd have approached optimization, so it's an accurate measure of how successful I would have been.
Testing the members of the negative set against the positive set (only 1,000 elements) takes about 0.7ms after JIT. (This non-linear speedup is to be expected.)
(Update, 2009-09-14: using (.length s) and (int 10) improves this substantially, as Chouser pointed out. Stripping out side-effects, the 1,000 string length comparisons take 0.184ms; the negative set test takes 0.281ms; and the positive takes 0.77ms. The string length check, then, is slightly faster, but if the length matches you still have the set test to perform. Probably not worth the complexity. I'm leaving the rest of this post as-is.)
In summary:
- Standard data structures are fast.
- Your preconceptions of which implementation approach is ‘naïve’ are probably wrong.
- If you haven't measured it, don't even think about optimizing.
- The HotSpot JIT is really good.
- Don't rule out a design because you can't imagine the implementation being fast. This approach (using sets) is essentially a direct translation of a theoretical approach, and it's crazy fast.
Posted at 2009-09-13 15:54:00 by Richard Newman • Link to Why timing before …
