Performance of find methods, comparison with linear search

Paul Harris paulharris at computer.org
Tue Nov 20 10:25:48 UTC 2007


An important metric for me is the # of comparisons actually made.  The
benchmark is useful, but it ignores this metric.  If the distance
calculation is non-trivial, then it makes a big difference on the
actual speed of the search.

I made some tiny changes to the code and came up with these numbers,
for $ ./test_kdtree 100000 10000

lap_find_nearest:  # comparisons: 23,381,315
lap_find_nearest_opt:  # comparisons: 28066644 (another run gave about
30,000,000)
lap_iterative_linear_find_nearest:  # comparisons: 1,000,000,000


This is a little surprising, the _opt compared more than the non-opt
version of the tree.
This is with my version of the code, so it follows both branches of
the tree if the dimension is equivalent (see comments in my code i
published a while ago, i think they are now in the repo), so maybe
that has something to do with it.



to count the linear comparisons, obviously just inc a counter in the
loop.  or multiply N by NPROBE.

to count kdtree, first #define KDTREE_CHECK_PERFORMANCE, and before the loop:
    KDTree::num_dist_calcs = 0;
and after the loop,
    std::cerr << "# comparisons: " << KDTree::num_dist_calcs << std::endl;



so kdtree is still superior to linear search for anything requiring
non-trivial distance calcs, but getting back to the original
benchmark, my guess would be the slowness is due to its use of
pointers thrashing the cache?   Maybe a scheme involving index tables
or at least allocing blocks of tree nodes would help.

At the end of the day, a linear "search" through a block of memory is
going to be highly optimised via the compiler, CPU and memory
look-ahead requests, and never be beaten by a scheme that jumps around
in memory, especially it still has to jump through 30% of the hoops.
We should certainly be keeping an eye on the general overhead of the
library, but should be more concerned about things like # of
comparisons (especially with the "optimised" version).

very interesting,
Paul



More information about the libkdtree-devel mailing list