Single dimension search

Paul Harris paulharris at computer.org
Mon Jan 21 01:53:00 UTC 2008


On 21/01/2008, Andrew Leung <anwleung at gmail.com> wrote:
> Thanks for the help Paul. I will look at this approach.
>
> Can you give me some idea of how poorly kd-trees will perform on single
> dimension queries, relative to k dimension queries? Will performance
> improve for 2 or 3 dimension queries?
>

well... set<> and sorted vectors can do a binary search over the data,
right?  eg set<> uses a tree, where each node has 2 children, left
goes to smaller children, right goes to larger children.   So you can
go through the tree branches and quickly find the point you are
looking for.

Well, kdtrees do something similar, except that at each level, its
comparing a different dimension.

So for level 1, left is smaller X, right is larger X.  that takes you
to level 2.
Level 2, left is smaller Y, right is larger Y.  that takes you to
level 3.  and so on.

so to search for a point along only 1 dimension,
Level 1 - follows one of the branches
Level 2 - follows BOTH branches
level 3 - follows both branches
level 4 (back to X) - follows one of the branches.


that is the old algorithm.  Sylvain's new algorithm uses a different
technique that appears to be much faster.


> The reason I am looking at using kd-trees is because I need to build an
> index over 6 or so dimensions, which will hold ~200,000 records. The
> index will be read-only after it is built. Queries over the index may
> query any combination of dimensions. Due to the highly skewed nature of
> the data, relational databases do not perform well enough. Are there
> better structures for this type of usage than kd-trees? Thanks.
>

this is interesting.  I suppose you could use kdtrees to search
through this dataset, but it means that at several levels it may have
to search in both directions...

However, I don't think you could use Sylvain's new algorithm...  it
works by starting from a nearby leaf in the tree and working back down
towards the trunk.   in your case, you would have many "nearby leaves"
spread across the whole tree edge.

anyway, this is an interesting application.  You can do it...

it sounds like you'd want to "visit" all of the nodes within the
range, so i'll focus on doing that.

See attached, the code will visit all the nodes that are within a
range in a single dimension.  I wrote it quite quickly so its not very
neatly laid out etc, but it appears to work.

You may just have to double-check the assumption on line 108.  And
extend it to try a heap of random numbers to check it really is
correct.

Use it with kdtree 0.6.2.  Newer kdtree may work, i didn't check.

If it proves useful to you, we should add it to the webpage or
documentation.  Please let me know how it goes.

see ya
Paul
-------------- next part --------------
A non-text attachment was scrubbed...
Name: visit_single_dimension.tar.gz
Type: application/x-gzip
Size: 1055 bytes
Desc: not available
Url : http://lists.alioth.debian.org/pipermail/libkdtree-devel/attachments/20080121/c5dddfad/attachment.bin 


More information about the libkdtree-devel mailing list