Single dimension search
Andrew Leung
anwleung at gmail.com
Mon Jan 21 21:39:37 UTC 2008
Hey Paul,
The code attached does basically what I need. Thanks a lot for the help.
I imagine I'll need to do some modifications to get the library to what
I need but this looks like it will put me on the right track.
If your interested, once I make the modifications I can post some
performance numbers to give anyone interested an idea of how the tree
will perform under various ranges. Thanks again.
Andrew
Paul Harris wrote:
> On 21/01/2008, Andrew Leung <anwleung at gmail.com> wrote:
>
>> Thanks for the help Paul. I will look at this approach.
>>
>> Can you give me some idea of how poorly kd-trees will perform on single
>> dimension queries, relative to k dimension queries? Will performance
>> improve for 2 or 3 dimension queries?
>>
>>
>
> well... set<> and sorted vectors can do a binary search over the data,
> right? eg set<> uses a tree, where each node has 2 children, left
> goes to smaller children, right goes to larger children. So you can
> go through the tree branches and quickly find the point you are
> looking for.
>
> Well, kdtrees do something similar, except that at each level, its
> comparing a different dimension.
>
> So for level 1, left is smaller X, right is larger X. that takes you
> to level 2.
> Level 2, left is smaller Y, right is larger Y. that takes you to
> level 3. and so on.
>
> so to search for a point along only 1 dimension,
> Level 1 - follows one of the branches
> Level 2 - follows BOTH branches
> level 3 - follows both branches
> level 4 (back to X) - follows one of the branches.
>
>
> that is the old algorithm. Sylvain's new algorithm uses a different
> technique that appears to be much faster.
>
>
>
>> The reason I am looking at using kd-trees is because I need to build an
>> index over 6 or so dimensions, which will hold ~200,000 records. The
>> index will be read-only after it is built. Queries over the index may
>> query any combination of dimensions. Due to the highly skewed nature of
>> the data, relational databases do not perform well enough. Are there
>> better structures for this type of usage than kd-trees? Thanks.
>>
>>
>
> this is interesting. I suppose you could use kdtrees to search
> through this dataset, but it means that at several levels it may have
> to search in both directions...
>
> However, I don't think you could use Sylvain's new algorithm... it
> works by starting from a nearby leaf in the tree and working back down
> towards the trunk. in your case, you would have many "nearby leaves"
> spread across the whole tree edge.
>
> anyway, this is an interesting application. You can do it...
>
> it sounds like you'd want to "visit" all of the nodes within the
> range, so i'll focus on doing that.
>
> See attached, the code will visit all the nodes that are within a
> range in a single dimension. I wrote it quite quickly so its not very
> neatly laid out etc, but it appears to work.
>
> You may just have to double-check the assumption on line 108. And
> extend it to try a heap of random numbers to check it really is
> correct.
>
> Use it with kdtree 0.6.2. Newer kdtree may work, i didn't check.
>
> If it proves useful to you, we should add it to the webpage or
> documentation. Please let me know how it goes.
>
> see ya
> Paul
>
More information about the libkdtree-devel
mailing list