Single dimension search

Mon Jan 21 21:39:37 UTC 2008

Hey Paul,

The code attached does basically what I need. Thanks a lot for the help. 
I imagine I'll need to do some modifications to get the library to what 
I need but this looks like it will put me on the right track.

If your interested, once I make the modifications I can post some 
performance numbers to give anyone interested an idea of how the tree 
will perform under various ranges. Thanks again.

Andrew

Paul Harris wrote:
> On 21/01/2008, Andrew Leung <anwleung at gmail.com> wrote:
>   
>> Thanks for the help Paul. I will look at this approach.
>>
>> Can you give me some idea of how poorly kd-trees will perform on single
>> dimension queries, relative to k dimension queries? Will performance
>> improve for 2 or 3 dimension queries?
>>
>>     
>
> well... set<> and sorted vectors can do a binary search over the data,
> right?  eg set<> uses a tree, where each node has 2 children, left
> goes to smaller children, right goes to larger children.   So you can
> go through the tree branches and quickly find the point you are
> looking for.
>
> Well, kdtrees do something similar, except that at each level, its
> comparing a different dimension.
>
> So for level 1, left is smaller X, right is larger X.  that takes you
> to level 2.
> Level 2, left is smaller Y, right is larger Y.  that takes you to
> level 3.  and so on.
>
> so to search for a point along only 1 dimension,
> Level 1 - follows one of the branches
> Level 2 - follows BOTH branches
> level 3 - follows both branches
> level 4 (back to X) - follows one of the branches.
>
>
> that is the old algorithm.  Sylvain's new algorithm uses a different
> technique that appears to be much faster.
>
>
>   
>> The reason I am looking at using kd-trees is because I need to build an
>> index over 6 or so dimensions, which will hold ~200,000 records. The
>> index will be read-only after it is built. Queries over the index may
>> query any combination of dimensions. Due to the highly skewed nature of
>> the data, relational databases do not perform well enough. Are there
>> better structures for this type of usage than kd-trees? Thanks.
>>
>>     
>
> this is interesting.  I suppose you could use kdtrees to search
> through this dataset, but it means that at several levels it may have
> to search in both directions...
>
> However, I don't think you could use Sylvain's new algorithm...  it
> works by starting from a nearby leaf in the tree and working back down
> towards the trunk.   in your case, you would have many "nearby leaves"
> spread across the whole tree edge.
>
> anyway, this is an interesting application.  You can do it...
>
> it sounds like you'd want to "visit" all of the nodes within the
> range, so i'll focus on doing that.
>
> See attached, the code will visit all the nodes that are within a
> range in a single dimension.  I wrote it quite quickly so its not very
> neatly laid out etc, but it appears to work.
>
> You may just have to double-check the assumption on line 108.  And
> extend it to try a heap of random numbers to check it really is
> correct.
>
> Use it with kdtree 0.6.2.  Newer kdtree may work, i didn't check.
>
> If it proves useful to you, we should add it to the webpage or
> documentation.  Please let me know how it goes.
>
> see ya
> Paul
>