Python bindings for libkdtree: FIX of find_within_range...

Michal J. Gajda mjgajda at gmail.com
Fri Mar 20 17:24:16 UTC 2009

Hi,

Indeed KD-tree data structure seems designed to use 'taxi-cab' metric instead of euclidean metric.
That's why I used the same distance type as COORD_T in my patch.
I work around this in my code by filtering points with euclidean distance afterwards.

On the other hand I would love to beta test code for other dimensions than 1f, 3f, and 6f. For now I've got surprising result of 3f projection being faster for 20k points than 6f one.
It may be because runtime grows as O(2^dim*log(n)). It may be that nobody has enough points to usefully exploit anything beyond 10D?
--
Best regards
Michal

-original message-
Subject: Re: Python bindings for libkdtree: FIX of find_within_range...
From: Paul Harris <paulharris at computer.org>
Date: 20-03-2009 15:15

Hrm.

It may be the same reason that find_within_range() would return 4 as well.

See email in archives, subject "Accuracy of libkdtree", 29 Jan 2009.

This will need to be confirmed, I'll put it on my list to try and do asap.
My theory is that it may only be checking in each dimension individually (a
Manhattan distance check), and not doing a final  Euclidean distance check.
That would be my initial guess.   I'm not sure what the logic would be
behind that, maybe just speed.  Anyway enough guess-work, next step is to
confirm.

If anyone wants to step up and figure out what is going on, and to have some
input into what should be done about it, be my guest :)

cheers,
Paul

2009/3/20 Willi Richert <w.richert at gmx.net>

> Hi,
>
> alright, I can do that.
>
> However, I just found out that the range measurement is somehow "illogic":
>
>    def test_count_within_range(self):
>        nn = KDTree_2Int()
>
>        for p in [(0,0), (1,0), (0,1), (1,1)]:
>
>         res = nn.count_within_range((0,0), 1.0)
>        self.assertEqual(3, res, "Counted %i points instead of %i"%(res, 3))
>
>         res = nn.count_within_range((0,0), 1.9)
>         self.assertEqual(4, res, "Counted %i points instead of %i"%(res,
> 4))
>
>
> libkdtree_new/python-bindings> python -m unittest
> py-kdtree_test.KDTree_2IntTestCase.test_count_within_range
> F
> ======================================================================
> FAIL: test_count_within_range (py-kdtree_test.KDTree_2IntTestCase)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "py-kdtree_test.py", line 110, in test_count_within_range
>     self.assertEqual(3, res, "Counted %i points instead of %i"%(res, 3))
> AssertionError: Counted 4 points instead of 3
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.001s
>
>
> Any suggestion why that is the case? The point (1,1) should not be counted
> as its distance to (0,0) is sqrt(2)>1.0
>
> wr
>
> On Freitag 20 März 2009 11:56:16 Paul Harris wrote:
> > Hi,
> >
> > 2009/3/20 Willi Richert <w.richert at gmx.net>
> >
> > > One more thing: The current state of having one explicit template
> > > realization
> > > in the python/swig part annoys me a little bit. Some weeks ago there
> was
> > > a discussion on the ML about whether or not dynamically supporting the
> > > specification of the dimension and type. How is the status here, Paul?
> If
> > > you
> > > have decided against it, I will provide a python file that generates
> all
> > > the
> > > stuff for float, double and int for dim \in [1, .., 20]. That will ease
> > > the maintenance a lot.  In the other case the situation for the python
> > > bindings will be much smoother. But the performance will maybe suffer.
> > > Can you shed some light on this decision, Paul?
> >
> > Its still on the list, but I haven't had the time to do anything on
> kdtree
> > for a while now, so i'm just focusing pushing through bug fixes to ensure
> > that the results are correct.
> >
> > I would guess it'll be at least a month before I can do anything, unless
> I
> > manage to find a small random window of time where I can attack that
> issue.
> >
> > Would it be too much to ask if you write the python file that generates
> all
> > the variants, and then we can do the dynamic version later?  I don't know
> > how much work that would involve, if its not too much then I would prefer
> > if you set yourself up with a low-maintenance solution first.
> >
> > cheers,
> > Paul
>
>