Getting the latest sources

Sylvain Bougerel sylvain.bougerel.devel at gmail.com
Sun Nov 24 14:18:52 UTC 2013


Hi Sebastien and others,

I will retry with the new addresses.

The real reason why I wanted the latest sources was to do performance
comparisons between libkdtree++ and spatial (
https://sourceforge.net/projects/spatial/).

During the course of the performance comparisons, I insert 1000000 points
in 3 or 9 dimensions and watch how the 2 libraries behave, on insertion and
insertion with optimization.

This has lead me to the discovery of one bug very important bug and its
fix. Spatial originally (a year ago) suffered from the same issue, so I was
able to identify it quickly.

In the KDTree::_M_optimize, the body in the version I have is:

      template <typename _Iter>
        void
        _M_optimise(_Iter const& __A, _Iter const& __B,
                    size_type const __L)
      {
        if (__A == __B) return;
        _Node_compare_ compare(__L % __K, _M_acc, _M_cmp);
        _Iter __m = __A + (__B - __A) / 2;
        std::nth_element(__A, __m, __B, compare);
        this->insert(*__m);
        if (__m != __A) _M_optimise(__A, __m, __L+1);
        if (++__m != __B) _M_optimise(__m, __B, __L+1);
      }

But it should be:

      template <typename _Iter>
        void
        _M_optimise(_Iter const& __A, _Iter const& __B,
                    size_type const __L)
      {
        if (__A == __B) return;
        _Node_compare_ compare(__L % __K, _M_acc, _M_cmp);
        _Iter __m = __A + (__B - __A) / 2;
        std::partial_sort(__A, __m, __B, compare);
        this->insert(*__m);
        if (__m != __A) _M_optimise(__A, __m, __L+1);
        if (++__m != __B) _M_optimise(__m, __B, __L+1);
      }

(notice the change from std::nth_element to std::partial_sort)

Using nth_element here is a erroneous, because it does not guarantee that
everything in the sequence [__A, __m) is lower than _m nor that everything
in the sequence (__m, __B] is greater than __m. This is expected from the
algorithm because it operates on a subset where the point sorted along the
dimension __L % __K are strictly ordered around the pivot __m.

In practice nth_element does "some" sorting around the pivot __m and this
bug was not noticeable when inserting less than 10,000 elements. I started
to get different results between some random nearest neighbor queries at
10,000 elements. At 1,000,000 elements, efficient_insert_and_optimize
started to crash with Segmentation Faults on Linux.

Replacing nth_element by partial_sort fixes the issues described above,
because partial_sort enforces the guarantee needed here.

If the current maintainers agree with the proposed changes, I'd like to
request their good will to apply the changes on my behalf in the source. To
top it off, I think, a long time ago (5 years may be?), I may have been the
one proposing to replace a previous std::sort by std::nth_element (I
remember a conversation with Paul on this)... So I might even have
introduced the bug in the first place :(

According to my current experiments, at the moment libkdtree++ cannot be
assumed to maintain its invariant throughout the tree after a call to
_M_optimize(). If you are a user of optimization in libkdtree++ with large
data sets, then you have probably experienced this issue, if the code above
is the one you are using.

Best Regards,
Sylvain


On Sun, Nov 24, 2013 at 9:41 PM, Sebastian Ramacher <
sebastian+lists at ramacher.at> wrote:

> On 2013-11-24 21:20:25, Sylvain Bougerel wrote:
> > Thank you Sebastien, I will wait a bit then.
>
> It should be back up:
>
> https://lists.debian.org/debian-infrastructure-announce/2013/11/msg00002.html
>
> Regards
> --
> Sebastian Ramacher
>
> _______________________________________________
> libkdtree-devel mailing list
> libkdtree-devel at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/libkdtree-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/libkdtree-devel/attachments/20131124/0e085939/attachment.html>


More information about the libkdtree-devel mailing list