Getting the latest sources

Sylvain Bougerel sylvain.bougerel.devel at gmail.com
Mon Nov 25 04:27:58 UTC 2013


Paul,

You are right. I need to revisit both the explanation and the understanding
of the crash.

The issue I am getting happens randomly and never happens with
partial_sort().

I will investigate more. I will keep you updated.

Regards,
Sylvain
On Nov 25, 2013 10:57 AM, "Paul" <elegant_dice at yahoo.com> wrote:

> Hi Sylvain,
>
> I just saw this email by chance, not sure why it didn't show up bigger in
> the inbox...
>
>
> http://www.cplusplus.com/reference/algorithm/nth_element/
>
> Quote:
> The other elements are left without any specific order, except that none
> of the elements preceding nth are greater than it, and none of the elements
> following it are less.
>
> So your comment regarding Guarantees cannot be correct.
> Please check your code again, I assume there is some other problem that is
> the source of the issue.
>
> cheers,
> Paul
>
>
>
>   On Sunday, 24 November 2013 10:19 PM, Sylvain Bougerel <
> sylvain.bougerel.devel at gmail.com> wrote:
>  Hi Sebastien and others,
>
> I will retry with the new addresses.
>
> The real reason why I wanted the latest sources was to do performance
> comparisons between libkdtree++ and spatial (
> https://sourceforge.net/projects/spatial/).
>
> During the course of the performance comparisons, I insert 1000000 points
> in 3 or 9 dimensions and watch how the 2 libraries behave, on insertion and
> insertion with optimization.
>
> This has lead me to the discovery of one bug very important bug and its
> fix. Spatial originally (a year ago) suffered from the same issue, so I was
> able to identify it quickly.
>
> In the KDTree::_M_optimize, the body in the version I have is:
>
>       template <typename _Iter>
>         void
>         _M_optimise(_Iter const& __A, _Iter const& __B,
>                     size_type const __L)
>       {
>         if (__A == __B) return;
>         _Node_compare_ compare(__L % __K, _M_acc, _M_cmp);
>         _Iter __m = __A + (__B - __A) / 2;
>         std::nth_element(__A, __m, __B, compare);
>         this->insert(*__m);
>         if (__m != __A) _M_optimise(__A, __m, __L+1);
>         if (++__m != __B) _M_optimise(__m, __B, __L+1);
>       }
>
> But it should be:
>
>       template <typename _Iter>
>         void
>         _M_optimise(_Iter const& __A, _Iter const& __B,
>                     size_type const __L)
>       {
>         if (__A == __B) return;
>         _Node_compare_ compare(__L % __K, _M_acc, _M_cmp);
>         _Iter __m = __A + (__B - __A) / 2;
>         std::partial_sort(__A, __m, __B, compare);
>         this->insert(*__m);
>         if (__m != __A) _M_optimise(__A, __m, __L+1);
>         if (++__m != __B) _M_optimise(__m, __B, __L+1);
>       }
>
> (notice the change from std::nth_element to std::partial_sort)
>
> Using nth_element here is a erroneous, because it does not guarantee that
> everything in the sequence [__A, __m) is lower than _m nor that everything
> in the sequence (__m, __B] is greater than __m. This is expected from the
> algorithm because it operates on a subset where the point sorted along the
> dimension __L % __K are strictly ordered around the pivot __m.
>
> In practice nth_element does "some" sorting around the pivot __m and this
> bug was not noticeable when inserting less than 10,000 elements. I started
> to get different results between some random nearest neighbor queries at
> 10,000 elements. At 1,000,000 elements, efficient_insert_and_optimize
> started to crash with Segmentation Faults on Linux.
>
> Replacing nth_element by partial_sort fixes the issues described above,
> because partial_sort enforces the guarantee needed here.
>
> If the current maintainers agree with the proposed changes, I'd like to
> request their good will to apply the changes on my behalf in the source. To
> top it off, I think, a long time ago (5 years may be?), I may have been the
> one proposing to replace a previous std::sort by std::nth_element (I
> remember a conversation with Paul on this)... So I might even have
> introduced the bug in the first place :(
>
> According to my current experiments, at the moment libkdtree++ cannot be
> assumed to maintain its invariant throughout the tree after a call to
> _M_optimize(). If you are a user of optimization in libkdtree++ with large
> data sets, then you have probably experienced this issue, if the code above
> is the one you are using.
>
> Best Regards,
> Sylvain
>
>
> On Sun, Nov 24, 2013 at 9:41 PM, Sebastian Ramacher <
> sebastian+lists at ramacher.at> wrote:
>
> On 2013-11-24 21:20:25, Sylvain Bougerel wrote:
> > Thank you Sebastien, I will wait a bit then.
>
> It should be back up:
>
> https://lists.debian.org/debian-infrastructure-announce/2013/11/msg00002.html
>
> Regards
> --
> Sebastian Ramacher
>
> _______________________________________________
> libkdtree-devel mailing list
> libkdtree-devel at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/libkdtree-devel
>
>
>
> _______________________________________________
> libkdtree-devel mailing list
> libkdtree-devel at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/libkdtree-devel
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/libkdtree-devel/attachments/20131125/2ab6b92f/attachment.html>


More information about the libkdtree-devel mailing list