Getting the latest sources

Paul elegant_dice at yahoo.com
Mon Nov 25 02:57:32 UTC 2013


Hi Sylvain,

I just saw this email by chance, not sure why it didn't show up bigger in the inbox...


http://www.cplusplus.com/reference/algorithm/nth_element/

Quote: 
The
 other elements are left without any specific order, except that none of
 the elements preceding nth are greater than it, and none of the 
elements following it are less.

So your comment regarding Guarantees cannot be correct.
Please check your code again, I assume there is some other problem that is the source of the issue.

cheers,
Paul





On Sunday, 24 November 2013 10:19 PM, Sylvain Bougerel <sylvain.bougerel.devel at gmail.com> wrote:
 
Hi Sebastien and others,

I will retry with the new addresses.

The real reason why I wanted the latest sources was to do performance comparisons between libkdtree++ and spatial (https://sourceforge.net/projects/spatial/).

During the course of the performance comparisons, I insert 1000000 points in 3 or 9 dimensions and watch how the 2 libraries behave, on insertion and insertion with optimization.

This has lead me to the discovery of one bug very important bug and its fix. Spatial originally (a year ago) suffered from the same issue, so I was able to identify it quickly.

In the KDTree::_M_optimize, the body in the version I have is:

      template <typename _Iter>
        void
        _M_optimise(_Iter const& __A, _Iter const& __B,
                    size_type const __L)
      {
        if (__A == __B) return;
        _Node_compare_ compare(__L % __K, _M_acc, _M_cmp);
        _Iter __m = __A + (__B - __A) / 2;
        std::nth_element(__A, __m, __B, compare);
        this->insert(*__m);
        if (__m != __A) _M_optimise(__A, __m, __L+1);
        if (++__m != __B) _M_optimise(__m, __B, __L+1);
      }

But it should be:

      template <typename _Iter>
        void
        _M_optimise(_Iter const& __A, _Iter const& __B,
                    size_type const __L)
      {
        if (__A == __B) return;
        _Node_compare_ compare(__L % __K, _M_acc, _M_cmp);
        _Iter __m = __A + (__B - __A) / 2;
        std::partial_sort(__A, __m, __B, compare);
        this->insert(*__m);
        if (__m != __A) _M_optimise(__A, __m, __L+1);
        if (++__m != __B) _M_optimise(__m, __B, __L+1);
      }

(notice the change from std::nth_element to std::partial_sort)

Using nth_element here is a erroneous, because it does not guarantee that everything in the sequence [__A, __m) is lower than _m nor that everything in the sequence (__m, __B] is greater than __m. This is expected from the algorithm because it operates on a subset where the point sorted along the dimension __L % __K are strictly ordered around the pivot __m.

In practice nth_element does "some" sorting around the pivot __m and this bug was not noticeable when inserting less than 10,000 elements. I started to get different results between some random nearest neighbor queries at 10,000 elements. At 1,000,000 elements, efficient_insert_and_optimize started to crash with Segmentation Faults on Linux.

Replacing nth_element by partial_sort fixes the issues described above, because partial_sort enforces the guarantee needed here.

If the current maintainers agree with the proposed changes, I'd like to request their good will to apply the changes on my behalf in the source. To top it off, I think, a long time ago (5 years may be?), I may have been the one proposing to replace a previous std::sort by std::nth_element (I remember a conversation with Paul on this)... So I might even have introduced the bug in the first place :(

According to my current experiments, at the moment libkdtree++ cannot be assumed to maintain its invariant throughout the tree after a call to _M_optimize(). If you are a user of optimization in libkdtree++ with large data sets, then you have probably experienced this issue, if the code above is the one you are using.

Best Regards,
Sylvain



On Sun, Nov 24, 2013 at 9:41 PM, Sebastian Ramacher <sebastian+lists at ramacher.at> wrote:

On 2013-11-24 21:20:25, Sylvain Bougerel wrote:
>> Thank you Sebastien, I will wait a bit then.
>
>It should be back up:
>https://lists.debian.org/debian-infrastructure-announce/2013/11/msg00002.html
>
>Regards
>--
>Sebastian Ramacher
>
>_______________________________________________
>libkdtree-devel mailing list
>libkdtree-devel at lists.alioth.debian.org
>http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/libkdtree-devel
>


_______________________________________________
libkdtree-devel mailing list
libkdtree-devel at lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/libkdtree-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/libkdtree-devel/attachments/20131124/183c3cab/attachment-0001.html>


More information about the libkdtree-devel mailing list