[med-svn] [Git][med-team/hnswlib][upstream] New upstream version 0.5.2

Nilesh Patra (@nilesh) gitlab at salsa.debian.org
Sun Aug 8 12:26:44 BST 2021



Nilesh Patra pushed to branch upstream at Debian Med / hnswlib


Commits:
8beff38d by Nilesh Patra at 2021-08-08T16:42:09+05:30
New upstream version 0.5.2
- - - - -


12 changed files:

- .travis.yml
- README.md
- examples/pyw_hnswlib.py
- hnswlib/hnswalg.h
- python_bindings/bindings.cpp
- python_bindings/tests/bindings_test.py
- python_bindings/tests/bindings_test_getdata.py
- python_bindings/tests/bindings_test_labels.py
- + python_bindings/tests/bindings_test_metadata.py
- python_bindings/tests/bindings_test_pickle.py
- python_bindings/tests/bindings_test_resize.py
- setup.py


Changes:

=====================================
.travis.yml
=====================================
@@ -9,7 +9,15 @@ jobs:
     - name: Linux Python 3.7
       os: linux
       python: 3.7
-    
+
+    - name: Linux Python 3.8
+      os: linux
+      python: 3.8
+
+    - name: Linux Python 3.9
+      os: linux
+      python: 3.9
+
     - name: Windows Python 3.6
       os: windows
       language: shell    # 'language: python' is an error on Travis CI Windows
@@ -28,6 +36,24 @@ jobs:
         - python --version
       env: PATH=/c/Python37:/c/Python37/Scripts:$PATH
 
+    - name: Windows Python 3.8
+      os: windows
+      language: shell    # 'language: python' is an error on Travis CI Windows
+      before_install:
+        - choco install python --version 3.8.0
+        - python -m pip install --upgrade pip
+        - python --version
+      env: PATH=/c/Python38:/c/Python38/Scripts:$PATH
+
+    - name: Windows Python 3.9
+      os: windows
+      language: shell    # 'language: python' is an error on Travis CI Windows
+      before_install:
+        - choco install python --version 3.9.0
+        - python -m pip install --upgrade pip
+        - python --version
+      env: PATH=/c/Python39:/c/Python39/Scripts:$PATH
+
 install:
   - |
     python -m pip install .


=====================================
README.md
=====================================
@@ -1,10 +1,11 @@
 # Hnswlib - fast approximate nearest neighbor search
-Header-only C++ HNSW implementation with python bindings. Paper's code for the HNSW 200M SIFT experiment
+Header-only C++ HNSW implementation with python bindings.
 
 **NEWS:**
 
+* **Hnswlib is now 0.5.2**. Bugfixes - thanks [@marekhanus](https://github.com/marekhanus) for fixing the missing arguments, adding support for python 3.8, 3.9 in Travis, improving python wrapper and fixing typos/code style; [@apoorv-sharma](https://github.com/apoorv-sharma) for fixing the bug int the insertion/deletion logic; [@shengjun1985](https://github.com/shengjun1985) for simplifying the memory reallocation logic; [@TakaakiFuruse](https://github.com/TakaakiFuruse) for improved description of `add_items`; [@psobot ](https://github.com/psobot) for improving error handling; [@ShuAiii](https://github.com/ShuAiii) for reporting the bug in the python interface
 
-* **hnswlib is now 0.5.0. Added support for pickling indices, support for PEP-517 and PEP-518 building, small speedups, bug and documentation fixes. Many thanks to [@dbespalov](https://github.com/dbespalov), [@dyashuni](https://github.com/dyashuni), [@groodt](https://github.com/groodt),[@uestc-lfs](https://github.com/uestc-lfs), [@vinnitu](https://github.com/vinnitu), [@fabiencastan](https://github.com/fabiencastan), [@JinHai-CN](https://github.com/JinHai-CN), [@js1010](https://github.com/js1010)!**
+* **Hnswlib is now 0.5.0**. Added support for pickling indices, support for PEP-517 and PEP-518 building, small speedups, bug and documentation fixes. Many thanks to [@dbespalov](https://github.com/dbespalov), [@dyashuni](https://github.com/dyashuni), [@groodt](https://github.com/groodt),[@uestc-lfs](https://github.com/uestc-lfs), [@vinnitu](https://github.com/vinnitu), [@fabiencastan](https://github.com/fabiencastan), [@JinHai-CN](https://github.com/JinHai-CN), [@js1010](https://github.com/js1010)!
 
 * **Thanks to Apoorv Sharma [@apoorv-sharma](https://github.com/apoorv-sharma), hnswlib now supports true element updates (the interface remained the same, but when you the performance/memory should not degrade as you update the element embeddings).**
 
@@ -41,18 +42,18 @@ For other spaces use the nmslib library https://github.com/nmslib/nmslib.
 * `hnswlib.Index(space, dim)` creates a non-initialized index an HNSW in space `space` with integer dimension `dim`.
 
 `hnswlib.Index` methods:
-* `init_index(max_elements, ef_construction = 200, M = 16, random_seed = 100)` initializes the index from with no elements. 
+* `init_index(max_elements, M = 16, ef_construction = 200, random_seed = 100)` initializes the index from with no elements. 
     * `max_elements` defines the maximum number of elements that can be stored in the structure(can be increased/shrunk).
     * `ef_construction` defines a construction time/accuracy trade-off (see [ALGO_PARAMS.md](ALGO_PARAMS.md)).
     * `M` defines tha maximum number of outgoing connections in the graph ([ALGO_PARAMS.md](ALGO_PARAMS.md)).
     
-* `add_items(data, data_labels, num_threads = -1)` - inserts the `data`(numpy array of vectors, shape:`N*dim`) into the structure. 
-    * `labels` is an optional N-size numpy array of integer labels for all elements in `data`.
+* `add_items(data, ids, num_threads = -1)` - inserts the `data`(numpy array of vectors, shape:`N*dim`) into the structure. 
     * `num_threads` sets the number of cpu threads to use (-1 means use default).
-    * `data_labels` specifies the labels for the data. If index already has the elements with the same labels, their features will be updated. Note that update procedure is slower than insertion of a new element, but more memory- and query-efficient.
+    * `ids` are optional N-size numpy array of integer labels for all elements in `data`. 
+      - If index already has the elements with the same labels, their features will be updated. Note that update procedure is slower than insertion of a new element, but more memory- and query-efficient.
     * Thread-safe with other `add_items` calls, but not with `knn_query`.
     
-* `mark_deleted(data_label)`  - marks the element as deleted, so it will be omitted from search results.
+* `mark_deleted(label)`  - marks the element as deleted, so it will be omitted from search results.
 
 * `resize_index(new_size)` - changes the maximum capacity of the index. Not thread safe with `add_items` and `knn_query`.
 
@@ -113,7 +114,7 @@ num_elements = 10000
 
 # Generating sample data
 data = np.float32(np.random.random((num_elements, dim)))
-data_labels = np.arange(num_elements)
+ids = np.arange(num_elements)
 
 # Declaring index
 p = hnswlib.Index(space = 'l2', dim = dim) # possible options are l2, cosine or ip
@@ -122,7 +123,7 @@ p = hnswlib.Index(space = 'l2', dim = dim) # possible options are l2, cosine or
 p.init_index(max_elements = num_elements, ef_construction = 200, M = 16)
 
 # Element insertion (can be called several times):
-p.add_items(data, data_labels)
+p.add_items(data, ids)
 
 # Controlling the recall by setting ef:
 p.set_ef(50) # ef should always be > k
@@ -295,4 +296,13 @@ To run test **with** updates (from `build` directory)
 
 ### References
 
-Malkov, Yu A., and D. A. Yashunin. "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." TPAMI, preprint: https://arxiv.org/abs/1603.09320
+ at article{malkov2018efficient,
+  title={Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs},
+  author={Malkov, Yu A and Yashunin, Dmitry A},
+  journal={IEEE transactions on pattern analysis and machine intelligence},
+  volume={42},
+  number={4},
+  pages={824--836},
+  year={2018},
+  publisher={IEEE}
+}


=====================================
examples/pyw_hnswlib.py
=====================================
@@ -11,8 +11,8 @@ class Index():
         self.dict_labels = {}
         self.cur_ind = 0
 
-    def init_index(self, max_elements, ef_construction = 200, M = 16):
-        self.index.init_index(max_elements = max_elements, ef_construction = ef_construction, M = M)
+    def init_index(self, max_elements, ef_construction=200, M=16):
+        self.index.init_index(max_elements=max_elements, ef_construction=ef_construction, M=M)
 
     def add_items(self, data, ids=None):
         if ids is not None:
@@ -55,8 +55,7 @@ class Index():
         labels_int, distances = self.index.knn_query(data=data, k=k)
         labels = []
         for li in labels_int:
-            line = []
-            for l in li:
-                line.append(self.dict_labels[l])
-            labels.append(line)
+            labels.append(
+                [self.dict_labels[l] for l in li]
+            )
         return labels, distances


=====================================
hnswlib/hnswalg.h
=====================================
@@ -573,29 +573,23 @@ namespace hnswlib {
             visited_list_pool_ = new VisitedListPool(1, new_max_elements);
 
 
-
             element_levels_.resize(new_max_elements);
 
             std::vector<std::mutex>(new_max_elements).swap(link_list_locks_);
 
             // Reallocate base layer
-            char * data_level0_memory_new = (char *) malloc(new_max_elements * size_data_per_element_);
+            char * data_level0_memory_new = (char *) realloc(data_level0_memory_, new_max_elements * size_data_per_element_);
             if (data_level0_memory_new == nullptr)
                 throw std::runtime_error("Not enough memory: resizeIndex failed to allocate base layer");
-            memcpy(data_level0_memory_new, data_level0_memory_,cur_element_count * size_data_per_element_);
-            free(data_level0_memory_);
-            data_level0_memory_=data_level0_memory_new;
+            data_level0_memory_ = data_level0_memory_new;
 
             // Reallocate all other layers
-            char ** linkLists_new = (char **) malloc(sizeof(void *) * new_max_elements);
+            char ** linkLists_new = (char **) realloc(linkLists_, sizeof(void *) * new_max_elements);
             if (linkLists_new == nullptr)
                 throw std::runtime_error("Not enough memory: resizeIndex failed to allocate other layers");
-            memcpy(linkLists_new, linkLists_,cur_element_count * sizeof(void *));
-            free(linkLists_);
-            linkLists_=linkLists_new;
-
-            max_elements_=new_max_elements;
+            linkLists_ = linkLists_new;
 
+            max_elements_ = new_max_elements;
         }
 
         void saveIndex(const std::string &location) {
@@ -987,11 +981,15 @@ namespace hnswlib {
                 auto search = label_lookup_.find(label);
                 if (search != label_lookup_.end()) {
                     tableint existingInternalId = search->second;
-
                     templock_curr.unlock();
 
                     std::unique_lock <std::mutex> lock_el_update(link_list_update_locks_[(existingInternalId & (max_update_element_locks - 1))]);
+
+                    if (isMarkedDeleted(existingInternalId)) {
+                        unmarkDeletedInternal(existingInternalId);
+                    }
                     updatePoint(data_point, existingInternalId, 1.0);
+                    
                     return existingInternalId;
                 }
 


=====================================
python_bindings/bindings.cpp
=====================================
@@ -97,6 +97,8 @@ public:
     else if(space_name=="cosine") {
       l2space = new hnswlib::InnerProductSpace(dim);
       normalize=true;
+    } else {
+      throw new std::runtime_error("Space name must be one of l2, ip, or cosine.");
     }
     appr_alg = NULL;
     ep_added = true;
@@ -162,6 +164,7 @@ public:
       }
       appr_alg = new hnswlib::HierarchicalNSW<dist_t>(l2space, path_to_index, false, max_elements);
       cur_l = appr_alg->cur_element_count;
+      index_inited = true;
     }
 
     void normalize_vector(float *data, float *norm_array){
@@ -670,6 +673,8 @@ PYBIND11_PLUGIN(hnswlib) {
         .def("load_index", &Index<float>::loadIndex, py::arg("path_to_index"), py::arg("max_elements")=0)
         .def("mark_deleted", &Index<float>::markDeleted, py::arg("label"))
         .def("resize_index", &Index<float>::resizeIndex, py::arg("new_size"))
+        .def("get_max_elements", &Index<float>::getMaxElements)
+        .def("get_current_count", &Index<float>::getCurrentCount)
         .def_readonly("space", &Index<float>::space_name)
         .def_readonly("dim", &Index<float>::dim)
         .def_readwrite("num_threads", &Index<float>::num_threads_default)


=====================================
python_bindings/tests/bindings_test.py
=====================================
@@ -18,15 +18,15 @@ class RandomSelfTestCase(unittest.TestCase):
         # Declaring index
         p = hnswlib.Index(space='l2', dim=dim)  # possible options are l2, cosine or ip
 
-        # Initing index
+        # Initiating index
         # max_elements - the maximum number of elements, should be known beforehand
         #     (probably will be made optional in the future)
         #
         # ef_construction - controls index search speed/build speed tradeoff
         # M - is tightly connected with internal dimensionality of the data
-        #     stronlgy affects the memory consumption
+        #     strongly affects the memory consumption
 
-        p.init_index(max_elements = num_elements, ef_construction = 100, M = 16)
+        p.init_index(max_elements=num_elements, ef_construction=100, M=16)
 
         # Controlling the recall by setting ef:
         # higher ef leads to better accuracy, but slower search
@@ -51,7 +51,7 @@ class RandomSelfTestCase(unittest.TestCase):
         p.save_index(index_path)
         del p
 
-        # Reiniting, loading the index
+        # Re-initiating, loading the index
         p = hnswlib.Index(space='l2', dim=dim)  # you can change the sa
 
         print("\nLoading index from '%s'\n" % index_path)


=====================================
python_bindings/tests/bindings_test_getdata.py
=====================================
@@ -19,13 +19,13 @@ class RandomSelfTestCase(unittest.TestCase):
         # Declaring index
         p = hnswlib.Index(space='l2', dim=dim)  # possible options are l2, cosine or ip
 
-        # Initing index
+        # Initiating index
         # max_elements - the maximum number of elements, should be known beforehand
         #     (probably will be made optional in the future)
         #
         # ef_construction - controls index search speed/build speed tradeoff
         # M - is tightly connected with internal dimensionality of the data
-        #     stronlgy affects the memory consumption
+        #     strongly affects the memory consumption
 
         p.init_index(max_elements=num_elements, ef_construction=100, M=16)
 


=====================================
python_bindings/tests/bindings_test_labels.py
=====================================
@@ -21,13 +21,13 @@ class RandomSelfTestCase(unittest.TestCase):
             # Declaring index
             p = hnswlib.Index(space='l2', dim=dim)  # possible options are l2, cosine or ip
 
-            # Initing index
+            # Initiating index
             # max_elements - the maximum number of elements, should be known beforehand
             #     (probably will be made optional in the future)
             #
             # ef_construction - controls index search speed/build speed tradeoff
             # M - is tightly connected with internal dimensionality of the data
-            #     stronlgy affects the memory consumption
+            #     strongly affects the memory consumption
 
             p.init_index(max_elements=num_elements, ef_construction=100, M=16)
 
@@ -47,7 +47,7 @@ class RandomSelfTestCase(unittest.TestCase):
             # Query the elements for themselves and measure recall:
             labels, distances = p.knn_query(data1, k=1)
 
-            items=p.get_items(labels)
+            items = p.get_items(labels)
 
             # Check the recall:
             self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data1))), 1.0, 3)
@@ -67,8 +67,8 @@ class RandomSelfTestCase(unittest.TestCase):
             print("Deleted")
 
             print("\n**** Mark delete test ****\n")
-            # Reiniting, loading the index
-            print("Reiniting")
+            # Re-initiating, loading the index
+            print("Re-initiating")
             p = hnswlib.Index(space='l2', dim=dim)
 
             print("\nLoading index from '%s'\n" % index_path)
@@ -80,17 +80,17 @@ class RandomSelfTestCase(unittest.TestCase):
 
             # Query the elements for themselves and measure recall:
             labels, distances = p.knn_query(data, k=1)
-            items=p.get_items(labels)
+            items = p.get_items(labels)
 
             # Check the recall:
             self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data))), 1.0, 3)
 
             # Check that the returned element data is correct:
-            diff_with_gt_labels=np.mean(np.abs(data-items))
+            diff_with_gt_labels = np.mean(np.abs(data-items))
             self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4) # deleting index.
 
             # Checking that all labels are returned correctly:
-            sorted_labels=sorted(p.get_ids_list())
+            sorted_labels = sorted(p.get_ids_list())
             self.assertEqual(np.sum(~np.asarray(sorted_labels) == np.asarray(range(num_elements))), 0)
 
             # Delete data1


=====================================
python_bindings/tests/bindings_test_metadata.py
=====================================
@@ -0,0 +1,49 @@
+import unittest
+
+import numpy as np
+
+import hnswlib
+
+
+class RandomSelfTestCase(unittest.TestCase):
+    def testMetadata(self):
+
+        dim = 16
+        num_elements = 10000
+
+        # Generating sample data
+        data = np.float32(np.random.random((num_elements, dim)))
+
+        # Declaring index
+        p = hnswlib.Index(space='l2', dim=dim)  # possible options are l2, cosine or ip
+
+        # Initing index
+        # max_elements - the maximum number of elements, should be known beforehand
+        #     (probably will be made optional in the future)
+        #
+        # ef_construction - controls index search speed/build speed tradeoff
+        # M - is tightly connected with internal dimensionality of the data
+        #     stronlgy affects the memory consumption
+
+        p.init_index(max_elements=num_elements, ef_construction=100, M=16)
+
+        # Controlling the recall by setting ef:
+        # higher ef leads to better accuracy, but slower search
+        p.set_ef(100)
+
+        p.set_num_threads(4)  # by default using all available cores
+
+        print("Adding all elements (%d)" % (len(data)))
+        p.add_items(data)
+
+        # test methods
+        self.assertEqual(p.get_max_elements(), num_elements)
+        self.assertEqual(p.get_current_count(), num_elements)
+
+        # test properties
+        self.assertEqual(p.space, 'l2')
+        self.assertEqual(p.dim, dim)
+        self.assertEqual(p.M, 16)
+        self.assertEqual(p.ef_construction, 100)
+        self.assertEqual(p.max_elements, num_elements)
+        self.assertEqual(p.element_count, num_elements)


=====================================
python_bindings/tests/bindings_test_pickle.py
=====================================
@@ -60,38 +60,38 @@ def test_space_main(self, space, dim):
 
     p.num_threads = self.num_threads  # by default using all available cores
 
-    p0 = pickle.loads(pickle.dumps(p)) ### pickle un-initialized Index
+    p0 = pickle.loads(pickle.dumps(p)) # pickle un-initialized Index
     p.init_index(max_elements=self.num_elements, ef_construction=self.ef_construction, M=self.M)
     p0.init_index(max_elements=self.num_elements, ef_construction=self.ef_construction, M=self.M)
 
     p.ef = self.ef
     p0.ef = self.ef
 
-    p1 = pickle.loads(pickle.dumps(p)) ### pickle Index before adding items
+    p1 = pickle.loads(pickle.dumps(p)) # pickle Index before adding items
 
-    ### add items to ann index p,p0,p1
+    # add items to ann index p,p0,p1
     p.add_items(data)
     p1.add_items(data)
     p0.add_items(data)
 
-    p2=pickle.loads(pickle.dumps(p)) ### pickle Index before adding items
+    p2=pickle.loads(pickle.dumps(p)) # pickle Index before adding items
 
     self.assertTrue(np.allclose(p.get_items(), p0.get_items()), "items for p and p0 must be same")
     self.assertTrue(np.allclose(p0.get_items(), p1.get_items()), "items for p0 and p1 must be same")
     self.assertTrue(np.allclose(p1.get_items(), p2.get_items()), "items for p1 and p2 must be same")
 
-    ### Test if returned distances are same
+    # Test if returned distances are same
     l, d = p.knn_query(test_data, k=self.k)
     l0, d0 = p0.knn_query(test_data, k=self.k)
     l1, d1 = p1.knn_query(test_data, k=self.k)
     l2, d2 = p2.knn_query(test_data, k=self.k)
 
-    self.assertLessEqual(np.sum(((d-d0)**2.)>1e-3), self.dists_err_thresh, msg=f"knn distances returned by p and p0 must match")
-    self.assertLessEqual(np.sum(((d0-d1)**2.)>1e-3), self.dists_err_thresh, msg=f"knn distances returned by p0 and p1 must match")
-    self.assertLessEqual(np.sum(((d1-d2)**2.)>1e-3), self.dists_err_thresh, msg=f"knn distances returned by p1 and p2 must match")
+    self.assertLessEqual(np.sum(((d-d0)**2.) > 1e-3), self.dists_err_thresh, msg=f"knn distances returned by p and p0 must match")
+    self.assertLessEqual(np.sum(((d0-d1)**2.) > 1e-3), self.dists_err_thresh, msg=f"knn distances returned by p0 and p1 must match")
+    self.assertLessEqual(np.sum(((d1-d2)**2.) > 1e-3), self.dists_err_thresh, msg=f"knn distances returned by p1 and p2 must match")
 
-    ### check if ann results match brute-force search
-    ###   allow for 2 labels to be missing from ann results
+    # check if ann results match brute-force search
+    #   allow for 2 labels to be missing from ann results
     check_ann_results(self, space, data, test_data, self.k, l, d,
                            err_thresh=self.label_err_thresh,
                            total_thresh=self.item_err_thresh,
@@ -102,19 +102,19 @@ def test_space_main(self, space, dim):
                            total_thresh=self.item_err_thresh,
                            dists_thresh=self.dists_err_thresh)
 
-    ### Check ef parameter value
+    # Check ef parameter value
     self.assertEqual(p.ef, self.ef, "incorrect value of p.ef")
     self.assertEqual(p0.ef, self.ef, "incorrect value of p0.ef")
     self.assertEqual(p2.ef, self.ef, "incorrect value of p2.ef")
     self.assertEqual(p1.ef, self.ef, "incorrect value of p1.ef")
 
-    ### Check M parameter value
+    # Check M parameter value
     self.assertEqual(p.M, self.M, "incorrect value of p.M")
     self.assertEqual(p0.M, self.M, "incorrect value of p0.M")
     self.assertEqual(p1.M, self.M, "incorrect value of p1.M")
     self.assertEqual(p2.M, self.M, "incorrect value of p2.M")
 
-    ### Check ef_construction parameter value
+    # Check ef_construction parameter value
     self.assertEqual(p.ef_construction, self.ef_construction, "incorrect value of p.ef_construction")
     self.assertEqual(p0.ef_construction, self.ef_construction, "incorrect value of p0.ef_construction")
     self.assertEqual(p1.ef_construction, self.ef_construction, "incorrect value of p1.ef_construction")
@@ -135,12 +135,12 @@ class PickleUnitTests(unittest.TestCase):
         self.num_threads = 4
         self.k = 25
 
-        self.label_err_thresh = 5  ### max number of missing labels allowed per test item
-        self.item_err_thresh = 5   ### max number of items allowed with incorrect labels
+        self.label_err_thresh = 5  # max number of missing labels allowed per test item
+        self.item_err_thresh = 5   # max number of items allowed with incorrect labels
 
-        self.dists_err_thresh = 50 ### for two matrices, d1 and d2, dists_err_thresh controls max
-                                 ### number of value pairs that are allowed to be different in d1 and d2
-                                 ### i.e., number of values that are (d1-d2)**2>1e-3
+        self.dists_err_thresh = 50 # for two matrices, d1 and d2, dists_err_thresh controls max
+                                 # number of value pairs that are allowed to be different in d1 and d2
+                                 # i.e., number of values that are (d1-d2)**2>1e-3
 
     def test_inner_product_space(self):
         test_space_main(self, 'ip', 48)


=====================================
python_bindings/tests/bindings_test_resize.py
=====================================
@@ -7,71 +7,71 @@ import hnswlib
 
 class RandomSelfTestCase(unittest.TestCase):
     def testRandomSelf(self):
-      for idx in range(16):
-        print("\n**** Index resize test ****\n")
+        for idx in range(16):
+            print("\n**** Index resize test ****\n")
 
-        np.random.seed(idx)
-        dim = 16
-        num_elements = 10000
+            np.random.seed(idx)
+            dim = 16
+            num_elements = 10000
 
-        # Generating sample data
-        data = np.float32(np.random.random((num_elements, dim)))
+            # Generating sample data
+            data = np.float32(np.random.random((num_elements, dim)))
 
-        # Declaring index
-        p = hnswlib.Index(space='l2', dim=dim)  # possible options are l2, cosine or ip
+            # Declaring index
+            p = hnswlib.Index(space='l2', dim=dim)  # possible options are l2, cosine or ip
 
-        # Initing index
-        # max_elements - the maximum number of elements, should be known beforehand
-        #     (probably will be made optional in the future)
-        #
-        # ef_construction - controls index search speed/build speed tradeoff
-        # M - is tightly connected with internal dimensionality of the data
-        #     stronlgy affects the memory consumption
+            # Initiating index
+            # max_elements - the maximum number of elements, should be known beforehand
+            #     (probably will be made optional in the future)
+            #
+            # ef_construction - controls index search speed/build speed tradeoff
+            # M - is tightly connected with internal dimensionality of the data
+            #     strongly affects the memory consumption
 
-        p.init_index(max_elements=num_elements//2, ef_construction=100, M=16)
+            p.init_index(max_elements=num_elements//2, ef_construction=100, M=16)
 
-        # Controlling the recall by setting ef:
-        # higher ef leads to better accuracy, but slower search
-        p.set_ef(20)
+            # Controlling the recall by setting ef:
+            # higher ef leads to better accuracy, but slower search
+            p.set_ef(20)
 
-        p.set_num_threads(idx%8)  # by default using all available cores
+            p.set_num_threads(idx % 8)  # by default using all available cores
 
-        # We split the data in two batches:
-        data1 = data[:num_elements // 2]
-        data2 = data[num_elements // 2:]
+            # We split the data in two batches:
+            data1 = data[:num_elements // 2]
+            data2 = data[num_elements // 2:]
 
-        print("Adding first batch of %d elements" % (len(data1)))
-        p.add_items(data1)
+            print("Adding first batch of %d elements" % (len(data1)))
+            p.add_items(data1)
 
-        # Query the elements for themselves and measure recall:
-        labels, distances = p.knn_query(data1, k=1)
+            # Query the elements for themselves and measure recall:
+            labels, distances = p.knn_query(data1, k=1)
 
-        items = p.get_items(list(range(len(data1))))
+            items = p.get_items(list(range(len(data1))))
 
-        # Check the recall:
-        self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data1))), 1.0, 3)
+            # Check the recall:
+            self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data1))), 1.0, 3)
 
-        # Check that the returned element data is correct:
-        diff_with_gt_labels = np.max(np.abs(data1-items))
-        self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4)
+            # Check that the returned element data is correct:
+            diff_with_gt_labels = np.max(np.abs(data1-items))
+            self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4)
 
-        print("Resizing the index")
-        p.resize_index(num_elements)
+            print("Resizing the index")
+            p.resize_index(num_elements)
 
-        print("Adding the second batch of %d elements" % (len(data2)))
-        p.add_items(data2)
+            print("Adding the second batch of %d elements" % (len(data2)))
+            p.add_items(data2)
 
-        # Query the elements for themselves and measure recall:
-        labels, distances = p.knn_query(data, k=1)
-        items=p.get_items(list(range(num_elements)))
+            # Query the elements for themselves and measure recall:
+            labels, distances = p.knn_query(data, k=1)
+            items=p.get_items(list(range(num_elements)))
 
-        # Check the recall:
-        self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data))), 1.0, 3)
+            # Check the recall:
+            self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data))), 1.0, 3)
 
-        # Check that the returned element data is correct:
-        diff_with_gt_labels=np.max(np.abs(data-items))
-        self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4)
+            # Check that the returned element data is correct:
+            diff_with_gt_labels = np.max(np.abs(data-items))
+            self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4)
 
-        # Checking that all labels are returned correcly:
-        sorted_labels=sorted(p.get_ids_list())
-        self.assertEqual(np.sum(~np.asarray(sorted_labels) == np.asarray(range(num_elements))), 0)
+            # Checking that all labels are returned correctly:
+            sorted_labels = sorted(p.get_ids_list())
+            self.assertEqual(np.sum(~np.asarray(sorted_labels) == np.asarray(range(num_elements))), 0)


=====================================
setup.py
=====================================
@@ -7,7 +7,7 @@ import setuptools
 from setuptools import Extension, setup
 from setuptools.command.build_ext import build_ext
 
-__version__ = '0.5.0'
+__version__ = '0.5.2'
 
 
 include_dirs = [



View it on GitLab: https://salsa.debian.org/med-team/hnswlib/-/commit/8beff38dcf4897e59e0275df2a79a4cdaf1db1d6

-- 
View it on GitLab: https://salsa.debian.org/med-team/hnswlib/-/commit/8beff38dcf4897e59e0275df2a79a4cdaf1db1d6
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20210808/2d491f10/attachment-0001.htm>


More information about the debian-med-commit mailing list