[med-svn] [Git][med-team/hnswlib][master] 5 commits: New upstream version 0.5.0
Nilesh Patra (@nilesh)
gitlab at salsa.debian.org
Sun Aug 8 12:26:37 BST 2021
Nilesh Patra pushed to branch master at Debian Med / hnswlib
Commits:
8725d40f by Nilesh Patra at 2021-04-20T17:23:38+05:30
New upstream version 0.5.0
- - - - -
8beff38d by Nilesh Patra at 2021-08-08T16:42:09+05:30
New upstream version 0.5.2
- - - - -
be4bff5d by Nilesh Patra at 2021-08-08T16:42:10+05:30
Update upstream source from tag 'upstream/0.5.2'
Update to upstream version '0.5.2'
with Debian dir e720d68f7b1911c20e29de6d9e4a02c59433f796
- - - - -
848633a7 by Nilesh Patra at 2021-08-08T16:47:42+05:30
Refresh patches
- - - - -
e40c6360 by Nilesh Patra at 2021-08-08T16:56:01+05:30
Interim changelog entry
- - - - -
15 changed files:
- .travis.yml
- README.md
- debian/changelog
- debian/patches/cassert.patch
- debian/patches/noTwine.patch
- examples/pyw_hnswlib.py
- hnswlib/hnswalg.h
- python_bindings/bindings.cpp
- python_bindings/tests/bindings_test.py
- python_bindings/tests/bindings_test_getdata.py
- python_bindings/tests/bindings_test_labels.py
- + python_bindings/tests/bindings_test_metadata.py
- python_bindings/tests/bindings_test_pickle.py
- python_bindings/tests/bindings_test_resize.py
- setup.py
Changes:
=====================================
.travis.yml
=====================================
@@ -9,7 +9,15 @@ jobs:
- name: Linux Python 3.7
os: linux
python: 3.7
-
+
+ - name: Linux Python 3.8
+ os: linux
+ python: 3.8
+
+ - name: Linux Python 3.9
+ os: linux
+ python: 3.9
+
- name: Windows Python 3.6
os: windows
language: shell # 'language: python' is an error on Travis CI Windows
@@ -28,6 +36,24 @@ jobs:
- python --version
env: PATH=/c/Python37:/c/Python37/Scripts:$PATH
+ - name: Windows Python 3.8
+ os: windows
+ language: shell # 'language: python' is an error on Travis CI Windows
+ before_install:
+ - choco install python --version 3.8.0
+ - python -m pip install --upgrade pip
+ - python --version
+ env: PATH=/c/Python38:/c/Python38/Scripts:$PATH
+
+ - name: Windows Python 3.9
+ os: windows
+ language: shell # 'language: python' is an error on Travis CI Windows
+ before_install:
+ - choco install python --version 3.9.0
+ - python -m pip install --upgrade pip
+ - python --version
+ env: PATH=/c/Python39:/c/Python39/Scripts:$PATH
+
install:
- |
python -m pip install .
=====================================
README.md
=====================================
@@ -1,10 +1,11 @@
# Hnswlib - fast approximate nearest neighbor search
-Header-only C++ HNSW implementation with python bindings. Paper's code for the HNSW 200M SIFT experiment
+Header-only C++ HNSW implementation with python bindings.
**NEWS:**
+* **Hnswlib is now 0.5.2**. Bugfixes - thanks [@marekhanus](https://github.com/marekhanus) for fixing the missing arguments, adding support for python 3.8, 3.9 in Travis, improving python wrapper and fixing typos/code style; [@apoorv-sharma](https://github.com/apoorv-sharma) for fixing the bug int the insertion/deletion logic; [@shengjun1985](https://github.com/shengjun1985) for simplifying the memory reallocation logic; [@TakaakiFuruse](https://github.com/TakaakiFuruse) for improved description of `add_items`; [@psobot ](https://github.com/psobot) for improving error handling; [@ShuAiii](https://github.com/ShuAiii) for reporting the bug in the python interface
-* **hnswlib is now 0.5.0. Added support for pickling indices, support for PEP-517 and PEP-518 building, small speedups, bug and documentation fixes. Many thanks to [@dbespalov](https://github.com/dbespalov), [@dyashuni](https://github.com/dyashuni), [@groodt](https://github.com/groodt),[@uestc-lfs](https://github.com/uestc-lfs), [@vinnitu](https://github.com/vinnitu), [@fabiencastan](https://github.com/fabiencastan), [@JinHai-CN](https://github.com/JinHai-CN), [@js1010](https://github.com/js1010)!**
+* **Hnswlib is now 0.5.0**. Added support for pickling indices, support for PEP-517 and PEP-518 building, small speedups, bug and documentation fixes. Many thanks to [@dbespalov](https://github.com/dbespalov), [@dyashuni](https://github.com/dyashuni), [@groodt](https://github.com/groodt),[@uestc-lfs](https://github.com/uestc-lfs), [@vinnitu](https://github.com/vinnitu), [@fabiencastan](https://github.com/fabiencastan), [@JinHai-CN](https://github.com/JinHai-CN), [@js1010](https://github.com/js1010)!
* **Thanks to Apoorv Sharma [@apoorv-sharma](https://github.com/apoorv-sharma), hnswlib now supports true element updates (the interface remained the same, but when you the performance/memory should not degrade as you update the element embeddings).**
@@ -41,18 +42,18 @@ For other spaces use the nmslib library https://github.com/nmslib/nmslib.
* `hnswlib.Index(space, dim)` creates a non-initialized index an HNSW in space `space` with integer dimension `dim`.
`hnswlib.Index` methods:
-* `init_index(max_elements, ef_construction = 200, M = 16, random_seed = 100)` initializes the index from with no elements.
+* `init_index(max_elements, M = 16, ef_construction = 200, random_seed = 100)` initializes the index from with no elements.
* `max_elements` defines the maximum number of elements that can be stored in the structure(can be increased/shrunk).
* `ef_construction` defines a construction time/accuracy trade-off (see [ALGO_PARAMS.md](ALGO_PARAMS.md)).
* `M` defines tha maximum number of outgoing connections in the graph ([ALGO_PARAMS.md](ALGO_PARAMS.md)).
-* `add_items(data, data_labels, num_threads = -1)` - inserts the `data`(numpy array of vectors, shape:`N*dim`) into the structure.
- * `labels` is an optional N-size numpy array of integer labels for all elements in `data`.
+* `add_items(data, ids, num_threads = -1)` - inserts the `data`(numpy array of vectors, shape:`N*dim`) into the structure.
* `num_threads` sets the number of cpu threads to use (-1 means use default).
- * `data_labels` specifies the labels for the data. If index already has the elements with the same labels, their features will be updated. Note that update procedure is slower than insertion of a new element, but more memory- and query-efficient.
+ * `ids` are optional N-size numpy array of integer labels for all elements in `data`.
+ - If index already has the elements with the same labels, their features will be updated. Note that update procedure is slower than insertion of a new element, but more memory- and query-efficient.
* Thread-safe with other `add_items` calls, but not with `knn_query`..
-* `mark_deleted(data_label)` - marks the element as deleted, so it will be omitted from search results.
+* `mark_deleted(label)` - marks the element as deleted, so it will be omitted from search results.
* `resize_index(new_size)` - changes the maximum capacity of the index. Not thread safe with `add_items` and `knn_query`.
@@ -113,7 +114,7 @@ num_elements = 10000
# Generating sample data
data = np.float32(np.random.random((num_elements, dim)))
-data_labels = np.arange(num_elements)
+ids = np.arange(num_elements)
# Declaring index
p = hnswlib.Index(space = 'l2', dim = dim) # possible options are l2, cosine or ip
@@ -122,7 +123,7 @@ p = hnswlib.Index(space = 'l2', dim = dim) # possible options are l2, cosine or
p.init_index(max_elements = num_elements, ef_construction = 200, M = 16)
# Element insertion (can be called several times):
-p.add_items(data, data_labels)
+p.add_items(data, ids)
# Controlling the recall by setting ef:
p.set_ef(50) # ef should always be > k
@@ -295,4 +296,13 @@ To run test **with** updates (from `build` directory)
### References
-Malkov, Yu A., and D. A. Yashunin. "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." TPAMI, preprint: https://arxiv.org/abs/1603.09320
+ at article{malkov2018efficient,
+ title={Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs},
+ author={Malkov, Yu A and Yashunin, Dmitry A},
+ journal={IEEE transactions on pattern analysis and machine intelligence},
+ volume={42},
+ number={4},
+ pages={824--836},
+ year={2018},
+ publisher={IEEE}
+}
=====================================
debian/changelog
=====================================
@@ -1,12 +1,12 @@
-hnswlib (0.5.0-1) UNRELEASED; urgency=medium
+hnswlib (0.5.2-1) UNRELEASED; urgency=medium
* Team Upload.
* Fix watch URL
- * New upstream version 0.5.0
+ * New upstream version 0.5.2
* Declare compliance with policy 4.5.1
* Refresh and fix patches
- -- Nilesh Patra <nilesh at debian.org> Tue, 20 Apr 2021 17:34:27 +0530
+ -- Nilesh Patra <nilesh at debian.org> Sun, 08 Aug 2021 16:42:21 +0530
hnswlib (0.4.0-3) unstable; urgency=medium
=====================================
debian/patches/cassert.patch
=====================================
@@ -2,10 +2,8 @@ Author: Steffen Möller
Last-Update: 2020-09-06 14:39:12 +0200
Description: Add missing "#include <cassert>"
-Index: hnswlib-0.4.0/hnswlib/hnswalg.h
-===================================================================
---- hnswlib-0.4.0.orig/hnswlib/hnswalg.h
-+++ hnswlib-0.4.0/hnswlib/hnswalg.h
+--- a/hnswlib/hnswalg.h
++++ b/hnswlib/hnswalg.h
@@ -1,5 +1,6 @@
#pragma once
=====================================
debian/patches/noTwine.patch
=====================================
@@ -2,10 +2,8 @@ Author: Steffen Möller
Last-Update: 2020-09-06 14:39:12 +0200
Description: Prevent execution of upstream Makefile in python_bindings dir
-Index: hnswlib/python_bindings/Makefile
-===================================================================
---- hnswlib.orig/Makefile
-+++ hnswlib/Makefile
+--- a/Makefile
++++ b/Makefile
@@ -1,3 +1,7 @@
+all:
+ echo E: $(CURDIR)/Makefile should not be executed during Debian build
=====================================
examples/pyw_hnswlib.py
=====================================
@@ -11,8 +11,8 @@ class Index():
self.dict_labels = {}
self.cur_ind = 0
- def init_index(self, max_elements, ef_construction = 200, M = 16):
- self.index.init_index(max_elements = max_elements, ef_construction = ef_construction, M = M)
+ def init_index(self, max_elements, ef_construction=200, M=16):
+ self.index.init_index(max_elements=max_elements, ef_construction=ef_construction, M=M)
def add_items(self, data, ids=None):
if ids is not None:
@@ -55,8 +55,7 @@ class Index():
labels_int, distances = self.index.knn_query(data=data, k=k)
labels = []
for li in labels_int:
- line = []
- for l in li:
- line.append(self.dict_labels[l])
- labels.append(line)
+ labels.append(
+ [self.dict_labels[l] for l in li]
+ )
return labels, distances
=====================================
hnswlib/hnswalg.h
=====================================
@@ -573,29 +573,23 @@ namespace hnswlib {
visited_list_pool_ = new VisitedListPool(1, new_max_elements);
-
element_levels_.resize(new_max_elements);
std::vector<std::mutex>(new_max_elements).swap(link_list_locks_);
// Reallocate base layer
- char * data_level0_memory_new = (char *) malloc(new_max_elements * size_data_per_element_);
+ char * data_level0_memory_new = (char *) realloc(data_level0_memory_, new_max_elements * size_data_per_element_);
if (data_level0_memory_new == nullptr)
throw std::runtime_error("Not enough memory: resizeIndex failed to allocate base layer");
- memcpy(data_level0_memory_new, data_level0_memory_,cur_element_count * size_data_per_element_);
- free(data_level0_memory_);
- data_level0_memory_=data_level0_memory_new;
+ data_level0_memory_ = data_level0_memory_new;
// Reallocate all other layers
- char ** linkLists_new = (char **) malloc(sizeof(void *) * new_max_elements);
+ char ** linkLists_new = (char **) realloc(linkLists_, sizeof(void *) * new_max_elements);
if (linkLists_new == nullptr)
throw std::runtime_error("Not enough memory: resizeIndex failed to allocate other layers");
- memcpy(linkLists_new, linkLists_,cur_element_count * sizeof(void *));
- free(linkLists_);
- linkLists_=linkLists_new;
-
- max_elements_=new_max_elements;
+ linkLists_ = linkLists_new;
+ max_elements_ = new_max_elements;
}
void saveIndex(const std::string &location) {
@@ -987,11 +981,15 @@ namespace hnswlib {
auto search = label_lookup_.find(label);
if (search != label_lookup_.end()) {
tableint existingInternalId = search->second;
-
templock_curr.unlock();
std::unique_lock <std::mutex> lock_el_update(link_list_update_locks_[(existingInternalId & (max_update_element_locks - 1))]);
+
+ if (isMarkedDeleted(existingInternalId)) {
+ unmarkDeletedInternal(existingInternalId);
+ }
updatePoint(data_point, existingInternalId, 1.0);
+
return existingInternalId;
}
=====================================
python_bindings/bindings.cpp
=====================================
@@ -97,6 +97,8 @@ public:
else if(space_name=="cosine") {
l2space = new hnswlib::InnerProductSpace(dim);
normalize=true;
+ } else {
+ throw new std::runtime_error("Space name must be one of l2, ip, or cosine.");
}
appr_alg = NULL;
ep_added = true;
@@ -162,6 +164,7 @@ public:
}
appr_alg = new hnswlib::HierarchicalNSW<dist_t>(l2space, path_to_index, false, max_elements);
cur_l = appr_alg->cur_element_count;
+ index_inited = true;
}
void normalize_vector(float *data, float *norm_array){
@@ -670,6 +673,8 @@ PYBIND11_PLUGIN(hnswlib) {
.def("load_index", &Index<float>::loadIndex, py::arg("path_to_index"), py::arg("max_elements")=0)
.def("mark_deleted", &Index<float>::markDeleted, py::arg("label"))
.def("resize_index", &Index<float>::resizeIndex, py::arg("new_size"))
+ .def("get_max_elements", &Index<float>::getMaxElements)
+ .def("get_current_count", &Index<float>::getCurrentCount)
.def_readonly("space", &Index<float>::space_name)
.def_readonly("dim", &Index<float>::dim)
.def_readwrite("num_threads", &Index<float>::num_threads_default)
=====================================
python_bindings/tests/bindings_test.py
=====================================
@@ -18,15 +18,15 @@ class RandomSelfTestCase(unittest.TestCase):
# Declaring index
p = hnswlib.Index(space='l2', dim=dim) # possible options are l2, cosine or ip
- # Initing index
+ # Initiating index
# max_elements - the maximum number of elements, should be known beforehand
# (probably will be made optional in the future)
#
# ef_construction - controls index search speed/build speed tradeoff
# M - is tightly connected with internal dimensionality of the data
- # stronlgy affects the memory consumption
+ # strongly affects the memory consumption
- p.init_index(max_elements = num_elements, ef_construction = 100, M = 16)
+ p.init_index(max_elements=num_elements, ef_construction=100, M=16)
# Controlling the recall by setting ef:
# higher ef leads to better accuracy, but slower search
@@ -51,7 +51,7 @@ class RandomSelfTestCase(unittest.TestCase):
p.save_index(index_path)
del p
- # Reiniting, loading the index
+ # Re-initiating, loading the index
p = hnswlib.Index(space='l2', dim=dim) # you can change the sa
print("\nLoading index from '%s'\n" % index_path)
=====================================
python_bindings/tests/bindings_test_getdata.py
=====================================
@@ -19,13 +19,13 @@ class RandomSelfTestCase(unittest.TestCase):
# Declaring index
p = hnswlib.Index(space='l2', dim=dim) # possible options are l2, cosine or ip
- # Initing index
+ # Initiating index
# max_elements - the maximum number of elements, should be known beforehand
# (probably will be made optional in the future)
#
# ef_construction - controls index search speed/build speed tradeoff
# M - is tightly connected with internal dimensionality of the data
- # stronlgy affects the memory consumption
+ # strongly affects the memory consumption
p.init_index(max_elements=num_elements, ef_construction=100, M=16)
=====================================
python_bindings/tests/bindings_test_labels.py
=====================================
@@ -21,13 +21,13 @@ class RandomSelfTestCase(unittest.TestCase):
# Declaring index
p = hnswlib.Index(space='l2', dim=dim) # possible options are l2, cosine or ip
- # Initing index
+ # Initiating index
# max_elements - the maximum number of elements, should be known beforehand
# (probably will be made optional in the future)
#
# ef_construction - controls index search speed/build speed tradeoff
# M - is tightly connected with internal dimensionality of the data
- # stronlgy affects the memory consumption
+ # strongly affects the memory consumption
p.init_index(max_elements=num_elements, ef_construction=100, M=16)
@@ -47,7 +47,7 @@ class RandomSelfTestCase(unittest.TestCase):
# Query the elements for themselves and measure recall:
labels, distances = p.knn_query(data1, k=1)
- items=p.get_items(labels)
+ items = p.get_items(labels)
# Check the recall:
self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data1))), 1.0, 3)
@@ -67,8 +67,8 @@ class RandomSelfTestCase(unittest.TestCase):
print("Deleted")
print("\n**** Mark delete test ****\n")
- # Reiniting, loading the index
- print("Reiniting")
+ # Re-initiating, loading the index
+ print("Re-initiating")
p = hnswlib.Index(space='l2', dim=dim)
print("\nLoading index from '%s'\n" % index_path)
@@ -80,17 +80,17 @@ class RandomSelfTestCase(unittest.TestCase):
# Query the elements for themselves and measure recall:
labels, distances = p.knn_query(data, k=1)
- items=p.get_items(labels)
+ items = p.get_items(labels)
# Check the recall:
self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data))), 1.0, 3)
# Check that the returned element data is correct:
- diff_with_gt_labels=np.mean(np.abs(data-items))
+ diff_with_gt_labels = np.mean(np.abs(data-items))
self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4) # deleting index.
# Checking that all labels are returned correctly:
- sorted_labels=sorted(p.get_ids_list())
+ sorted_labels = sorted(p.get_ids_list())
self.assertEqual(np.sum(~np.asarray(sorted_labels) == np.asarray(range(num_elements))), 0)
# Delete data1
=====================================
python_bindings/tests/bindings_test_metadata.py
=====================================
@@ -0,0 +1,49 @@
+import unittest
+
+import numpy as np
+
+import hnswlib
+
+
+class RandomSelfTestCase(unittest.TestCase):
+ def testMetadata(self):
+
+ dim = 16
+ num_elements = 10000
+
+ # Generating sample data
+ data = np.float32(np.random.random((num_elements, dim)))
+
+ # Declaring index
+ p = hnswlib.Index(space='l2', dim=dim) # possible options are l2, cosine or ip
+
+ # Initing index
+ # max_elements - the maximum number of elements, should be known beforehand
+ # (probably will be made optional in the future)
+ #
+ # ef_construction - controls index search speed/build speed tradeoff
+ # M - is tightly connected with internal dimensionality of the data
+ # stronlgy affects the memory consumption
+
+ p.init_index(max_elements=num_elements, ef_construction=100, M=16)
+
+ # Controlling the recall by setting ef:
+ # higher ef leads to better accuracy, but slower search
+ p.set_ef(100)
+
+ p.set_num_threads(4) # by default using all available cores
+
+ print("Adding all elements (%d)" % (len(data)))
+ p.add_items(data)
+
+ # test methods
+ self.assertEqual(p.get_max_elements(), num_elements)
+ self.assertEqual(p.get_current_count(), num_elements)
+
+ # test properties
+ self.assertEqual(p.space, 'l2')
+ self.assertEqual(p.dim, dim)
+ self.assertEqual(p.M, 16)
+ self.assertEqual(p.ef_construction, 100)
+ self.assertEqual(p.max_elements, num_elements)
+ self.assertEqual(p.element_count, num_elements)
=====================================
python_bindings/tests/bindings_test_pickle.py
=====================================
@@ -60,38 +60,38 @@ def test_space_main(self, space, dim):
p.num_threads = self.num_threads # by default using all available cores
- p0 = pickle.loads(pickle.dumps(p)) ### pickle un-initialized Index
+ p0 = pickle.loads(pickle.dumps(p)) # pickle un-initialized Index
p.init_index(max_elements=self.num_elements, ef_construction=self.ef_construction, M=self.M)
p0.init_index(max_elements=self.num_elements, ef_construction=self.ef_construction, M=self.M)
p.ef = self.ef
p0.ef = self.ef
- p1 = pickle.loads(pickle.dumps(p)) ### pickle Index before adding items
+ p1 = pickle.loads(pickle.dumps(p)) # pickle Index before adding items
- ### add items to ann index p,p0,p1
+ # add items to ann index p,p0,p1
p.add_items(data)
p1.add_items(data)
p0.add_items(data)
- p2=pickle.loads(pickle.dumps(p)) ### pickle Index before adding items
+ p2=pickle.loads(pickle.dumps(p)) # pickle Index before adding items
self.assertTrue(np.allclose(p.get_items(), p0.get_items()), "items for p and p0 must be same")
self.assertTrue(np.allclose(p0.get_items(), p1.get_items()), "items for p0 and p1 must be same")
self.assertTrue(np.allclose(p1.get_items(), p2.get_items()), "items for p1 and p2 must be same")
- ### Test if returned distances are same
+ # Test if returned distances are same
l, d = p.knn_query(test_data, k=self.k)
l0, d0 = p0.knn_query(test_data, k=self.k)
l1, d1 = p1.knn_query(test_data, k=self.k)
l2, d2 = p2.knn_query(test_data, k=self.k)
- self.assertLessEqual(np.sum(((d-d0)**2.)>1e-3), self.dists_err_thresh, msg=f"knn distances returned by p and p0 must match")
- self.assertLessEqual(np.sum(((d0-d1)**2.)>1e-3), self.dists_err_thresh, msg=f"knn distances returned by p0 and p1 must match")
- self.assertLessEqual(np.sum(((d1-d2)**2.)>1e-3), self.dists_err_thresh, msg=f"knn distances returned by p1 and p2 must match")
+ self.assertLessEqual(np.sum(((d-d0)**2.) > 1e-3), self.dists_err_thresh, msg=f"knn distances returned by p and p0 must match")
+ self.assertLessEqual(np.sum(((d0-d1)**2.) > 1e-3), self.dists_err_thresh, msg=f"knn distances returned by p0 and p1 must match")
+ self.assertLessEqual(np.sum(((d1-d2)**2.) > 1e-3), self.dists_err_thresh, msg=f"knn distances returned by p1 and p2 must match")
- ### check if ann results match brute-force search
- ### allow for 2 labels to be missing from ann results
+ # check if ann results match brute-force search
+ # allow for 2 labels to be missing from ann results
check_ann_results(self, space, data, test_data, self.k, l, d,
err_thresh=self.label_err_thresh,
total_thresh=self.item_err_thresh,
@@ -102,19 +102,19 @@ def test_space_main(self, space, dim):
total_thresh=self.item_err_thresh,
dists_thresh=self.dists_err_thresh)
- ### Check ef parameter value
+ # Check ef parameter value
self.assertEqual(p.ef, self.ef, "incorrect value of p.ef")
self.assertEqual(p0.ef, self.ef, "incorrect value of p0.ef")
self.assertEqual(p2.ef, self.ef, "incorrect value of p2.ef")
self.assertEqual(p1.ef, self.ef, "incorrect value of p1.ef")
- ### Check M parameter value
+ # Check M parameter value
self.assertEqual(p.M, self.M, "incorrect value of p.M")
self.assertEqual(p0.M, self.M, "incorrect value of p0.M")
self.assertEqual(p1.M, self.M, "incorrect value of p1.M")
self.assertEqual(p2.M, self.M, "incorrect value of p2.M")
- ### Check ef_construction parameter value
+ # Check ef_construction parameter value
self.assertEqual(p.ef_construction, self.ef_construction, "incorrect value of p.ef_construction")
self.assertEqual(p0.ef_construction, self.ef_construction, "incorrect value of p0.ef_construction")
self.assertEqual(p1.ef_construction, self.ef_construction, "incorrect value of p1.ef_construction")
@@ -135,12 +135,12 @@ class PickleUnitTests(unittest.TestCase):
self.num_threads = 4
self.k = 25
- self.label_err_thresh = 5 ### max number of missing labels allowed per test item
- self.item_err_thresh = 5 ### max number of items allowed with incorrect labels
+ self.label_err_thresh = 5 # max number of missing labels allowed per test item
+ self.item_err_thresh = 5 # max number of items allowed with incorrect labels
- self.dists_err_thresh = 50 ### for two matrices, d1 and d2, dists_err_thresh controls max
- ### number of value pairs that are allowed to be different in d1 and d2
- ### i.e., number of values that are (d1-d2)**2>1e-3
+ self.dists_err_thresh = 50 # for two matrices, d1 and d2, dists_err_thresh controls max
+ # number of value pairs that are allowed to be different in d1 and d2
+ # i.e., number of values that are (d1-d2)**2>1e-3
def test_inner_product_space(self):
test_space_main(self, 'ip', 48)
=====================================
python_bindings/tests/bindings_test_resize.py
=====================================
@@ -7,71 +7,71 @@ import hnswlib
class RandomSelfTestCase(unittest.TestCase):
def testRandomSelf(self):
- for idx in range(16):
- print("\n**** Index resize test ****\n")
+ for idx in range(16):
+ print("\n**** Index resize test ****\n")
- np.random.seed(idx)
- dim = 16
- num_elements = 10000
+ np.random.seed(idx)
+ dim = 16
+ num_elements = 10000
- # Generating sample data
- data = np.float32(np.random.random((num_elements, dim)))
+ # Generating sample data
+ data = np.float32(np.random.random((num_elements, dim)))
- # Declaring index
- p = hnswlib.Index(space='l2', dim=dim) # possible options are l2, cosine or ip
+ # Declaring index
+ p = hnswlib.Index(space='l2', dim=dim) # possible options are l2, cosine or ip
- # Initing index
- # max_elements - the maximum number of elements, should be known beforehand
- # (probably will be made optional in the future)
- #
- # ef_construction - controls index search speed/build speed tradeoff
- # M - is tightly connected with internal dimensionality of the data
- # stronlgy affects the memory consumption
+ # Initiating index
+ # max_elements - the maximum number of elements, should be known beforehand
+ # (probably will be made optional in the future)
+ #
+ # ef_construction - controls index search speed/build speed tradeoff
+ # M - is tightly connected with internal dimensionality of the data
+ # strongly affects the memory consumption
- p.init_index(max_elements=num_elements//2, ef_construction=100, M=16)
+ p.init_index(max_elements=num_elements//2, ef_construction=100, M=16)
- # Controlling the recall by setting ef:
- # higher ef leads to better accuracy, but slower search
- p.set_ef(20)
+ # Controlling the recall by setting ef:
+ # higher ef leads to better accuracy, but slower search
+ p.set_ef(20)
- p.set_num_threads(idx%8) # by default using all available cores
+ p.set_num_threads(idx % 8) # by default using all available cores
- # We split the data in two batches:
- data1 = data[:num_elements // 2]
- data2 = data[num_elements // 2:]
+ # We split the data in two batches:
+ data1 = data[:num_elements // 2]
+ data2 = data[num_elements // 2:]
- print("Adding first batch of %d elements" % (len(data1)))
- p.add_items(data1)
+ print("Adding first batch of %d elements" % (len(data1)))
+ p.add_items(data1)
- # Query the elements for themselves and measure recall:
- labels, distances = p.knn_query(data1, k=1)
+ # Query the elements for themselves and measure recall:
+ labels, distances = p.knn_query(data1, k=1)
- items = p.get_items(list(range(len(data1))))
+ items = p.get_items(list(range(len(data1))))
- # Check the recall:
- self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data1))), 1.0, 3)
+ # Check the recall:
+ self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data1))), 1.0, 3)
- # Check that the returned element data is correct:
- diff_with_gt_labels = np.max(np.abs(data1-items))
- self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4)
+ # Check that the returned element data is correct:
+ diff_with_gt_labels = np.max(np.abs(data1-items))
+ self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4)
- print("Resizing the index")
- p.resize_index(num_elements)
+ print("Resizing the index")
+ p.resize_index(num_elements)
- print("Adding the second batch of %d elements" % (len(data2)))
- p.add_items(data2)
+ print("Adding the second batch of %d elements" % (len(data2)))
+ p.add_items(data2)
- # Query the elements for themselves and measure recall:
- labels, distances = p.knn_query(data, k=1)
- items=p.get_items(list(range(num_elements)))
+ # Query the elements for themselves and measure recall:
+ labels, distances = p.knn_query(data, k=1)
+ items=p.get_items(list(range(num_elements)))
- # Check the recall:
- self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data))), 1.0, 3)
+ # Check the recall:
+ self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data))), 1.0, 3)
- # Check that the returned element data is correct:
- diff_with_gt_labels=np.max(np.abs(data-items))
- self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4)
+ # Check that the returned element data is correct:
+ diff_with_gt_labels = np.max(np.abs(data-items))
+ self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4)
- # Checking that all labels are returned correcly:
- sorted_labels=sorted(p.get_ids_list())
- self.assertEqual(np.sum(~np.asarray(sorted_labels) == np.asarray(range(num_elements))), 0)
+ # Checking that all labels are returned correctly:
+ sorted_labels = sorted(p.get_ids_list())
+ self.assertEqual(np.sum(~np.asarray(sorted_labels) == np.asarray(range(num_elements))), 0)
=====================================
setup.py
=====================================
@@ -7,7 +7,7 @@ import setuptools
from setuptools import Extension, setup
from setuptools.command.build_ext import build_ext
-__version__ = '0.5.0'
+__version__ = '0.5.2'
include_dirs = [
View it on GitLab: https://salsa.debian.org/med-team/hnswlib/-/compare/a91efae2475c29662db504938b0421dbc63bb2a0...e40c6360504691b494eb2d4374278e9ed1f8effc
--
View it on GitLab: https://salsa.debian.org/med-team/hnswlib/-/compare/a91efae2475c29662db504938b0421dbc63bb2a0...e40c6360504691b494eb2d4374278e9ed1f8effc
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20210808/cd1e40a9/attachment-0001.htm>
More information about the debian-med-commit
mailing list