[med-svn] [Git][med-team/hnswlib][upstream] New upstream version 0.9.0

Étienne Mollier (@emollier) gitlab at salsa.debian.org
Mon Mar 30 20:42:58 BST 2026



Étienne Mollier pushed to branch upstream at Debian Med / hnswlib


Commits:
c51c552f by Étienne Mollier at 2026-03-30T21:27:46+02:00
New upstream version 0.9.0
- - - - -


6 changed files:

- .github/workflows/build.yml
- README.md
- hnswlib/bruteforce.h
- hnswlib/hnswalg.h
- python_bindings/bindings.cpp
- setup.py


Changes:

=====================================
.github/workflows/build.yml
=====================================
@@ -4,36 +4,48 @@ on: [push, pull_request]
 
 jobs:
   test_python:
-    runs-on: ${{matrix.os}}
+    runs-on: ${{ matrix.os }}
     strategy:
       matrix:
         os: [ubuntu-latest, windows-latest, macos-latest]
-        python-version: ["3.7", "3.8", "3.9", "3.10"]
+        python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
     steps:
-      - uses: actions/checkout at v3
-      - uses: actions/setup-python at v4
+      - uses: actions/checkout at v6
+      - uses: actions/setup-python at v6
         with:
           python-version: ${{ matrix.python-version }}
           
       - name: Build and install
         run: python -m pip install .
       
-      - name: Test
+      - name: Test examples
+        timeout-minutes: 15
+        run: |
+          # Run example files directly
+          for example in examples/python/example*.py; do
+            echo "Running example: $example"
+            python "$example"
+            echo "---------------------------------------"
+          done
+        shell: bash
+
+      - name: Test bindings
         timeout-minutes: 15
         run: |
-          python -m unittest discover -v --start-directory examples/python --pattern "example*.py"
+          # Run the unittest tests
           python -m unittest discover -v --start-directory tests/python --pattern "bindings_test*.py"
+        shell: bash
   
   test_cpp:
-    runs-on: ${{matrix.os}}
+    runs-on: ${{ matrix.os }}
     strategy:
       matrix:
         os: [ubuntu-latest, windows-latest, macos-latest]
     steps:
-      - uses: actions/checkout at v3
-      - uses: actions/setup-python at v4
+      - uses: actions/checkout at v6
+      - uses: actions/setup-python at v6
         with:
-          python-version: "3.10"
+          python-version: "3.11"
 
       - name: Build
         run: |


=====================================
README.md
=====================================
@@ -3,6 +3,14 @@ Header-only C++ HNSW implementation with python bindings, insertions and updates
 
 **NEWS:**
 
+**version 0.9.0**
+
+* Fixed incorrect results in bruteforce search with filter (#514) by [@lukaszsmolinski](https://github.com/lukaszsmolinski)
+* Fixed missing normalization check in BFIndex (#514) by [@lukaszsmolinski](https://github.com/lukaszsmolinski)
+* Throw an exception when fewer than k elements are available (#514) by [@lukaszsmolinski](https://github.com/lukaszsmolinski)
+* Remove unused variable (#531) by [@lulyon](https://github.com/lulyon)
+* Change cosine similarity to distance in README by [@yurymalkov](https://github.com/yurymalkov)
+
 **version 0.8.0** 
 
 * Multi-vector document search and epsilon search (for now, only in C++)
@@ -25,8 +33,8 @@ Full list of changes: https://github.com/nmslib/hnswlib/pull/523
 ### Highlights:
 1) Lightweight, header-only, no dependencies other than C++ 11
 2) Interfaces for C++, Python, external support for Java and R (https://github.com/jlmelville/rcpphnsw).
-3) Has full support for incremental index construction and updating the elements. Has support for element deletions 
-(by marking them in index). Index is picklable.
+3) Has full support for incremental index construction and updating the elements (thanks to the contribution by Apoorv Sharma). Has support for element deletions 
+(by marking them in index, later can be replaced with other elements). Python index is picklable.
 4) Can work with custom user defined distances (C++).
 5) Significantly less memory footprint and faster build time compared to current nmslib's implementation.
 
@@ -41,7 +49,7 @@ Description of the algorithm parameters can be found in [ALGO_PARAMS.md](ALGO_PA
 | -------------    |:---------------:| -----------------------:|
 |Squared L2        |'l2'             | d = sum((Ai-Bi)^2)      |
 |Inner product     |'ip'             | d = 1.0 - sum(Ai\*Bi)   |
-|Cosine similarity |'cosine'         | d = 1.0 - sum(Ai\*Bi) / sqrt(sum(Ai\*Ai) * sum(Bi\*Bi))|
+|Cosine distance   |'cosine'         | d = 1.0 - sum(Ai\*Bi) / sqrt(sum(Ai\*Ai) * sum(Bi\*Bi))|
 
 Note that inner product is not an actual metric. An element can be closer to some other element than to itself. That allows some speedup if you remove all elements that are not the closest to themselves from the index.
 


=====================================
hnswlib/bruteforce.h
=====================================
@@ -107,27 +107,17 @@ class BruteforceSearch : public AlgorithmInterface<dist_t> {
     searchKnn(const void *query_data, size_t k, BaseFilterFunctor* isIdAllowed = nullptr) const {
         assert(k <= cur_element_count);
         std::priority_queue<std::pair<dist_t, labeltype >> topResults;
-        if (cur_element_count == 0) return topResults;
-        for (int i = 0; i < k; i++) {
+        dist_t lastdist = std::numeric_limits<dist_t>::max();
+        for (int i = 0; i < cur_element_count; i++) {
             dist_t dist = fstdistfunc_(query_data, data_ + size_per_element_ * i, dist_func_param_);
-            labeltype label = *((labeltype*) (data_ + size_per_element_ * i + data_size_));
-            if ((!isIdAllowed) || (*isIdAllowed)(label)) {
-                topResults.emplace(dist, label);
-            }
-        }
-        dist_t lastdist = topResults.empty() ? std::numeric_limits<dist_t>::max() : topResults.top().first;
-        for (int i = k; i < cur_element_count; i++) {
-            dist_t dist = fstdistfunc_(query_data, data_ + size_per_element_ * i, dist_func_param_);
-            if (dist <= lastdist) {
+            if (dist <= lastdist || topResults.size() < k) {
                 labeltype label = *((labeltype *) (data_ + size_per_element_ * i + data_size_));
                 if ((!isIdAllowed) || (*isIdAllowed)(label)) {
                     topResults.emplace(dist, label);
-                }
-                if (topResults.size() > k)
-                    topResults.pop();
-
-                if (!topResults.empty()) {
-                    lastdist = topResults.top().first;
+                    if (topResults.size() > k)
+                        topResults.pop();
+                    if (!topResults.empty())
+                        lastdist = topResults.top().first;
                 }
             }
         }


=====================================
hnswlib/hnswalg.h
=====================================
@@ -684,7 +684,6 @@ class HierarchicalNSW : public AlgorithmInterface<dist_t> {
 
     void saveIndex(const std::string &location) {
         std::ofstream output(location, std::ios::binary);
-        std::streampos position;
 
         writeBinaryPOD(output, offsetLevel0_);
         writeBinaryPOD(output, max_elements_);


=====================================
python_bindings/bindings.cpp
=====================================
@@ -871,16 +871,39 @@ class BFIndex {
             CustomFilterFunctor idFilter(filter);
             CustomFilterFunctor* p_idFilter = filter ? &idFilter : nullptr;
 
-            ParallelFor(0, rows, num_threads, [&](size_t row, size_t threadId) {
-                std::priority_queue<std::pair<dist_t, hnswlib::labeltype >> result = alg->searchKnn(
-                    (void*)items.data(row), k, p_idFilter);
-                for (int i = k - 1; i >= 0; i--) {
-                    auto& result_tuple = result.top();
-                    data_numpy_d[row * k + i] = result_tuple.first;
-                    data_numpy_l[row * k + i] = result_tuple.second;
-                    result.pop();
-                }
-            });
+            if (!normalize) {
+                ParallelFor(0, rows, num_threads, [&](size_t row, size_t threadId) {
+                    std::priority_queue<std::pair<dist_t, hnswlib::labeltype >> result = alg->searchKnn(
+                        (void*)items.data(row), k, p_idFilter);
+                    if (result.size() != k)
+                        throw std::runtime_error(
+                            "Cannot return the results in a contiguous 2D array. There are not enough elements.");
+                    for (int i = k - 1; i >= 0; i--) {
+                        auto& result_tuple = result.top();
+                        data_numpy_d[row * k + i] = result_tuple.first;
+                        data_numpy_l[row * k + i] = result_tuple.second;
+                        result.pop();
+                    }
+                });
+            } else {
+                std::vector<float> norm_array(num_threads * features);
+                ParallelFor(0, rows, num_threads, [&](size_t row, size_t threadId) {
+                    size_t start_idx = threadId * dim;
+                    normalize_vector((float*)items.data(row), norm_array.data() + start_idx);
+
+                    std::priority_queue<std::pair<dist_t, hnswlib::labeltype >> result = alg->searchKnn(
+                        (void*)(norm_array.data() + start_idx), k, p_idFilter);
+                    if (result.size() != k)
+                        throw std::runtime_error(
+                            "Cannot return the results in a contiguous 2D array. There are not enough elements.");
+                    for (int i = k - 1; i >= 0; i--) {
+                        auto& result_tuple = result.top();
+                        data_numpy_d[row * k + i] = result_tuple.first;
+                        data_numpy_l[row * k + i] = result_tuple.second;
+                        result.pop();
+                    }
+                });
+            }
         }
 
         py::capsule free_when_done_l(data_numpy_l, [](void *f) {


=====================================
setup.py
=====================================
@@ -8,7 +8,7 @@ import setuptools
 from setuptools import Extension, setup
 from setuptools.command.build_ext import build_ext
 
-__version__ = '0.8.0'
+__version__ = '0.9.0'
 
 
 include_dirs = [



View it on GitLab: https://salsa.debian.org/med-team/hnswlib/-/commit/c51c552fad4de5b1d26bfa0ea6438479fb5b3ec3

-- 
View it on GitLab: https://salsa.debian.org/med-team/hnswlib/-/commit/c51c552fad4de5b1d26bfa0ea6438479fb5b3ec3
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20260330/a93ea878/attachment-0001.htm>


More information about the debian-med-commit mailing list