[med-svn] [Git][med-team/umap-learn][upstream] New upstream version 0.5.4+dfsg

Andreas Tille (@tille) gitlab at salsa.debian.org
Thu Nov 9 05:56:31 GMT 2023



Andreas Tille pushed to branch upstream at Debian Med / umap-learn


Commits:
f8f44a7d by Andreas Tille at 2023-11-08T21:31:39+01:00
New upstream version 0.5.4+dfsg
- - - - -


25 changed files:

- .gitignore
- + .idea/umap-nan.iml
- CONTRIBUTING.md
- README.rst
- azure-pipelines.yml
- docs_requirements.txt
- − requirements.txt
- setup.py
- umap/aligned_umap.py
- umap/distances.py
- umap/layouts.py
- umap/parametric_umap.py
- umap/plot.py
- umap/spectral.py
- umap/tests/test_aligned_umap.py
- umap/tests/test_composite_models.py
- + umap/tests/test_data_input.py
- + umap/tests/test_spectral.py
- + umap/tests/test_umap_get_feature_names_out.py
- umap/tests/test_umap_metrics.py
- umap/tests/test_umap_on_iris.py
- umap/tests/test_umap_ops.py
- umap/tests/test_umap_trustworthiness.py
- umap/umap_.py
- umap/utils.py


Changes:

=====================================
.gitignore
=====================================
@@ -21,4 +21,8 @@ venv
 *__pycache__
 
 # metadata from pip-installing repo
-umap_learn.egg-info
\ No newline at end of file
+umap_learn.egg-info
+
+# docs
+doc/auto_examples
+doc/_build
\ No newline at end of file


=====================================
.idea/umap-nan.iml
=====================================
@@ -0,0 +1,15 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<module type="PYTHON_MODULE" version="4">
+  <component name="NewModuleRootManager">
+    <content url="file://$MODULE_DIR$" />
+    <orderEntry type="jdk" jdkName="Python 3.10 (umap-nan)" jdkType="Python SDK" />
+    <orderEntry type="sourceFolder" forTests="false" />
+  </component>
+  <component name="PyDocumentationSettings">
+    <option name="format" value="NUMPY" />
+    <option name="myDocStringFormat" value="NumPy" />
+  </component>
+  <component name="TestRunnerService">
+    <option name="PROJECT_TEST_RUNNER" value="py.test" />
+  </component>
+</module>
\ No newline at end of file


=====================================
CONTRIBUTING.md
=====================================
@@ -24,6 +24,23 @@ especially appreciated. To contribute to the documentation please
 [fork the project](https://github.com/lmcinnes/umap/issues#fork-destination-box)
 into your own repository, make changes there, and then submit a pull request.
 
+### Building the Documentation Locally
+
+To build the docs locally, install the documentation tools requirements:
+
+```bash
+pip install -r docs_requirements.txt
+```
+
+Then run:
+
+```bash
+sphinx-build -b html doc doc/_build
+```
+
+This will build the documentation in HTML format. You will be able to find the output
+in the `doc/_build` folder.
+
 ## Code
 
 Code contributions are always welcome, from simple bug fixes, to new features. To


=====================================
README.rst
=====================================
@@ -72,10 +72,10 @@ Documentation is `available via Read the Docs <https://umap-learn.readthedocs.io
 
 **New: this package now also provides support for densMAP.** The densMAP algorithm augments UMAP
 to preserve local density information in addition to the topological structure of the data.
-Details of this method are described in the following `paper <https://doi.org/10.1101/2020.05.12.077776>`_:
+Details of this method are described in the following `paper <https://doi.org/10.1038/s41587-020-00801-7>`_:
 
-Narayan, A, Berger, B, Cho, H, *Density-Preserving Data Visualization Unveils
-Dynamic Patterns of Single-Cell Transcriptomic Variability*, bioRxiv, 2020
+Narayan, A, Berger, B, Cho, H, *Assessing Single-Cell Transcriptomic Variability
+through Density-Preserving Data Visualization*, Nature Biotechnology, 2021
 
 ----------
 Installing
@@ -94,21 +94,17 @@ Requirements:
 * scikit-learn
 * numba
 * tqdm
+* `pynndescent <https://github.com/lmcinnes/pynndescent>`_
 
 Recommended packages:
 
-* `pynndescent <https://github.com/lmcinnes/pynndescent>`_
 * For plotting
    * matplotlib
    * datashader
    * holoviews
-* for Parametric UMAP  
+* for Parametric UMAP
    * tensorflow > 2.0.0
 
-
-Installing pynndescent can significantly increase performance, and in later versions
-it will become a hard dependency.
-
 **Install Options**
 
 Conda install, via the excellent work of the conda-forge team:
@@ -135,8 +131,8 @@ If you wish to use the plotting functionality you can use
 to install all the plotting dependencies.
 
 If you wish to use Parametric UMAP, you need to install Tensorflow, which can be
-installed either using the instructions at https://www.tensorflow.org/install 
-(reccomended) or using 
+installed either using the instructions at https://www.tensorflow.org/install
+(reccomended) or using
 
 .. code:: bash
 
@@ -163,23 +159,17 @@ For a manual install get this package:
     rm master.zip
     cd umap-master
 
-Install the requirements
-
-.. code:: bash
-
-    sudo pip install -r requirements.txt
-
-or
+Optionally, install the requirements through Conda:
 
 .. code:: bash
 
     conda install scikit-learn numba
 
-Install the package
+Then install the package
 
 .. code:: bash
 
-    python setup.py install
+    python -m pip install -e .
 
 ---------------
 How to use UMAP
@@ -297,7 +287,7 @@ For a problem such as the 784-dimensional MNIST digits dataset with
 compared with around 45 minutes for scikit-learn's t-SNE implementation).
 Despite this runtime efficiency, UMAP still produces high quality embeddings.
 
-The obligatory MNIST digits dataset, embedded in 42 
+The obligatory MNIST digits dataset, embedded in 42
 seconds (with pynndescent installed and after numba jit warmup)
 using a 3.1 GHz Intel Core i7 processor (n_neighbors=10, min_dist=0.001):
 
@@ -466,12 +456,12 @@ Additionally, if you use the densMAP algorithm in your work please cite the foll
 
     @article {NBC2020,
         author = {Narayan, Ashwin and Berger, Bonnie and Cho, Hyunghoon},
-        title = {Density-Preserving Data Visualization Unveils Dynamic Patterns of Single-Cell Transcriptomic Variability},
-        journal = {bioRxiv},
-        year = {2020},
-        doi = {10.1101/2020.05.12.077776},
-        publisher = {Cold Spring Harbor Laboratory},
-        URL = {https://www.biorxiv.org/content/early/2020/05/14/2020.05.12.077776},
+        title = {Assessing Single-Cell Transcriptomic Variability through Density-Preserving Data Visualization},
+        journal = {Nature Biotechnology},
+        year = {2021},
+        doi = {10.1038/s41587-020-00801-7},
+        publisher = {Springer Nature},
+        URL = {https://doi.org/10.1038/s41587-020-00801-7},
         eprint = {https://www.biorxiv.org/content/early/2020/05/14/2020.05.12.077776.full.pdf},
     }
 
@@ -479,7 +469,7 @@ If you use the Parametric UMAP algorithm in your work please cite the following
 
 .. code:: bibtex
 
-    @article {NBC2020,
+    @article {SMG2020,
         author = {Sainburg, Tim and McInnes, Leland and Gentner, Timothy Q.},
         title = {Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning},
         journal = {ArXiv e-prints},


=====================================
azure-pipelines.yml
=====================================
@@ -1,182 +1,150 @@
-# Python package
-# Create and test a Python package on multiple Python versions.
-# Add steps that analyze code, save the dist with the build record, publish to a PyPI-compatible index, and more:
-# https://docs.microsoft.com/azure/devops/pipelines/languages/python
-
+# Trigger a build when there is a push to the main branch or a tag starts with release-
 trigger:
-- master
-
-jobs:
-  - job: Linux
-    pool:
-      vmImage: 'ubuntu-latest'
-    strategy:
-      matrix:
-        Python36:
-          python.version: '3.6'
-        Python37:
-          python.version: '3.7'
-        Python38:
-          python.version: '3.8'
-        Python39:
-          python.version: '3.9'
-
-    steps:
-      - task: UsePythonVersion at 0
-        inputs:
-          versionSpec: '$(python.version)'
-        displayName: 'Use Python $(python.version)'
-
-      - script: |
-          python -m pip install --upgrade pip
-          pip install -r requirements.txt
-        displayName: 'Install dependencies'
-
-      - script: |
-          pip install -e .
-          pip install .[plot]
-          pip install .[parametric_umap]
-        displayName: 'Install package'
-
-      - script: |
-          pip install pytest pytest-benchmark
-          pytest --show-capture=no -v --disable-warnings --junitxml=pytest.xml
-        displayName: 'Run tests'
-
-      - task: PublishTestResults at 2
-        inputs:
-          testResultsFiles: 'pytest.xml'
-          testRunTitle: '$(Agent.OS) - $(Build.BuildNumber)[$(Agent.JobName)] - Python $(python.version)'
-        condition: succeededOrFailed()
-
-  - job: MacOS
-    pool:
-      vmImage: 'macOS-latest'
-    strategy:
-      matrix:
-        Python37:
-          python.version: '3.7'
-        Python38:
-          python.version: '3.8'
-        Python39:
-          python.version: '3.9'
-
-    steps:
-      - task: UsePythonVersion at 0
-        inputs:
-          versionSpec: '$(python.version)'
-        displayName: 'Use Python $(python.version)'
-
-      - script: |
-          python -m pip install --upgrade pip
-          pip install -r requirements.txt
-        displayName: 'Install dependencies'
-
-      - script: |
-          pip install -e .
-          pip install .[plot]
-          pip install .[parametric_umap]
-        displayName: 'Install package'
-
-      - script: |
-          pip install pytest pytest-benchmark
-          pytest --show-capture=no -v --disable-warnings --junitxml=pytest.xml
-        displayName: 'Run tests'
-
-      - task: PublishTestResults at 2
-        inputs:
-          testResultsFiles: 'pytest.xml'
-          testRunTitle: '$(Agent.OS) - $(Build.BuildNumber)[$(Agent.JobName)] - Python $(python.version)'
-        condition: succeededOrFailed()
-
-  - job: Windows
-    pool:
-      vmImage: 'windows-latest'
-    strategy:
-      matrix:
-        Python36:
-          python.version: '3.6'
-        Python37:
-          python.version: '3.7'
-        Python38:
-          python.version: '3.8'
-        Python39:
-          python.version: '3.9'
-
-    steps:
-      - task: UsePythonVersion at 0
-        inputs:
-          versionSpec: '$(python.version)'
-        displayName: 'Use Python $(python.version)'
-
-      - script: |
-          python -m pip install --upgrade pip
-          pip install -r requirements.txt
-        displayName: 'Install dependencies'
-
-      - script: |
-          pip install -e .
-          pip install .[plot]
-          pip install .[parametric_umap]
-        displayName: 'Install package'
-
-      - script: |
-          pip install pytest pytest-benchmark
-          pytest --show-capture=no -v --disable-warnings --junitxml=pytest.xml
-        displayName: 'Run tests'
-
-      - task: PublishTestResults at 2
-        inputs:
-          testResultsFiles: 'pytest.xml'
-          testRunTitle: '$(Agent.OS) - $(Build.BuildNumber)[$(Agent.JobName)] - Python $(python.version)'
-        condition: succeededOrFailed()
-
-  - job: Coverage
-    pool:
-      vmImage: 'ubuntu-latest'
-    strategy:
-      matrix:
-        Python38:
-          python.version: '3.8'
-
-    steps:
-      - task: UsePythonVersion at 0
-        inputs:
-          versionSpec: '$(python.version)'
-        displayName: 'Use Python $(python.version)'
-
-      - script: |
-          python -m pip install --upgrade pip
-          pip install -r requirements.txt
-        displayName: 'Install dependencies'
-
-      - script: |
-          pip install -e .
-          pip install .[plot]
-          pip install .[parametric_umap]
-          pip install pytest pytest-benchmark
-          pip install pytest-cov
-          pip install coveralls
-        displayName: 'Install package'
-
-      - script: |
-          export NUMBA_DISABLE_JIT=1
-          pytest umap/tests --show-capture=no -v --disable-warnings --junitxml=pytest.xml --cov=umap/ --cov-report=xml --cov-report=html
-        displayName: 'Run tests'
-
-      - script: |
-          export COVERALLS_REPO_TOKEN=$(COVERALLS_REPO_TOKEN)
-          coveralls
-        displayName: 'Publish to coveralls'
-
-      - task: PublishTestResults at 2
-        inputs:
-          testResultsFiles: 'coverage.xml'
-          testRunTitle: '$(Agent.OS) - $(Build.BuildNumber)[$(Agent.JobName)] - Python $(python.version)'
-        condition: succeededOrFailed()
-
-      - task: PublishCodeCoverageResults at 1
-        inputs:
-          codeCoverageTool: Cobertura
-          summaryFileLocation: '$(System.DefaultWorkingDirectory)/**/coverage.xml'
-          reportDirectory: '$(System.DefaultWorkingDirectory)/**/htmlcov'
+  branches:
+    include:
+    - master
+  tags:
+    include:
+    - '*'
+
+# Trigger a build when there is a pull request to the main branch
+# Ignore PRs that are just updating the docs
+pr:
+  branches:
+    include:
+    - master
+    exclude:
+    - doc/*
+    - README.rst
+
+variables:
+  triggeredByPullRequest: $[eq(variables['Build.Reason'], 'PullRequest')]
+
+stages:
+  - stage: RunAllTests
+    displayName: Run test suite
+    jobs:
+      - job: run_platform_tests
+        strategy:
+          matrix:
+            mac_py37:
+              imageName: 'macOS-latest'
+              python.version: '3.7'
+            linux_py37:
+              imageName: 'ubuntu-latest'
+              python.version: '3.7'
+            windows_py37:
+              imageName: 'windows-latest'
+              python.version: '3.7'
+            mac_py38:
+              imageName: 'macOS-latest'
+              python.version: '3.8'
+            linux_py38:
+              imageName: 'ubuntu-latest'
+              python.version: '3.8'
+            windows_py38:
+              imageName: 'windows-latest'
+              python.version: '3.8'
+            mac_py39:
+              imageName: 'macOS-latest'
+              python.version: '3.9'
+            linux_py39:
+              imageName: 'ubuntu-latest'
+              python.version: '3.9'
+            windows_py39:
+              imageName: 'windows-latest'
+              python.version: '3.9'
+            mac_py310:
+              imageName: 'macOS-latest'
+              python.version: '3.10'
+            linux_py310:
+              imageName: 'ubuntu-latest'
+              python.version: '3.10'
+            windows_py310:
+              imageName: 'windows-latest'
+              python.version: '3.10'
+
+        pool:
+          vmImage: $(imageName)
+
+        steps:
+        - task: UsePythonVersion at 0
+          inputs:
+            versionSpec: '$(python.version)'
+          displayName: 'Use Python $(python.version)'
+
+        - script: |
+            python -m pip install --upgrade pip
+            pip install -e .
+            pip install .[plot]
+            pip install .[parametric_umap]
+            pip install pytest  pytest-azurepipelines pytest-cov pytest-benchmark coveralls
+          displayName: 'Install package'
+
+        - script: |
+            export NUMBA_DISABLE_JIT=1
+            pytest umap/tests --show-capture=no -v --disable-warnings --junitxml=junit/test-results.xml --cov=umap/ --cov-report=xml --cov-report=html
+          displayName: 'Run tests'
+
+        - bash: |
+            coveralls
+          displayName: 'Publish to coveralls'
+          condition: and(succeeded(), eq(variables.triggeredByPullRequest, false)) # Don't run this for PRs because they can't access pipeline secrets
+          env:
+            COVERALLS_REPO_TOKEN: $(COVERALLS_TOKEN)
+
+        - task: PublishTestResults at 2
+          inputs:
+            testResultsFiles: '$(System.DefaultWorkingDirectory)/**/coverage.xml'
+            testRunTitle: '$(Agent.OS) - $(Build.BuildNumber)[$(Agent.JobName)] - Python $(python.version)'
+          condition: succeededOrFailed()
+
+  - stage: BuildPublishArtifact
+    dependsOn: RunAllTests
+    condition: and(succeeded(), startsWith(variables['Build.SourceBranch'], 'refs/tags/'), eq(variables.triggeredByPullRequest, false))
+    jobs:
+      - job: BuildArtifacts
+        displayName: Build source dists and wheels    
+        pool:
+          vmImage: 'ubuntu-latest'
+        steps:
+        - task: UsePythonVersion at 0
+          inputs:
+            versionSpec: '3.10'
+          displayName: 'Use Python 3.10'
+
+        - script: |
+            python -m pip install --upgrade pip
+            pip install wheel
+            pip install -e .
+          displayName: 'Install package locally'
+        
+        - bash: |
+            python setup.py sdist bdist_wheel
+            ls -l dist/
+          displayName: 'Build package'
+
+        - bash: |
+            export PACKAGE_VERSION="$(python setup.py --version)"
+            echo "Package Version: ${PACKAGE_VERSION}"
+            echo "##vso[task.setvariable variable=packageVersionFormatted;]${PACKAGE_VERSION}"
+          displayName: 'Get package version'
+
+        - script: |
+            echo "Version in git tag $(Build.SourceBranchName) does not match version derived from setup.py $(packageVersionFormatted)"
+            exit 1
+          displayName: Raise error if version doesnt match tag
+          condition: and(succeeded(), ne(variables['Build.SourceBranchName'], variables['packageVersionFormatted']))
+
+        - task: DownloadSecureFile at 1
+          name: PYPIRC_CONFIG
+          displayName: 'Download pypirc'
+          inputs:
+            secureFile: 'pypirc'  
+
+        - script: |
+            pip install twine
+            twine upload --repository testpypi --config-file $(PYPIRC_CONFIG.secureFilePath) dist/*
+          displayName: 'Upload to PyPI'
+          condition: and(succeeded(), eq(variables['Build.SourceBranchName'], variables['packageVersionFormatted']))
+


=====================================
docs_requirements.txt
=====================================
@@ -1,5 +1,6 @@
-sphinx
+sphinx>=1.8
 sphinx_gallery
 matplotlib
 pillow
-sphinx_rtd_theme
\ No newline at end of file
+sphinx_rtd_theme
+numpydoc
\ No newline at end of file


=====================================
requirements.txt deleted
=====================================
@@ -1,7 +0,0 @@
-numpy>=1.17
-scipy>=1.3.1
-scikit-learn>=0.22
-numba>=0.51.2
-pynndescent>=0.5
-tbb>=2019.0
-tqdm


=====================================
setup.py
=====================================
@@ -1,3 +1,4 @@
+import platform
 from setuptools import setup
 
 
@@ -15,7 +16,7 @@ def readme():
 
 configuration = {
     "name": "umap-learn",
-    "version": "0.5.3",
+    "version": "0.5.4",
     "description": "Uniform Manifold Approximation and Projection",
     "long_description": readme(),
     "long_description_content_type": "text/x-rst",
@@ -45,12 +46,12 @@ configuration = {
     "packages": ["umap"],
     "install_requires": [
         "numpy >= 1.17",
+        "scipy >= 1.3.1",
         "scikit-learn >= 0.22",
-        "scipy >= 1.0",
-        "numba >= 0.49",
+        "numba >= 0.51.2",
         "pynndescent >= 0.5",
         "tqdm",
-    ],
+    ] + (["tbb >= 2019.0"] if platform.machine().lower().startswith("x86") else []),
     "extras_require": {
         "plot": [
             "pandas",


=====================================
umap/aligned_umap.py
=====================================
@@ -1,7 +1,7 @@
 import numpy as np
 import numba
 from sklearn.base import BaseEstimator
-from sklearn.utils import check_random_state, check_array
+from sklearn.utils import check_array
 
 from umap.sparse import arr_intersect as intersect1d
 from umap.sparse import arr_union as union1d
@@ -16,7 +16,7 @@ INT32_MAX = np.iinfo(np.int32).max - 1
 @numba.njit(parallel=True)
 def in1d(arr, test_set):
     test_set = set(test_set)
-    result = np.empty(arr.shape[0], dtype=np.uint8)
+    result = np.empty(arr.shape[0], dtype=np.bool_)
     for i in numba.prange(arr.shape[0]):
         if arr[i] in test_set:
             result[i] = True
@@ -63,7 +63,7 @@ def expand_relations(relation_dicts, window_size=3):
                 mapping = np.arange(max_n_samples)
                 for k in range(j + 1):
                     mapping = np.array(
-                        [relation_dicts[i + j].get(n, -1) for n in mapping]
+                        [relation_dicts[i + k].get(n, -1) for n in mapping]
                     )
                 result[i, result_index] = mapping
 
@@ -73,9 +73,9 @@ def expand_relations(relation_dicts, window_size=3):
                 result[i, result_index] = np.full(max_n_samples, -1, dtype=np.int32)
             else:
                 mapping = np.arange(max_n_samples)
-                for k in range(np.abs(j) + 1):
+                for k in range(0, j - 1, -1):
                     mapping = np.array(
-                        [reverse_relation_dicts[i + j - 1].get(n, -1) for n in mapping]
+                        [reverse_relation_dicts[i + k - 1].get(n, -1) for n in mapping]
                     )
                 result[i, result_index] = mapping
 
@@ -103,7 +103,8 @@ def build_neighborhood_similarities(graphs_indptr, graphs_indices, relations):
                 raw_base_graph_indices = base_graph_indices[
                     base_graph_indptr[k] : base_graph_indptr[k + 1]
                 ].copy()
-                base_indices = relations[i, j][raw_base_graph_indices]
+                base_indices = relations[i, j][raw_base_graph_indices[
+                    raw_base_graph_indices < relations.shape[2]]]
                 base_indices = base_indices[base_indices >= 0]
                 comparison_indices = comparison_graph_indices[
                     comparison_graph_indptr[comparison_index] : comparison_graph_indptr[
@@ -165,8 +166,14 @@ PARAM_NAMES = (
 def set_aligned_params(new_params, existing_params, n_models, param_names=PARAM_NAMES):
     for param in param_names:
         if param in new_params:
-            if type(existing_params[param]) in (list, tuple, np.ndarray):
-                existing_params[param] = existing_params[param] + (new_params[param],)
+            if isinstance(existing_params[param], list):
+                existing_params[param].append(new_params[param])
+            elif isinstance(existing_params[param], tuple):
+                existing_params[param] = existing_params[param] + \
+                    (new_params[param],)
+            elif isinstance(existing_params[param], np.ndarray):
+                existing_params[param] = np.append(existing_params[param],
+                                                   new_params[param])
             else:
                 if new_params[param] != existing_params[param]:
                     existing_params[param] = (existing_params[param],) * n_models + (
@@ -307,6 +314,11 @@ class AlignedUMAP(BaseEstimator):
 
         self.n_models_ = len(X)
 
+        if self.n_epochs is None:
+            self.n_epochs = 200
+
+        n_epochs = self.n_epochs
+
         self.mappers_ = [
             UMAP(
                 n_neighbors=get_nth_item_or_val(self.n_neighbors, n),
@@ -314,6 +326,7 @@ class AlignedUMAP(BaseEstimator):
                 n_epochs=get_nth_item_or_val(self.n_epochs, n),
                 repulsion_strength=get_nth_item_or_val(self.repulsion_strength, n),
                 learning_rate=get_nth_item_or_val(self.learning_rate, n),
+                init=self.init,
                 spread=get_nth_item_or_val(self.spread, n),
                 negative_sample_rate=get_nth_item_or_val(self.negative_sample_rate, n),
                 local_connectivity=get_nth_item_or_val(self.local_connectivity, n),
@@ -339,11 +352,6 @@ class AlignedUMAP(BaseEstimator):
             for n in range(self.n_models_)
         ]
 
-        if self.n_epochs is None:
-            n_epochs = 200
-        else:
-            n_epochs = self.n_epochs
-
         window_size = fit_params.get("window_size", self.alignment_window_size)
         relations = expand_relations(self.dict_relations_, window_size)
 
@@ -441,10 +449,20 @@ class AlignedUMAP(BaseEstimator):
             )
 
         new_dict_relations = fit_params["relations"]
+        assert isinstance(new_dict_relations, dict)
+
         X = check_array(X)
 
         self.__dict__ = set_aligned_params(fit_params, self.__dict__, self.n_models_)
-        self.n_models_ += 1
+
+        # We need n_components to be constant or this won't work
+        if type(self.n_components) in (list, tuple, np.ndarray):
+            raise ValueError("n_components must be a single integer, and cannot vary")
+
+        if self.n_epochs is None:
+            self.n_epochs = 200
+
+        n_epochs = self.n_epochs
 
         new_mapper = UMAP(
             n_neighbors=get_nth_item_or_val(self.n_neighbors, self.n_models_),
@@ -454,6 +472,7 @@ class AlignedUMAP(BaseEstimator):
                 self.repulsion_strength, self.n_models_
             ),
             learning_rate=get_nth_item_or_val(self.learning_rate, self.n_models_),
+	    init=self.init,
             spread=get_nth_item_or_val(self.spread, self.n_models_),
             negative_sample_rate=get_nth_item_or_val(
                 self.negative_sample_rate, self.n_models_
@@ -464,19 +483,30 @@ class AlignedUMAP(BaseEstimator):
             set_op_mix_ratio=get_nth_item_or_val(self.set_op_mix_ratio, self.n_models_),
             unique=get_nth_item_or_val(self.unique, self.n_models_),
             n_components=self.n_components,
+            metric=self.metric,
+            metric_kwds=self.metric_kwds,
+            low_memory=self.low_memory,
             random_state=self.random_state,
+            angular_rp_forest=self.angular_rp_forest,
+            transform_queue_size=self.transform_queue_size,
+            target_n_neighbors=self.target_n_neighbors,
+            target_metric=self.target_metric,
+            target_metric_kwds=self.target_metric_kwds,
+            target_weight=self.target_weight,
             transform_seed=self.transform_seed,
+            force_approximation_algorithm=self.force_approximation_algorithm,
+            verbose=self.verbose,
+            a=self.a,
+            b=self.b,
         ).fit(X, y)
 
+        self.n_models_ += 1
         self.mappers_ += [new_mapper]
 
-        # TODO: We can likely make this more efficient and not recompute each time
-        self.dict_relations_ += [invert_dict(new_dict_relations)]
+        self.dict_relations_ += [new_dict_relations]
 
-        if self.n_epochs is None:
-            n_epochs = 200
-        else:
-            n_epochs = self.n_epochs
+        window_size = fit_params.get("window_size", self.alignment_window_size)
+        new_relations = expand_relations(self.dict_relations_, window_size)
 
         indptr_list = numba.typed.List.empty_list(numba.types.int32[::1])
         indices_list = numba.typed.List.empty_list(numba.types.int32[::1])
@@ -498,15 +528,17 @@ class AlignedUMAP(BaseEstimator):
                     np.full(mapper.embedding_.shape[0], n_epochs + 1, dtype=np.float64)
                 )
 
-        new_relations = expand_relations(self.dict_relations_)
         new_regularisation_weights = build_neighborhood_similarities(
             indptr_list,
             indices_list,
             new_relations,
         )
 
+        # TODO: We can likely make this more efficient and not recompute each time
+        inv_dict_relations = invert_dict(new_dict_relations)
+
         new_embedding = init_from_existing(
-            self.embeddings_[-1], new_mapper.graph_, new_dict_relations
+            self.embeddings_[-1], new_mapper.graph_, inv_dict_relations
         )
 
         self.embeddings_.append(new_embedding)


=====================================
umap/distances.py
=====================================
@@ -1060,7 +1060,7 @@ def get_discrete_params(data, metric):
         return {}
 
 
- at numba.jit()
+ at numba.njit()
 def categorical_distance(x, y):
     if x == y:
         return 0.0
@@ -1068,7 +1068,7 @@ def categorical_distance(x, y):
         return 1.0
 
 
- at numba.jit()
+ at numba.njit()
 def hierarchical_categorical_distance(x, y, cat_hierarchy=[{}]):
     n_levels = float(len(cat_hierarchy))
     for level, cats in enumerate(cat_hierarchy):
@@ -1083,7 +1083,7 @@ def ordinal_distance(x, y, support_size=1.0):
     return abs(x - y) / support_size
 
 
- at numba.jit()
+ at numba.njit()
 def count_distance(x, y, poisson_lambda=1.0, normalisation=1.0):
     lo = int(min(x, y))
     hi = int(max(x, y))
@@ -1283,7 +1283,7 @@ def chunked_parallel_special_metric(X, Y=None, metric=hellinger, chunk_size=16):
     return result
 
 
-def pairwise_special_metric(X, Y=None, metric="hellinger", kwds=None):
+def pairwise_special_metric(X, Y=None, metric="hellinger", kwds=None, force_all_finite=True):
     if callable(metric):
         if kwds is not None:
             kwd_vals = tuple(kwds.values())
@@ -1294,7 +1294,7 @@ def pairwise_special_metric(X, Y=None, metric="hellinger", kwds=None):
         def _partial_metric(_X, _Y=None):
             return metric(_X, _Y, *kwd_vals)
 
-        return pairwise_distances(X, Y, metric=_partial_metric)
+        return pairwise_distances(X, Y, metric=_partial_metric, force_all_finite=force_all_finite)
     else:
         special_metric_func = named_distances[metric]
     return parallel_special_metric(X, Y, metric=special_metric_func)


=====================================
umap/layouts.py
=====================================
@@ -35,6 +35,7 @@ def clip(val):
         "result": numba.types.float32,
         "diff": numba.types.float32,
         "dim": numba.types.intp,
+        "i": numba.types.intp,
     },
 )
 def rdist(x, y):
@@ -176,7 +177,7 @@ def _optimize_layout_euclidean_single_epoch(
                     if grad_coeff > 0.0:
                         grad_d = clip(grad_coeff * (current[d] - other[d]))
                     else:
-                        grad_d = 4.0
+                        grad_d = 0
                     current[d] += grad_d * alpha
 
             epoch_of_next_negative_sample[i] += (
@@ -256,8 +257,12 @@ def optimize_layout_euclidean(
         The indices of the heads of 1-simplices with non-zero membership.
     tail: array of shape (n_1_simplices)
         The indices of the tails of 1-simplices with non-zero membership.
-    n_epochs: int
-        The number of training epochs to use in optimization.
+    n_epochs: int, or list of int
+        The number of training epochs to use in optimization, or a list of
+        epochs at which to save the embedding. In case of a list, the optimization
+        will use the maximum number of epochs in the list, and will return a list
+        of embedding in the order of increasing epoch, regardless of the order in
+        the epoch list.
     n_vertices: int
         The number of vertices (0-simplices) in the dataset.
     epochs_per_sample: array of shape (n_1_simplices)
@@ -332,6 +337,12 @@ def optimize_layout_euclidean(
         dens_phi_sum = np.zeros(1, dtype=np.float32)
         dens_re_sum = np.zeros(1, dtype=np.float32)
 
+    epochs_list = None
+    embedding_list = []
+    if isinstance(n_epochs, list):
+        epochs_list = n_epochs
+        n_epochs = max(epochs_list)
+
     if "disable" not in tqdm_kwds:
         tqdm_kwds["disable"] = not verbose
 
@@ -398,7 +409,17 @@ def optimize_layout_euclidean(
 
         alpha = initial_alpha * (1.0 - (float(n) / float(n_epochs)))
 
-    return head_embedding
+        if verbose and n % int(n_epochs / 10) == 0:
+            print("\tcompleted ", n, " / ", n_epochs, "epochs")
+
+        if epochs_list is not None and n in epochs_list:
+            embedding_list.append(head_embedding.copy())
+
+    # Add the last embedding to the list as well
+    if epochs_list is not None:
+        embedding_list.append(head_embedding.copy())
+
+    return head_embedding if epochs_list is None else embedding_list
 
 
 def _optimize_layout_generic_single_epoch(
@@ -906,7 +927,7 @@ def _optimize_layout_aligned_euclidean_single_epoch(
                             if n_embeddings > neighbor_m >= 0 != offset:
                                 identified_index = relations[m, offset + window_size, k]
                                 if identified_index >= 0:
-                                    grad_d -= clip(
+                                    other_grad_d -= clip(
                                         (lambda_ * np.exp(-(np.abs(offset) - 1)))
                                         * regularisation_weights[
                                             m, offset + window_size, k
@@ -952,7 +973,7 @@ def _optimize_layout_aligned_euclidean_single_epoch(
                         if grad_coeff > 0.0:
                             grad_d = clip(grad_coeff * (current[d] - other[d]))
                         else:
-                            grad_d = 4.0
+                            grad_d = 0.0
 
                         for offset in range(-window_size, window_size):
                             neighbor_m = m + offset


=====================================
umap/parametric_umap.py
=====================================
@@ -211,7 +211,7 @@ class ParametricUMAP(UMAP):
                 )
             # prepare X for training the network
             self._X = X
-            # geneate the graph on precomputed distances
+            # generate the graph on precomputed distances
             return super().fit_transform(precomputed_distances, y)
         else:
             return super().fit_transform(X, y)
@@ -240,7 +240,7 @@ class ParametricUMAP(UMAP):
             return super().transform(X)
 
     def inverse_transform(self, X):
-        """ "Transform X in the existing embedded space back into the input
+        """ Transform X in the existing embedded space back into the input
         data space and return that transformed output.
         Parameters
         ----------
@@ -289,7 +289,7 @@ class ParametricUMAP(UMAP):
                 outputs["reconstruction"] = embedding_to_recon
 
         else:
-            # this is the sham input (its just a 0) to make keras think there is input data
+            # this is the sham input (it's just a 0) to make keras think there is input data
             batch_sample = tf.keras.layers.Input(
                 shape=(1), dtype=tf.int32, name="batch_sample"
             )
@@ -545,7 +545,7 @@ def get_graph_elements(graph_, n_epochs):
     weight np.array
         edge weight
     n_vertices int
-        number of verticies in graph
+        number of vertices in graph
     """
     ### should we remove redundancies () here??
     # graph_ = remove_redundant_edges(graph_)
@@ -633,9 +633,9 @@ def init_embedding_from_graph(
     return embedding
 
 
-def convert_distance_to_probability(distances, a=1.0, b=1.0):
+def convert_distance_to_log_probability(distances, a=1.0, b=1.0):
     """
-     convert distance representation into probability,
+     convert distance representation into log probability,
         as a function of a, b params
 
     Parameters
@@ -650,13 +650,13 @@ def convert_distance_to_probability(distances, a=1.0, b=1.0):
     Returns
     -------
     float
-        probability in embedding space
+        log probability in embedding space
     """
-    return 1.0 / (1.0 + a * distances ** (2 * b))
+    return -tf.math.log1p(a * distances ** (2 * b))
 
 
 def compute_cross_entropy(
-    probabilities_graph, probabilities_distance, EPS=1e-4, repulsion_strength=1.0
+    probabilities_graph, log_probabilities_distance, EPS=1e-4, repulsion_strength=1.0
 ):
     """
     Compute cross entropy between low and high probability
@@ -665,10 +665,10 @@ def compute_cross_entropy(
     ----------
     probabilities_graph : array
         high dimensional probabilities
-    probabilities_distance : array
-        low dimensional probabilities
+    log_probabilities_distance : array
+        low dimensional log probabilities
     EPS : float, optional
-        offset to to ensure log is taken of a positive number, by default 1e-4
+        offset to ensure log is taken of a positive number, by default 1e-4
     repulsion_strength : float, optional
         strength of repulsion between negative samples, by default 1.0
 
@@ -677,22 +677,25 @@ def compute_cross_entropy(
     attraction_term: tf.float32
         attraction term for cross entropy loss
     repellant_term: tf.float32
-        repellant term for cross entropy loss
+        repellent term for cross entropy loss
     cross_entropy: tf.float32
         cross entropy umap loss
 
     """
     # cross entropy
-    attraction_term = -probabilities_graph * tf.math.log(
-        tf.clip_by_value(probabilities_distance, EPS, 1.0)
+    attraction_term = -probabilities_graph * tf.math.log_sigmoid(
+        log_probabilities_distance
     )
+    # use numerically stable repellent term
+    # Shi et al. 2022 (https://arxiv.org/abs/2111.08851)
+    # log(1 - sigmoid(logits)) = log(sigmoid(logits)) - logits
     repellant_term = (
         -(1.0 - probabilities_graph)
-        * tf.math.log(tf.clip_by_value(1.0 - probabilities_distance, EPS, 1.0))
+        * (tf.math.log_sigmoid(log_probabilities_distance) - log_probabilities_distance)
         * repulsion_strength
     )
 
-    # balance the expected losses between atrraction and repel
+    # balance the expected losses between attraction and repel
     CE = attraction_term + repellant_term
     return attraction_term, repellant_term, CE
 
@@ -707,7 +710,7 @@ def umap_loss(
     repulsion_strength=1.0,
 ):
     """
-    Generate a keras-ccompatible loss function for UMAP loss
+    Generate a keras-compatible loss function for UMAP loss
 
     Parameters
     ----------
@@ -717,12 +720,12 @@ def umap_loss(
         number of negative samples per positive samples to train on
     _a : float
         distance parameter in embedding space
-    _b : float float
+    _b : float
         distance parameter in embedding space
     edge_weights : array
         weights of all edges from sparse UMAP graph
     parametric_embedding : bool
-        whether the embeddding is parametric or nonparametric
+        whether the embedding is parametric or nonparametric
     repulsion_strength : float, optional
         strength of repulsion vs attraction for cross-entropy, by default 1.0
 
@@ -759,8 +762,8 @@ def umap_loss(
             axis=0,
         )
 
-        # convert probabilities to distances
-        probabilities_distance = convert_distance_to_probability(
+        # convert distances to probabilities
+        log_probabilities_distance = convert_distance_to_log_probability(
             distance_embedding, _a, _b
         )
 
@@ -772,7 +775,7 @@ def umap_loss(
         # compute cross entropy
         (attraction_loss, repellant_loss, ce_loss) = compute_cross_entropy(
             probabilities_graph,
-            probabilities_distance,
+            log_probabilities_distance,
             repulsion_strength=repulsion_strength,
         )
 


=====================================
umap/plot.py
=====================================
@@ -63,11 +63,11 @@ darkpurple_cmap = matplotlib.colors.LinearSegmentedColormap.from_list(
     "darkpurple", colorcet.linear_bmw_5_95_c89
 )
 
-plt.register_cmap("fire", fire_cmap)
-plt.register_cmap("darkblue", darkblue_cmap)
-plt.register_cmap("darkgreen", darkgreen_cmap)
-plt.register_cmap("darkred", darkred_cmap)
-plt.register_cmap("darkpurple", darkpurple_cmap)
+plt.colormaps.register(fire_cmap, name="fire")
+plt.colormaps.register(darkblue_cmap, name="darkblue")
+plt.colormaps.register(darkgreen_cmap, name="darkgreen")
+plt.colormaps.register(darkred_cmap, name="darkred")
+plt.colormaps.register(darkpurple_cmap, name="darkpurple")
 
 
 def _to_hex(arr):
@@ -200,13 +200,14 @@ def _nhood_search(umap_object, nhood_size):
     return indices, dists
 
 
- at numba.jit()
+ at numba.jit(nopython=False)
 def _nhood_compare(indices_left, indices_right):
     """Compute Jaccard index of two neighborhoods"""
     result = np.empty(indices_left.shape[0])
 
     for i in range(indices_left.shape[0]):
-        intersection_size = np.intersect1d(indices_left[i], indices_right[i]).shape[0]
+        intersection_size = np.intersect1d(indices_left[i], indices_right[i], 
+                                           assume_unique=True).shape[0]
         union_size = np.unique(np.hstack([indices_left[i], indices_right[i]])).shape[0]
         result[i] = float(intersection_size) / float(union_size)
 
@@ -467,7 +468,7 @@ def show(plot_to_show):
     """
     if isinstance(plot_to_show, plt.Axes):
         show_static()
-    elif isinstance(plot_to_show, bpl.Figure):
+    elif isinstance(plot_to_show, bpl.figure):
         show_interactive(plot_to_show)
     elif isinstance(plot_to_show, hv.core.spaces.DynamicMap):
         show_interactive(hv.render(plot_to_show), backend="bokeh")
@@ -479,6 +480,7 @@ def show(plot_to_show):
 
 def points(
     umap_object,
+    points=None,
     labels=None,
     values=None,
     theme=None,
@@ -498,7 +500,7 @@ def points(
     to further control and tailor the plotting, you need only
     pass in the trained/fit umap model to get results. This plot
     utility will attempt to do the hard work of avoiding
-    overplotting issues, and make it easy to automatically
+    over-plotting issues, and make it easy to automatically
     colour points by a categorical labelling or numeric values.
 
     This method is intended to be used within a Jupyter
@@ -509,6 +511,12 @@ def points(
     umap_object: trained UMAP object
         A trained UMAP object that has a 2D embedding.
 
+    points: array, shape (n_samples, dim) (optional, default None)
+        An array of points to be plotted. Usually this is None
+        and so the original embedding points of the umap_object
+        are used. However points can be passed explicitly instead
+        which is useful for points manually transformed.
+
     labels: array, shape (n_samples,) (optional, default None)
         An array of labels (assumed integer or categorical),
         one for each data sample.
@@ -624,7 +632,8 @@ def points(
         if not 0.0 <= alpha <= 1.0:
             raise ValueError("Alpha must be between 0 and 1 inclusive")
 
-    points = _get_embedding(umap_object)
+    if points is None:
+        points = _get_embedding(umap_object)
 
     if subset_points is not None:
         if len(subset_points) != points.shape[0]:
@@ -738,7 +747,7 @@ def connectivity(
     to further control and tailor the plotting, you need only
     pass in the trained/fit umap model to get results. This plot
     utility will attempt to do the hard work of avoiding
-    overplotting issues and provide options for plotting the
+    over-plotting issues and provide options for plotting the
     points as well as using edge bundling for graph visualization.
 
     Parameters
@@ -840,7 +849,7 @@ def connectivity(
     -------
     result: matplotlib axis
         The result is a matplotlib axis with the relevant plot displayed.
-        If you are using a notbooks and have ``%matplotlib inline`` set
+        If you are using a notebook and have ``%matplotlib inline`` set
         then this will simply display inline.
     """
     if theme is not None:
@@ -1001,7 +1010,7 @@ def diagnostic(
         as that which provides ``local_variance_threshold``
         or more of the ``variance_explained_ratio``.
 
-    ax: matlotlib axis (optional, default None)
+    ax: matplotlib axis (optional, default None)
         A matplotlib axis to plot to, or, if None, a new
         axis will be created and returned.
 
@@ -1019,7 +1028,7 @@ def diagnostic(
     -------
     result: matplotlib axis
         The result is a matplotlib axis with the relevant plot displayed.
-        If you are using a notbooks and have ``%matplotlib inline`` set
+        If you are using a notebook and have ``%matplotlib inline`` set
         then this will simply display inline.
     """
 
@@ -1212,6 +1221,7 @@ def interactive(
     labels=None,
     values=None,
     hover_data=None,
+    tools=None,
     theme=None,
     cmap="Blues",
     color_key=None,
@@ -1229,7 +1239,7 @@ def interactive(
     """Create an interactive bokeh plot of a UMAP embedding.
     While static plots are useful, sometimes a plot that
     supports interactive zooming, and hover tooltips for
-    individual points is much more desireable. This function
+    individual points is much more desirable. This function
     provides a simple interface for creating such plots. The
     result is a bokeh plot that will be displayed in a notebook.
 
@@ -1266,6 +1276,14 @@ def interactive(
         for each data point. Column names will be used for
         identifying information within the tooltip.
 
+    tools: List (optional, default None),
+        Defines the tools to be configured for interactive plots.
+        The list can be mixed type of string and tools objects defined by
+        Bokeh like HoverTool. Default tool list Bokeh uses is
+        ["pan","wheel_zoom","box_zoom","save","reset","help",].
+        When tools are specified, and includes hovertool, automatic tooltip
+        based on hover_data is not created.
+
     theme: string (optional, default None)
         A color theme to use for plotting. A small set of
         predefined themes are provided which have relatively
@@ -1420,15 +1438,19 @@ def interactive(
             hover_data = hover_data[subset_points]
 
     if points.shape[0] <= width * height // 10:
-
+        tooltips = None
+        tooltip_needed = True
         if hover_data is not None:
             tooltip_dict = {}
             for col_name in hover_data:
                 data[col_name] = hover_data[col_name]
                 tooltip_dict[col_name] = "@{" + col_name + "}"
             tooltips = list(tooltip_dict.items())
-        else:
-            tooltips = None
+
+            for _tool in tools:
+                if _tool.__class__.__name__ == "HoverTool":
+                    tooltip_needed = False
+                    break
 
         if alpha is not None:
             data["alpha"] = alpha
@@ -1441,7 +1463,8 @@ def interactive(
         plot = bpl.figure(
             width=width,
             height=height,
-            tooltips=tooltips,
+            tooltips=None if not tooltip_needed else tooltips,
+            tools=tools if tools is not None else "pan,wheel_zoom,box_zoom,save,reset,help",
             background_fill_color=background,
         )
         plot.circle(
@@ -1518,11 +1541,11 @@ def interactive(
         if hover_data is not None:
             warn(
                 "Too many points for hover data -- tooltips will not"
-                "be displayed. Sorry; try subssampling your data."
+                "be displayed. Sorry; try subsampling your data."
             )
         if interactive_text_search:
             warn(
-                "Too many points for text search." "Sorry; try subssampling your data."
+                "Too many points for text search." "Sorry; try subsampling your data."
             )
         if alpha is not None:
             warn("Alpha parameter will not be applied on holoviews plots")
@@ -1579,7 +1602,7 @@ def nearest_neighbour_distribution(umap_object, bins=25, ax=None):
     bins: int (optional, default 25)
         Number of bins to put the points into
 
-    ax: matlotlib axis (optional, default None)
+    ax: matplotlib axis (optional, default None)
         A matplotlib axis to plot to, or, if None, a new
         axis will be created and returned.
 


=====================================
umap/spectral.py
=====================================
@@ -1,10 +1,12 @@
+import warnings
+
 from warnings import warn
 
 import numpy as np
 
 import scipy.sparse
 import scipy.sparse.csgraph
-
+from sklearn.decomposition import TruncatedSVD
 from sklearn.manifold import SpectralEmbedding
 from sklearn.metrics import pairwise_distances
 from sklearn.metrics.pairwise import _VALID_METRICS as SKLEARN_PAIRWISE_VALID_METRICS
@@ -130,7 +132,7 @@ def component_layout(
                     component_centroids, metric=metric, **metric_kwds
                 )
 
-    affinity_matrix = np.exp(-(distance_matrix ** 2))
+    affinity_matrix = np.exp(-(distance_matrix**2))
 
     component_embedding = SpectralEmbedding(
         n_components=dim, affinity="precomputed", random_state=random_state
@@ -149,9 +151,12 @@ def multi_component_layout(
     random_state,
     metric="euclidean",
     metric_kwds={},
+    init="random",
+    tol=0.0,
+    maxiter=0
 ):
     """Specialised layout algorithm for dealing with graphs with many connected components.
-    This will first fid relative positions for the components by spectrally embedding
+    This will first find relative positions for the components by spectrally embedding
     their centroids, then spectrally embed each individual connected component positioning
     them according to the centroid embeddings. This provides a decent embedding of each
     component while placing the components in good relative positions to one another.
@@ -163,7 +168,7 @@ def multi_component_layout(
         connected component of the graph.
 
     graph: sparse matrix
-        The adjacency matrix of the graph to be emebdded.
+        The adjacency matrix of the graph to be embedded.
 
     n_components: int
         The number of distinct components to be layed out.
@@ -181,6 +186,19 @@ def multi_component_layout(
     metric_kwds: dict (optional, default {})
         Keyword arguments to be passed to the metric function.
 
+    init: string, either "random" or "tsvd"
+        Indicates to initialize the eigensolver. Use "random" (the default) to
+        use uniformly distributed random initialization; use "tsvd" to warm-start the
+        eigensolver with singular vectors of the Laplacian associated to the largest
+        singular values. This latter option also forces usage of the LOBPCG eigensolver;
+        with the former, ARPACK's solver ``eigsh`` will be used for smaller Laplacians.
+
+    tol: float, default chosen by implementation
+        Stopping tolerance for the numerical algorithm computing the embedding.
+
+    maxiter: int, default chosen by implementation
+        Number of iterations the numerical algorithm will go through at most as it
+        attempts to compute the embedding.
 
     Returns
     -------
@@ -221,62 +239,39 @@ def multi_component_layout(
                 )
                 + meta_embedding[label]
             )
-            continue
-
-        diag_data = np.asarray(component_graph.sum(axis=0))
-        # standard Laplacian
-        # D = scipy.sparse.spdiags(diag_data, 0, graph.shape[0], graph.shape[0])
-        # L = D - graph
-        # Normalized Laplacian
-        I = scipy.sparse.identity(component_graph.shape[0], dtype=np.float64)
-        D = scipy.sparse.spdiags(
-            1.0 / np.sqrt(diag_data),
-            0,
-            component_graph.shape[0],
-            component_graph.shape[0],
-        )
-        L = I - D * component_graph * D
-
-        k = dim + 1
-        num_lanczos_vectors = max(2 * k + 1, int(np.sqrt(component_graph.shape[0])))
-        try:
-            eigenvalues, eigenvectors = scipy.sparse.linalg.eigsh(
-                L,
-                k,
-                which="SM",
-                ncv=num_lanczos_vectors,
-                tol=1e-4,
-                v0=np.ones(L.shape[0]),
-                maxiter=graph.shape[0] * 5,
+        else:
+            component_embedding = _spectral_layout(
+                data=None,
+                graph=component_graph,
+                dim=dim,
+                random_state=random_state,
+                metric=metric,
+                metric_kwds=metric_kwds,
+                init=init,
+                tol=tol,
+                maxiter=maxiter
             )
-            order = np.argsort(eigenvalues)[1:k]
-            component_embedding = eigenvectors[:, order]
             expansion = data_range / np.max(np.abs(component_embedding))
             component_embedding *= expansion
             result[component_labels == label] = (
                 component_embedding + meta_embedding[label]
             )
-        except scipy.sparse.linalg.ArpackError:
-            warn(
-                "WARNING: spectral initialisation failed! The eigenvector solver\n"
-                "failed. This is likely due to too small an eigengap. Consider\n"
-                "adding some noise or jitter to your data.\n\n"
-                "Falling back to random initialisation!"
-            )
-            result[component_labels == label] = (
-                random_state.uniform(
-                    low=-data_range,
-                    high=data_range,
-                    size=(component_graph.shape[0], dim),
-                )
-                + meta_embedding[label]
-            )
 
     return result
 
 
-def spectral_layout(data, graph, dim, random_state, metric="euclidean", metric_kwds={}):
-    """Given a graph compute the spectral embedding of the graph. This is
+def spectral_layout(
+    data,
+    graph,
+    dim,
+    random_state,
+    metric="euclidean",
+    metric_kwds={},
+    tol=0.0,
+    maxiter=0
+):
+    """
+    Given a graph compute the spectral embedding of the graph. This is
     simply the eigenvectors of the laplacian of the graph. Here we use the
     normalized laplacian.
 
@@ -294,6 +289,172 @@ def spectral_layout(data, graph, dim, random_state, metric="euclidean", metric_k
     random_state: numpy RandomState or equivalent
         A state capable being used as a numpy random state.
 
+    tol: float, default chosen by implementation
+        Stopping tolerance for the numerical algorithm computing the embedding.
+
+    maxiter: int, default chosen by implementation
+        Number of iterations the numerical algorithm will go through at most as it
+        attempts to compute the embedding.
+
+    Returns
+    -------
+    embedding: array of shape (n_vertices, dim)
+        The spectral embedding of the graph.
+    """
+    return _spectral_layout(
+        data=data,
+        graph=graph,
+        dim=dim,
+        random_state=random_state,
+        metric=metric,
+        metric_kwds=metric_kwds,
+        init="random",
+        tol=tol,
+        maxiter=maxiter
+    )
+
+
+def tswspectral_layout(
+    data,
+    graph,
+    dim,
+    random_state,
+    metric="euclidean",
+    metric_kwds={},
+    method=None,
+    tol=0.0,
+    maxiter=0
+):
+    """Given a graph, compute the spectral embedding of the graph. This is
+    simply the eigenvectors of the Laplacian of the graph. Here we use the
+    normalized laplacian and a truncated SVD-based guess of the
+    eigenvectors to "warm" up the eigensolver. This function should
+    give results of similar accuracy to the spectral_layout function, but
+    may converge more quickly for graph Laplacians that cause
+    spectral_layout to take an excessive amount of time to complete.
+
+    Parameters
+    ----------
+    data: array of shape (n_samples, n_features)
+        The source data
+
+    graph: sparse matrix
+        The (weighted) adjacency matrix of the graph as a sparse matrix.
+
+    dim: int
+        The dimension of the space into which to embed.
+
+    random_state: numpy RandomState or equivalent
+        A state capable being used as a numpy random state.
+
+    metric: string or callable (optional, default 'euclidean')
+        The metric used to measure distances among the source data points.
+        Used only if the multiple connected components are found in the
+        graph.
+
+    metric_kwds: dict (optional, default {})
+        Keyword arguments to be passed to the metric function.
+        If metric is 'precomputed', 'linkage' keyword can be used to specify
+        'average', 'complete', or 'single' linkage. Default is 'average'.
+        Used only if the multiple connected components are found in the
+        graph.
+
+    method: str (optional, default None, values either 'eigsh' or 'lobpcg')
+        Name of the eigenvalue computation method to use to compute the spectral
+        embedding. If left to None (or empty string), as by default, the method is
+        chosen from the number of vectors in play: larger vector collections are
+        handled with lobpcg, smaller collections with eigsh. Method names correspond
+        to SciPy routines in scipy.sparse.linalg.
+
+    tol: float, default chosen by implementation
+        Stopping tolerance for the numerical algorithm computing the embedding.
+
+    maxiter: int, default chosen by implementation
+        Number of iterations the numerical algorithm will go through at most as it
+        attempts to compute the embedding.
+
+    Returns
+    -------
+    embedding: array of shape (n_vertices, dim)
+        The spectral embedding of the graph.
+    """
+    return _spectral_layout(
+        data=data,
+        graph=graph,
+        dim=dim,
+        random_state=random_state,
+        metric=metric,
+        metric_kwds=metric_kwds,
+        init="tsvd",
+        method=method,
+        tol=tol,
+        maxiter=maxiter
+    )
+
+
+def _spectral_layout(
+    data,
+    graph,
+    dim,
+    random_state,
+    metric="euclidean",
+    metric_kwds={},
+    init="random",
+    method=None,
+    tol=0.0,
+    maxiter=0
+):
+    """General implementation of the spectral embedding of the graph, derived as
+    a subset of the eigenvectors of the normalized Laplacian of the graph. The numerical
+    method for computing the eigendecomposition is chosen through heuristics.
+
+    Parameters
+    ----------
+    data: array of shape (n_samples, n_features)
+        The source data
+
+    graph: sparse matrix
+        The (weighted) adjacency matrix of the graph as a sparse matrix.
+
+    dim: int
+        The dimension of the space into which to embed.
+
+    random_state: numpy RandomState or equivalent
+        A state capable being used as a numpy random state.
+
+    metric: string or callable (optional, default 'euclidean')
+        The metric used to measure distances among the source data points.
+        Used only if the multiple connected components are found in the
+        graph.
+
+    metric_kwds: dict (optional, default {})
+        Keyword arguments to be passed to the metric function.
+        If metric is 'precomputed', 'linkage' keyword can be used to specify
+        'average', 'complete', or 'single' linkage. Default is 'average'.
+        Used only if the multiple connected components are found in the
+        graph.
+
+    init: string, either "random" or "tsvd"
+        Indicates to initialize the eigensolver. Use "random" (the default) to
+        use uniformly distributed random initialization; use "tsvd" to warm-start the
+        eigensolver with singular vectors of the Laplacian associated to the largest
+        singular values. This latter option also forces usage of the LOBPCG eigensolver;
+        with the former, ARPACK's solver ``eigsh`` will be used for smaller Laplacians.
+
+    method: string -- either "eigsh" or "lobpcg" -- or None
+        Name of the eigenvalue computation method to use to compute the spectral
+        embedding. If left to None (or empty string), as by default, the method is
+        chosen from the number of vectors in play: larger vector collections are
+        handled with lobpcg, smaller collections with eigsh. Method names correspond
+        to SciPy routines in scipy.sparse.linalg.
+
+    tol: float, default chosen by implementation
+        Stopping tolerance for the numerical algorithm computing the embedding.
+
+    maxiter: int, default chosen by implementation
+        Number of iterations the numerical algorithm will go through at most as it
+        attempts to compute the embedding.
+
     Returns
     -------
     embedding: array of shape (n_vertices, dim)
@@ -314,41 +475,82 @@ def spectral_layout(data, graph, dim, random_state, metric="euclidean", metric_k
             metric_kwds=metric_kwds,
         )
 
-    diag_data = np.asarray(graph.sum(axis=0))
+    sqrt_deg = np.sqrt(np.asarray(graph.sum(axis=0)).squeeze())
     # standard Laplacian
     # D = scipy.sparse.spdiags(diag_data, 0, graph.shape[0], graph.shape[0])
     # L = D - graph
     # Normalized Laplacian
     I = scipy.sparse.identity(graph.shape[0], dtype=np.float64)
     D = scipy.sparse.spdiags(
-        1.0 / np.sqrt(diag_data), 0, graph.shape[0], graph.shape[0]
+        1.0 / sqrt_deg, 0, graph.shape[0], graph.shape[0]
     )
     L = I - D * graph * D
+    if not scipy.sparse.issparse(L):
+        L = np.asarray(L)
 
     k = dim + 1
     num_lanczos_vectors = max(2 * k + 1, int(np.sqrt(graph.shape[0])))
+    gen = (
+        random_state
+        if isinstance(random_state, (np.random.Generator, np.random.RandomState))
+        else np.random.default_rng(seed=random_state)
+    )
+    if not method:
+        method = "eigsh" if L.shape[0] < 2000000 else "lobpcg"
+
     try:
-        if L.shape[0] < 2000000:
+        if init == "random":
+            X = gen.normal(size=(L.shape[0], k))
+        elif init == "tsvd":
+            X = TruncatedSVD(
+                n_components=k,
+                random_state=random_state,
+                # algorithm="arpack"
+            ).fit_transform(L)
+        else:
+            raise ValueError(
+                "The init parameter must be either 'random' or 'tsvd': "
+                f"{init} is invalid."
+            )
+        # For such a normalized Laplacian, the first eigenvector is always
+        # proportional to sqrt(degrees). We thus replace the first t-SVD guess
+        # with the exact value.
+        X[:, 0] = sqrt_deg / np.linalg.norm(sqrt_deg)
+
+        if method == "eigsh":
             eigenvalues, eigenvectors = scipy.sparse.linalg.eigsh(
                 L,
                 k,
                 which="SM",
                 ncv=num_lanczos_vectors,
-                tol=1e-4,
+                tol=tol or 1e-4,
                 v0=np.ones(L.shape[0]),
-                maxiter=graph.shape[0] * 5,
+                maxiter=maxiter or graph.shape[0] * 5,
             )
+        elif method == "lobpcg":
+            with warnings.catch_warnings():
+                warnings.filterwarnings(
+                    category=UserWarning,
+                    message=r"(?ms).*not reaching the requested tolerance",
+                    action="error"
+                )
+                eigenvalues, eigenvectors = scipy.sparse.linalg.lobpcg(
+                    L,
+                    np.asarray(X),
+                    largest=False,
+                    tol=tol or 1e-4,
+                    maxiter=maxiter or 5 * graph.shape[0]
+                )
         else:
-            eigenvalues, eigenvectors = scipy.sparse.linalg.lobpcg(
-                L, random_state.normal(size=(L.shape[0], k)), largest=False, tol=1e-8
-            )
+            raise ValueError("Method should either be None, 'eigsh' or 'lobpcg'")
+
         order = np.argsort(eigenvalues)[1:k]
         return eigenvectors[:, order]
-    except scipy.sparse.linalg.ArpackError:
+    except (scipy.sparse.linalg.ArpackError, UserWarning):
         warn(
-            "WARNING: spectral initialisation failed! The eigenvector solver\n"
+            "Spectral initialisation failed! The eigenvector solver\n"
             "failed. This is likely due to too small an eigengap. Consider\n"
             "adding some noise or jitter to your data.\n\n"
             "Falling back to random initialisation!"
         )
-        return random_state.uniform(low=-10.0, high=10.0, size=(graph.shape[0], dim))
+        return gen.uniform(low=-10.0, high=10.0, size=(graph.shape[0], dim))


=====================================
umap/tests/test_aligned_umap.py
=====================================
@@ -1,3 +1,4 @@
+import pytest
 from umap import AlignedUMAP
 from sklearn.metrics import pairwise_distances
 from sklearn.cluster import KMeans
@@ -51,3 +52,31 @@ def test_aligned_update(aligned_iris, aligned_iris_relations):
         embd_dmat = pairwise_distances(small_aligned_model.embeddings_[i])
         embd_nn = np.argsort(embd_dmat, axis=1)[:, :10]
         assert nn_accuracy(true_nn, embd_nn) >= 0.45
+
+
+def test_aligned_update_params(aligned_iris, aligned_iris_relations):
+    data, target = aligned_iris
+    n_neighbors = [15, 15, 15, 15, 15]
+    small_aligned_model = AlignedUMAP(n_neighbors=n_neighbors[:3])
+    small_aligned_model.fit(data[:3], relations=aligned_iris_relations[:2])
+    small_aligned_model.update(data[3],
+                               relations=aligned_iris_relations[2],
+                               n_neighbors=n_neighbors[3])
+    for i, slice in enumerate(data[:4]):
+        data_dmat = pairwise_distances(slice)
+        true_nn = np.argsort(data_dmat, axis=1)[:, :10]
+        embd_dmat = pairwise_distances(small_aligned_model.embeddings_[i])
+        embd_nn = np.argsort(embd_dmat, axis=1)[:, :10]
+        assert nn_accuracy(true_nn, embd_nn) >= 0.45
+
+ at pytest.mark.skip(reason="Temporarily disable")
+def test_aligned_update_array_error(aligned_iris, aligned_iris_relations):
+    data, target = aligned_iris
+    n_neighbors = [15, 15, 15, 15, 15]
+    small_aligned_model = AlignedUMAP(n_neighbors=n_neighbors[:3])
+    small_aligned_model.fit(data[:3], relations=aligned_iris_relations[:2])
+
+    with pytest.raises(ValueError):
+        small_aligned_model.update(data[3:],
+                                   relations=aligned_iris_relations[2:],
+                                   n_neighbors=n_neighbors[3:])


=====================================
umap/tests/test_composite_models.py
=====================================
@@ -25,12 +25,12 @@ def test_composite_trustworthiness(nn_data, iris_model):
     model3 = model1 * model2
     trust = trustworthiness(data, model3.embedding_, n_neighbors=10)
     assert (
-        trust >= 0.82
+        trust >= 0.80
     ), "Insufficiently trustworthy embedding for" "nn dataset: {}".format(trust)
     model4 = model1 + model2
     trust = trustworthiness(data, model4.embedding_, n_neighbors=10)
     assert (
-        trust >= 0.82
+        trust >= 0.80
     ), "Insufficiently trustworthy embedding for" "nn dataset: {}".format(trust)
 
     with pytest.raises(ValueError):


=====================================
umap/tests/test_data_input.py
=====================================
@@ -0,0 +1,85 @@
+import numpy as np
+import pytest as pytest
+from numba import njit
+from umap import UMAP
+
+
+ at pytest.fixture(scope="session")
+def all_finite_data():
+    return np.arange(100.0).reshape(25, 4)
+
+
+ at pytest.fixture(scope="session")
+def inverse_data():
+    return np.arange(50).reshape(25, 2)
+
+
+ at njit
+def nan_dist(a: np.ndarray, b: np.ndarray):
+    a[0] = np.nan
+    a[1] = np.inf
+    return 0, a
+
+
+def test_check_input_data(all_finite_data, inverse_data):
+    """
+    Data input to UMAP gets checked for liability.
+    This tests checks the if data input is dismissed/accepted
+    according to the "force_all_finite" keyword as used by
+    sklearn.
+
+    Parameters
+    ----------
+    all_finite_data
+    inverse_data
+    -------
+
+    """
+    inf_data = all_finite_data.copy()
+    inf_data[0] = np.inf
+    nan_data = all_finite_data.copy()
+    nan_data[0] = np.nan
+    inf_nan_data = all_finite_data.copy()
+    inf_nan_data[0] = np.nan
+    inf_nan_data[1] = np.inf
+
+    # wrapper to call each data handling function of UMAP in a convenient way
+    def call_umap_functions(data, force_all_finite):
+        u = UMAP(metric=nan_dist)
+        if force_all_finite is None:
+            u.fit_transform(data)
+            u.fit(data)
+            u.transform(data)
+            u.update(data)
+            u.inverse_transform(inverse_data)
+        else:
+            u.fit_transform(data, force_all_finite=force_all_finite)
+            u.fit(data, force_all_finite=force_all_finite)
+            u.transform(data, force_all_finite=force_all_finite)
+            u.update(data, force_all_finite=force_all_finite)
+            u.inverse_transform(inverse_data)
+
+    # Check whether correct data input is accepted
+    call_umap_functions(all_finite_data, None)
+    call_umap_functions(all_finite_data, True)
+
+    call_umap_functions(nan_data, 'allow-nan')
+    call_umap_functions(all_finite_data, 'allow-nan')
+
+    call_umap_functions(inf_data, False)
+    call_umap_functions(inf_nan_data, False)
+    call_umap_functions(nan_data, False)
+    call_umap_functions(all_finite_data, False)
+
+    # Check whether illegal data raises a ValueError
+    with pytest.raises(ValueError):
+        call_umap_functions(nan_data, None)
+        call_umap_functions(inf_data, None)
+        call_umap_functions(inf_nan_data, None)
+
+        call_umap_functions(nan_data, True)
+        call_umap_functions(inf_data, True)
+        call_umap_functions(inf_nan_data, True)
+
+        call_umap_functions(inf_data, 'allow-nan')
+        call_umap_functions(inf_nan_data, 'allow-nan')


=====================================
umap/tests/test_spectral.py
=====================================
@@ -0,0 +1,51 @@
+from umap.spectral import spectral_layout, tswspectral_layout
+
+import numpy as np
+import pytest
+from scipy.version import full_version as scipy_full_version_
+from warnings import catch_warnings
+
+
+scipy_full_version = tuple(int(n) for n in scipy_full_version_.split("."))
+
+
+ at pytest.mark.skipif(
+    scipy_full_version < (1, 10),
+    reason="SciPy installing with Python 3.7 does not converge under same circumstances"
+)
+def test_tsw_spectral_init(iris):
+    # create an arbitrary (dense) random affinity matrix
+    seed = 42
+    rng = np.random.default_rng(seed=seed)
+    # matrix must be of sufficient size of lobpcg will refuse to work on it
+    n = 20
+    graph = rng.standard_normal(n * n).reshape((n, n)) ** 2
+    graph = graph.T * graph
+
+    spec = spectral_layout(None, graph, 2, random_state=seed ** 2)
+    tsw_spec = tswspectral_layout(None, graph, 2, random_state=seed ** 2, tol=1e-8)
+
+    # Make sure the two methods produce similar embeddings.
+    rmsd = np.mean(np.sum((spec - tsw_spec) ** 2, axis=1))
+    assert (
+        rmsd < 1e-6
+    ), "tsvd-warmed spectral init insufficiently close to standard spectral init"
+
+
+ at pytest.mark.skipif(
+    scipy_full_version < (1, 10),
+    reason="SciPy installing with Py 3.7 does not warn reliably on convergence failure"
+)
+def test_ensure_fallback_to_random_on_spectral_failure():
+    dim = 1000
+    k = 10
+    assert k >= 10
+    assert dim // 10 > k
+    y = np.eye(dim, k=1)
+    u = np.random.random((dim, dim // 10))
+    graph = y + y.T + u @ u.T
+    with pytest.warns(
+        UserWarning,
+        match="Spectral initialisation failed!"
+    ):
+        tswspectral_layout(u, graph, k, random_state=42, maxiter=2, method="lobpcg")


=====================================
umap/tests/test_umap_get_feature_names_out.py
=====================================
@@ -0,0 +1,53 @@
+import numpy as np
+from sklearn.datasets import make_classification
+from sklearn.pipeline import Pipeline, FeatureUnion
+
+from ..umap_ import UMAP
+
+
+def test_get_feature_names_out_passthrough():
+    umap = UMAP()
+    # get_feature_names_out should return same names if feature are passed in directly.
+    example_passthrough = ['feature1', 'feature2']
+    passthrough_result = umap.get_feature_names_out(feature_names_out=example_passthrough)
+    assert example_passthrough == passthrough_result
+
+
+def test_get_feature_names_out_default():
+    umap = UMAP()
+    # get_feature_names_out should generate feature names in a certain format if no names are passed.
+    default_result = umap.get_feature_names_out()
+    expected_default_result = ["umap_component_1", "umap_component_2"]
+    assert default_result == expected_default_result
+
+
+def test_get_feature_names_out_multicomponent():
+    # The output length should be equal to the number of components UMAP generates.
+    umap10 = UMAP(n_components=10)
+    result_umap10 = umap10.get_feature_names_out()
+    expected_umap10_result = [f"umap_component_{i+1}" for i in range(10)]
+    assert len(result_umap10) == 10
+    assert result_umap10 == expected_umap10_result
+
+
+def test_get_feature_names_out_featureunion():
+    X, _ = make_classification(n_samples=10)
+    pipeline = Pipeline(
+        [
+            (
+                "umap_pipeline",
+                FeatureUnion(
+                    [
+                        ("umap1", UMAP()),
+                        ("umap2", UMAP(n_components=3)),
+                    ]
+                ),
+            )
+        ]
+    )
+
+    pipeline.fit(X)
+    feature_names = pipeline.get_feature_names_out()
+    expected_feature_names = np.array(["umap1__umap_component_1", "umap1__umap_component_2", "umap2__umap_component_1",
+                                       "umap2__umap_component_2", "umap2__umap_component_3"])
+    np.testing.assert_array_equal(feature_names, expected_feature_names)


=====================================
umap/tests/test_umap_metrics.py
=====================================
@@ -5,10 +5,12 @@ import umap.sparse as spdist
 
 from sklearn.metrics import pairwise_distances
 from sklearn.neighbors import BallTree
-from scipy.version import full_version as scipy_full_version
+from scipy.version import full_version as scipy_full_version_
 import pytest
 
 
+scipy_full_version = tuple(int(n) for n in scipy_full_version_.split("."))
+
 
 # ===================================================
 #  Metrics Test cases
@@ -122,7 +124,7 @@ def sparse_spatial_check(metric, sparse_spatial_data):
     assert (
         metric in spdist.sparse_named_distances
     ), f"{metric} not supported for sparse data"
-    dist_matrix = pairwise_distances(sparse_spatial_data.todense(), metric=metric)
+    dist_matrix = pairwise_distances(np.asarray(sparse_spatial_data.todense()), metric=metric)
 
     if metric in ("braycurtis", "dice", "sokalsneath", "yule"):
         dist_matrix[np.where(~np.isfinite(dist_matrix))] = 0.0
@@ -141,7 +143,7 @@ def sparse_binary_check(metric, sparse_binary_data):
     assert (
         metric in spdist.sparse_named_distances
     ), f"{metric} not supported for sparse data"
-    dist_matrix = pairwise_distances(sparse_binary_data.todense(), metric=metric)
+    dist_matrix = pairwise_distances(np.asarray(sparse_binary_data.todense()), metric=metric)
     if metric in ("jaccard", "dice", "sokalsneath", "yule"):
         dist_matrix[np.where(~np.isfinite(dist_matrix))] = 0.0
 
@@ -212,6 +214,9 @@ def test_dice(binary_data, binary_distances):
     binary_check("dice", binary_data, binary_distances)
 
 
+ at pytest.mark.skipif(
+    scipy_full_version >= (1, 9), reason="deprecated in SciPy 1.9, removed in 1.11"
+)
 def test_kulsinski(binary_data, binary_distances):
     binary_check("kulsinski", binary_data, binary_distances)
 
@@ -294,6 +299,9 @@ def test_sparse_dice(sparse_binary_data):
     sparse_binary_check("dice", sparse_binary_data)
 
 
+ at pytest.mark.skipif(
+    scipy_full_version >= (1, 9), reason="deprecated in SciPy 1.9, removed in 1.11"
+)
 def test_sparse_kulsinski(sparse_binary_data):
     sparse_binary_check("kulsinski", sparse_binary_data)
 
@@ -336,7 +344,7 @@ def test_seuclidean(spatial_data):
     )
 
 @pytest.mark.skipif(
-    scipy_full_version < "1.8", reason="incorrect function in scipy<1.8"
+    scipy_full_version < (1, 8), reason="incorrect function in scipy<1.8"
 )
 def test_weighted_minkowski(spatial_data):
     v = np.abs(np.random.randn(spatial_data.shape[1]))
@@ -493,7 +501,7 @@ def test_grad_metrics_match_metrics(spatial_data, spatial_distances):
         err_msg="Distances don't match " "for metric seuclidean",
     )
 
-    if scipy_full_version >= "1.8":
+    if scipy_full_version >= (1, 8):
         # Weighted minkowski
         dist_matrix = pairwise_distances(spatial_data, metric="minkowski", w=v, p=3)
         test_matrix = np.array(


=====================================
umap/tests/test_umap_on_iris.py
=====================================
@@ -1,10 +1,13 @@
 from umap import UMAP
+from umap.umap_ import nearest_neighbors
 from scipy import sparse
 import numpy as np
 from sklearn.cluster import KMeans
 from sklearn.metrics import adjusted_rand_score
 from sklearn.neighbors import KDTree
 from scipy.spatial.distance import cdist, pdist, squareform
+import pytest
+import warnings
 
 try:
     # works for sklearn>=0.22
@@ -70,7 +73,7 @@ def test_umap_trustworthiness_on_sphere_iris(
         iris.data, projected_embedding, n_neighbors=10, metric="cosine"
     )
     assert (
-        trust >= 0.70
+        trust >= 0.65
     ), "Insufficiently trustworthy spherical embedding for iris dataset: {}".format(
         trust
     )
@@ -214,3 +217,69 @@ def test_umap_inverse_transform_on_iris(iris, iris_model):
             highd_centroid, k=10, return_distance=False
         )
         assert np.intersect1d(near_points, highd_near_points[0]).shape[0] >= 3
+
+
+def test_precomputed_knn_on_iris(iris, iris_selection, iris_subset_model):
+    # this to compare two similarity graphs which should be nearly the same
+    def rms(a, b):
+        return np.sqrt(np.mean(np.square(a - b)))
+
+    data = iris.data[iris_selection]
+    new_data = iris.data[~iris_selection]
+
+    knn = nearest_neighbors(
+        data,
+        n_neighbors=10,
+        metric="euclidean",
+        metric_kwds=None,
+        angular=False,
+        random_state=42,
+    )
+
+    # repeated UMAP arguments we don't want to mis-specify
+    umap_args = dict(
+        n_neighbors=iris_subset_model.n_neighbors,
+        random_state=iris_subset_model.random_state,
+        n_jobs=1,
+        min_dist=iris_subset_model.min_dist,
+    )
+
+    # force_approximation_algorithm parameter is ignored when a precomputed knn is used
+    fitter_with_precomputed_knn = UMAP(
+        **umap_args,
+        precomputed_knn=knn,
+        force_approximation_algorithm=False,
+    ).fit(data)
+
+    # embeddings and similarity graph are NOT the same due to choices of nearest
+    # neighbor in non-exact case: similarity graph is most stable for comparing output
+    # threshold for similarity in graph empirically chosen by comparing the iris subset
+    # model with force_approximation_algorithm=True and different random seeds
+    assert rms(fitter_with_precomputed_knn.graph_, iris_subset_model.graph_) < 0.005
+
+    with pytest.warns(Warning, match="transforming new data") as record:
+        fitter_ignoring_force_approx = UMAP(
+            **umap_args,
+            precomputed_knn=(knn[0], knn[1]),
+        ).fit(data)
+        assert len(record) >= 1
+    np.testing.assert_array_equal(
+        fitter_ignoring_force_approx.embedding_, fitter_with_precomputed_knn.embedding_
+    )
+
+    # #848 (continued): if you don't have a search index, attempting to transform
+    # will raise an error
+    with pytest.raises(NotImplementedError, match="search index"):
+        _ = fitter_ignoring_force_approx.transform(new_data)
+
+    # force_approximation_algorithm parameter is ignored
+    with pytest.warns(Warning, match="transforming new data") as record:
+        fitter_ignoring_force_approx_True = UMAP(
+            **umap_args,
+            precomputed_knn=(knn[0], knn[1]),
+            force_approximation_algorithm=True,
+        ).fit(data)
+        assert len(record) >= 1
+    np.testing.assert_array_equal(
+        fitter_ignoring_force_approx_True.embedding_, fitter_ignoring_force_approx.embedding_
+    )


=====================================
umap/tests/test_umap_ops.py
=====================================
@@ -47,7 +47,7 @@ def test_blobs_cluster():
 # Multi-components Layout
 def test_multi_component_layout():
     data, labels = make_blobs(
-        100, 2, centers=5, cluster_std=0.5, center_box=[-20, 20], random_state=42
+        100, 2, centers=5, cluster_std=0.5, center_box=(-20, 20), random_state=42
     )
 
     true_centroids = np.empty((labels.max() + 1, data.shape[1]), dtype=np.float64)
@@ -74,7 +74,7 @@ def test_multi_component_layout():
 # Multi-components Layout
 def test_multi_component_layout_precomputed():
     data, labels = make_blobs(
-        100, 2, centers=5, cluster_std=0.5, center_box=[-20, 20], random_state=42
+        100, 2, centers=5, cluster_std=0.5, center_box=(-20, 20), random_state=42
     )
     dmat = pairwise_distances(data)
 
@@ -259,7 +259,7 @@ def test_umap_update_large(
 
     error = np.sum(np.abs((new_model.graph_ - comparison_graph).data))
 
-    assert error < 3.0 # Higher error tolerance based on approx nearest neighbors
+    assert error < 3.0  # Higher error tolerance based on approx nearest neighbors
 
 
 # -----------------
@@ -292,7 +292,7 @@ def test_component_layout_options(nn_data):
         n_components,
         component_labels,
         2,
-        np.random,
+        None,
         metric="precomputed",
         metric_kwds={"linkage": "single"},
     )
@@ -301,7 +301,7 @@ def test_component_layout_options(nn_data):
         n_components,
         component_labels,
         2,
-        np.random,
+        None,
         metric="precomputed",
         metric_kwds={"linkage": "average"},
     )
@@ -310,7 +310,7 @@ def test_component_layout_options(nn_data):
         n_components,
         component_labels,
         2,
-        np.random,
+        None,
         metric="precomputed",
         metric_kwds={"linkage": "complete"},
     )


=====================================
umap/tests/test_umap_trustworthiness.py
=====================================
@@ -23,7 +23,7 @@ def test_umap_sparse_trustworthiness(sparse_test_data):
     embedding = UMAP(n_neighbors=10, n_epochs=100).fit_transform(sparse_test_data[:100])
     trust = trustworthiness(sparse_test_data[:100].toarray(), embedding, n_neighbors=10)
     assert (
-        trust >= 0.85
+        trust >= 0.88
     ), "Insufficiently trustworthy embedding for sparse test dataset: {}".format(trust)
 
 
@@ -49,7 +49,7 @@ def test_umap_trustworthiness_random_init(nn_data):
     ).fit_transform(data)
     trust = trustworthiness(data, embedding, n_neighbors=10)
     assert (
-        trust >= 0.75
+        trust >= 0.8
     ), "Insufficiently trustworthy embedding for" "nn dataset: {}".format(trust)
 
 


=====================================
umap/umap_.py
=====================================
@@ -14,6 +14,7 @@ from sklearn.utils.validation import check_is_fitted
 from sklearn.metrics import pairwise_distances
 from sklearn.preprocessing import normalize
 from sklearn.neighbors import KDTree
+from sklearn.decomposition import PCA, TruncatedSVD
 
 try:
     import joblib
@@ -37,7 +38,7 @@ from umap.utils import (
     csr_unique,
     fast_knn_indices,
 )
-from umap.spectral import spectral_layout
+from umap.spectral import spectral_layout, tswspectral_layout
 from umap.layouts import (
     optimize_layout_euclidean,
     optimize_layout_generic,
@@ -157,7 +158,7 @@ def smooth_knn_dist(distances, k, n_iter=64, local_connectivity=1.0, bandwidth=1
     Parameters
     ----------
     distances: array of shape (n_samples, n_neighbors)
-        Distances to nearest neighbors for each samples. Each row should be a
+        Distances to nearest neighbors for each sample. Each row should be a
         sorted list of distances to a given samples nearest neighbors.
 
     k: float
@@ -302,7 +303,7 @@ def nearest_neighbors(
         The distances to the ``n_neighbors`` closest points in the dataset.
 
     rp_forest: list of trees
-        The random projection forest used for searching (if used, None otherwise)
+        The random projection forest used for searching (if used, None otherwise).
     """
     if verbose:
         print(ts(), "Finding Nearest Neighbors")
@@ -384,10 +385,10 @@ def compute_membership_strengths(
         The local connectivity adjustment.
 
     return_dists: bool (optional, default False)
-        Whether to return the pairwise distance associated with each edge
+        Whether to return the pairwise distance associated with each edge.
 
     bipartite: bool (optional, default False)
-        Does the nearest neighbour set represent a bipartite graph?  That is are the
+        Does the nearest neighbour set represent a bipartite graph? That is, are the
         nearest neighbour indices from the same point set as the row indices?
 
     Returns
@@ -480,30 +481,31 @@ def fuzzy_simplicial_set(
         returns a float can be provided. For performance purposes it is
         required that this be a numba jit'd function. Valid string metrics
         include:
-            * euclidean (or l2)
-            * manhattan (or l1)
-            * cityblock
-            * braycurtis
-            * canberra
-            * chebyshev
-            * correlation
-            * cosine
-            * dice
-            * hamming
-            * jaccard
-            * kulsinski
-            * ll_dirichlet
-            * mahalanobis
-            * matching
-            * minkowski
-            * rogerstanimoto
-            * russellrao
-            * seuclidean
-            * sokalmichener
-            * sokalsneath
-            * sqeuclidean
-            * yule
-            * wminkowski
+
+        * euclidean (or l2)
+        * manhattan (or l1)
+        * cityblock
+        * braycurtis
+        * canberra
+        * chebyshev
+        * correlation
+        * cosine
+        * dice
+        * hamming
+        * jaccard
+        * kulsinski
+        * ll_dirichlet
+        * mahalanobis
+        * matching
+        * minkowski
+        * rogerstanimoto
+        * russellrao
+        * seuclidean
+        * sokalmichener
+        * sokalsneath
+        * sqeuclidean
+        * yule
+        * wminkowski
 
         Metrics that take arguments (such as minkowski, mahalanobis etc.)
         can have arguments passed via the metric_kwds dictionary. At this
@@ -657,7 +659,7 @@ def fast_intersection(rows, cols, values, target, unknown_dist=1.0, far_dist=5.0
     return
 
 
- at numba.jit()
+ at numba.njit()
 def fast_metric_intersection(
     rows, cols, values, discrete_space, metric, metric_args, scale
 ):
@@ -905,7 +907,7 @@ def make_epochs_per_sample(weights, n_epochs):
     Parameters
     ----------
     weights: array of shape (n_1_simplices)
-        The weights ofhow much we wish to sample each 1-simplex.
+        The weights of how much we wish to sample each 1-simplex.
 
     n_epochs: int
         The total number of epochs we want to train for.
@@ -916,10 +918,20 @@ def make_epochs_per_sample(weights, n_epochs):
     """
     result = -1.0 * np.ones(weights.shape[0], dtype=np.float64)
     n_samples = n_epochs * (weights / weights.max())
-    result[n_samples > 0] = float(n_epochs) / n_samples[n_samples > 0]
+    result[n_samples > 0] = float(n_epochs) / np.float64(n_samples[n_samples > 0])
     return result
 
 
+# scale coords so that the largest coordinate is max_coords, then add normal-distributed
+# noise with standard deviation noise
+def noisy_scale_coords(coords, random_state, max_coord=10.0, noise=0.0001):
+    expansion = max_coord / np.abs(coords).max()
+    coords = (coords * expansion).astype(np.float32)
+    return coords + random_state.normal(scale=noise, size=coords.shape).astype(
+        np.float32
+    )
+
+
 def simplicial_set_embedding(
     data,
     graph,
@@ -980,16 +992,21 @@ def simplicial_set_embedding(
         in greater repulsive force being applied, greater optimization
         cost, but slightly more accuracy.
 
-    n_epochs: int (optional, default 0)
+    n_epochs: int (optional, default 0), or list of int
         The number of training epochs to be used in optimizing the
         low dimensional embedding. Larger values result in more accurate
         embeddings. If 0 is specified a value will be selected based on
         the size of the input dataset (200 for large datasets, 500 for small).
+        If a list of int is specified, then the intermediate embeddings at the
+        different epochs specified in that list are returned in
+        ``aux_data["embedding_list"]``.
 
     init: string
         How to initialize the low dimensional embedding. Options are:
+
             * 'spectral': use a spectral embedding of the fuzzy 1-skeleton
             * 'random': assign initial embedding positions at random.
+            * 'pca': use the first n_components from PCA applied to the input data.
             * A numpy array of initial embedding positions.
 
     random_state: numpy RandomState or equivalent
@@ -1062,8 +1079,11 @@ def simplicial_set_embedding(
     if n_epochs is None:
         n_epochs = default_epochs
 
-    if n_epochs > 10:
-        graph.data[graph.data < (graph.data.max() / float(n_epochs))] = 0.0
+    # If n_epoch is a list, get the maximum epoch to reach
+    n_epochs_max = max(n_epochs) if isinstance(n_epochs, list) else n_epochs
+
+    if n_epochs_max > 10:
+        graph.data[graph.data < (graph.data.max() / float(n_epochs_max))] = 0.0
     else:
         graph.data[graph.data < (graph.data.max() / float(default_epochs))] = 0.0
 
@@ -1073,9 +1093,30 @@ def simplicial_set_embedding(
         embedding = random_state.uniform(
             low=-10.0, high=10.0, size=(graph.shape[0], n_components)
         ).astype(np.float32)
+    elif isinstance(init, str) and init == "pca":
+        if scipy.sparse.issparse(data):
+            pca = TruncatedSVD(n_components=n_components, random_state=random_state)
+        else:
+            pca = PCA(n_components=n_components, random_state=random_state)
+        embedding = pca.fit_transform(data).astype(np.float32)
+        embedding = noisy_scale_coords(
+            embedding, random_state, max_coord=10, noise=0.0001
+        )
     elif isinstance(init, str) and init == "spectral":
+        embedding = spectral_layout(
+            data,
+            graph,
+            n_components,
+            random_state,
+            metric=metric,
+            metric_kwds=metric_kwds,
+        )
         # We add a little noise to avoid local minima for optimization to come
-        initialisation = spectral_layout(
+        embedding = noisy_scale_coords(
+            embedding, random_state, max_coord=10, noise=0.0001
+        )
+    elif isinstance(init, str) and init == "tswspectral":
+        embedding = tswspectral_layout(
             data,
             graph,
             n_components,
@@ -1083,13 +1124,8 @@ def simplicial_set_embedding(
             metric=metric,
             metric_kwds=metric_kwds,
         )
-        expansion = 10.0 / np.abs(initialisation).max()
-        embedding = (initialisation * expansion).astype(
-            np.float32
-        ) + random_state.normal(
-            scale=0.0001, size=[graph.shape[0], n_components]
-        ).astype(
-            np.float32
+        embedding = noisy_scale_coords(
+            embedding, random_state, max_coord=10, noise=0.0001
         )
     else:
         init_data = np.array(init)
@@ -1104,7 +1140,7 @@ def simplicial_set_embedding(
             else:
                 embedding = init_data
 
-    epochs_per_sample = make_epochs_per_sample(graph.data, n_epochs)
+    epochs_per_sample = make_epochs_per_sample(graph.data, n_epochs_max)
 
     head = graph.row
     tail = graph.col
@@ -1195,6 +1231,11 @@ def simplicial_set_embedding(
             tqdm_kwds=tqdm_kwds,
             move_other=True,
         )
+
+    if isinstance(embedding, list):
+        aux_data["embedding_list"] = embedding
+        embedding = embedding[-1].copy()
+
     if output_dens:
         if verbose:
             print(ts() + " Computing embedding densities")
@@ -1288,7 +1329,7 @@ def init_transform(indices, weights, embedding):
 
 def init_graph_transform(graph, embedding):
     """Given a bipartite graph representing the 1-simplices and strengths between the
-     new points and the original data set along with an embedding of the original points
+    new points and the original data set along with an embedding of the original points
     initialize the positions of new points relative to the strengths (of their neighbors in the source data).
 
     If a point is in our original data set it embeds at the original points coordinates.
@@ -1298,7 +1339,7 @@ def init_graph_transform(graph, embedding):
     Parameters
     ----------
     graph: csr_matrix (n_new_samples, n_samples)
-        A matrix indicating the the 1-simplices and their associated strengths.  These strengths should
+        A matrix indicating the 1-simplices and their associated strengths.  These strengths should
         be values between zero and one and not normalized.  One indicating that the new point was identical
         to one of our original points.
 
@@ -1313,23 +1354,19 @@ def init_graph_transform(graph, embedding):
     result = np.zeros((graph.shape[0], embedding.shape[1]), dtype=np.float32)
 
     for row_index in range(graph.shape[0]):
-        num_neighbours = len(graph[row_index].indices)
-        if num_neighbours == 0:
+        graph_row = graph[row_index]
+        if graph_row.nnz == 0:
             result[row_index] = np.nan
             continue
-        row_sum = np.sum(graph[row_index])
-        for col_index in graph[row_index].indices:
-            if graph[row_index, col_index] == 1:
+        row_sum = graph_row.sum()
+        for graph_value, col_index in zip(graph_row.data, graph_row.indices):
+            if graph_value == 1:
                 result[row_index, :] = embedding[col_index, :]
                 break
-            for d in range(embedding.shape[1]):
-                result[row_index, d] += (
-                    graph[row_index, col_index] / row_sum * embedding[col_index, d]
-                )
+            result[row_index] += graph_value / row_sum * embedding[col_index]
 
     return result
 
-
 @numba.njit()
 def init_update(current_init, n_original_samples, indices):
     for i in range(n_original_samples, indices.shape[0]):
@@ -1390,29 +1427,31 @@ class UMAP(BaseEstimator):
         returns a float can be provided. For performance purposes it is
         required that this be a numba jit'd function. Valid string metrics
         include:
-            * euclidean
-            * manhattan
-            * chebyshev
-            * minkowski
-            * canberra
-            * braycurtis
-            * mahalanobis
-            * wminkowski
-            * seuclidean
-            * cosine
-            * correlation
-            * haversine
-            * hamming
-            * jaccard
-            * dice
-            * russelrao
-            * kulsinski
-            * ll_dirichlet
-            * hellinger
-            * rogerstanimoto
-            * sokalmichener
-            * sokalsneath
-            * yule
+
+        * euclidean
+        * manhattan
+        * chebyshev
+        * minkowski
+        * canberra
+        * braycurtis
+        * mahalanobis
+        * wminkowski
+        * seuclidean
+        * cosine
+        * correlation
+        * haversine
+        * hamming
+        * jaccard
+        * dice
+        * russelrao
+        * kulsinski
+        * ll_dirichlet
+        * hellinger
+        * rogerstanimoto
+        * sokalmichener
+        * sokalsneath
+        * yule
+
         Metrics that take arguments (such as minkowski, mahalanobis etc.)
         can have arguments passed via the metric_kwds dictionary. At this
         time care must be taken and dictionary elements must be ordered
@@ -1429,8 +1468,16 @@ class UMAP(BaseEstimator):
 
     init: string (optional, default 'spectral')
         How to initialize the low dimensional embedding. Options are:
+
             * 'spectral': use a spectral embedding of the fuzzy 1-skeleton
             * 'random': assign initial embedding positions at random.
+            * 'pca': use the first n_components from PCA applied to the
+                input data.
+            * 'tswspectral': use a spectral embedding of the fuzzy
+                1-skeleton, using a truncated singular value decomposition to
+                "warm" up the eigensolver. This is intended as an alternative
+                to the 'spectral' method, if that takes an  excessively long
+                time to complete initialization (or fails to complete).
             * A numpy array of initial embedding positions.
 
     min_dist: float (optional, default 0.1)
@@ -1478,7 +1525,7 @@ class UMAP(BaseEstimator):
         cost, but slightly more accuracy.
 
     transform_queue_size: float (optional, default 4.0)
-        For transform operations (embedding new points using a trained model_
+        For transform operations (embedding new points using a trained model
         this will control how aggressively to search for nearest neighbors.
         Larger values will result in slower performance but more accurate
         nearest neighbor evaluation.
@@ -1505,12 +1552,12 @@ class UMAP(BaseEstimator):
     angular_rp_forest: bool (optional, default False)
         Whether to use an angular random projection forest to initialise
         the approximate nearest neighbor search. This can be faster, but is
-        mostly on useful for metric that use an angular style distance such
+        mostly only useful for a metric that uses an angular style distance such
         as cosine, correlation etc. In the case of those metrics angular forests
         will be chosen automatically.
 
     target_n_neighbors: int (optional, default -1)
-        The number of nearest neighbors to use to construct the target simplcial
+        The number of nearest neighbors to use to construct the target simplicial
         set. If set to -1 use the ``n_neighbors`` value.
 
     target_metric: string or callable (optional, default 'categorical')
@@ -1544,7 +1591,7 @@ class UMAP(BaseEstimator):
 
     unique: bool (optional, default False)
         Controls if the rows of your data should be uniqued before being
-        embedded.  If you have more duplicates than you have n_neighbour
+        embedded.  If you have more duplicates than you have ``n_neighbors``
         you can have the identical data points lying in different regions of
         your space.  It also violates the definition of a metric.
         For to map from internal structures back to your data use the variable
@@ -1596,7 +1643,15 @@ class UMAP(BaseEstimator):
         neighbors in the precomputed_knn must be greater or equal to the
         n_neighbors parameter. This should be a tuple containing the output
         of the nearest_neighbors() function or attributes from a previously fit
-        UMAP object; (knn_indices, knn_dists,knn_search_index).
+        UMAP object; (knn_indices, knn_dists, knn_search_index). If you wish to use
+        k-nearest neighbors data calculated by another package then provide a tuple of
+        the form (knn_indices, knn_dists). The contents of the tuple should be two numpy
+        arrays of shape (N, n_neighbors) where N is the number of items in the
+        input data. The first array should be the integer indices of the nearest
+        neighbors, and the second array should be the corresponding distances. The
+        nearest neighbor of each item should be itself, e.g. the nearest neighbor of
+        item 0 should be 0, the nearest neighbor of item 1 is 1 and so on. Please note
+        that you will *not* be able to transform new data in this case.
     """
 
     def __init__(
@@ -1698,10 +1753,15 @@ class UMAP(BaseEstimator):
         if not isinstance(self.init, str) and not isinstance(self.init, np.ndarray):
             raise ValueError("init must be a string or ndarray")
         if isinstance(self.init, str) and self.init not in (
+            "pca",
             "spectral",
             "random",
+            "tswspectral",
         ):
-            raise ValueError('string init values must be "spectral" or "random"')
+            raise ValueError(
+                'string init values must be one of: "pca", "tswspectral",'
+                ' "spectral" or "random"'
+            )
         if (
             isinstance(self.init, np.ndarray)
             and self.init.shape[1] != self.n_components
@@ -1730,10 +1790,27 @@ class UMAP(BaseEstimator):
                 raise ValueError("n_components must be an int")
         if self.n_components < 1:
             raise ValueError("n_components must be greater than 0")
-        if self.n_epochs is not None and (
+        self.n_epochs_list = None
+        if (
+            isinstance(self.n_epochs, list)
+            or isinstance(self.n_epochs, tuple)
+            or isinstance(self.n_epochs, np.ndarray)
+        ):
+            if not issubclass(
+                np.array(self.n_epochs).dtype.type, np.integer
+            ) or not np.all(np.array(self.n_epochs) >= 0):
+                raise ValueError(
+                    "n_epochs must be a nonnegative integer "
+                    "or a list of nonnegative integers"
+                )
+            self.n_epochs_list = list(self.n_epochs)
+        elif self.n_epochs is not None and (
             self.n_epochs < 0 or not isinstance(self.n_epochs, int)
         ):
-            raise ValueError("n_epochs must be a nonnegative integer")
+            raise ValueError(
+                "n_epochs must be a nonnegative integer "
+                "or a list of nonnegative integers"
+            )
         if self.metric_kwds is None:
             self._metric_kwds = {}
         else:
@@ -1862,6 +1939,8 @@ class UMAP(BaseEstimator):
 
         if self.n_jobs < -1 or self.n_jobs == 0:
             raise ValueError("n_jobs must be a postive integer, or -1 (for all cores)")
+        if self.n_jobs != 1 and self.random_state is not None:
+            warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.") 
 
         if self.dens_lambda < 0.0:
             raise ValueError("dens_lambda cannot be negative")
@@ -1926,10 +2005,12 @@ class UMAP(BaseEstimator):
                     "precomputed_knn[0] and precomputed_knn[1]"
                     " must be numpy arrays of the same size."
                 )
+            # #848: warn but proceed if no search index is present
             if not isinstance(self.knn_search_index, NNDescent):
-                raise ValueError(
-                    "precomputed_knn[2] (knn_search_index)"
-                    " must be an NNDescent object."
+                warn(
+                    "precomputed_knn[2] (knn_search_index) "
+                    "is not an NNDescent object: transforming new data with transform "
+                    "will be unavailable."
                 )
             if self.knn_dists.shape[1] < self.n_neighbors:
                 warn(
@@ -1953,11 +2034,9 @@ class UMAP(BaseEstimator):
                 self.knn_dists.shape[0] < 4096
                 and not self.force_approximation_algorithm
             ):
-                warn(
-                    "precomputed_knn is meant for large datasets. Since your"
-                    " data is small, precomputed_knn will be ignored and the"
-                    " k-nn will be computed normally."
-                )
+                # force_approximation_algorithm is irrelevant for pre-computed knn
+                # always set it to True which keeps downstream code paths working
+                self.force_approximation_algorithm = True
             elif self.knn_dists.shape[1] > self.n_neighbors:
                 # if k for precomputed_knn larger than n_neighbors we simply prune it
                 self.knn_indices = self.knn_indices[:, : self.n_neighbors]
@@ -2246,7 +2325,7 @@ class UMAP(BaseEstimator):
 
         return result
 
-    def fit(self, X, y=None):
+    def fit(self, X, y=None, force_all_finite=True):
         """Fit X into an embedded space.
 
         Optionally use y for supervised dimension reduction.
@@ -2264,9 +2343,15 @@ class UMAP(BaseEstimator):
             handled is determined by parameters UMAP was instantiated with.
             The relevant attributes are ``target_metric`` and
             ``target_metric_kwds``.
+
+        force_all_finite : Whether to raise an error on np.inf, np.nan, pd.NA in array.
+            The possibilities are: - True: Force all values of array to be finite.
+                                   - False: accepts np.inf, np.nan, pd.NA in array.
+                                   - 'allow-nan': accepts only np.nan and pd.NA values in array.
+                                     Values cannot be infinite.
         """
 
-        X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C")
+        X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C", force_all_finite=force_all_finite)
         self._raw_data = X
 
         # Handle all the optional arguments, setting default
@@ -2277,7 +2362,7 @@ class UMAP(BaseEstimator):
             self._b = self.b
 
         if isinstance(self.init, np.ndarray):
-            init = check_array(self.init, dtype=np.float32, accept_sparse=False)
+            init = check_array(self.init, dtype=np.float32, accept_sparse=False, force_all_finite=force_all_finite)
         else:
             init = self.init
 
@@ -2285,7 +2370,11 @@ class UMAP(BaseEstimator):
 
         self.knn_indices = self.precomputed_knn[0]
         self.knn_dists = self.precomputed_knn[1]
-        self.knn_search_index = self.precomputed_knn[2]
+        # #848: allow precomputed knn to not have a search index
+        if len(self.precomputed_knn) == 2:
+            self.knn_search_index = None
+        else:
+            self.knn_search_index = self.precomputed_knn[2]
 
         self._validate_parameters()
 
@@ -2375,9 +2464,9 @@ class UMAP(BaseEstimator):
                 raise ValueError("Non-zero distances from samples to themselves!")
             if self.knn_dists is None:
                 self._knn_indices = np.zeros(
-                    (X.shape[0], self.n_neighbors), dtype=np.int
+                    (X.shape[0], self.n_neighbors), dtype=int
                 )
-                self._knn_dists = np.zeros(self._knn_indices.shape, dtype=np.float)
+                self._knn_dists = np.zeros(self._knn_indices.shape, dtype=float)
                 for row_id in range(X.shape[0]):
                     # Find KNNs row-by-row
                     row_data = X[row_id].data
@@ -2449,18 +2538,21 @@ class UMAP(BaseEstimator):
                             X[index].toarray(),
                             metric=_m,
                             kwds=self._metric_kwds,
+                            force_all_finite=force_all_finite
                         )
                     else:
                         dmat = dist.pairwise_special_metric(
                             X[index],
                             metric=self._input_distance_func,
                             kwds=self._metric_kwds,
+                            force_all_finite=force_all_finite
                         )
                 else:
                     dmat = dist.pairwise_special_metric(
                         X[index],
                         metric=self._input_distance_func,
                         kwds=self._metric_kwds,
+                        force_all_finite=force_all_finite
                     )
             # set any values greater than disconnection_distance to be np.inf.
             # This will have no effect when _disconnection_distance is not set since it defaults to np.inf.
@@ -2486,7 +2578,7 @@ class UMAP(BaseEstimator):
                 self.verbose,
                 self.densmap or self.output_dens,
             )
-            # Report the number of vertices with degree 0 in our our umap.graph_
+            # Report the number of vertices with degree 0 in our umap.graph_
             # This ensures that they were properly disconnected.
             vertices_disconnected = np.sum(
                 np.array(self.graph_.sum(axis=1)).flatten() == 0
@@ -2555,7 +2647,7 @@ class UMAP(BaseEstimator):
                 self.verbose,
                 self.densmap or self.output_dens,
             )
-            # Report the number of vertices with degree 0 in our our umap.graph_
+            # Report the number of vertices with degree 0 in our umap.graph_
             # This ensures that they were properly disconnected.
             vertices_disconnected = np.sum(
                 np.array(self.graph_.sum(axis=1)).flatten() == 0
@@ -2581,7 +2673,7 @@ class UMAP(BaseEstimator):
             if self.target_metric == "string":
                 y_ = y[index]
             else:
-                y_ = check_array(y, ensure_2d=False)[index]
+                y_ = check_array(y, ensure_2d=False, force_all_finite=force_all_finite)[index]
             if self.target_metric == "categorical":
                 if self.target_weight < 1.0:
                     far_dist = 2.5 * (1.0 / (1.0 - self.target_weight))
@@ -2631,6 +2723,7 @@ class UMAP(BaseEstimator):
                             y_,
                             metric=self.target_metric,
                             kwds=self._target_metric_kwds,
+                            force_all_finite=force_all_finite
                         )
 
                     (target_graph, target_sigmas, target_rhos,) = fuzzy_simplicial_set(
@@ -2681,12 +2774,28 @@ class UMAP(BaseEstimator):
             print(ts(), "Construct embedding")
 
         if self.transform_mode == "embedding":
+            epochs = (
+                self.n_epochs_list if self.n_epochs_list is not None else self.n_epochs
+            )
             self.embedding_, aux_data = self._fit_embed_data(
                 self._raw_data[index],
-                self.n_epochs,
+                epochs,
                 init,
                 random_state,  # JH why raw data?
             )
+
+            if self.n_epochs_list is not None:
+                if "embedding_list" not in aux_data:
+                    raise KeyError(
+                        "No list of embedding were found in 'aux_data'. "
+                        "It is likely the layout optimization function "
+                        "doesn't support the list of int for 'n_epochs'."
+                    )
+                else:
+                    self.embedding_list_ = [
+                        e[inverse] for e in aux_data["embedding_list"]
+                    ]
+
             # Assign any points that are fully disconnected from our manifold(s) to have embedding
             # coordinates of np.nan.  These will be filtered by our plotting functions automatically.
             # They also prevent users from being deceived a distance query to one of these points.
@@ -2739,7 +2848,7 @@ class UMAP(BaseEstimator):
             tqdm_kwds=self.tqdm_kwds,
         )
 
-    def fit_transform(self, X, y=None):
+    def fit_transform(self, X, y=None, force_all_finite=True):
         """Fit X into an embedded space and return that transformed
         output.
 
@@ -2755,6 +2864,12 @@ class UMAP(BaseEstimator):
             The relevant attributes are ``target_metric`` and
             ``target_metric_kwds``.
 
+        force_all_finite : Whether to raise an error on np.inf, np.nan, pd.NA in array.
+            The possibilities are: - True: Force all values of array to be finite.
+                                   - False: accepts np.inf, np.nan, pd.NA in array.
+                                   - 'allow-nan': accepts only np.nan and pd.NA values in array.
+                                     Values cannot be infinite.
+
         Returns
         -------
         X_new : array, shape (n_samples, n_components)
@@ -2769,7 +2884,7 @@ class UMAP(BaseEstimator):
         r_emb: array, shape (n_samples)
             Local radii of data points in the embedding (log-transformed).
         """
-        self.fit(X, y)
+        self.fit(X, y, force_all_finite)
         if self.transform_mode == "embedding":
             if self.output_dens:
                 return self.embedding_, self.rad_orig_, self.rad_emb_
@@ -2784,7 +2899,7 @@ class UMAP(BaseEstimator):
                 )
             )
 
-    def transform(self, X):
+    def transform(self, X, force_all_finite=True):
         """Transform X into the existing embedded space and return that
         transformed output.
 
@@ -2793,6 +2908,12 @@ class UMAP(BaseEstimator):
         X : array, shape (n_samples, n_features)
             New data to be transformed.
 
+        force_all_finite : Whether to raise an error on np.inf, np.nan, pd.NA in array.
+            The possibilities are: - True: Force all values of array to be finite.
+                                   - False: accepts np.inf, np.nan, pd.NA in array.
+                                   - 'allow-nan': accepts only np.nan and pd.NA values in array.
+                                     Values cannot be infinite.
+
         Returns
         -------
         X_new : array, shape (n_samples, n_components)
@@ -2804,7 +2925,7 @@ class UMAP(BaseEstimator):
                 "Transform unavailable when model was fit with only a single data sample."
             )
         # If we just have the original input then short circuit things
-        X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C")
+        X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C", force_all_finite=force_all_finite)
         x_hash = joblib.hash(X)
         if x_hash == self._input_hash:
             if self.transform_mode == "embedding":
@@ -2822,6 +2943,14 @@ class UMAP(BaseEstimator):
                 "Transforming data into an existing embedding not supported for densMAP."
             )
 
+        # #848: knn_search_index is allowed to be None if not transforming new data,
+        # so now we must validate that if it exists it is not None
+        if hasattr(self, "_knn_search_index") and self._knn_search_index is None:
+            raise NotImplementedError(
+                "No search index available: transforming data"
+                " into an existing embedding is not supported"
+            )
+
         # X = check_array(X, dtype=np.float32, order="C", accept_sparse="csr")
         random_state = check_random_state(self.transform_seed)
         rng_state = random_state.randint(INT32_MIN, INT32_MAX, 3).astype(np.int64)
@@ -2871,6 +3000,7 @@ class UMAP(BaseEstimator):
                             self._raw_data.toarray(),
                             metric=_m,
                             kwds=self._metric_kwds,
+                            force_all_finite=force_all_finite
                         )
                     else:
                         dmat = dist.pairwise_special_metric(
@@ -2878,6 +3008,7 @@ class UMAP(BaseEstimator):
                             self._raw_data,
                             metric=self._input_distance_func,
                             kwds=self._metric_kwds,
+                            force_all_finite=force_all_finite
                         )
                 else:
                     dmat = dist.pairwise_special_metric(
@@ -2885,6 +3016,7 @@ class UMAP(BaseEstimator):
                         self._raw_data,
                         metric=self._input_distance_func,
                         kwds=self._metric_kwds,
+                        force_all_finite=force_all_finite
                     )
             indices = np.argpartition(dmat, self._n_neighbors)[:, : self._n_neighbors]
             dmat_shortened = submatrix(dmat, indices, self._n_neighbors)
@@ -3004,7 +3136,6 @@ class UMAP(BaseEstimator):
         ----------
         X : array, shape (n_samples, n_components)
             New points to be inverse transformed.
-
         Returns
         -------
         X_new : array, shape (n_samples, n_features)
@@ -3164,8 +3295,8 @@ class UMAP(BaseEstimator):
 
         return inv_transformed_points
 
-    def update(self, X):
-        X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C")
+    def update(self, X, force_all_finite=True):
+        X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C", force_all_finite=force_all_finite)
         random_state = check_random_state(self.transform_seed)
         rng_state = random_state.randint(INT32_MIN, INT32_MAX, 3).astype(np.int64)
 
@@ -3203,18 +3334,21 @@ class UMAP(BaseEstimator):
                                 self._raw_data.toarray(),
                                 metric=_m,
                                 kwds=self._metric_kwds,
+                                force_all_finite=force_all_finite
                             )
                         else:
                             dmat = dist.pairwise_special_metric(
                                 self._raw_data,
                                 metric=self._input_distance_func,
                                 kwds=self._metric_kwds,
+                                force_all_finite=force_all_finite
                             )
                     else:
                         dmat = dist.pairwise_special_metric(
                             self._raw_data,
                             metric=self._input_distance_func,
                             kwds=self._metric_kwds,
+                            force_all_finite=force_all_finite
                         )
                 self.graph_, self._sigmas, self._rhos = fuzzy_simplicial_set(
                     dmat,
@@ -3381,6 +3515,17 @@ class UMAP(BaseEstimator):
             self.rad_orig_ = aux_data["rad_orig"]
             self.rad_emb_ = aux_data["rad_emb"]
 
+    def get_feature_names_out(self, feature_names_out=None):
+        """
+        Defines descriptive names for each output of the (fitted) estimator.
+        :param feature_names_out: Optional passthrough for feature names.
+        By default, feature names will be generated automatically.
+        :return: List of descriptive names for each output variable from the fitted estimator.
+        """
+        if feature_names_out is None:
+            feature_names_out = [f"umap_component_{i+1}" for i in range(self.n_components)]
+        return feature_names_out
+
     def __repr__(self):
         from sklearn.utils._pprint import _EstimatorPrettyPrinter
         import re


=====================================
umap/utils.py
=====================================
@@ -145,7 +145,7 @@ def csr_unique(matrix, return_index=True, return_inverse=True, return_counts=Tru
     return_index = bool, optional
         If true, return the row indices of 'matrix'
     return_inverse: bool, optional
-        If true, return the the indices of the unique array that can be
+        If true, return the indices of the unique array that can be
            used to reconstruct 'matrix'.
     return_counts = bool, optional
         If true, returns the number of times each unique item appears in 'matrix'
@@ -156,7 +156,7 @@ def csr_unique(matrix, return_index=True, return_inverse=True, return_counts=Tru
     unique_matrix[inverse]
     """
     lil_matrix = matrix.tolil()
-    rows = [x + y for x, y in zip(lil_matrix.rows, lil_matrix.data)]
+    rows = np.asarray([tuple(x + y) for x, y in zip(lil_matrix.rows, lil_matrix.data)], dtype=object)
     return_values = return_counts + return_inverse + return_index
     return np.unique(
         rows,



View it on GitLab: https://salsa.debian.org/med-team/umap-learn/-/commit/f8f44a7df1a56fecf901df495f274f06b861cd4d

-- 
View it on GitLab: https://salsa.debian.org/med-team/umap-learn/-/commit/f8f44a7df1a56fecf901df495f274f06b861cd4d
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20231109/a6bf9658/attachment-0001.htm>


More information about the debian-med-commit mailing list