[med-svn] [Git][med-team/q2-sample-classifier][master] 6 commits: New upstream version 2022.2.0

Wed Jul 20 15:46:55 BST 2022


Andreas Tille pushed to branch master at Debian Med / q2-sample-classifier


Commits:
b16d3b7d by Andreas Tille at 2022-07-20T16:39:46+02:00
New upstream version 2022.2.0
- - - - -
0b7fc103 by Andreas Tille at 2022-07-20T16:39:46+02:00
routine-update: New upstream version

- - - - -
81341a3b by Andreas Tille at 2022-07-20T16:39:47+02:00
Update upstream source from tag 'upstream/2022.2.0'

Update to upstream version '2022.2.0'
with Debian dir 14f79d343db0c79a1fbd8355f92017732dfd16a0
- - - - -
8f068cc0 by Andreas Tille at 2022-07-20T16:39:47+02:00
routine-update: Standards-Version: 4.6.1

- - - - -
da0a62e7 by Andreas Tille at 2022-07-20T16:42:15+02:00
Bump versioned (Build-)Depends of q2-* packages to 2022.2.0

- - - - -
ffdb8a20 by Andreas Tille at 2022-07-20T16:46:27+02:00
Lots of failures in autopkgtest

- - - - -


22 changed files:

- LICENSE
- ci/recipe/meta.yaml
- debian/changelog
- debian/control
- q2_sample_classifier/__init__.py
- q2_sample_classifier/_format.py
- q2_sample_classifier/_transformer.py
- q2_sample_classifier/_type.py
- q2_sample_classifier/_version.py
- q2_sample_classifier/classify.py
- q2_sample_classifier/plugin_setup.py
- q2_sample_classifier/tests/__init__.py
- q2_sample_classifier/tests/test_actions.py
- q2_sample_classifier/tests/test_base_class.py
- q2_sample_classifier/tests/test_classifier.py
- q2_sample_classifier/tests/test_estimators.py
- q2_sample_classifier/tests/test_types_formats_transformers.py
- q2_sample_classifier/tests/test_utilities.py
- q2_sample_classifier/tests/test_visualization.py
- q2_sample_classifier/utilities.py
- q2_sample_classifier/visuals.py
- setup.py


Changes:

=====================================
LICENSE
=====================================
@@ -1,6 +1,6 @@
 BSD 3-Clause License
 
-Copyright (c) 2017-2021, QIIME 2 development team.
+Copyright (c) 2017-2022, QIIME 2 development team.
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without


=====================================
ci/recipe/meta.yaml
=====================================
@@ -22,7 +22,7 @@ requirements:
     - scipy
     - numpy
     - joblib
-    - scikit-learn >=0.22.1
+    - scikit-learn {{ scikit_learn }}
     - scikit-bio
     - seaborn >=0.8
     - fastcluster


=====================================
debian/changelog
=====================================
@@ -1,3 +1,13 @@
+q2-sample-classifier (2022.2.0-1) UNRELEASED; urgency=medium
+
+  * Team upload.
+  * New upstream version
+  * Standards-Version: 4.6.1 (routine-update)
+  * Bump versioned (Build-)Depends of q2-* packages to 2022.2.0
+  TODO: Lots of failures in autopkgtest
+
+ -- Andreas Tille <tille at debian.org>  Wed, 20 Jul 2022 16:39:46 +0200
+
 q2-sample-classifier (2021.8.0-1) unstable; urgency=medium
 
   * Team upload.


=====================================
debian/control
=====================================
@@ -6,11 +6,11 @@ Section: science
 Priority: optional
 Build-Depends: debhelper-compat (= 13),
                dh-python,
-               qiime (>= 2021.8.0),
+               qiime (>= 2022.2.0),
                python3-all,
                python3-setuptools,
                python3-pytest <!nocheck>
-Standards-Version: 4.6.0
+Standards-Version: 4.6.1
 Vcs-Browser: https://salsa.debian.org/med-team/q2-sample-classifier
 Vcs-Git: https://salsa.debian.org/med-team/q2-sample-classifier.git
 Homepage: https://qiime2.org
@@ -21,10 +21,10 @@ Architecture: all
 Depends: ${shlibs:Depends},
          ${misc:Depends},
          ${python3:Depends},
-         qiime (>= 2021.8.0),
+         qiime (>= 2022.2.0),
          python3-distutils,
-         q2-types (>= 2021.8.0),
-         q2-feature-table (>= 2021.8.0)
+         q2-types (>= 2022.2.0),
+         q2-feature-table (>= 2022.2.0)
 Description: QIIME 2 plugin for machine learning prediction of sample data
  QIIME 2 is a powerful, extensible, and decentralized microbiome analysis
  package with a focus on data and analysis transparency. QIIME 2 enables


=====================================
q2_sample_classifier/__init__.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
q2_sample_classifier/_format.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
q2_sample_classifier/_transformer.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
q2_sample_classifier/_type.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
q2_sample_classifier/_version.py
=====================================
@@ -23,9 +23,9 @@ def get_keywords():
     # setup.py/versioneer.py will grep for the variable names, so they must
     # each be defined on a line of their own. _version.py will just call
     # get_keywords().
-    git_refnames = " (tag: 2021.8.0)"
-    git_full = "916eb0799fa2c95a04766b44c22335c6a097ce13"
-    git_date = "2021-09-09 18:35:33 +0000"
+    git_refnames = " (tag: 2022.2.0)"
+    git_full = "5056003cb20259dfc4dd046c9043f0d89740bc95"
+    git_date = "2022-02-18 18:51:19 +0000"
     keywords = {"refnames": git_refnames, "full": git_full, "date": git_date}
     return keywords
 


=====================================
q2_sample_classifier/classify.py
=====================================
@@ -1,16 +1,17 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #
 # The full license is in the file LICENSE, distributed with this software.
 # ----------------------------------------------------------------------------
 
-import collections
-
+import numpy as np
 from sklearn.ensemble import IsolationForest
 from sklearn.metrics import mean_squared_error, accuracy_score
 from sklearn.feature_extraction import DictVectorizer
+from sklearn.model_selection import KFold
+from sklearn.neighbors import KNeighborsClassifier
 from sklearn.pipeline import Pipeline
 
 import qiime2
@@ -102,42 +103,66 @@ def metatable(ctx,
     return metatab
 
 
-def classify_samples_from_dist(ctx, distance_matrix, metadata, k=1,
-                               palette=defaults['palette']):
-    ''' Returns knn classifier results from a distance matrix.'''
-    distance_matrix = distance_matrix.view(skbio.DistanceMatrix)
-    predictions = []
-    metadata_series = metadata.to_series()
-    for i, row in enumerate(distance_matrix):
-        dists = []
-        categories = []
-        for j, dist in enumerate(row):
-            if j == i:
-                continue  # exclude self
-            dists.append(dist)
-            categories.append(metadata_series[distance_matrix.ids[j]])
-
-        # k-long series of (category: dist) ordered small -> large
-        nn_categories = pd.Series(dists, index=categories).nsmallest(k)
-        counter = collections.Counter(nn_categories.index)
-        max_counts = max(counter.values())
-        # in order of closeness, pick a category that is or shares
-        # max_counts
-        for category in nn_categories.index:
-            if counter[category] == max_counts:
-                predictions.append(category)
-                break
-
-    predictions = pd.Series(predictions, index=distance_matrix.ids)
-    predictions.index.name = 'SampleID'
-    pred = qiime2.Artifact.import_data(
-        'SampleData[ClassifierPredictions]', predictions)
+def _fit_predict_knn_cv(
+        x: pd.DataFrame, y: pd.Series, k: int, cv: int,
+        random_state: int, n_jobs: int
+) -> (pd.Series, pd.Series):
+    kf = KFold(n_splits=cv, shuffle=True, random_state=random_state)
+
+    # train and test with CV
+    predictions, pred_ids, truth = [], [], []
+    for train_index, test_index in kf.split(x):
+        x_train, x_test = x.iloc[train_index, train_index], \
+                          x.iloc[test_index, train_index]
+        y_train, y_test = y[train_index], y[test_index]
+
+        knn = KNeighborsClassifier(
+            n_neighbors=k, metric='precomputed', n_jobs=n_jobs
+        )
+        knn.fit(x_train, y_train)
+
+        # gather predictions for the confusion matrix
+        predictions.append(knn.predict(x_test))
+        pred_ids.extend(x_test.index.tolist())
+        truth.append(y_test)
+
+    predictions = pd.Series(
+        np.concatenate(predictions).ravel(),
+        index=pd.Index(pred_ids, name='SampleID')
+    )
+    truth = pd.concat(truth)
+    truth.index.name = 'SampleID'
+
+    return predictions, truth
+
+
+def classify_samples_from_dist(
+        ctx, distance_matrix, metadata, k=1, cv=defaults['cv'],
+        random_state=None, n_jobs=defaults['n_jobs'],
+        palette=defaults['palette']
+):
+    """ Trains and evaluates a KNN classifier from a distance matrix
+        using cross-validation."""
+    distance_matrix = distance_matrix \
+        .view(skbio.DistanceMatrix) \
+        .to_data_frame()
+    # reorder (required for splitting into train/test)
+    metadata_ser = metadata.to_series()[distance_matrix.index]
+
+    predictions, truth = _fit_predict_knn_cv(
+        distance_matrix, metadata_ser, k, cv, random_state, n_jobs
+    )
+    predictions = qiime2.Artifact.import_data(
+        'SampleData[ClassifierPredictions]', predictions
+    )
+    truth = qiime2.CategoricalMetadataColumn(truth)
 
     confusion = ctx.get_action('sample_classifier', 'confusion_matrix')
     accuracy_results, = confusion(
-        pred, metadata, missing_samples='ignore', palette=palette)
+        predictions, truth, missing_samples='ignore', palette=palette
+    )
 
-    return pred, accuracy_results
+    return predictions, accuracy_results
 
 
 def classify_samples(ctx,


=====================================
q2_sample_classifier/plugin_setup.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #
@@ -255,6 +255,9 @@ plugin.pipelines.register_function(
     parameters={
         'metadata': MetadataColumn[Categorical],
         'k': Int,
+        'cv': parameters['cv']['cv'],
+        'random_state': parameters['base']['random_state'],
+        'n_jobs': parameters['base']['n_jobs'],
         'palette': Str % Choices(_custom_palettes().keys()),
     },
     outputs=[
@@ -265,6 +268,9 @@ plugin.pipelines.register_function(
     parameter_descriptions={
         'metadata': 'Categorical metadata column to use as prediction target.',
         'k': 'Number of nearest neighbors',
+        'cv': parameter_descriptions['cv']['cv'],
+        'random_state': parameter_descriptions['base']['random_state'],
+        'n_jobs': parameter_descriptions['base']['n_jobs'],
         'palette': 'The color palette to use for plotting.',
     },
     output_descriptions={


=====================================
q2_sample_classifier/tests/__init__.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
q2_sample_classifier/tests/test_actions.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
q2_sample_classifier/tests/test_base_class.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
q2_sample_classifier/tests/test_classifier.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #
@@ -8,6 +8,7 @@
 from warnings import filterwarnings
 import pandas as pd
 import numpy as np
+import skbio
 from sklearn.ensemble import RandomForestClassifier
 from sklearn.feature_selection import RFECV
 import pandas.testing as pdt
@@ -96,6 +97,12 @@ class TestBinaryClassification(SampleClassifierTestPluginBase):
             sample_ids=[c for c in 'abcdef'])
         self.tab = qiime2.Artifact.import_data('FeatureTable[Frequency]', tab)
 
+        dist = skbio.DistanceMatrix.from_iterable(
+            iterable=[1, 16, 2, 1, 16, 17],
+            metric=lambda x, y: abs(y-x), keys=[c for c in 'abcdef']
+        )
+        self.dist = qiime2.Artifact.import_data('DistanceMatrix', dist)
+
     # we will make sure predictions are correct, but no need to validate
     # other outputs, which are tested elsewhere.
     def test_classify_samples_binary(self):
@@ -115,6 +122,16 @@ class TestBinaryClassification(SampleClassifierTestPluginBase):
                         index=pd.Index([i for i in 'aebdcf'], name='id'))
         pdt.assert_series_equal(exp, res[0].view(pd.Series))
 
+    def test_classify_samples_dist_binary(self):
+        res = sample_classifier.actions.classify_samples_from_dist(
+            distance_matrix=self.dist, metadata=self.md, k=2, cv=3,
+            n_jobs=1, random_state=123)
+        exp = pd.Series([c for c in 'abaaaa'], name='0',
+                        index=pd.Index([i for i in 'abcdef'], name='id'))
+        pdt.assert_series_equal(
+            exp.sort_index(), res[0].view(pd.Series).sort_index()
+        )
+
 
 class TestROC(SampleClassifierTestPluginBase):
     def setUp(self):


=====================================
q2_sample_classifier/tests/test_estimators.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #
@@ -162,7 +162,8 @@ class EstimatorsTests(SampleClassifierTestPluginBase):
 
         # -- test -- #
         res = sample_classifier.actions.classify_samples_from_dist(
-            distance_matrix=dm, metadata=metadata, k=1)
+            distance_matrix=dm, metadata=metadata, k=1, cv=3, random_state=123
+        )
         pred = res[0].view(pd.Series).sort_values()
         expected = pd.Series(('fat', 'skinny', 'fat', 'skinny'),
                              index=['f1', 's1', 'f2', 's2'])
@@ -192,7 +193,8 @@ class EstimatorsTests(SampleClassifierTestPluginBase):
 
         # -- test -- #
         res = sample_classifier.actions.classify_samples_from_dist(
-            distance_matrix=dm, metadata=metadata, k=1)
+            distance_matrix=dm, metadata=metadata, k=1, cv=3, random_state=123
+        )
         pred = res[0].view(pd.Series)
         expected = pd.Series(('skinny', 'skinny', 'skinny', 'skinny'),
                              index=sample_ids)
@@ -222,7 +224,8 @@ class EstimatorsTests(SampleClassifierTestPluginBase):
 
         # -- test -- #
         res = sample_classifier.actions.classify_samples_from_dist(
-            distance_matrix=dm, metadata=metadata, k=2)
+            distance_matrix=dm, metadata=metadata, k=2, cv=3, random_state=123
+        )
         pred = res[0].view(pd.Series)
         expected = pd.Series(('skinny', 'fat', 'fat', 'skinny'),
                              index=sample_ids)


=====================================
q2_sample_classifier/tests/test_types_formats_transformers.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
q2_sample_classifier/tests/test_utilities.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
q2_sample_classifier/tests/test_visualization.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
q2_sample_classifier/utilities.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
q2_sample_classifier/visuals.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #


=====================================
setup.py
=====================================
@@ -1,5 +1,5 @@
 # ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
 #
 # Distributed under the terms of the Modified BSD License.
 #



View it on GitLab: https://salsa.debian.org/med-team/q2-sample-classifier/-/compare/81735ab5f8203afaa83f00822c5cc84b7d69dc4b...ffdb8a209a7f9e939e7466a63a3eb412569a94e4

-- 
View it on GitLab: https://salsa.debian.org/med-team/q2-sample-classifier/-/compare/81735ab5f8203afaa83f00822c5cc84b7d69dc4b...ffdb8a209a7f9e939e7466a63a3eb412569a94e4
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20220720/2b62c935/attachment-0001.htm>