[med-svn] [Git][med-team/q2-sample-classifier][master] 6 commits: New upstream version 2022.2.0
Andreas Tille (@tille)
gitlab at salsa.debian.org
Wed Jul 20 15:46:55 BST 2022
Andreas Tille pushed to branch master at Debian Med / q2-sample-classifier
Commits:
b16d3b7d by Andreas Tille at 2022-07-20T16:39:46+02:00
New upstream version 2022.2.0
- - - - -
0b7fc103 by Andreas Tille at 2022-07-20T16:39:46+02:00
routine-update: New upstream version
- - - - -
81341a3b by Andreas Tille at 2022-07-20T16:39:47+02:00
Update upstream source from tag 'upstream/2022.2.0'
Update to upstream version '2022.2.0'
with Debian dir 14f79d343db0c79a1fbd8355f92017732dfd16a0
- - - - -
8f068cc0 by Andreas Tille at 2022-07-20T16:39:47+02:00
routine-update: Standards-Version: 4.6.1
- - - - -
da0a62e7 by Andreas Tille at 2022-07-20T16:42:15+02:00
Bump versioned (Build-)Depends of q2-* packages to 2022.2.0
- - - - -
ffdb8a20 by Andreas Tille at 2022-07-20T16:46:27+02:00
Lots of failures in autopkgtest
- - - - -
22 changed files:
- LICENSE
- ci/recipe/meta.yaml
- debian/changelog
- debian/control
- q2_sample_classifier/__init__.py
- q2_sample_classifier/_format.py
- q2_sample_classifier/_transformer.py
- q2_sample_classifier/_type.py
- q2_sample_classifier/_version.py
- q2_sample_classifier/classify.py
- q2_sample_classifier/plugin_setup.py
- q2_sample_classifier/tests/__init__.py
- q2_sample_classifier/tests/test_actions.py
- q2_sample_classifier/tests/test_base_class.py
- q2_sample_classifier/tests/test_classifier.py
- q2_sample_classifier/tests/test_estimators.py
- q2_sample_classifier/tests/test_types_formats_transformers.py
- q2_sample_classifier/tests/test_utilities.py
- q2_sample_classifier/tests/test_visualization.py
- q2_sample_classifier/utilities.py
- q2_sample_classifier/visuals.py
- setup.py
Changes:
=====================================
LICENSE
=====================================
@@ -1,6 +1,6 @@
BSD 3-Clause License
-Copyright (c) 2017-2021, QIIME 2 development team.
+Copyright (c) 2017-2022, QIIME 2 development team.
All rights reserved.
Redistribution and use in source and binary forms, with or without
=====================================
ci/recipe/meta.yaml
=====================================
@@ -22,7 +22,7 @@ requirements:
- scipy
- numpy
- joblib
- - scikit-learn >=0.22.1
+ - scikit-learn {{ scikit_learn }}
- scikit-bio
- seaborn >=0.8
- fastcluster
=====================================
debian/changelog
=====================================
@@ -1,3 +1,13 @@
+q2-sample-classifier (2022.2.0-1) UNRELEASED; urgency=medium
+
+ * Team upload.
+ * New upstream version
+ * Standards-Version: 4.6.1 (routine-update)
+ * Bump versioned (Build-)Depends of q2-* packages to 2022.2.0
+ TODO: Lots of failures in autopkgtest
+
+ -- Andreas Tille <tille at debian.org> Wed, 20 Jul 2022 16:39:46 +0200
+
q2-sample-classifier (2021.8.0-1) unstable; urgency=medium
* Team upload.
=====================================
debian/control
=====================================
@@ -6,11 +6,11 @@ Section: science
Priority: optional
Build-Depends: debhelper-compat (= 13),
dh-python,
- qiime (>= 2021.8.0),
+ qiime (>= 2022.2.0),
python3-all,
python3-setuptools,
python3-pytest <!nocheck>
-Standards-Version: 4.6.0
+Standards-Version: 4.6.1
Vcs-Browser: https://salsa.debian.org/med-team/q2-sample-classifier
Vcs-Git: https://salsa.debian.org/med-team/q2-sample-classifier.git
Homepage: https://qiime2.org
@@ -21,10 +21,10 @@ Architecture: all
Depends: ${shlibs:Depends},
${misc:Depends},
${python3:Depends},
- qiime (>= 2021.8.0),
+ qiime (>= 2022.2.0),
python3-distutils,
- q2-types (>= 2021.8.0),
- q2-feature-table (>= 2021.8.0)
+ q2-types (>= 2022.2.0),
+ q2-feature-table (>= 2022.2.0)
Description: QIIME 2 plugin for machine learning prediction of sample data
QIIME 2 is a powerful, extensible, and decentralized microbiome analysis
package with a focus on data and analysis transparency. QIIME 2 enables
=====================================
q2_sample_classifier/__init__.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
q2_sample_classifier/_format.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
q2_sample_classifier/_transformer.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
q2_sample_classifier/_type.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
q2_sample_classifier/_version.py
=====================================
@@ -23,9 +23,9 @@ def get_keywords():
# setup.py/versioneer.py will grep for the variable names, so they must
# each be defined on a line of their own. _version.py will just call
# get_keywords().
- git_refnames = " (tag: 2021.8.0)"
- git_full = "916eb0799fa2c95a04766b44c22335c6a097ce13"
- git_date = "2021-09-09 18:35:33 +0000"
+ git_refnames = " (tag: 2022.2.0)"
+ git_full = "5056003cb20259dfc4dd046c9043f0d89740bc95"
+ git_date = "2022-02-18 18:51:19 +0000"
keywords = {"refnames": git_refnames, "full": git_full, "date": git_date}
return keywords
=====================================
q2_sample_classifier/classify.py
=====================================
@@ -1,16 +1,17 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file LICENSE, distributed with this software.
# ----------------------------------------------------------------------------
-import collections
-
+import numpy as np
from sklearn.ensemble import IsolationForest
from sklearn.metrics import mean_squared_error, accuracy_score
from sklearn.feature_extraction import DictVectorizer
+from sklearn.model_selection import KFold
+from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
import qiime2
@@ -102,42 +103,66 @@ def metatable(ctx,
return metatab
-def classify_samples_from_dist(ctx, distance_matrix, metadata, k=1,
- palette=defaults['palette']):
- ''' Returns knn classifier results from a distance matrix.'''
- distance_matrix = distance_matrix.view(skbio.DistanceMatrix)
- predictions = []
- metadata_series = metadata.to_series()
- for i, row in enumerate(distance_matrix):
- dists = []
- categories = []
- for j, dist in enumerate(row):
- if j == i:
- continue # exclude self
- dists.append(dist)
- categories.append(metadata_series[distance_matrix.ids[j]])
-
- # k-long series of (category: dist) ordered small -> large
- nn_categories = pd.Series(dists, index=categories).nsmallest(k)
- counter = collections.Counter(nn_categories.index)
- max_counts = max(counter.values())
- # in order of closeness, pick a category that is or shares
- # max_counts
- for category in nn_categories.index:
- if counter[category] == max_counts:
- predictions.append(category)
- break
-
- predictions = pd.Series(predictions, index=distance_matrix.ids)
- predictions.index.name = 'SampleID'
- pred = qiime2.Artifact.import_data(
- 'SampleData[ClassifierPredictions]', predictions)
+def _fit_predict_knn_cv(
+ x: pd.DataFrame, y: pd.Series, k: int, cv: int,
+ random_state: int, n_jobs: int
+) -> (pd.Series, pd.Series):
+ kf = KFold(n_splits=cv, shuffle=True, random_state=random_state)
+
+ # train and test with CV
+ predictions, pred_ids, truth = [], [], []
+ for train_index, test_index in kf.split(x):
+ x_train, x_test = x.iloc[train_index, train_index], \
+ x.iloc[test_index, train_index]
+ y_train, y_test = y[train_index], y[test_index]
+
+ knn = KNeighborsClassifier(
+ n_neighbors=k, metric='precomputed', n_jobs=n_jobs
+ )
+ knn.fit(x_train, y_train)
+
+ # gather predictions for the confusion matrix
+ predictions.append(knn.predict(x_test))
+ pred_ids.extend(x_test.index.tolist())
+ truth.append(y_test)
+
+ predictions = pd.Series(
+ np.concatenate(predictions).ravel(),
+ index=pd.Index(pred_ids, name='SampleID')
+ )
+ truth = pd.concat(truth)
+ truth.index.name = 'SampleID'
+
+ return predictions, truth
+
+
+def classify_samples_from_dist(
+ ctx, distance_matrix, metadata, k=1, cv=defaults['cv'],
+ random_state=None, n_jobs=defaults['n_jobs'],
+ palette=defaults['palette']
+):
+ """ Trains and evaluates a KNN classifier from a distance matrix
+ using cross-validation."""
+ distance_matrix = distance_matrix \
+ .view(skbio.DistanceMatrix) \
+ .to_data_frame()
+ # reorder (required for splitting into train/test)
+ metadata_ser = metadata.to_series()[distance_matrix.index]
+
+ predictions, truth = _fit_predict_knn_cv(
+ distance_matrix, metadata_ser, k, cv, random_state, n_jobs
+ )
+ predictions = qiime2.Artifact.import_data(
+ 'SampleData[ClassifierPredictions]', predictions
+ )
+ truth = qiime2.CategoricalMetadataColumn(truth)
confusion = ctx.get_action('sample_classifier', 'confusion_matrix')
accuracy_results, = confusion(
- pred, metadata, missing_samples='ignore', palette=palette)
+ predictions, truth, missing_samples='ignore', palette=palette
+ )
- return pred, accuracy_results
+ return predictions, accuracy_results
def classify_samples(ctx,
=====================================
q2_sample_classifier/plugin_setup.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
@@ -255,6 +255,9 @@ plugin.pipelines.register_function(
parameters={
'metadata': MetadataColumn[Categorical],
'k': Int,
+ 'cv': parameters['cv']['cv'],
+ 'random_state': parameters['base']['random_state'],
+ 'n_jobs': parameters['base']['n_jobs'],
'palette': Str % Choices(_custom_palettes().keys()),
},
outputs=[
@@ -265,6 +268,9 @@ plugin.pipelines.register_function(
parameter_descriptions={
'metadata': 'Categorical metadata column to use as prediction target.',
'k': 'Number of nearest neighbors',
+ 'cv': parameter_descriptions['cv']['cv'],
+ 'random_state': parameter_descriptions['base']['random_state'],
+ 'n_jobs': parameter_descriptions['base']['n_jobs'],
'palette': 'The color palette to use for plotting.',
},
output_descriptions={
=====================================
q2_sample_classifier/tests/__init__.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
q2_sample_classifier/tests/test_actions.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
q2_sample_classifier/tests/test_base_class.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
q2_sample_classifier/tests/test_classifier.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
@@ -8,6 +8,7 @@
from warnings import filterwarnings
import pandas as pd
import numpy as np
+import skbio
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFECV
import pandas.testing as pdt
@@ -96,6 +97,12 @@ class TestBinaryClassification(SampleClassifierTestPluginBase):
sample_ids=[c for c in 'abcdef'])
self.tab = qiime2.Artifact.import_data('FeatureTable[Frequency]', tab)
+ dist = skbio.DistanceMatrix.from_iterable(
+ iterable=[1, 16, 2, 1, 16, 17],
+ metric=lambda x, y: abs(y-x), keys=[c for c in 'abcdef']
+ )
+ self.dist = qiime2.Artifact.import_data('DistanceMatrix', dist)
+
# we will make sure predictions are correct, but no need to validate
# other outputs, which are tested elsewhere.
def test_classify_samples_binary(self):
@@ -115,6 +122,16 @@ class TestBinaryClassification(SampleClassifierTestPluginBase):
index=pd.Index([i for i in 'aebdcf'], name='id'))
pdt.assert_series_equal(exp, res[0].view(pd.Series))
+ def test_classify_samples_dist_binary(self):
+ res = sample_classifier.actions.classify_samples_from_dist(
+ distance_matrix=self.dist, metadata=self.md, k=2, cv=3,
+ n_jobs=1, random_state=123)
+ exp = pd.Series([c for c in 'abaaaa'], name='0',
+ index=pd.Index([i for i in 'abcdef'], name='id'))
+ pdt.assert_series_equal(
+ exp.sort_index(), res[0].view(pd.Series).sort_index()
+ )
+
class TestROC(SampleClassifierTestPluginBase):
def setUp(self):
=====================================
q2_sample_classifier/tests/test_estimators.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
@@ -162,7 +162,8 @@ class EstimatorsTests(SampleClassifierTestPluginBase):
# -- test -- #
res = sample_classifier.actions.classify_samples_from_dist(
- distance_matrix=dm, metadata=metadata, k=1)
+ distance_matrix=dm, metadata=metadata, k=1, cv=3, random_state=123
+ )
pred = res[0].view(pd.Series).sort_values()
expected = pd.Series(('fat', 'skinny', 'fat', 'skinny'),
index=['f1', 's1', 'f2', 's2'])
@@ -192,7 +193,8 @@ class EstimatorsTests(SampleClassifierTestPluginBase):
# -- test -- #
res = sample_classifier.actions.classify_samples_from_dist(
- distance_matrix=dm, metadata=metadata, k=1)
+ distance_matrix=dm, metadata=metadata, k=1, cv=3, random_state=123
+ )
pred = res[0].view(pd.Series)
expected = pd.Series(('skinny', 'skinny', 'skinny', 'skinny'),
index=sample_ids)
@@ -222,7 +224,8 @@ class EstimatorsTests(SampleClassifierTestPluginBase):
# -- test -- #
res = sample_classifier.actions.classify_samples_from_dist(
- distance_matrix=dm, metadata=metadata, k=2)
+ distance_matrix=dm, metadata=metadata, k=2, cv=3, random_state=123
+ )
pred = res[0].view(pd.Series)
expected = pd.Series(('skinny', 'fat', 'fat', 'skinny'),
index=sample_ids)
=====================================
q2_sample_classifier/tests/test_types_formats_transformers.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
q2_sample_classifier/tests/test_utilities.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
q2_sample_classifier/tests/test_visualization.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
q2_sample_classifier/utilities.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
q2_sample_classifier/visuals.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
=====================================
setup.py
=====================================
@@ -1,5 +1,5 @@
# ----------------------------------------------------------------------------
-# Copyright (c) 2017-2021, QIIME 2 development team.
+# Copyright (c) 2017-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
View it on GitLab: https://salsa.debian.org/med-team/q2-sample-classifier/-/compare/81735ab5f8203afaa83f00822c5cc84b7d69dc4b...ffdb8a209a7f9e939e7466a63a3eb412569a94e4
--
View it on GitLab: https://salsa.debian.org/med-team/q2-sample-classifier/-/compare/81735ab5f8203afaa83f00822c5cc84b7d69dc4b...ffdb8a209a7f9e939e7466a63a3eb412569a94e4
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20220720/2b62c935/attachment-0001.htm>
More information about the debian-med-commit
mailing list