[med-svn] [Git][med-team/pangolin][master] 6 commits: New upstream version 4.3.1
Andreas Tille (@tille)
gitlab at salsa.debian.org
Tue Aug 22 15:10:06 BST 2023
Andreas Tille pushed to branch master at Debian Med / pangolin
Commits:
7ad853f2 by Andreas Tille at 2023-08-22T15:59:38+02:00
New upstream version 4.3.1
- - - - -
4dad9f2c by Andreas Tille at 2023-08-22T15:59:38+02:00
routine-update: New upstream version
- - - - -
6d80685e by Andreas Tille at 2023-08-22T15:59:39+02:00
Update upstream source from tag 'upstream/4.3.1'
Update to upstream version '4.3.1'
with Debian dir 72420f3d2e1424985b8856c1d3f03428d09fa82e
- - - - -
b948c87f by Andreas Tille at 2023-08-22T15:59:53+02:00
Set upstream metadata fields: Repository.
Changes-By: lintian-brush
Fixes: lintian: upstream-metadata-missing-repository
See-also: https://lintian.debian.org/tags/upstream-metadata-missing-repository.html
- - - - -
3cda43d7 by Andreas Tille at 2023-08-22T16:04:01+02:00
Cleanup changelog, try to go without pangolin-data
- - - - -
5a7f9175 by Andreas Tille at 2023-08-22T16:09:37+02:00
Add TODO2 about pangolin-data
- - - - -
12 changed files:
- .github/workflows/pangolin.yml
- .github/workflows/pangolin_macos.yml
- debian/changelog
- debian/control
- debian/upstream/metadata
- environment.yml
- pangolin/__init__.py
- pangolin/command.py
- pangolin/data/data_compatibility.csv
- pangolin/scripts/usher.smk
- pangolin/utils/report_collation.py
- setup.py
Changes:
=====================================
.github/workflows/pangolin.yml
=====================================
@@ -18,6 +18,7 @@ jobs:
environment-file: environment.yml
activate-environment: pangolin
channels: conda-forge,bioconda,defaults
+ conda-version: "23.5.0"
mamba-version: "*"
- name: Install pangolin
run: pip install -e .
=====================================
.github/workflows/pangolin_macos.yml
=====================================
@@ -18,6 +18,7 @@ jobs:
environment-file: environment.yml
activate-environment: pangolin
channels: conda-forge,bioconda,defaults
+ conda-version: "23.5.0"
mamba-version: "*"
- name: Install pangolin
run: pip install -e .
=====================================
debian/changelog
=====================================
@@ -1,11 +1,12 @@
-pangolin (4.1.3-1) UNRELEASED; urgency=medium
+pangolin (4.3.1-1) UNRELEASED; urgency=medium
* Initial release (Closes: #975920)
TODO: python3-lineages (https://github.com/cov-lineages/lineages)
-> just try running `pangolin --help` to see the problem
- Also
- https://github.com/cov-lineages/pangoLEARN
- seems to be needed
+ TODO2: Seems we need to update pangolin-data manually since its
+ frequently changing data and ftpmaster is asking for the
+ "source" (and at least the joblib files are binary without
+ source)
- -- Andreas Tille <tille at debian.org> Fri, 23 Dec 2022 16:19:00 +0100
+ -- Andreas Tille <tille at debian.org> Tue, 22 Aug 2023 15:59:38 +0200
=====================================
debian/control
=====================================
@@ -22,7 +22,7 @@ Depends: ${python3:Depends},
snakemake,
python3-sklearn,
python3-pangolearn,
- pangolin-data,
+# pangolin-data,
scorpio,
constellations
Description: Phylogenetic Assignment of Named Global Outbreak LINeages
=====================================
debian/upstream/metadata
=====================================
@@ -3,4 +3,5 @@ Bug-Submit: https://github.com/cov-lineages/pangolin/issues/new
Registry:
- Name: conda:bioconda
Entry: pangolin
+Repository: https://github.com/cov-lineages/pangolin.git
Repository-Browse: https://github.com/cov-lineages/pangolin
=====================================
environment.yml
=====================================
@@ -8,7 +8,7 @@ dependencies:
- minimap2>=2.16
- pip=19.3.1
- python>=3.7
- - snakemake-minimal<=6.8.0
+ - snakemake-minimal=7.24.0
- gofasta
- ucsc-fatovcf>=426
- usher>=0.5.4
=====================================
pangolin/__init__.py
=====================================
@@ -1,5 +1,5 @@
_program = "pangolin"
-__version__ = "4.1.3"
+__version__ = "4.3.1"
__all__ = ["pangolearn",
=====================================
pangolin/command.py
=====================================
@@ -62,9 +62,9 @@ def main(sysargs = sys.argv[1:]):
a_group = parser.add_argument_group('Analysis options')
a_group.add_argument('--analysis-mode', action="store",help="""Pangolin includes multiple analysis engines: UShER and pangoLEARN.
-Scorpio is used in conjunction with UShER/ pangoLEARN to curate variant of concern (VOC)-related lineage calls.
+Scorpio is used in conjunction with pangoLEARN to curate variant of concern (VOC)-related lineage calls.
UShER is the default and is selected using option "usher" or "accurate".
-pangoLEARN can alternatively be selected using "pangolearn" or "fast".
+pangoLEARN has been depreciated, but older models can be run using "pangolearn" or "fast" with "--datadir" provided.
Finally, it is possible to skip the UShER/ pangoLEARN step by selecting "scorpio" mode, but in this case only VOC-related lineages will be assigned.
""")
@@ -80,7 +80,7 @@ Finally, it is possible to skip the UShER/ pangoLEARN step by selecting "scorpio
d_group.add_argument("--update-data", action='store_true',dest="update_data", default=False, help="Automatically updates to latest release of constellations and pangolin-data, including the pangoLEARN model, UShER tree file and alias file (also pangolin-assignment if it has been installed using --add-assignment-cache), then exits.")
d_group.add_argument('--add-assignment-cache', action='store_true', dest="add_assignment_cache", default=False, help="Install the pangolin-assignment repository for use with --use-assignment-cache. This makes updates slower and makes pangolin slower for small numbers of input sequences but much faster for large numbers of input sequences.")
d_group.add_argument('--use-assignment-cache', action='store_true', dest="use_assignment_cache", default=False, help="Use assignment cache from optional pangolin-assignment repository. NOTE: the repository must be installed by --add-assignment-cache before using --use-assignment-cache.")
- d_group.add_argument('-d', '--datadir', action='store',dest="datadir",help="Data directory minimally containing the pangoLEARN model, header files and UShER tree. Default: Installed pangolin-data package.")
+ d_group.add_argument('-d', '--datadir', action='store',dest="datadir",help="Data directory minimally containing the pangoLEARN model and header files or UShER tree. Default: Installed pangolin-data package.")
d_group.add_argument('--use-old-datadir', action='store_true', default=False, help="Use the data from data directory even if older than data installed via Python packages. Default: False")
d_group.add_argument('--usher-tree', action='store', dest='usher_protobuf', help="UShER Mutation Annotated Tree protobuf file to use instead of default from pangolin-data repository or --datadir.")
d_group.add_argument('--assignment-cache', action='store', dest='assignment_cache', help="Cached precomputed assignment file to use instead of default from pangolin-assignment repository. Does not require installation of pangolin-assignment.")
@@ -104,9 +104,16 @@ Finally, it is possible to skip the UShER/ pangoLEARN step by selecting "scorpio
config = setup_config_dict(cwd)
data_checks.check_install(config)
set_up_verbosity(config)
+ config[KEY_ANALYSIS_MODE] = set_up_analysis_mode(args.analysis_mode, config[KEY_ANALYSIS_MODE])
if args.usher:
sys.stderr.write(cyan(f"--usher is a pangolin v3 option and is deprecated in pangolin v4. UShER is now the default analysis mode. Use --analysis-mode to explicitly set mode.\n"))
+ if config[KEY_ANALYSIS_MODE] == "pangolearn" or config[KEY_ANALYSIS_MODE] == "fast":
+ if args.datadir:
+ args.use_old_datadir = True
+ else:
+ sys.stderr.write(cyan(f"pangoLEARN is deprecated in pangolin v4.3. UShER is now the only updated analysis mode. Use --datadir to provide an older pangoLEARN model.\n"))
+ config[KEY_ANALYSIS_MODE] = "usher"
setup_data(args.datadir,config[KEY_ANALYSIS_MODE], config, args.use_old_datadir)
@@ -142,9 +149,6 @@ Finally, it is possible to skip the UShER/ pangoLEARN step by selecting "scorpio
if args.expanded_lineage:
print(green(f"****\nAdding expanded lineage column to output.\n****"))
config[KEY_EXPANDED_LINEAGE] = True
-
- # Parsing analysis mode flags to return one of 'usher' or 'pangolearn'
- config[KEY_ANALYSIS_MODE] = set_up_analysis_mode(args.analysis_mode, config[KEY_ANALYSIS_MODE])
snakefile = get_snakefile(thisdir,config[KEY_ANALYSIS_MODE])
=====================================
pangolin/data/data_compatibility.csv
=====================================
@@ -1,4 +1,12 @@
data_source,version,min_pangolin_version,min_scorpio_version
+pangolin-data,1.21,4.3,
+pangolin-data,1.20,4.3,
+pangolin-data,1.19,4,
+pangolin-data,1.18.1.1,4,
+pangolin-data,1.18.1,4,
+pangolin-data,1.18,4,
+pangolin-data,1.17,4,
+pangolin-data,1.16,4,
pangolin-data,1.15.1,4,
pangolin-data,1.14,4,
pangolin-data,1.13,4,
@@ -10,6 +18,14 @@ pangolin-data,1.6,4,
pangolin-data,1.3,4,
pangolin-data,1.2.133,4,
pangolin-data,1.2.127,4,
+pangolin-assignment,1.21,4.3,
+pangolin-assignment,1.20,4.3,
+pangolin-assignment,1.19,4,
+pangolin-assignment,1.18.1.1,4,
+pangolin-assignment,1.18.1,4,
+pangolin-assignment,1.18,4,
+pangolin-assignment,1.17,4,
+pangolin-assignment,1.16,4,
pangolin-assignment,1.15.1,4,
pangolin-assignment,1.14,4,
pangolin-assignment,1.13,4,
@@ -20,6 +36,7 @@ pangolin-assignment,1.8,4,
pangolin-assignment,1.6,4,
pangolin-assignment,1.3,4,
pangolin-assignment,1.2.133,4,
+constellations,0.1.12,,0.3.17
constellations,0.1.10,,0.3.17
constellations,0.1.9,,0.3.17
constellations,0.1.8,,0.3.17
=====================================
pangolin/scripts/usher.smk
=====================================
@@ -94,9 +94,20 @@ rule usher_inference:
if [ -s {input.fasta:q} ]; then
cat {input.reference:q} > {params.ref_fa:q}
echo >> {params.ref_fa:q}
+ usher=usher
+ threads={workflow.cores}
+ if usher-sampled --help >& /dev/null; then
+ usher="usher-sampled --optimization_radius 0"
+ else
+ echo ""
+ echo "*** usher-sampled is not installed -- please upgrade usher to at least v0.6.1 ***"
+ echo "*** If you used conda to install usher, run 'conda update --no-pin usher' ***"
+ echo "*** Alternatively if mamba is installed, run 'mamba update --no-pin usher' ***"
+ echo ""
+ fi
cat {input.fasta:q} >> {params.ref_fa:q}
faToVcf -includeNoAltN {params.ref_fa:q} {params.vcf:q}
- usher -n -D -i {input.usher_protobuf:q} -v {params.vcf:q} -T {workflow.cores} -d '{config[tempdir]}' &> {log}
+ $usher -n -D -i {input.usher_protobuf:q} -v {params.vcf:q} -T $threads -d '{config[tempdir]}' &> {log}
else
rm -f {output.txt:q}
touch {output.txt:q}
=====================================
pangolin/utils/report_collation.py
=====================================
@@ -25,26 +25,16 @@ def usher_parsing(usher_result,output_report):
histo_list = [ i for i in histogram.split(",") if i ]
conflict = 0.0
if len(histo_list) > 1:
- max_count = 0
- max_lineage = ""
selected_count = 0
total = 0
for lin_counts in histo_list:
m = re.match('([A-Z0-9.]+)\(([0-9]+)/([0-9]+)\)', lin_counts)
if m:
lin, place_count, total = [m.group(1), int(m.group(2)), int(m.group(3))]
- if place_count > max_count:
- max_count = place_count
- max_lineage = lin
if lin == lineage:
selected_count = place_count
- if selected_count < max_count:
- # The selected placement was not in the lineage with the plurality
- # of placements; go with the plurality.
- lineage = max_lineage
- conflict = (total - max_count) / total
- elif total > 0:
- conflict = (total - selected_count) / total
+ break
+ conflict = (total - selected_count) / total
histogram_note = "Usher placements: " + " ".join(histo_list)
else:
lineage = lineage_histogram
=====================================
setup.py
=====================================
@@ -19,7 +19,7 @@ setup(name='pangolin',
'pandas>=1.0.1',
"wheel>=0.34",
'joblib>=0.11',
- 'tabulate==0.8.10',
+ # 'tabulate==0.8.10',
'scikit-learn>=0.23.1',
"PuLP>=2"
],
View it on GitLab: https://salsa.debian.org/med-team/pangolin/-/compare/5c1bfaa9fffbdc81eb3b0b490a73dadab72888cf...5a7f9175ae0184348ccd48c094ddbf45028358cf
--
View it on GitLab: https://salsa.debian.org/med-team/pangolin/-/compare/5c1bfaa9fffbdc81eb3b0b490a73dadab72888cf...5a7f9175ae0184348ccd48c094ddbf45028358cf
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20230822/663cf935/attachment-0001.htm>
More information about the debian-med-commit
mailing list