[med-svn] [Git][med-team/pangolin][master] 6 commits: New upstream version 4.3.1

Tue Aug 22 15:10:06 BST 2023


Andreas Tille pushed to branch master at Debian Med / pangolin


Commits:
7ad853f2 by Andreas Tille at 2023-08-22T15:59:38+02:00
New upstream version 4.3.1
- - - - -
4dad9f2c by Andreas Tille at 2023-08-22T15:59:38+02:00
routine-update: New upstream version

- - - - -
6d80685e by Andreas Tille at 2023-08-22T15:59:39+02:00
Update upstream source from tag 'upstream/4.3.1'

Update to upstream version '4.3.1'
with Debian dir 72420f3d2e1424985b8856c1d3f03428d09fa82e
- - - - -
b948c87f by Andreas Tille at 2023-08-22T15:59:53+02:00
Set upstream metadata fields: Repository.

Changes-By: lintian-brush
Fixes: lintian: upstream-metadata-missing-repository
See-also: https://lintian.debian.org/tags/upstream-metadata-missing-repository.html

- - - - -
3cda43d7 by Andreas Tille at 2023-08-22T16:04:01+02:00
Cleanup changelog, try to go without pangolin-data

- - - - -
5a7f9175 by Andreas Tille at 2023-08-22T16:09:37+02:00
Add TODO2 about pangolin-data

- - - - -


12 changed files:

- .github/workflows/pangolin.yml
- .github/workflows/pangolin_macos.yml
- debian/changelog
- debian/control
- debian/upstream/metadata
- environment.yml
- pangolin/__init__.py
- pangolin/command.py
- pangolin/data/data_compatibility.csv
- pangolin/scripts/usher.smk
- pangolin/utils/report_collation.py
- setup.py


Changes:

=====================================
.github/workflows/pangolin.yml
=====================================
@@ -18,6 +18,7 @@ jobs:
           environment-file: environment.yml
           activate-environment: pangolin
           channels: conda-forge,bioconda,defaults
+          conda-version: "23.5.0"
           mamba-version: "*"
       - name: Install pangolin
         run: pip install -e .


=====================================
.github/workflows/pangolin_macos.yml
=====================================
@@ -18,6 +18,7 @@ jobs:
           environment-file: environment.yml
           activate-environment: pangolin
           channels: conda-forge,bioconda,defaults
+          conda-version: "23.5.0"
           mamba-version: "*"
       - name: Install pangolin
         run: pip install -e .


=====================================
debian/changelog
=====================================
@@ -1,11 +1,12 @@
-pangolin (4.1.3-1) UNRELEASED; urgency=medium
+pangolin (4.3.1-1) UNRELEASED; urgency=medium
 
   * Initial release (Closes: #975920)
 
   TODO: python3-lineages (https://github.com/cov-lineages/lineages)
     -> just try running `pangolin --help` to see the problem
-    Also
-        https://github.com/cov-lineages/pangoLEARN
-    seems to be needed
+  TODO2: Seems we need to update pangolin-data manually since its
+         frequently changing data and ftpmaster is asking for the
+         "source" (and at least the joblib files are binary without
+         source)
 
- -- Andreas Tille <tille at debian.org>  Fri, 23 Dec 2022 16:19:00 +0100
+ -- Andreas Tille <tille at debian.org>  Tue, 22 Aug 2023 15:59:38 +0200


=====================================
debian/control
=====================================
@@ -22,7 +22,7 @@ Depends: ${python3:Depends},
          snakemake,
          python3-sklearn,
          python3-pangolearn,
-         pangolin-data,
+#         pangolin-data,
          scorpio,
          constellations
 Description: Phylogenetic Assignment of Named Global Outbreak LINeages


=====================================
debian/upstream/metadata
=====================================
@@ -3,4 +3,5 @@ Bug-Submit: https://github.com/cov-lineages/pangolin/issues/new
 Registry:
  - Name: conda:bioconda
    Entry: pangolin
+Repository: https://github.com/cov-lineages/pangolin.git
 Repository-Browse: https://github.com/cov-lineages/pangolin


=====================================
environment.yml
=====================================
@@ -8,7 +8,7 @@ dependencies:
   - minimap2>=2.16
   - pip=19.3.1
   - python>=3.7
-  - snakemake-minimal<=6.8.0
+  - snakemake-minimal=7.24.0
   - gofasta
   - ucsc-fatovcf>=426
   - usher>=0.5.4


=====================================
pangolin/__init__.py
=====================================
@@ -1,5 +1,5 @@
 _program = "pangolin"
-__version__ = "4.1.3"
+__version__ = "4.3.1"
 
 
 __all__ = ["pangolearn",


=====================================
pangolin/command.py
=====================================
@@ -62,9 +62,9 @@ def main(sysargs = sys.argv[1:]):
 
     a_group = parser.add_argument_group('Analysis options')
     a_group.add_argument('--analysis-mode', action="store",help="""Pangolin includes multiple analysis engines: UShER and pangoLEARN.
-Scorpio is used in conjunction with UShER/ pangoLEARN to curate variant of concern (VOC)-related lineage calls.
+Scorpio is used in conjunction with pangoLEARN to curate variant of concern (VOC)-related lineage calls.
 UShER is the default and is selected using option "usher" or "accurate".
-pangoLEARN can alternatively be selected using "pangolearn" or "fast".
+pangoLEARN has been depreciated, but older models can be run using "pangolearn" or "fast" with "--datadir" provided.
 Finally, it is possible to skip the UShER/ pangoLEARN step by selecting "scorpio" mode, but in this case only VOC-related lineages will be assigned. 
 """)
     
@@ -80,7 +80,7 @@ Finally, it is possible to skip the UShER/ pangoLEARN step by selecting "scorpio
     d_group.add_argument("--update-data", action='store_true',dest="update_data", default=False, help="Automatically updates to latest release of constellations and pangolin-data, including the pangoLEARN model, UShER tree file and alias file (also pangolin-assignment if it has been installed using --add-assignment-cache), then exits.")
     d_group.add_argument('--add-assignment-cache', action='store_true', dest="add_assignment_cache", default=False, help="Install the pangolin-assignment repository for use with --use-assignment-cache.  This makes updates slower and makes pangolin slower for small numbers of input sequences but much faster for large numbers of input sequences.")
     d_group.add_argument('--use-assignment-cache', action='store_true', dest="use_assignment_cache", default=False, help="Use assignment cache from optional pangolin-assignment repository. NOTE: the repository must be installed by --add-assignment-cache before using --use-assignment-cache.")
-    d_group.add_argument('-d', '--datadir', action='store',dest="datadir",help="Data directory minimally containing the pangoLEARN model, header files and UShER tree. Default: Installed pangolin-data package.")
+    d_group.add_argument('-d', '--datadir', action='store',dest="datadir",help="Data directory minimally containing the pangoLEARN model and header files or UShER tree. Default: Installed pangolin-data package.")
     d_group.add_argument('--use-old-datadir', action='store_true', default=False, help="Use the data from data directory even if older than data installed via Python packages. Default: False")
     d_group.add_argument('--usher-tree', action='store', dest='usher_protobuf', help="UShER Mutation Annotated Tree protobuf file to use instead of default from pangolin-data repository or --datadir.")
     d_group.add_argument('--assignment-cache', action='store', dest='assignment_cache', help="Cached precomputed assignment file to use instead of default from pangolin-assignment repository.  Does not require installation of pangolin-assignment.")
@@ -104,9 +104,16 @@ Finally, it is possible to skip the UShER/ pangoLEARN step by selecting "scorpio
     config = setup_config_dict(cwd)
     data_checks.check_install(config)
     set_up_verbosity(config)
+    config[KEY_ANALYSIS_MODE] = set_up_analysis_mode(args.analysis_mode, config[KEY_ANALYSIS_MODE])
 
     if args.usher:
         sys.stderr.write(cyan(f"--usher is a pangolin v3 option and is deprecated in pangolin v4.  UShER is now the default analysis mode.  Use --analysis-mode to explicitly set mode.\n"))
+    if config[KEY_ANALYSIS_MODE] == "pangolearn" or config[KEY_ANALYSIS_MODE] == "fast":
+        if args.datadir:
+            args.use_old_datadir = True
+        else:
+            sys.stderr.write(cyan(f"pangoLEARN is deprecated in pangolin v4.3.  UShER is now the only updated analysis mode.  Use --datadir to provide an older pangoLEARN model.\n"))
+            config[KEY_ANALYSIS_MODE] = "usher"
 
     setup_data(args.datadir,config[KEY_ANALYSIS_MODE], config, args.use_old_datadir)
 
@@ -142,9 +149,6 @@ Finally, it is possible to skip the UShER/ pangoLEARN step by selecting "scorpio
     if args.expanded_lineage:
         print(green(f"****\nAdding expanded lineage column to output.\n****"))
         config[KEY_EXPANDED_LINEAGE] = True
-        
-    # Parsing analysis mode flags to return one of 'usher' or 'pangolearn'
-    config[KEY_ANALYSIS_MODE] = set_up_analysis_mode(args.analysis_mode, config[KEY_ANALYSIS_MODE])
 
     snakefile = get_snakefile(thisdir,config[KEY_ANALYSIS_MODE])
 


=====================================
pangolin/data/data_compatibility.csv
=====================================
@@ -1,4 +1,12 @@
 data_source,version,min_pangolin_version,min_scorpio_version
+pangolin-data,1.21,4.3,
+pangolin-data,1.20,4.3,
+pangolin-data,1.19,4,
+pangolin-data,1.18.1.1,4,
+pangolin-data,1.18.1,4,
+pangolin-data,1.18,4,
+pangolin-data,1.17,4,
+pangolin-data,1.16,4,
 pangolin-data,1.15.1,4,
 pangolin-data,1.14,4,
 pangolin-data,1.13,4,
@@ -10,6 +18,14 @@ pangolin-data,1.6,4,
 pangolin-data,1.3,4,
 pangolin-data,1.2.133,4,
 pangolin-data,1.2.127,4,
+pangolin-assignment,1.21,4.3,
+pangolin-assignment,1.20,4.3,
+pangolin-assignment,1.19,4,
+pangolin-assignment,1.18.1.1,4,
+pangolin-assignment,1.18.1,4,
+pangolin-assignment,1.18,4,
+pangolin-assignment,1.17,4,
+pangolin-assignment,1.16,4,
 pangolin-assignment,1.15.1,4,
 pangolin-assignment,1.14,4,
 pangolin-assignment,1.13,4,
@@ -20,6 +36,7 @@ pangolin-assignment,1.8,4,
 pangolin-assignment,1.6,4,
 pangolin-assignment,1.3,4,
 pangolin-assignment,1.2.133,4,
+constellations,0.1.12,,0.3.17
 constellations,0.1.10,,0.3.17
 constellations,0.1.9,,0.3.17
 constellations,0.1.8,,0.3.17


=====================================
pangolin/scripts/usher.smk
=====================================
@@ -94,9 +94,20 @@ rule usher_inference:
         if [ -s {input.fasta:q} ]; then
             cat {input.reference:q} > {params.ref_fa:q}
             echo >> {params.ref_fa:q}
+            usher=usher
+            threads={workflow.cores}
+            if usher-sampled --help >& /dev/null; then
+                usher="usher-sampled --optimization_radius 0"
+            else
+                echo ""
+                echo "*** usher-sampled is not installed -- please upgrade usher to at least v0.6.1 ***"
+                echo "*** If you used conda to install usher, run 'conda update --no-pin usher'     ***"
+                echo "*** Alternatively if mamba is installed, run 'mamba update --no-pin usher'    ***"
+                echo ""
+            fi
             cat {input.fasta:q} >> {params.ref_fa:q}
             faToVcf -includeNoAltN {params.ref_fa:q} {params.vcf:q}
-            usher -n -D -i {input.usher_protobuf:q} -v {params.vcf:q} -T {workflow.cores} -d '{config[tempdir]}' &> {log}
+            $usher -n -D -i {input.usher_protobuf:q} -v {params.vcf:q} -T $threads -d '{config[tempdir]}' &> {log}
         else
             rm -f {output.txt:q}
             touch {output.txt:q}


=====================================
pangolin/utils/report_collation.py
=====================================
@@ -25,26 +25,16 @@ def usher_parsing(usher_result,output_report):
                     histo_list = [ i for i in histogram.split(",") if i ]
                     conflict = 0.0
                     if len(histo_list) > 1:
-                        max_count = 0
-                        max_lineage = ""
                         selected_count = 0
                         total = 0
                         for lin_counts in histo_list:
                             m = re.match('([A-Z0-9.]+)\(([0-9]+)/([0-9]+)\)', lin_counts)
                             if m:
                                 lin, place_count, total = [m.group(1), int(m.group(2)), int(m.group(3))]
-                                if place_count > max_count:
-                                    max_count = place_count
-                                    max_lineage = lin
                                 if lin == lineage:
                                     selected_count = place_count
-                        if selected_count < max_count:
-                            # The selected placement was not in the lineage with the plurality
-                            # of placements; go with the plurality.
-                            lineage = max_lineage
-                            conflict = (total - max_count) / total
-                        elif total > 0:
-                            conflict = (total - selected_count) / total
+                                    break
+                        conflict = (total - selected_count) / total
                     histogram_note = "Usher placements: " + " ".join(histo_list)
                 else:
                     lineage = lineage_histogram


=====================================
setup.py
=====================================
@@ -19,7 +19,7 @@ setup(name='pangolin',
             'pandas>=1.0.1',
             "wheel>=0.34",
             'joblib>=0.11',
-            'tabulate==0.8.10',
+            # 'tabulate==0.8.10',
             'scikit-learn>=0.23.1',
             "PuLP>=2"
         ],



View it on GitLab: https://salsa.debian.org/med-team/pangolin/-/compare/5c1bfaa9fffbdc81eb3b0b490a73dadab72888cf...5a7f9175ae0184348ccd48c094ddbf45028358cf

-- 
View it on GitLab: https://salsa.debian.org/med-team/pangolin/-/compare/5c1bfaa9fffbdc81eb3b0b490a73dadab72888cf...5a7f9175ae0184348ccd48c094ddbf45028358cf
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20230822/663cf935/attachment-0001.htm>