[med-svn] [Git][med-team/tnseq-transit][upstream] New upstream version 3.3.10

Andreas Tille (@tille) gitlab at salsa.debian.org
Wed Dec 4 10:27:26 GMT 2024



Andreas Tille pushed to branch upstream at Debian Med / tnseq-transit


Commits:
9bba894b by Andreas Tille at 2024-12-04T10:54:53+01:00
New upstream version 3.3.10
- - - - -


23 changed files:

- CHANGELOG.md
- MANIFEST.in
- README.md
- setup.py
- src/pytpp/tpp_gui.py
- src/pytpp/tpp_tools.py
- + src/pytransit/DejaVuSans.ttf
- src/pytransit/__init__.py
- src/pytransit/analysis/CGI.py
- src/pytransit/analysis/pathway_enrichment.py
- − src/pytransit/data/CGI/temp_cdr.txt
- − src/pytransit/data/CGI/temp_frac_abund.txt
- − src/pytransit/data/gene_ontology.1_2.3-11-18.obo
- src/pytransit/doc/source/CGI.rst
- src/pytransit/doc/source/_images/RVBD3645_lmplot.png
- src/pytransit/doc/source/_images/ndh_lmplot.png
- src/pytransit/doc/source/_images/thiL_lmplot.png
- src/pytransit/doc/source/method_pathway_enrichment.rst
- src/pytransit/doc/source/tpp.rst
- src/pytransit/doc/source/transit_install.rst
- src/pytransit/draw_trash.py
- src/pytransit/tnseq_tools.py
- tests/test_tpp.py


Changes:

=====================================
CHANGELOG.md
=====================================
@@ -2,6 +2,53 @@
 All notable changes to this project will be documented in this file.
 
 
+## Version 3.3.10 (2024-12-02)
+#### Transit:
+
+Minor changes:
+  - fixed setup (pip install) problems caused by updates to Python3.12
+  - added comments about installing Transit1 in Python3.12 to install documentation
+
+
+## Version 3.3.9 (2024-11-17)
+#### Transit:
+
+Minor changes:
+  - updated doc on pathway enrichment to refer to pathways.html for association files
+
+
+## Version 3.3.8 (2024-10-26)
+#### Transit:
+
+Minor changes:
+ - added flags to pathway_enrichment, such as:
+   -focusLFC pos|neg : to restrict pathway analysis of significant genes to only those with positive or negative LFCs
+   -minLFC : to specify a minimum magnitude for LFCs (e.g. '-minLFC 1' means analyze genes with at least a 2-fold change, up or down)
+   -qval : to change the threshold for significance from the default value of 0.05
+   -topk : to analyze the top K genes ranked by significance (Qval), regardless of cutoff
+
+
+## Version 3.3.7 (2024-10-16)
+#### Transit:
+
+Minor changes:
+ - updated manifest to include font file in PyPi package
+
+
+## Version 3.3.6 (2024-10-14)
+#### Transit:
+
+Minor changes:
+ - fixed a bug in TrackView that was caused by deprecated function in recent version of Python Image Library (PIL v10.0)
+
+
+## Version 3.3.5 (2024-09-14)
+#### Transit:
+
+Minor changes:
+ - allow wig file pathnames to have spaces in combined_wig and metadata files (e.g. for ANOVA and ZINB)
+ - change default alg for BWA from 'mem' back to 'aln' (see documentation on TPP)
+
 ## Version 3.3.4 (2024-02-16)
 #### Transit:
 


=====================================
MANIFEST.in
=====================================
@@ -2,6 +2,7 @@ include MANIFEST.in
 include README.md
 include LICENSE.md
 include VERSION
+include src/pytransit/DejaVuSans.ttf
 recursive-include src/pytransit/data *
 recursive-include src/pytransit/genomes *
 recursive-include src/pytransit/doc/build/html *.html


=====================================
README.md
=====================================
@@ -13,7 +13,7 @@ It provides an easy to use graphical interface and access to three different ana
 
 TRANSIT Home page: http://saclab.tamu.edu/essentiality/transit/index.html
 
-TRANSIT Documentation: https://transit.readthedocs.io/en/latest/transit_overview.html
+TRANSIT Documentation: https://transit.readthedocs.io/en/stable/transit_overview.html
 
 [Changelog](https://github.com/mad-lab/transit/blob/master/CHANGELOG.md)
 
@@ -49,7 +49,7 @@ For any questions or comments, please contact Dr. Thomas Ioerger, ioerger at cs.tam
 For full instructions on how to install and run TRANSIT (and the optional pre-processor, TPP), please see the documentation included in this distribution ("src/pytransit/doc" folder) or visit the following web page:
 
 
-https://transit.readthedocs.io/en/latest/
+https://transit.readthedocs.io/en/stable/
 
 
 ## Datasets


=====================================
setup.py
=====================================
@@ -195,7 +195,7 @@ setup(
     # https://packaging.python.org/en/latest/requirements.html
     # 'pypubsub<4.0' and 'wxPython' are needed for GUI only, but go ahead and install them
     # the reason for restriction on pypubsub is that version>=4.0 does not work with python2 - I can probably get rid of this restriction, since everybody must be using python3 by now
-    install_requires=['setuptools', 'numpy~=1.16', 'scipy~=1.2', 'matplotlib~=3.0', 'pillow', 'scikit-learn', 'statsmodels~=0.9', 'pypubsub', 'wxPython'],
+    install_requires=['wheel','setuptools', 'numpy~=1.16', 'scipy~=1.2', 'matplotlib~=3.0', 'pillow', 'scikit-learn', 'statsmodels~=0.9', 'pypubsub', 'wxPython'],
 
     #dependency_links = [
     #	"git+https://github.com/wxWidgets/wxPython.git#egg=wxPython"


=====================================
src/pytpp/tpp_gui.py
=====================================
@@ -246,7 +246,7 @@ The Mme1 protocol generally assumes reads do NOT include the primer prefix, and
 
             self.bwa_alg = wx.ComboBox(panel,choices=["use algorithm 'aln'", "use algorithm 'mem'"],size=(200,30))
             if vars.bwa_alg=='aln': self.bwa_alg.SetSelection(0)
-            else: self.bwa_alg.SetSelection(1) # default
+            else: self.bwa_alg.SetSelection(1) 
             sizer0.Add(self.bwa_alg, proportion=2, flag=wx.EXPAND|wx.ALL, border=5) ## 
             self.bwa_alg.Bind(wx.EVT_COMBOBOX, self.OnBwaAlgSelection, id=self.bwa_alg.GetId())
             sizer0.Add(TPPIcon(panel, wx.ID_ANY, bmp, "'mem' is considered to do a better job at mapping reads, but 'aln' is available as an alternative."), flag=wx.CENTER, border=0)


=====================================
src/pytpp/tpp_tools.py
=====================================
@@ -273,9 +273,8 @@ def extract_staggered(infile,outfile,vars):
   output = open(outfile,"w")
   output_failed = open(outfile+'_failed_trim',"w")                          # [WM] [add]
   if vars.window_size!=-1: message("Looking for start of Tn prefix with P,Q = %d,%d (origin = %d, window size = %d)" % (P,Q,origin,vars.window_size)) # [RJ] Outputting P,Q values and origin/window size
-  else: message("Looking for start of Tn prefix within P,Q = [%d,%d]" % (P,Q))
+  else: message("Looking for start of Tn prefix within positions [%d,%d]" % (P,Q))
   tot = 0
-  #print(infile)
   if vars.barseq_catalog_out!=None:
     barcodes_file = vars.base+".barseq" # I could define this in vars
     catalog = open(barcodes_file,"w")
@@ -283,7 +282,6 @@ def extract_staggered(infile,outfile,vars):
     barseq2 = "CGTACGCTGCAGGTCGACGGCCGG"
     barseq1len,barseq2len = len(barseq1),len(barseq2)
   for line in open(infile):
-    #print(line)
     line = line.rstrip()
     if not line: continue
     if line[0]=='>': header = line; continue
@@ -1339,7 +1337,7 @@ def initialize_globals(vars, args=[], kwargs={}):
     vars.window_size = -1
     vars.primer_start_window = 0,20
     vars.window = None
-    vars.bwa_alg = "mem"
+    vars.bwa_alg = "aln" # changing from mem back to aln because of /dev/shm error on Windows machines [TRI,9/14/24]
     
     # Update defaults
     protocol = kwargs.get("protocol", "").lower()
@@ -1463,10 +1461,10 @@ def show_help():
   print('    -maxreads <INT>')
   print('    -mismatches <INT>  # when searching for constant regions in reads 1 and 2; default is 1')
   print('    -flags "<STRING>"  # args to pass to BWA')
-  print('    -bwa-alg [aln|mem]  # Default: mem. Algorithm to use for mapping reads with bwa' )
-  print('    -primer-start-window INT,INT # position in read to search for start of primer; default is [0,20]')
+  print('    -bwa-alg [aln|mem]  # Algorithm to use for mapping reads with bwa; default is \'aln\'' )
+  print('    -primer-start-window INT,INT # position in read to search for start of primer; default is: [0,20]')
   print('    -window-size INT   # automatic method to set window')
-  print('    -barseq_catalog_in|-barseq_catalog_out <file>')
+  #print('    -barseq_catalog_in|-barseq_catalog_out <file>')
   print('    -replicon-ids <comma_separated_list_of_names> # if multiple replicons/genomes/contigs/sequences were provided in -ref, give them names.')
   print('                                                  # Enter \'auto\' for autogenerated ids.')
 


=====================================
src/pytransit/DejaVuSans.ttf
=====================================
Binary files /dev/null and b/src/pytransit/DejaVuSans.ttf differ


=====================================
src/pytransit/__init__.py
=====================================
@@ -2,6 +2,6 @@
 __all__ = ["transit_tools", "tnseq_tools", "norm_tools", "stat_tools"]
 
 
-__version__ = "v3.3.4"
+__version__ = "v3.3.10"
 prefix = "[TRANSIT]"
 


=====================================
src/pytransit/analysis/CGI.py
=====================================
@@ -100,7 +100,7 @@ class CGI_Method(base.SingleConditionMethod):
         return """usage (6 sub-commands):
     python3 ../src/transit.py CGI extract_counts <fastq file> <ids file> > <counts file>
     python3 ../src/transit.py CGI create_combined_counts <comma seperated headers> <counts file 1> <counts file 2> ... <counts file n> > <combined counts file>
-    python3 ../src/transit.py CGI extract_abund <combined counts file> <metadata file> <control condition> <sgRNA strength file> <uninduced ATC file> <drug> <days>  >  <fractional abundundance file>
+    python3 ../src/transit.py CGI extract_abund <combined counts file> <metadata file> <control condition> <sgRNA efficiency file> <uninduced ATC file> <drug> <days>  >  <fractional abundundance file>
     python3 ../src/transit.py CGI run_model <fractional abundundance file>  >  <CRISPRi DR results file>
     python3 ../src/transit.py CGI visualize <fractional abundance> <gene> <output figure location>
     note: redirect output from stdout to output files as shown above"""
@@ -149,11 +149,11 @@ class CGI_Method(base.SingleConditionMethod):
             combined_counts_file = args[0]
             metadata_file = args[1]
             control_condition=args[2]
-            sgRNA_strength_file = args[3]
+            sgRNA_efficiency_file = args[3]
             no_dep_abund = args[4]
             drug = args[5]
             days = args[6]
-            self.extract_abund(combined_counts_file,metadata_file,control_condition,sgRNA_strength_file,no_dep_abund,drug,days)
+            self.extract_abund(combined_counts_file,metadata_file,control_condition,sgRNA_efficiency_file,no_dep_abund,drug,days)
         elif cmd == "run_model":
             if len(args)<1: 
                 print("You have provided incorrect number of args")
@@ -223,6 +223,7 @@ class CGI_Method(base.SingleConditionMethod):
         for id in IDs:
             vals = [id,counts.get(id,0)]
             print('\t'.join([str(x) for x in vals]))
+            print("\n")
        
 
     def create_combined_counts(self,headers, counts_list):
@@ -242,7 +243,7 @@ class CGI_Method(base.SingleConditionMethod):
         print(combined_df_text)
 
 
-    def extract_abund(self,combined_counts_file,metadata_file,control_condition,sgRNA_strength_file,no_dep_abund,drug,days,PC=1e-8):  
+    def extract_abund(self,combined_counts_file,metadata_file,control_condition,sgRNA_efficiency_file,no_dep_abund,drug,days,PC=1e-8):  
         import pandas as pd
         
         metadata = pd.read_csv(metadata_file, sep="\t")
@@ -274,12 +275,12 @@ class CGI_Method(base.SingleConditionMethod):
         elif(len(combined_counts_df.columns)<len(metadata)):
             sys.stderr.write("WARNING: Not all of the samples from the metadata based on this criteron have a column in the combined counts file")
       
-        sgRNA_strength = pd.read_csv(sgRNA_strength_file,sep="\t", index_col=0)
-        sgRNA_strength = sgRNA_strength.iloc[:,-1:]
-        sgRNA_strength.columns = ["sgRNA strength"]
-        sgRNA_strength["sgRNA"] = sgRNA_strength.index
-        sgRNA_strength["sgRNA"]=sgRNA_strength["sgRNA"].str.split("_v", expand=True)[0]
-        sgRNA_strength.set_index("sgRNA",inplace=True)
+        sgRNA_efficiency = pd.read_csv(sgRNA_efficiency_file,sep="\t", index_col=0)
+        sgRNA_efficiency = sgRNA_efficiency.iloc[:,-1:]
+        sgRNA_efficiency.columns = ["sgRNA efficiency"]
+        sgRNA_efficiency["sgRNA"] = sgRNA_efficiency.index
+        sgRNA_efficiency["sgRNA"]=sgRNA_efficiency["sgRNA"].str.split("_v", expand=True)[0]
+        sgRNA_efficiency.set_index("sgRNA",inplace=True)
 
         no_dep_df = pd.read_csv(no_dep_abund, sep="\t", index_col=0, header=None)
         no_dep_df = no_dep_df.iloc[:,-1:]
@@ -289,12 +290,12 @@ class CGI_Method(base.SingleConditionMethod):
         no_dep_df["sgRNA"]=no_dep_df["sgRNA"].str.split("_v", expand=True)[0]
         no_dep_df.set_index("sgRNA",inplace=True)
 
-        abund_df = pd.concat([sgRNA_strength, no_dep_df,combined_counts_df], axis=1)
+        abund_df = pd.concat([sgRNA_efficiency, no_dep_df,combined_counts_df], axis=1)
         abund_df= abund_df[~(abund_df.index.str.contains("Negative") | abund_df.index.str.contains("Empty"))]
         sys.stderr.write("Disregarding Empty or Negative sgRNAs\n")
-        sys.stderr.write("%d sgRNAs are all of the following files : sgRNA strength metadata, uninduced ATC counts file, combined counts file\n"%len(abund_df))
+        sys.stderr.write("%d sgRNAs are all of the following files : sgRNA efficiency metadata, uninduced ATC counts file, combined counts file\n"%len(abund_df))
 
-        headers = ["sgRNA strength","uninduced ATC values"]
+        headers = ["sgRNA efficiency","uninduced ATC values"]
         for i,col in enumerate(column_names):
             abund_df[col] = abund_df[col]/abund_df[col].sum()
             abund_df[col] = (abund_df[col]+PC)/(abund_df["uninduced ATC values"]+PC)
@@ -308,7 +309,7 @@ class CGI_Method(base.SingleConditionMethod):
         abund_df = abund_df.drop(columns=["orf-gene","remaining"])
         abund_df = abund_df.dropna()
         
-        abund_df.insert(0, "sgRNA strength", abund_df.pop("sgRNA strength"))
+        abund_df.insert(0, "sgRNA efficiency", abund_df.pop("sgRNA efficiency"))
         abund_df.insert(0, "uninduced ATC values", abund_df.pop("uninduced ATC values"))
         abund_df.insert(0, 'gene', abund_df.pop('gene'))
         abund_df.insert(0, 'orf', abund_df.pop('orf'))
@@ -339,7 +340,7 @@ class CGI_Method(base.SingleConditionMethod):
             orf = gene_df["orf"].iloc[0]
             gene_df = gene_df.drop(columns=["orf","gene","uninduced ATC values"])
 
-            melted_df = gene_df.melt(id_vars=["sgRNA","sgRNA strength"],var_name="conc",value_name="abund")
+            melted_df = gene_df.melt(id_vars=["sgRNA","sgRNA efficiency"],var_name="conc",value_name="abund")
             melted_df["conc"] = melted_df["conc"].str.split("_", expand=True)[0].astype(float)
             min_conc = min(melted_df[melted_df["conc"]>0]["conc"])
             melted_df.loc[melted_df["conc"]==0,"conc"] = min_conc/2
@@ -363,12 +364,12 @@ class CGI_Method(base.SingleConditionMethod):
             drug_output.append([orf,gene,len(gene_df)]+coeffs.values.tolist()+pvals.values.tolist())
             sys.stderr.flush()
 
-        drug_out_df = pd.DataFrame(drug_output, columns=["Orf","Gene","Nobs", "intercept","coefficient sgRNA_strength","coefficient concentration dependence","pval intercept","pval sgRNA_strength","pval concentration dependence"])
+        drug_out_df = pd.DataFrame(drug_output, columns=["Orf","Gene","Nobs", "intercept","coefficient sgRNA_efficiency","coefficient concentration dependence","pval intercept","pval sgRNA_efficiency","pval concentration dependence"])
         drug_out_df["intercept"] = round(drug_out_df["intercept"],6)
-        drug_out_df["coefficient sgRNA_strength"] = round(drug_out_df["coefficient sgRNA_strength"],6)
+        drug_out_df["coefficient sgRNA_efficiency"] = round(drug_out_df["coefficient sgRNA_efficiency"],6)
         drug_out_df["coefficient concentration dependence"] = round(drug_out_df["coefficient concentration dependence"],6)
         drug_out_df["pval intercept"] = round(drug_out_df["pval intercept"],6)
-        drug_out_df["pval sgRNA_strength"] = round(drug_out_df["pval sgRNA_strength"],6)
+        drug_out_df["pval sgRNA_efficiency"] = round(drug_out_df["pval sgRNA_efficiency"],6)
         drug_out_df["pval concentration dependence"] = round(drug_out_df["pval concentration dependence"],6)
 
 
@@ -395,6 +396,9 @@ class CGI_Method(base.SingleConditionMethod):
     
         drug_out_df  = drug_out_df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)
         drug_out_txt = drug_out_df.to_csv(sep="\t", index=False)
+        print("# Total Significant Gene Interactions : ", n)
+        print("# Significant Gene Depletions : ", depl_n)
+        print("# Significant Gene Enrichments : ", enrich_n)
         print(drug_out_txt)
 
     def visualize(self,fractional_abundances_file, gene, fig_location):
@@ -435,7 +439,7 @@ class CGI_Method(base.SingleConditionMethod):
             X_in = sm.add_constant(X, has_constant='add')
             results = sm.OLS(Y,X_in).fit()
             all_slopes.append(results.params[1])
-            data["sgRNA strength"] = [row["sgRNA strength"]] * len(data)
+            data["sgRNA efficiency"] = [row["sgRNA efficiency"]] * len(data)
             data["slope"] = [results.params[1]] * len(data)
             df_list.append(data)
 
@@ -445,10 +449,10 @@ class CGI_Method(base.SingleConditionMethod):
         cmap =  mpl.colors.LinearSegmentedColormap.from_list("", ["#8ecae6","#219ebc","#023047","#ffb703","#fb8500"], N=len(abund_df))
         palette = [mpl.colors.rgb2hex(cmap(i)) for i in range(cmap.N)]
         #print("-----------", bo_palette.as_hex())
-        g = sns.lmplot(data=plot_df, x='Log (Concentration)', y='Log (Relative Abundance)', hue="sgRNA strength", palette=palette, legend=False,ci=None, scatter=False, line_kws={"lw":0.75})
+        g = sns.lmplot(data=plot_df, x='Log (Concentration)', y='Log (Relative Abundance)', hue="sgRNA efficiency", palette=palette, legend=False,ci=None, scatter=False, line_kws={"lw":0.75})
 
-        sm1 = mpl.cm.ScalarMappable(norm=mpl.colors.Normalize(vmin=plot_df['sgRNA strength'].min(), vmax=0, clip=False), cmap=cmap)
-        g.figure.colorbar(sm1, shrink=0.8, aspect=50, label="sgRNA strength")
+        sm1 = mpl.cm.ScalarMappable(norm=mpl.colors.Normalize(vmin=plot_df['sgRNA efficiency'].min(), vmax=0, clip=False), cmap=cmap)
+        g.figure.colorbar(sm1, shrink=0.8, aspect=50, label="sgRNA efficiency")
         g.set(ylim=(-2.5, 1.0))
         plt.gca().set_title(gene+"\n"+condition, wrap=True)
         plt.tight_layout()


=====================================
src/pytransit/analysis/pathway_enrichment.py
=====================================
@@ -79,7 +79,8 @@ class PathwayGUI(base.AnalysisGUI):
 
 class PathwayMethod(base.AnalysisMethod):
 
-  def __init__(self,resamplingFile,associationsFile,pathwaysFile,outputFile,method,PC=0,Nperm=10000,p=0,ranking="SLPV",Pval_col=-2,Qval_col=-1,LFC_col=6): # default cols are for resampling files
+  def __init__(self,resamplingFile,associationsFile,pathwaysFile,outputFile,method,PC=0,Nperm=10000,p=0,ranking="SLPV",Pval_col=-2,Qval_col=-1,LFC_col=6, 
+               focusLFC="all", minLFC=0, qvalCutoff=1, topk=-1): # default cols are for resampling files
     base.AnalysisMethod.__init__(self, short_name, long_name, short_desc, long_desc, open(outputFile,"w"), None) # no annotation file
     self.resamplingFile = resamplingFile
     self.associationsFile = associationsFile
@@ -91,6 +92,10 @@ class PathwayMethod(base.AnalysisMethod):
     self.Qval_col = Qval_col
     self.LFC_col = LFC_col
     self.PC = PC # for FET
+    self.focusLFC = focusLFC # for FET
+    self.minLFC = minLFC # for FET
+    self.qvalCutoff = qvalCutoff # for FET
+    self.topk = topk # for FET
     self.Nperm = Nperm # for GSEA
     self.p = p # for GSEA
     self.ranking = ranking # for GSEA
@@ -111,6 +116,11 @@ class PathwayMethod(base.AnalysisMethod):
     Qval_col = int(kwargs.get("Qval_col","-1"))
     LFC_col = int(kwargs.get("LFC_col","6"))
     PC = int(kwargs.get("PC","2")) # for FET
+    focusLFC = kwargs.get("focusLFC", "all")# for FET
+    minLFC = float(kwargs.get("minLFC", "0"))# for FET
+    #Don't forget to add to the return line or the init function header
+    topk = int(kwargs.get("topk", "-1"))# for FET
+    qvalCutoff = float(kwargs.get("qval", "1"))# for FET
     Nperm = int(kwargs.get("Nperm", "10000")) # for GSEA
     p = float(kwargs.get("p","0")) # for GSEA
     ranking = kwargs.get("ranking","SLPV") # for GSEA
@@ -120,7 +130,13 @@ class PathwayMethod(base.AnalysisMethod):
       print(self.usage_string()); 
       sys.exit(0)
 
-    return self(resamplingFile,associations,pathways,output,method,PC=PC,Nperm=Nperm,p=p,ranking=ranking,Pval_col=Pval_col,Qval_col=Qval_col,LFC_col=LFC_col)
+    if focusLFC not in "pos neg all".split(): 
+      print("error: focusLFC value %s not recognized" % focusLFC)
+      print(self.usage_string()); 
+      sys.exit(0)
+
+    return self(resamplingFile,associations,pathways,output,method,PC=PC,Nperm=Nperm,p=p,ranking=ranking,Pval_col=Pval_col,Qval_col=Qval_col,LFC_col=LFC_col, 
+                focusLFC=focusLFC, minLFC=minLFC, qvalCutoff=qvalCutoff, topk=topk)
 
   @classmethod
   def usage_string(self):
@@ -131,12 +147,16 @@ Optional parameters:
  -Pval_col <int>    : indicate column with *raw* P-values (starting with 0; can also be negative, i.e. -1 means last col) (used for sorting) (default: -2)
  -Qval_col <int>    : indicate column with *adjusted* P-values (starting with 0; can also be negative, i.e. -1 means last col) (used for significant cutoff) (default: -1)
  for GSEA...
- -ranking SLPV|LFC  : SLPV is signed-log-p-value (default); LFC is log2-fold-change from resampling 
- -LFC_col <int>     : indicate column with log2FC (starting with 0; can also be negative, i.e. -1 means last col) (used for ranking genes by SLPV or LFC) (default: 6)
- -p <float>         : exponent to use in calculating enrichment score; recommend trying 0 or 1 (as in Subramaniam et al, 2005)
- -Nperm <int>       : number of permutations to simulate for null distribution to determine p-value (default=10000)
+   -ranking SLPV|LFC  : SLPV is signed-log-p-value (default); LFC is log2-fold-change from resampling 
+   -LFC_col <int>     : indicate column with log2FC (starting with 0; can also be negative, i.e. -1 means last col) (used for ranking genes by SLPV or LFC) (default: 6)
+   -p <float>         : exponent to use in calculating enrichment score; recommend trying 0 or 1 (as in Subramaniam et al, 2005)
+   -Nperm <int>       : number of permutations to simulate for null distribution to determine p-value (default=10000)
  for FET...
- -PC <int>          :  pseudo-counts to use in calculating p-value based on hypergeometric distribution (default=2)
+   -focusLFC pos|neg  :  filter the output to focus on results with positive (pos) or negative (neg) LFCs (default: "all", no filtering)
+   -minLFC <float>    :  filter the output to include only genes that have a magnitude of LFC greater than the specified value (default: 0) (e.g. '-minLFC 1' means analyze only genes with 2-fold change or greater)
+   -qval <float>      :  filter the output to include only genes that have Qval less than to the value specified (default: 0.05)
+   -topk <int>        :  calculate enrichment among top k genes ranked by significance (Qval) regardless of cutoff (can combine with -focusLFC)
+   -PC <int>          :  pseudo-counts to use in calculating p-value based on hypergeometric distribution (default=2)
 """ % (sys.argv[0])
 
   def Run(self):
@@ -344,6 +364,20 @@ Optional parameters:
     return hypergeom.sf(k,M,n,N)
 
   def fisher_exact_test(self):
+
+      #DONE: add -qval cut-off flag and the topk flag from heatmap.py
+      # Keep in mind how should these flags should interact with each other (eg. focusLFC should have prioirty, then topk subsets the subset)
+
+      #DONE: add LFC cutoff flag (ignore genes that have too small of LFC, eg MIN absolute value or min magnitude)
+      # should default to zero, eg. -minLFC 1 [meaning magnitude(abs) >= 1]
+
+      #TODO: learn how add these as checkboxes in GUI [for transit 2] 
+
+
+      # Hiearchy of flags:     
+      #  
+      #  START: qval (OPTIONAL) / topk [Mutually Exclusive, topk changes filter of qval] -> focusLFC -> minLFC [END]
+      #
     genes,hits,headers = self.read_resampling_file(self.resamplingFile) # use self.Qval_col to determine hits
     associations = self.read_associations(self.associationsFile)
     pathways = self.read_pathways(self.pathwaysFile)
@@ -353,13 +387,55 @@ Optional parameters:
     # do all associations have a definition in pathways?
     # how many pathways have >1 gene? (out of total?) what is max?
 
+    focus_genes = genes
+
+    # Filter by only returning the top k genes (by q-value)
+    if self.topk != -1:
+      # should this account if there are mulitple genes with qval == 0 ??
+
+        k_list = [(w[0], w[-1]) for w in focus_genes] # get a list of tuples, where it's just (orf, q-value)
+        k_list = sorted(k_list, key=lambda tup: tup[1]) # sort
+        k_list = k_list[:self.topk] # get top k genes
+
+        k_list = [k[0] for k in k_list] # remove the q-values, getting just the orfs
+
+        focus_genes = list(filter(lambda w: w[0] in k_list, focus_genes)) # then get all data points that are in our top-k subset
+        hits = list(set([w[0] for w in focus_genes]) & set(hits)) 
+
+    # Q-value filtering
+    if self.qvalCutoff != 1 and self.topk == -1:# don't run the qvalCutoff filter if it's the default value and if topk is default (not being used)
+      focus_genes = list(filter(lambda w: float(w[self.Qval_col]) <= self.qvalCutoff, focus_genes))
+      hits = list(set([w[0] for w in focus_genes]) & set(hits)) 
+
+    # Sign-based log-fold-change filtering
+    if self.focusLFC == "pos":
+      focus_genes = list(filter(lambda w: float(w[self.LFC_col]) > 0, focus_genes))
+      hits = list(set([w[0] for w in focus_genes]) & set(hits)) # filter the hits to only include positive LFCs by doing an intersection between the newly filtered orfs and the hits (that include all LFCs)
+                                                          # by turning both lists into sets and intersecting (&) them, seemed to be the fastest way without adding too much more to MEM space
+    elif self.focusLFC == "neg":
+      focus_genes = list(filter(lambda w: float(w[self.LFC_col]) < 0, focus_genes))
+      hits = list(set([w[0] for w in focus_genes]) & set(hits))
+
+    # Minimum log-fold change filtering
+    if self.minLFC != 0: # don't run the minLFC filter if it's the default value
+      focus_genes = list(filter(lambda w: abs(float(w[self.LFC_col])) >= self.minLFC, focus_genes)) # we only want to keep values that are greater or equal to than the flag value
+                                                                                                   # This is done intentionally after the focusLFC filter and uses its results
+      hits = list(set([w[0] for w in focus_genes]) & set(hits))
+
     genes_with_associations = 0
-    for gene in genes: 
+    for gene in focus_genes: # uses the fitlered subset
       orf = gene[0]
       if orf in associations: genes_with_associations += 1
-    self.write("# method=FET, PC=%s" % self.PC)
-    self.write("# genes with associations=%s out of %s total" % (genes_with_associations,len(genes)))
-    self.write("# significant genes (qval<0.05): %s" % (len(hits)))
+    self.write("# method=FET, PC=%s, focusLFC=%s, minLFC=%s, qval=%s, topk=%s" % (self.PC, self.focusLFC, self.minLFC, self.qvalCutoff, self.topk))
+
+    # Added a subsetted-total to the print-out because it's confusing to see the length of the entire gene-set when the associations are only over the filtered subset
+    if self.focusLFC != "all" or self.minLFC != 0 or self.qvalCutoff != 1: #only do the subset printout when one flag isn't default
+      self.write("# genes with associations=%s out of %s total, %s out of %s subsetted-total" % (genes_with_associations,len(genes), genes_with_associations,len(focus_genes)))
+      self.write("# significant genes (qval<%s): %s" % (self.qvalCutoff, len(hits)))
+    else:
+      self.write("# genes with associations=%s out of %s total" % (genes_with_associations,len(genes)))
+      self.write("# significant genes (qval<0.05): %s" % (len(hits)))
+
 
     terms = list(pathways.keys())
     terms.sort()


=====================================
src/pytransit/data/CGI/temp_cdr.txt deleted
=====================================
The diff for this file was not included because it is too large.

=====================================
src/pytransit/data/CGI/temp_frac_abund.txt deleted
=====================================
The diff for this file was not included because it is too large.

=====================================
src/pytransit/data/gene_ontology.1_2.3-11-18.obo deleted
=====================================
The diff for this file was not included because it is too large.

=====================================
src/pytransit/doc/source/CGI.rst
=====================================
@@ -54,7 +54,7 @@ This is a fairly fast process. It takes at most a minute for the combination of
 
 ::
 
-    > python3 ../src/transit.py CGI extract_abund <combined counts file> <counts metadata file> <control condition> <sgRNA strengths file> <uninduced ATC file> <drug> <days>  >  <fractional abundance file>
+    > python3 ../src/transit.py CGI extract_abund <combined counts file> <counts metadata file> <control condition> <sgRNA efficiencies file> <uninduced ATC file> <drug> <days>  >  <fractional abundance file>
 
 * counts metadata file (USER created):
 
@@ -71,7 +71,7 @@ This is a fairly fast process. It takes at most a minute for the combination of
 
 * control condition: The condition to to be considered the control for these set of experiments, as specificed in the "drug" column of the metadata file; typically an atc-induced (+ ATC) with 0 drug concentration condition.
 
-* sgRNA strengths file: A file that contains metadata for each sgRNA in the combined counts file, where the first column must be sgRNA id (as seen in the combined counts file) and the last column must be the strength measurement of the sgRNAs (in publication of this method, sgRNA strength is measurement as extrapolated LFCs calculated through a passaging experiment).
+* sgRNA efficiencies file: A file that contains metadata for each sgRNA in the combined counts file, where the first column must be sgRNA id (as seen in the combined counts file) and the last column must be the efficiency measurement of the sgRNAs (in publication of this method, sgRNA efficiency is measurement as extrapolated LFCs calculated through a passaging experiment).
 
 * uninduced ATC file: A two column file of sgRNAs and their counts in uninduced ATC (no ATC) with 0 drug concentration 
 
@@ -100,13 +100,13 @@ The output columns in this file are:
 
 * intercept - the resulting intercept of the overall fitted regression
 
-* coefficient sgRNA_strength - coefficient of the amount sgRNA strength contributes to the decrease in abundance
+* coefficient sgRNA_efficiency - coefficient of the amount sgRNA efficiency contributes to the decrease in abundance
 
 * coefficient concentration dependence - coefficient of the amount concentration contributes to the decrease in abundance
 
 * pval intercept - the wald test based p-value of the intercept
 
-* pval sgRNA_strength - the wald test based p-value of the coefficient sgRNA_strength
+* pval sgRNA_efficiency - the wald test based p-value of the coefficient sgRNA_efficiency
 
 * pval concentration dependence - the wald test based p-value of the coefficient of concentration dependence
 
@@ -133,13 +133,13 @@ This process is fairly quick, taking less than a minute to run. This figure visu
 * fractional abundance file : Fractional abundance file as created in Step 2. 
 
     .. warning::
-        This visualization assumes the columns are in increasing order of concentration, with the first three abundance columns (after the column "sgRNA strength"), as the control. This order depends on the order of columns during the creation of the combined counts file in Step 1.
+        This visualization assumes the columns are in increasing order of concentration, with the first three abundance columns (after the column "sgRNA efficiency"), as the control. This order depends on the order of columns during the creation of the combined counts file in Step 1.
 
 * gene : select a gene to visualize. Use orf or gene name
 * output plot location : The location where to save the generated plot.
 
 .. note::
-    If comparing plots from different genes, note the scale of sgRNA strength shown in the plots.
+    If comparing plots from different genes, note the scale of sgRNA efficiency shown in the plots.
 
 
 Tutorial
@@ -212,7 +212,7 @@ The resulting file will have 13 columns, where the first column is sgRNA ids and
 
     > python3 ../../../transit.py CGI extract_abund RIF_D1_combined_counts.txt samples_metadata.txt DMSO sgRNA_info.txt uninduced_ATC_counts.txt RIF 1  >  RIF_D1_frac_abund.txt
 
-The result of this command should be a file with a set of comments at the top, detailing the libraries used (DMSO and RIF). There should be a total of 17 columns, the last 12 of which are the calculated abundances, the first is the sgRNA ids followed by the orf/gene the sgRNA is targeting, uninduced ATC values, and sgRNA strength. 
+The result of this command should be a file with a set of comments at the top, detailing the libraries used (DMSO and RIF). There should be a total of 17 columns, the last 12 of which are the calculated abundances, the first is the sgRNA ids followed by the orf/gene the sgRNA is targeting, uninduced ATC values, and sgRNA efficiency. 
 
 **Step 3: Run the CRISPRi-DR model**
 ::
@@ -226,11 +226,11 @@ There should be a total of 184 significant gene interactions, where 111 are sign
 
 **Visualize Specific Genes**
 
-Here are a few samples of the interactions visualized at the sgRNA level for this experiment. Note the difference in sgRNA strength scales shown.
+Here are a few samples of the interactions visualized at the sgRNA level for this experiment. Note the difference in sgRNA efficiency scales shown.
 
 *Significantly depleted gene : RVBD3645*
 
-*RVBD3645* is one of the significantly depleted genes in this experiment. In this plot, notice how most of the slopes are negative but the amount of depletion varies, where the more blue slopes (higher sgRNA strength) are steeper than orange sgRNA slopes (lower sgRNA strength)
+*RVBD3645* is one of the significantly depleted genes in this experiment. In this plot, notice how most of the slopes are negative but the amount of depletion varies, where the more blue slopes (higher sgRNA efficiency) are steeper than orange sgRNA slopes (lower sgRNA efficiency)
 
 .. image:: _images/RVBD3645_lmplot.png
   :width: 400
@@ -242,7 +242,7 @@ Here are a few samples of the interactions visualized at the sgRNA level for thi
 
 *Significantly enriched gene : ndh*
 
-*ndh* is one of the signifincantly enriched genes in this experiment. In its plot, notice how sgRNAs of high strength (blue green ones) show a strong upwards trend but those will lower strength (the orange ones) do not. In fact there a few sgRNAs that show almost no change in fractional abundace as concentration increases.
+*ndh* is one of the signifincantly enriched genes in this experiment. In its plot, notice how sgRNAs of high efficiency (blue green ones) show a strong upwards trend but those will lower efficiency (the orange ones) do not. In fact there a few sgRNAs that show almost no change in fractional abundace as concentration increases.
 
 .. image:: _images/ndh_lmplot.png
   :width: 400


=====================================
src/pytransit/doc/source/_images/RVBD3645_lmplot.png
=====================================
Binary files a/src/pytransit/doc/source/_images/RVBD3645_lmplot.png and b/src/pytransit/doc/source/_images/RVBD3645_lmplot.png differ


=====================================
src/pytransit/doc/source/_images/ndh_lmplot.png
=====================================
Binary files a/src/pytransit/doc/source/_images/ndh_lmplot.png and b/src/pytransit/doc/source/_images/ndh_lmplot.png differ


=====================================
src/pytransit/doc/source/_images/thiL_lmplot.png
=====================================
Binary files a/src/pytransit/doc/source/_images/thiL_lmplot.png and b/src/pytransit/doc/source/_images/thiL_lmplot.png differ


=====================================
src/pytransit/doc/source/method_pathway_enrichment.rst
=====================================
@@ -21,13 +21,15 @@ and evaluated for overlap with functional categories of genes.
 The GSEA methods use the whole list of genes, ranked in order of statistical significance
 (without requiring a cutoff), to calculate enrichment.
 
-Three systems of categories are provided for (but you can add your own):
+Four systems of categories are provided for (but you can add your own):
 the Sanger functional categories of genes determined in the
 original annotation of the H37Rv genome (`Cole et al, 1998 <https://www.ncbi.nlm.nih.gov/pubmed/9634230>`_,
 with subsequent updates),
-COG categories (`Clusters of Orthologous Genes <https://www.ncbi.nlm.nih.gov/pubmed/25428365>`_) and
-also GO terms (Gene Ontology).  The supporting files for *M. tuberculosis*
-H37Rv are in the src/pytransit/data/ directory.
+COG categories (`Clusters of Orthologous Genes <https://www.ncbi.nlm.nih.gov/pubmed/25428365>`_),
+also GO terms (Gene Ontology), and `KEGG <https://www.genome.jp/kegg/>`_.
+Pre-formatted annotation files for *M. tuberculosis* H37Rv and several other mycobacteria can be found in
+`pathways.html <https://orca1.tamu.edu/essentiality/transit/pathways.html>`_.
+
 
 For other organisms, it might be possible to download COG categories from
 `http://www.ncbi.nlm.nih.gov/COG/ <http://www.ncbi.nlm.nih.gov/COG/>`_
@@ -38,7 +40,7 @@ the *associations* file format described below. (The *pathways* files for COG ca
 in the Transit data directory should still work, because they just encode pathways names for all terms/ids.)
 
 At present, pathway enrichment analysis is only implemented as a command-line function,
-and is not available in the Transit GUI.
+and is not available in the Transit GUI (for Transit1).
 
 
 Usage
@@ -46,7 +48,7 @@ Usage
 
 ::
 
-  > python3 ../../transit.py pathway_enrichment <resampling_file> <associations> <pathways> <output_file> [-M <FET|GSEA|GO>] [-PC <int>] [-ranking SLPV|LFC] [-p <float>] [-Nperm <int>] [-Pval_col <int>] [-Qval_col <int>]  [-LFC_col <int>]
+  > python3 ../../transit.py pathway_enrichment <resampling_file> <associations> <pathways> <output_file> [-M <FET|GSEA|GO>] [optional paramters...]
 
   Optional parameters:
      -M FET|GSEA|ONT:     method to use, FET for Fisher's Exact Test (default), GSEA for Gene Set Enrichment Analysis (Subramaniam et al, 2005), or ONT for Ontologizer (Grossman et al, 2007)
@@ -58,6 +60,10 @@ Usage
      -p <float>         : exponent to use in calculating enrichment score; recommend trying 0 or 1 (as in Subramaniam et al, 2005)
      -Nperm <int>       : number of permutations to simulate for null distribution to determine p-value (default=10000)
  for FET...
+     -focusLFC pos|neg  :  filter the output to focus on results with positive (pos) or negative (neg) LFCs (default: "all", no filtering)
+     -minLFC <float>    :  filter the output to include only genes that have a megnitude of LFC greater than the specified value (default: 0) (e.g. '-minLFC 1' means analyze only genes with 2-fold change or greater)
+     -qval <float>      :  filter the output to include only genes that have Qval less than to the value specified (default: 0.05)
+     -topk <int>        :  calculate enrichment among top k genes ranked by significance (Qval) regardless of cutoff (can combine with -focusLFC)
      -PC <int>          :  pseudo-counts to use in calculating p-value based on hypergeometric distribution (default=2)
 
 |
@@ -114,7 +120,12 @@ Parameters
 
     Additional flags for FET:
 
-    - **-PC <int>**: Pseudocounts used in calculating the enrichment score and p-value by hypergeometic distribution. Default: PC=2.
+    - **-focusLFC pos|neg**  : filter the output to focus on genes with positive (pos) or negative (neg) LFCs (default: "all", no filtering)
+    - **-minLFC <float>**    : filter the output to include only genes that have |LFC| (magnitude of log2-fold change) >= the specified value (default: 0; e.g. '-minLFC 1' means restriction to genes with 2-fold change or greater)
+    - **-qval <float>**      : set Q-value cutoff (analyze genes with Qval<cutoff)  (default: 0.05)
+    - **-topk <int>**        : analyze enrichment in top K genes sorted by significance (Qval), regardless of Qval cutoff (can combine with -focusLFC)
+    - **-PC <int>**          : Pseudocounts used in calculating the enrichment score and p-value by hypergeometric distribution. Default: PC=2.
+
 
   **-M GSEA**
     Gene Set Enrichment Analysis. GSEA assess the significance of a pathway by looking at how the members fall in the ranking of all genes.  The genes are first ranked by significance from resampling.  Specifically, they are sorted by signed-log-p-value, SLPV=sign(LFC)*(log(pval)), which puts them in order so that the most significant genes with negative LFC are at the top, the most significant with positive LFC are at the bottom, and insignificant genes fall in the middle.  Roughly, GSEA computes the mean rank of pathway members, and evaluates significance based on a simulated a null distribution.  p-values are again adjusted at the end by BH.
@@ -161,13 +172,15 @@ Parameters
   2. the relative abundance of genes with this GO term compared to those with a parent GO term   in the whole genome
 
 
-Auxilliary Pathway Files in Transit Data Directory
---------------------------------------------------
+Pathway Association Files
+------------------------
 
 ::
 
-These files for pathway analysis are distributed in the Transit data directory
-(e.g. transit/src/pytransit/data/).
+Pathway association files for several mycobacterial species (*M. tuberculosis,
+M. smegmatis, M. abscessus*, etc.) can be downloaded from our 
+`pathways.html <https://orca1.tamu.edu/essentiality/transit/pathways.html>`_ web page.
+The pathway annotations include COG, KEGG, Sanger, and GO terms.
 
 Note: The "Sanger" roles are custom pathway associations for
 *M. tuberculosis* defined in the original Nature paper on
@@ -179,34 +192,11 @@ Uniprot, or geneontology.org) and COG roles (from
 https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/, `(Galerpin et al, 2021)
 <https://academic.oup.com/nar/article/49/D1/D274/5964069>`_ ).
 
-Pathway association files for *M. smegmatis* mc2 155 are also provided in the table below.
-
-
-+----------+----------+--------------------+--------------------------------------+------------------------------------+
-| system   | num roles| applicable methods | associations of genes with roles     | pathway definitions/role names     |
-+==========+==========+====================+======================================+====================================+
-| COG      | 25       | FET*, GSEA         | H37Rv_COG_roles.dat;                 | COG_roles.dat                      |
-|          |          |                    | smeg_COG_roles.dat                   |                                    |
-+----------+----------+--------------------+--------------------------------------+------------------------------------+
-| Sanger   | 153      | FET*, GSEA*        | H37Rv_sanger_roles.dat               | sanger_roles.dat                   |
-+----------+----------+--------------------+--------------------------------------+------------------------------------+
-| GO       | 2545     | ONT*               | H37Rv_GO_terms.txt;                  | gene_ontology.1_2.3-11-18.obo      |
-|          |          |                    | smeg_GO_terms.txt                    |                                    |
-+----------+----------+--------------------+--------------------------------------+------------------------------------+
-|          |          | FET, GSEA          | H37Rv_GO_terms.txt;                  | GO_term_names.dat                  |
-|          |          |                    | smeg_GO_terms.txt                    |                                    |
-+----------+----------+--------------------+--------------------------------------+------------------------------------+
-
-'\*' means *recommended* combination of method with system of functional categories
-
-
-Current Recommendations
------------------------
 
 Here are the recommended combinations of pathway methods to use for different systems of functional categories:
 
  * For COG, use '-M FET'
- * For Sanger roles, try both FET and GSEA
+ * For KEGG and Sanger pathways, try both FET and GSEA
  * For GO terms, use 'M -ONT'
 
 


=====================================
src/pytransit/doc/source/tpp.rst
=====================================
@@ -207,10 +207,9 @@ The main fields to fill out in the GUI are...
   using the Sassetti protocol, i.e. near the beginning of reads, with some small
   random shifts)
 
--  **BWA executable** - you'll have to find the path to where the executable is installed
+- **BWA executable** - you'll have to find the path to where the executable is installed
 
-- **BWA algorithm** - there are 2 options: 'aln' and 'mem'.  'aln' was originally used in Transit,
-  but the default has now been switched to 'mem', which should be able to map more reads
+- **BWA algorithm** - there are 2 options: 'aln' and 'mem'.  The default is 'aln'.  'mem' can map a few more reads (increasing NZmean <5%), but is not faster, and doesn't work on Windows machines (mem required shared memory, but Windows gives a '/dev/shm' error message)
 
 - **BWA flags** - if you want to pass through options to BWA
 
@@ -270,7 +269,8 @@ The input arguments and file types are as follows:
 +======================+==================================================+======================================================+
 | -bwa                 | path executable                                  |                                                      |
 +----------------------+--------------------------------------------------+------------------------------------------------------+
-| -bwa-alg             | 'mem' (default) or 'aln' (the old way)           |                                                      |
+| -bwa-alg             | 'aln' (default) or 'mem'                         | 'mem' might increase NZmean slightly (<5%),          |
+|                      |                                                  | but doesn't work on Windows machines                 |
 +----------------------+--------------------------------------------------+------------------------------------------------------+
 | -flag                | parameters to pass to BWA                        |                                                      |
 +----------------------+--------------------------------------------------+------------------------------------------------------+


=====================================
src/pytransit/doc/source/transit_install.rst
=====================================
@@ -33,60 +33,72 @@ If you encounter problems, please :ref:`contact us <developers>` or head to the
 Requirements
 ------------
 
-TRANSIT runs on both python2.7 and python3. But the dependencies vary slightly.
-
 Starting with release v3.0, TRANSIT now requires python3. 
 
+TRANSIT runs on both python2.7 and python3. But the dependencies vary slightly.
+
 To use TRANSIT with python2, use a TRANSIT release prior to 3.0 (e.g. v2.5.2)
 
+.. NOTE::
+  **Note about Python3.12**: The update to Python3.12 made changes in the setuptools package that 
+  disrupted the installation of Transit as well as other python packages (including some Transit
+  dependencies, like wxPython).
+  
+  We are working on fixing these problems caused by 3.12.  In the meantime,
+  if you have difficulty installing Transit with Python3.12, we recommend trying it with Python3.11, with
+  which Transit should work fine.
 
-Python 2.7:
+|
+
+
+Python 3:
 -----------
 
 The following libraries/modules are required to run TRANSIT:
 
-+ `Python 2.7 <http://www.python.org>`_
-+ `Numpy <http://www.numpy.org/>`_ (tested on 1.15.0)
++ `Python 3+ <http://www.python.org>`_
++ `Numpy <http://www.numpy.org/>`_ (tested on 1.16.0)
 + `Statsmodels <https://pypi.org/project/statsmodels/>`_ (tested on 0.9.0)
-+ `Scipy <http://www.scipy.org/>`_ (tested on 1.1)
-+ `matplotlib <http://matplotlib.org/users/installing.html>`_ (tested on 2.2)
-+ `Pillow 5.0 <https://github.com/python-pillow/Pillow>`_
++ `Scipy <http://www.scipy.org/>`_ (tested on 1.2)
++ `matplotlib <http://matplotlib.org/users/installing.html>`_ (tested on 3.0)
++ `Pillow 6.0 <https://github.com/python-pillow/Pillow>`_
 + `wxpython 4+ <http://www.wxpython.org/>`_
-+ `PyPubSub 3.3 <https://pypi.org/project/PyPubSub/>`_ (Version 4.0 does not support python2 `Github Issue <https://github.com/schollii/pypubsub/issues/9>`_)
++ `PyPubSub 4+ <https://pypi.org/project/PyPubSub/>`_ (tested on 4.0.3)
 
 All of these dependencies can be installed using the following command.
 
 ::
 
-   pip install numpy scipy pillow "pypubsub<4.0" "matplotlib<3.0" statsmodels wxPython
+   pip3 install numpy scipy pillow pypubsub matplotlib statsmodels wxPython
 
 Pip and Python are usually preinstalled in most modern operating systems.
 
 |
 
-Python 3:
+Python 2.7:
 -----------
 
 The following libraries/modules are required to run TRANSIT:
 
-+ `Python 3+ <http://www.python.org>`_
-+ `Numpy <http://www.numpy.org/>`_ (tested on 1.16.0)
++ `Python 2.7 <http://www.python.org>`_
++ `Numpy <http://www.numpy.org/>`_ (tested on 1.15.0)
 + `Statsmodels <https://pypi.org/project/statsmodels/>`_ (tested on 0.9.0)
-+ `Scipy <http://www.scipy.org/>`_ (tested on 1.2)
-+ `matplotlib <http://matplotlib.org/users/installing.html>`_ (tested on 3.0)
-+ `Pillow 6.0 <https://github.com/python-pillow/Pillow>`_
++ `Scipy <http://www.scipy.org/>`_ (tested on 1.1)
++ `matplotlib <http://matplotlib.org/users/installing.html>`_ (tested on 2.2)
++ `Pillow 5.0 <https://github.com/python-pillow/Pillow>`_
 + `wxpython 4+ <http://www.wxpython.org/>`_
-+ `PyPubSub 4+ <https://pypi.org/project/PyPubSub/>`_ (tested on 4.0.3)
++ `PyPubSub 3.3 <https://pypi.org/project/PyPubSub/>`_ (Version 4.0 does not support python2 `Github Issue <https://github.com/schollii/pypubsub/issues/9>`_)
 
 All of these dependencies can be installed using the following command.
 
 ::
 
-   pip3 install numpy scipy pillow pypubsub matplotlib statsmodels wxPython
+   pip install numpy scipy pillow "pypubsub<4.0" "matplotlib<3.0" statsmodels wxPython
 
 Pip and Python are usually preinstalled in most modern operating systems.
 
 |
+
 .. _install-zinb:
 
 Additional Requirements: R (statistical analysis package)


=====================================
src/pytransit/draw_trash.py
=====================================
@@ -20,7 +20,7 @@
 
 import pytransit.view_trash as view_trash
 from math import *
-import os
+import os,sys
 import platform
 import numpy
 from PIL import Image, ImageDraw, ImageFont
@@ -34,33 +34,39 @@ def normalize(X, old_min, old_max, new_min, new_max):
         return (((X - old_min) * new_range) / old_range) + new_min
 
 
-linuxFonts = []
-linuxFonts.append("/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans-Bold.ttf")
-linuxFonts.append("/usr/share/fonts/dejavu-lgc/DejaVuLGCSerifCondensed-Bold.ttf")
-linuxFonts.append("/usr/share/fonts/dejavu-lgc/DejaVuLGCSansCondensed-Bold.ttf")
-
-winFonts = []
-winFonts.append("consolab.ttf")
-winFonts.append("courb.ttf")
-winFonts.append("arial.ttf")
-fontsize = 16
-
-font = ImageFont.load_default()
-if platform.system() == "Linux":
-    for fontpath in linuxFonts:
-        if os.path.isfile(fontpath):
-            font = ImageFont.truetype(fontpath, fontsize)
-            break
-
-elif platform.system() == "Windows":
-    for fontpath in winFonts:
-        try:
-            font = ImageFont.truetype(fontpath, fontsize)
-            break    
-        except:
-            pass
-
-
+script_directory = os.path.dirname(os.path.realpath(__file__))
+font = ImageFont.truetype("%s/DejaVuSans.ttf" % script_directory,size=12) # should be in transit/src/pytransit/
+
+# linuxFonts = []
+# linuxFonts.append("DejaVuSans.ttf")
+# #linuxFonts.append("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf")
+# #linuxFonts.append("/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans-Bold.ttf")
+# #linuxFonts.append("/usr/share/fonts/dejavu-lgc/DejaVuLGCSerifCondensed-Bold.ttf")
+# #linuxFonts.append("/usr/share/fonts/dejavu-lgc/DejaVuLGCSansCondensed-Bold.ttf")
+# 
+# winFonts = []
+# winFonts.append("consolab.ttf")
+# winFonts.append("courb.ttf")
+# winFonts.append("arial.ttf")
+# fontsize = 16
+# 
+# font = ImageFont.load_default()
+# 
+# # what about Macs?
+# if platform.system() == "Linux":
+#     for fontpath in linuxFonts:
+#         if os.path.isfile(fontpath):
+#             font = ImageFont.truetype(fontpath, fontsize)
+#             break
+# elif platform.system() == "Windows":
+#     for fontpath in winFonts:
+#         try:
+#             font = ImageFont.truetype(fontpath, fontsize)
+#             break    
+#         except:
+#             pass
+ 
+ 
 def draw_reads(draw, reads, ta_sites, start_x=0, start_y=0, width=400, height=100, start=0, end=500, min_read=0, max_read=500, lwd=2):
 
 
@@ -97,19 +103,41 @@ def draw_ta_sites(draw, ta_sites, start_x=0, start_y=0, width=200, height=0, sta
         draw.line([(TApos, start_y+0), (TApos, start_y + height)], width=lwd, fill="black")
 
 
+# replaced draw.textsize(), which was deprecated in v10 of Pillow (TRI, 10/12/24)
+
+def newtextsize(text, font):
+  # https://github.com/python-pillow/Pillow/issues/7322
+  left, top, right, bottom = font.getbbox(text)
+  width = right - left
+  height = bottom - top
+  return (width, height)
+
+#    # https://levelup.gitconnected.com/how-to-properly-calculate-text-size-in-pil-images-17a2cc6f51fd
+#    # https://stackoverflow.com/a/46220683/9263761
+#    ascent, descent = font.getmetrics()
+#    text_width = font.getmask(text_string).getbbox()[2]
+#    text_height = font.getmask(text_string).getbbox()[3] + descent
+#    return (text_width, text_height)
+
+#    im = Image.new(mode="P", size=(0, 0))
+#    draw = ImageDraw.Draw(im)
+#    _, _, width, height = draw.textbbox((0, 0), text=text, font=font)
+#    return width, height
+
+#    return font.getsize(text) # this works in PIL 8.4 but might not work in 10.0; https://github.com/python-pillow/Pillow/issues/7322
 
 
 def draw_scale(draw, start_x, start_y, height, max_read):
 
     #print("scale", start_x, start_y, height)
     MIDREAD = int(max_read/2.0)
-    top_text_w, top_text_h = draw.textsize(str(max_read), font=font)
+    top_text_w, top_text_h = newtextsize(str(max_read), font=font)
     draw.text((start_x, start_y), str(max_read), font=font, fill="black")
   
 
     draw.text((start_x, start_y + height/2.0), str(MIDREAD), font=font, fill="black")
 
-    bottom_text_w, bottom_text_h = draw.textsize(str(MIDREAD), font=font)
+    bottom_text_w, bottom_text_h = newtextsize(str(MIDREAD), font=font)
     draw.text((start_x+bottom_text_w-(top_text_w/2.0), start_y+height), "0", font=font, fill="black")
 
 
@@ -118,7 +146,7 @@ def draw_scale(draw, start_x, start_y, height, max_read):
 def draw_features(draw, GENES, orf2data, start, end, start_x, start_y, width, height): 
 
     padding_h = 3
-    text_w, text_h = draw.textsize("RV0001", font=font)
+    text_w, text_h = newtextsize("RV0001", font=font)
     gene_h = height - text_h
 
     triangle_size = 10
@@ -162,7 +190,7 @@ def draw_features(draw, GENES, orf2data, start, end, start_x, start_y, width, he
 
         if name == "-": name = gene
         if not name.startswith("non-coding"):
-            name_text_w, name_text_h = draw.textsize(name, font=font)
+            name_text_w, name_text_h = newtextsize(name, font=font)
             if abs(norm_start-norm_end) >= name_text_w:
                 draw.text(( norm_start + (abs(norm_start-norm_end) - name_text_w)/2.0 , start_y+gene_h+text_h), name, font=font, fill="black")
 
@@ -180,7 +208,7 @@ def draw_features(draw, GENES, orf2data, start, end, start_x, start_y, width, he
 def draw_genes(draw, GENES, orf2data, start, end, start_x, start_y, width, height, doTriangle=True):
 
     padding_h = 3
-    text_w, text_h = draw.textsize("RV0001", font=font)        
+    text_w, text_h = newtextsize("RV0001", font=font)        
     gene_h = height - text_h
 
 
@@ -223,7 +251,7 @@ def draw_genes(draw, GENES, orf2data, start, end, start_x, start_y, width, heigh
 
         if name == "-": name = gene
         if not name.startswith("non-coding"):
-            name_text_w, name_text_h = draw.textsize(name, font=font)
+            name_text_w, name_text_h = newtextsize(name, font=font)
             if abs(norm_start-norm_end) >= name_text_w:
                 draw.text(( norm_start + (abs(norm_start-norm_end) - name_text_w)/2.0 , start_y+gene_h+text_h), name, font=font, fill="black")
 
@@ -296,11 +324,11 @@ def draw_canvas(fulldata, position, hash, orf2data, feature_hashes, feature_data
     #print("Labels:")
     max_label_w = 0
     for L in labels:
-        label_text_w, label_text_h = temp_draw.textsize(L, font=font)
+        label_text_w, label_text_h = newtextsize(L, font=font)
         max_label_w = max(label_text_w, max_label_w)
         #print(L)
 
-    scale_text_w, scale_text_h = temp_draw.textsize(str(max(max_reads)), font=font)
+    scale_text_w, scale_text_h = newtextsize(str(max(max_reads)), font=font)
     
 
 
@@ -338,7 +366,7 @@ def draw_canvas(fulldata, position, hash, orf2data, feature_hashes, feature_data
     half = 100*0.5
     start_x += 5
     for j in range(len(fulldata)):
-        temp_label_text_w, temp_label_text_h = temp_draw.textsize(labels[j], font=font)
+        temp_label_text_w, temp_label_text_h = newtextsize(labels[j], font=font)
         label_text_x = (start_x/2.0) - (temp_label_text_w/2.0)
         start_y+=read_h+padding_h
         #draw.text((10, start_y - half), labels[j], font=font, fill="black")
@@ -353,14 +381,14 @@ def draw_canvas(fulldata, position, hash, orf2data, feature_hashes, feature_data
     #start_x+=5
 
     #TA sites
-    temp_label_text_w, temp_label_text_h = temp_draw.textsize('TA Sites', font=font)
+    temp_label_text_w, temp_label_text_h = newtextsize('TA Sites', font=font)
     label_text_x = (start_x/2.0) - (temp_label_text_w/2.0)
     #draw.text((30, start_y),'TA Sites', font=font, fill="black")
     draw.text((label_text_x, start_y),'TA Sites', font=font, fill="black")
     draw_ta_sites(draw, TA_SITES, start_x, start_y, read_w, ta_h, start, end)
 
     #Genes
-    temp_label_text_w, temp_label_text_h = temp_draw.textsize('Genes', font=font)
+    temp_label_text_w, temp_label_text_h = newtextsize('Genes', font=font)
     label_text_x = (start_x/2.0) - (temp_label_text_w/2.0)
     start_y += 50
     #draw.text((30, start_y+10),'Genes', font=font, fill="black")
@@ -372,7 +400,7 @@ def draw_canvas(fulldata, position, hash, orf2data, feature_hashes, feature_data
     #Features:
     for f in range(len(FEATURES)):
         start_y += gene_h + padding_h + 25
-        temp_label_text_w, temp_label_text_h = temp_draw.textsize('Feature-%d' % (f+1), font=font)
+        temp_label_text_w, temp_label_text_h = newtextsize('Feature-%d' % (f+1), font=font)
         label_text_x = (start_x/2.0) - (temp_label_text_w/2.0)
         draw.text((label_text_x, start_y+10),'Feature-%d' % (f+1), font=font, fill="black")
         width = read_w


=====================================
src/pytransit/tnseq_tools.py
=====================================
@@ -98,9 +98,9 @@ def read_samples_metadata(metadata_file, covarsToRead = [], interactionsToRead =
                 for i, c in enumerate(lines[0].split())
                 if c.lower() == h.lower()]
 
-        for line in lines[1:]:
+        for line in lines[1:]: # skip header row
             if line[0]=='#': continue
-            vals = line.split()
+            vals = line.rstrip().split('\t') # allow spaces in filenames
             [condition, wfile] = vals[headIndexes[0]], vals[headIndexes[1]]
             conditionsByFile[wfile] = condition
             orderingMetadata['condition'].append(condition)


=====================================
tests/test_tpp.py
=====================================
@@ -122,33 +122,35 @@ def verify_stats(stats_file, expected):
 
 class TestTPP(TransitTestCase):
 
+    # since I changed default BWA mode from 'mem' back to 'aln', specify mem to make the output shown above (TRI, 12/2/24)
+
     @unittest.skipUnless(len(bwa_path) > 0, "requires BWA")
     def test_tpp_noflag_primer(self):
-        (args, kwargs) = cleanargs(["-bwa", bwa_path, "-ref", h37fna, "-reads1", reads1, "-output", tpp_output_base, "-protocol", "sassetti"])
+        (args, kwargs) = cleanargs(["-bwa", bwa_path, "-ref", h37fna, "-reads1", reads1, "-output", tpp_output_base, "-protocol", "sassetti" , "-bwa-alg", "mem"])
         tppMain(*args, **kwargs)
         self.assertTrue(verify_stats("{0}.tn_stats".format(tpp_output_base), NOFLAG_PRIMER))
 
     @unittest.skipUnless(len(bwa_path) > 0, "requires BWA")
     def test_tpp_flag_primer(self):
-        (args, kwargs) = cleanargs(["-bwa", bwa_path, "-ref", h37fna, "-reads1", reads1, "-output", tpp_output_base, "-himar1", "-flags", "-k 1"])
+        (args, kwargs) = cleanargs(["-bwa", bwa_path, "-ref", h37fna, "-reads1", reads1, "-output", tpp_output_base, "-himar1", "-flags", "-k 1" , "-bwa-alg", "mem"])
         tppMain(*args, **kwargs)
         self.assertTrue(verify_stats("{0}.tn_stats".format(tpp_output_base), FLAG_PRIMER))
 
     @unittest.skipUnless(len(bwa_path) > 0, "requires BWA")
     def test_tpp_protocol_mme1(self):
-        (args, kwargs) = cleanargs(["-bwa", bwa_path, "-ref", h37fna, "-reads1", reads1, "-output", tpp_output_base, "-protocol", "Mme1"])
+        (args, kwargs) = cleanargs(["-bwa", bwa_path, "-ref", h37fna, "-reads1", reads1, "-output", tpp_output_base, "-protocol", "Mme1", "-bwa-alg", "mem"])
         tppMain(*args, **kwargs)
         self.assertTrue(verify_stats("{0}.tn_stats".format(tpp_output_base), MME1_PROTOCOL))
 
     @unittest.skipUnless(len(bwa_path) > 0, "requires BWA")
     def test_tpp_multicontig_empty_prefix(self):
-        (args, kwargs) = cleanargs(["-bwa", bwa_path, "-ref", test_multicontig, "-reads1", test_multicontig_reads1, "reads2", test_multicontig_reads2, "-output", tpp_output_base, "-replicon-ids", "a,b,c", "-maxreads", "10000", "-primer", ""])
+        (args, kwargs) = cleanargs(["-bwa", bwa_path, "-ref", test_multicontig, "-reads1", test_multicontig_reads1, "reads2", test_multicontig_reads2, "-output", tpp_output_base, "-replicon-ids", "a,b,c", "-maxreads", "10000", "-primer", "" , "-bwa-alg", "mem"])
         tppMain(*args, **kwargs)
         self.assertTrue(verify_stats("{0}.tn_stats".format(tpp_output_base), MULTICONTIG))
 
     @unittest.skipUnless(len(bwa_path) > 0, "requires BWA")
     def test_tpp_multicontig_auto_replicon_ids(self):
-        (args, kwargs) = cleanargs(["-bwa", bwa_path, "-ref", test_multicontig, "-reads1", test_multicontig_reads1, "reads2", test_multicontig_reads2, "-output", tpp_output_base, "-replicon-ids", "auto", "-maxreads", "10000", "-primer", ""])
+        (args, kwargs) = cleanargs(["-bwa", bwa_path, "-ref", test_multicontig, "-reads1", test_multicontig_reads1, "reads2", test_multicontig_reads2, "-output", tpp_output_base, "-replicon-ids", "auto", "-maxreads", "10000", "-primer", "" , "-bwa-alg", "mem"])
         tppMain(*args, **kwargs)
         self.assertTrue(verify_stats("{0}.tn_stats".format(tpp_output_base), MULTICONTIG_AUTO_IDS))
 



View it on GitLab: https://salsa.debian.org/med-team/tnseq-transit/-/commit/9bba894b6e12a1544bad417435d2b0cd72cb2117

-- 
View it on GitLab: https://salsa.debian.org/med-team/tnseq-transit/-/commit/9bba894b6e12a1544bad417435d2b0cd72cb2117
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20241204/9e7ba9b3/attachment-0001.htm>


More information about the debian-med-commit mailing list