[med-svn] [Git][med-team/tnseq-transit][master] 5 commits: routine-update: New upstream version
Étienne Mollier (@emollier)
gitlab at salsa.debian.org
Sun Aug 13 10:11:51 BST 2023
Étienne Mollier pushed to branch master at Debian Med / tnseq-transit
Commits:
aef1ec00 by Étienne Mollier at 2023-08-13T10:59:12+02:00
routine-update: New upstream version
- - - - -
b687e54c by Étienne Mollier at 2023-08-13T10:59:17+02:00
New upstream version 3.3.0
- - - - -
27b77698 by Étienne Mollier at 2023-08-13T11:02:29+02:00
Update upstream source from tag 'upstream/3.3.0'
Update to upstream version '3.3.0'
with Debian dir ae07aa8646af34dce9f398de795ab14703709dd3
- - - - -
db8135af by Étienne Mollier at 2023-08-13T11:02:29+02:00
routine-update: Standards-Version: 4.6.2
- - - - -
898cc600 by Étienne Mollier at 2023-08-13T11:08:43+02:00
routine-update: Ready to upload to unstable
- - - - -
23 changed files:
- .gitignore
- CHANGELOG.md
- debian/changelog
- debian/control
- src/pytpp/tpp_gui.py
- src/pytpp/tpp_tools.py
- src/pytransit/__init__.py
- + src/pytransit/analysis/CGI.py
- src/pytransit/analysis/__init__.py
- src/pytransit/analysis/zinb.py
- + src/pytransit/data.ignore/macrophages.comwig
- + src/pytransit/data.ignore/macrophages_metadata.txt
- + src/pytransit/data/CGI/IDs.H37Rv.CRISPRi.lib.txt
- + src/pytransit/data/CGI/counts_metadata.txt
- + src/pytransit/data/CGI/sgRNA_metadata.txt
- + src/pytransit/data/CGI/uninduced_ATC_counts.txt
- + src/pytransit/doc/source/CGI.rst
- + src/pytransit/doc/source/_images/CGI_workflow.png
- + src/pytransit/doc/source/_images/RVBD3645_lmplot.png
- + src/pytransit/doc/source/_images/ndh_lmplot.png
- + src/pytransit/doc/source/_images/thiL_lmplot.png
- src/pytransit/doc/source/index.rst
- src/pytransit/doc/source/transit_install.rst
Changes:
=====================================
.gitignore
=====================================
@@ -1,4 +1,14 @@
-# # Mac
+#
+# standard
+#
+**/*.do_not_sync
+**/*.do_not_sync.*
+**/*.ignore
+**/*.ignore.*
+**/*.secret
+**/*.secret.*
+
+# Mac
.DS_STORE
.DS_Store
.env
=====================================
CHANGELOG.md
=====================================
@@ -2,6 +2,23 @@
All notable changes to this project will be documented in this file.
+
+## Version 3.3.0 2023-08-03
+#### Transit:
+
+Major changes:
+ - added CRISPRi-DR method for identifying chemical-genetic interactions (CGI) in CRISPRi libraries
+
+
+## Version 3.2.8 2023-07-22
+#### TPP:
+
+Small bug fixes:
+ - fixed uncompression of gzipped fastq files
+ - fixed error condition caused by recent versions of wxPython that require 'proportion' arg in sizers in TPP GUI to be int (not float)
+ - added link in documentation to make users aware of Transit2
+
+
## Version 3.2.7 2022-09-22
#### TRANSIT:
=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+tnseq-transit (3.3.0-1) unstable; urgency=medium
+
+ * New upstream version
+ * Standards-Version: 4.6.2 (routine-update)
+
+ -- Étienne Mollier <emollier at debian.org> Sun, 13 Aug 2023 11:03:02 +0200
+
tnseq-transit (3.2.7-1) unstable; urgency=medium
[ Nilesh Patra ]
=====================================
debian/control
=====================================
@@ -16,7 +16,7 @@ Build-Depends: debhelper-compat (= 13),
python3-statsmodels,
python3-pubsub,
bwa
-Standards-Version: 4.6.1
+Standards-Version: 4.6.2
Vcs-Browser: https://salsa.debian.org/med-team/tnseq-transit
Vcs-Git: https://salsa.debian.org/med-team/tnseq-transit.git
Homepage: http://saclab.tamu.edu/essentiality/transit/
=====================================
src/pytpp/tpp_gui.py
=====================================
@@ -110,7 +110,7 @@ if hasWx:
label_replicon_ids = wx.StaticText(panel, label='ID names for each replicon: \n(if genome has multiple contigs)',size=(340,-1))
sizer_replicon_ids.Add(label_replicon_ids,0,wx.ALIGN_CENTER_VERTICAL,0)
self.replicon_ids = wx.TextCtrl(panel,value=vars.replicon_ids,size=(400,30))
- sizer_replicon_ids.Add(self.replicon_ids, proportion=1.0, flag=wx.EXPAND|wx.ALL, border=5)
+ sizer_replicon_ids.Add(self.replicon_ids, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
sizer_replicon_ids.Add(TPPIcon(panel, wx.ID_ANY, bmp, "Specify names of each contig within the reference genome separated by commas (if using wig_gb_to_csv.py you must use the contig names in the Genbank file). Only required if there are multiple contigs; can leave blank if there is just one sequence.\nEnter 'auto' for autogenerated ids."), flag=wx.CENTER, border=0)
sizer_replicon_ids.Add((10, 1), 0, wx.EXPAND)
sizer.Add(sizer_replicon_ids,0,wx.EXPAND,0)
@@ -142,7 +142,7 @@ if hasWx:
label5 = wx.StaticText(panel, label='Prefix to use for output filenames (REQUIRED):',size=(340,-1))
sizer5.Add(label5,0,wx.ALIGN_CENTER_VERTICAL,0)
self.base = wx.TextCtrl(panel,value=vars.base,size=(400,30))
- sizer5.Add(self.base, proportion=1.0, flag=wx.EXPAND|wx.ALL, border=5)
+ sizer5.Add(self.base, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
sizer5.Add(TPPIcon(panel, wx.ID_ANY, bmp, "Select a prefix that will be used when writing output files"), flag=wx.CENTER, border=0)
sizer5.Add((10, 1), 0, wx.EXPAND)
sizer.Add(sizer5,0,wx.EXPAND,0)
@@ -247,7 +247,7 @@ The Mme1 protocol generally assumes reads do NOT include the primer prefix, and
self.bwa_alg = wx.ComboBox(panel,choices=["use algorithm 'aln'", "use algorithm 'mem'"],size=(200,30))
if vars.bwa_alg=='aln': self.bwa_alg.SetSelection(0)
else: self.bwa_alg.SetSelection(1) # default
- sizer0.Add(self.bwa_alg, proportion=0.5, flag=wx.EXPAND|wx.ALL, border=5) ##
+ sizer0.Add(self.bwa_alg, proportion=2, flag=wx.EXPAND|wx.ALL, border=5) ##
self.bwa_alg.Bind(wx.EVT_COMBOBOX, self.OnBwaAlgSelection, id=self.bwa_alg.GetId())
sizer0.Add(TPPIcon(panel, wx.ID_ANY, bmp, "'mem' is considered to do a better job at mapping reads, but 'aln' is available as an alternative."), flag=wx.CENTER, border=0)
sizer0.Add((10, 1), 0, wx.EXPAND)
@@ -271,7 +271,7 @@ The Mme1 protocol generally assumes reads do NOT include the primer prefix, and
self.barseq_select = wx.ComboBox(panel,choices=['this is not a Barseq dataset','read catalog file','write catalog file'],size=(200,30)) ## # does a BoxSizer use wx.HORIZONTAL, not wx.EXPAND?
self.barseq_select.SetSelection(0)
- sizer9.Add(self.barseq_select, proportion=0.5, flag=wx.EXPAND|wx.ALL, border=5) ##
+ sizer9.Add(self.barseq_select, proportion=2, flag=wx.EXPAND|wx.ALL, border=5) ##
self.picker9 = wx.lib.filebrowsebutton.FileBrowseButton(panel, id=wx.ID_ANY, dialogTitle='Please select the Barseq catalog filename', fileMode=wx.FD_OPEN, size=(400,30), startDirectory=os.path.dirname(vars.fq2), initialValue="", labelText='', ) # no need for this: changeCallback=self.OnChanged9 ; initialValue set below ; no file mask
sizer9.Add(self.picker9, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
=====================================
src/pytpp/tpp_tools.py
=====================================
@@ -686,7 +686,7 @@ def driver(vars):
except ValueError as err:
message("")
- message("%s" % " ".join(err.args))
+ message(err.args)
message("Exiting.")
sys.exit()
@@ -700,11 +700,15 @@ def driver(vars):
message("Done.")
+# gunzip <fastq>.gz file, written to <fastq>
+
def uncompress(filename):
- outfil = open(filename[0:-3], "w+")
+ newfname = filename[0:-3]
+ print("uncompressing %s" % filename)
+ outfil = open(newfname, "wb+")
for line in gzip.open(filename):
outfil.write(line)
- return filename[0:-3]
+ return newfname
def copy_fasta(infile,outfile,maxreads=-1):
a = open(infile)
@@ -724,8 +728,15 @@ def copy_fasta(infile,outfile,maxreads=-1):
def extract_reads(vars):
message("extracting reads...")
+ if vars.fq1.endswith('.gz'):
+ vars.fq1 = uncompress(vars.fq1)
+
+ if vars.fq2.endswith('.gz'):
+ vars.fq2 = uncompress(vars.fq2)
+
flag = ['','']
for idx, name in enumerate([vars.fq1, vars.fq2]):
+ print(name)
if idx==1 and vars.single_end==True: continue
fil = open(name)
for line in fil:
@@ -736,12 +747,6 @@ def extract_reads(vars):
break
fil.close()
- if vars.fq1.endswith('.gz'):
- vars.fq1 = uncompress(vars.fq1)
-
- if vars.fq2.endswith('.gz'):
- vars.fq2 = uncompress(vars.fq2)
-
if(flag[0] == 'FASTQ'):
message("fastq2reads: %s -> %s" % (vars.fq1,vars.reads1))
fastq2reads(vars.fq1,vars.reads1,vars.maxreads)
=====================================
src/pytransit/__init__.py
=====================================
@@ -2,6 +2,6 @@
__all__ = ["transit_tools", "tnseq_tools", "norm_tools", "stat_tools"]
-__version__ = "v3.2.7"
+__version__ = "v3.3.0"
prefix = "[TRANSIT]"
=====================================
src/pytransit/analysis/CGI.py
=====================================
@@ -0,0 +1,439 @@
+import sys
+
+try:
+ import wx
+ WX_VERSION = int(wx.version()[0])
+ hasWx = True
+
+except Exception as e:
+ hasWx = False
+ WX_VERSION = 0
+
+if hasWx:
+ import wx.xrc
+ from wx.lib.buttons import GenBitmapTextButton
+ from pubsub import pub
+ import wx.adv
+
+import os
+import time
+import math
+import random
+import numpy
+import scipy.stats
+import datetime
+import pandas
+
+from pytransit.analysis import base
+import pytransit.transit_tools as transit_tools
+import pytransit.tnseq_tools as tnseq_tools
+import pytransit.norm_tools as norm_tools
+import pytransit.stat_tools as stat_tools
+
+hasR = False
+try:
+ import rpy2.robjects
+ hasR = True
+except Exception as e:
+ hasR = False
+
+if hasR:
+ from rpy2.robjects import r, DataFrame, globalenv, IntVector, FloatVector, StrVector, packages as rpackages
+
+############# Description ##################
+
+short_name = "CGI"
+long_name = "Chemical Genetic Analysis"
+short_desc = "CGI Analysis of CRISPRi libraries"
+long_desc = "CGI Analysis of CRISPRi libraries"
+transposons = []
+
+columns = ["Position","Reads","Genes"] # ???
+
+############# Analysis Method ##############
+
+class CGI(base.TransitAnalysis):
+ def __init__(self):
+ base.TransitAnalysis.__init__(self, short_name, long_name, short_desc, long_desc, transposons, CGI_Method, CGI_GUI, [])
+
+################## FILE ###################
+
+# there is no output file that could be loaded into the GUI
+
+#class HeatmapFile(base.TransitFile):
+#
+# def __init__(self):
+# base.TransitFile.__init__(self, "#CombinedWig", columns)
+#
+# def getHeader(self, path):
+# text = """This is file contains mean counts for each gene. Nzmean is mean accross non-zero sites."""
+# return text
+
+################# GUI ##################
+
+# right now, CGI is just intended for the command-line; TRI
+
+class CGI_GUI(base.AnalysisGUI):
+
+ def __init__(self):
+ base.AnalysisGUI.__init__(self)
+
+########## METHOD #######################
+
+class CGI_Method(base.SingleConditionMethod):
+ def __init__(self):
+ ctrldata=None # initializers for superclass
+ annotation_path=""
+ output_file=""
+ replicates="Sum"
+ normalization="nonorm"
+ LOESS=False
+ ignoreCodon=True
+ NTerminus=0.0
+ CTerminus=0.0
+ wxobj=None
+ # this initialization seems pointless for CGI, but must do this for base class...
+ base.SingleConditionMethod.__init__(self, short_name, long_name, short_desc, long_desc, ctrldata, annotation_path, output_file, replicates=replicates, normalization=normalization, LOESS=LOESS, NTerminus=NTerminus, CTerminus=CTerminus, wxobj=wxobj)
+
+ @classmethod
+ def usage_string(self):
+ return """usage (6 sub-commands):
+ python3 ../src/transit.py CGI extract_counts <fastq file> <ids file> > <counts file>
+ python3 ../src/transit.py CGI create_combined_counts <comma seperated headers> <counts file 1> <counts file 2> ... <counts file n> > <combined counts file>
+ python3 ../src/transit.py CGI extract_abund <combined counts file> <metadata file> <control condition> <sgRNA strength file> <uninduced ATC file> <drug> <days> > <fractional abundundance file>
+ python3 ../src/transit.py CGI run_model <fractional abundundance file> > <CRISPRi DR results file>
+ python3 ../src/transit.py CGI visualize <fractional abundance> <gene> <output figure location>
+ note: redirect output from stdout to output files as shown above"""
+
+
+ @classmethod
+ def fromargs(self, rawargs):
+ if not hasR:
+ print("Error: R and rpy2 (~= 3.0) required to run heatmap.")
+ print("After installing R, you can install rpy2 using the command \"pip install 'rpy2~=3.0'\"")
+ sys.exit(0)
+
+ (args, kwargs) = transit_tools.cleanargs(rawargs)
+ if len(args)<1: print(self.usage_string())
+ self.cmd = args[0]
+ self.args = args[1:]
+ self.kwargs = kwargs
+
+ return self()
+
+ def Run(self):
+ cmd,args,kwargs = self.cmd,self.args,self.kwargs
+
+ if cmd=="extract_counts":
+ if len(args)<2: print(self.usage_string())
+ fastq_file = args[0]
+ ids_file = args[1]
+ self.extract_counts(fastq_file, ids_file)
+
+ elif cmd=="create_combined_counts":
+ if len(args)<2: print(self.usage_string())
+ headers = args[0].split(",")
+ counts_file_list = args[1:]
+ self.create_combined_counts(headers,counts_file_list)
+
+ elif cmd=="extract_abund":
+ if len(args)<7: print(self.usage_string())
+ combined_counts_file = args[0]
+ metadata_file = args[1]
+ control_condition=args[2]
+ sgRNA_strength_file = args[3]
+ no_dep_abund = args[4]
+ drug = args[5]
+ days = args[6]
+ self.extract_abund(combined_counts_file,metadata_file,control_condition,sgRNA_strength_file,no_dep_abund,drug,days)
+ elif cmd == "run_model":
+ if len(args)<1: print(self.usage_string())
+ ifile_path = args[0] #example frac_abund_RIF_D5.txt
+ self.run_model(ifile_path)
+ elif cmd == "visualize":
+ if len(args)<3: print(self.usage_string())
+ frac_abund_file= args[0]
+ gene = args[1]
+ fig_location = args[2]
+ self.visualize(frac_abund_file, gene, fig_location)
+
+ else: print(self.usage_string())
+
+ def reverse_complement(self, seq):
+ complement = {'A':'T','T':'A','C':'G','G':'C'}
+ s = list(seq)
+ s.reverse()
+ for i in range(len(s)):
+ s[i] = complement.get(s[i],s[i]) # if unknown, leave as it, e.g > or !
+ s = ''.join(s)
+ return s
+
+ def extract_counts(self, fastq_file, ids_file):
+ IDs = []
+ barcodemap = {} # hash from barcode to full ids
+ for line in open(ids_file):
+ w = line.rstrip().split('\t')
+ id = w[0]
+ v = id.split('_')
+ if len(v)<3: continue
+ barcode = v[2]
+ IDs.append(id)
+ # reverse-complement of barcodes appears in reads, so hash them that way
+ barcodemap[self.reverse_complement(barcode)] = id
+
+ counts = {}
+
+ #A,B = "AGCTTCTTTCGAGTACAAAAAC","TCCCAGATTATATCTATCACTGA"
+ A,B = "GTACAAAAAC","TCCCAGATTA"
+ lenA = len(A)
+ cnt,nreads,recognized = 0,0,0
+ for line in open(fastq_file):
+ cnt += 1
+ if cnt%4==2:
+ nreads += 1
+ if (nreads%1000000==0): sys.stderr.write("reads=%s, recognized barcodes=%s (%0.1f%%)\n" % (nreads,recognized,100.*recognized/float(nreads)))
+ seq = line.rstrip()
+ a = seq.find(A)
+ if a==-1: continue
+ b = seq.find(B)
+ if b==-1: continue
+ sz = b-(a+lenA)
+ if sz<10 or sz>30: continue
+ barcode = seq[a+lenA:b] # these are reverse-complements, but rc(barcodes) stored in hash too
+ if barcode not in barcodemap: continue
+ id = barcodemap[barcode]
+ if id not in counts: counts[id] = 0
+ counts[id] += 1
+ recognized += 1
+
+ for id in IDs:
+ vals = [id,counts.get(id,0)]
+ print('\t'.join([str(x) for x in vals]))
+
+
+ def create_combined_counts(self,headers, counts_list):
+ import pandas as pd
+ df_list =[]
+ for f in counts_list:
+ sys.stderr.write("Adding in file # %s \n"%f)
+ counts_df = pd.read_csv(f, sep="\t")
+ counts_df["sgRNA"]=counts_df[counts_df.columns[0]].str.split("_v", expand=True)[0]
+ counts_df = counts_df.drop(columns=[counts_df.columns[0]])
+ counts_df.set_index("sgRNA",inplace=True)
+ df_list.append(counts_df)
+ combined_df = pd.concat(df_list, axis=1)
+ combined_df.columns = headers
+ combined_df_text = combined_df.to_csv(sep="\t")
+ sys.stderr.write("Number of sgRNAs in combined counts file (present in all counts files): %d \n"%len(combined_df))
+ print(combined_df_text)
+
+
+ def extract_abund(self,combined_counts_file,metadata_file,control_condition,sgRNA_strength_file,no_dep_abund,drug,days,PC=1e-8):
+ import pandas as pd
+
+ metadata = pd.read_csv(metadata_file, sep="\t")
+ metadata = metadata[((metadata["drug"]==drug) | (metadata["drug"]==control_condition)) & (metadata["days_predepletion"]==int(days))]
+ if(len(metadata)==0):
+ sys.stderr.write("This combination of conditions does not exist in your metadata file. Please select one that does")
+ sys.exit(0)
+ elif (drug not in metadata["drug"].values.tolist()):
+ sys.stderr.write("%s is not found in your metadata. Add the drug's information in the metadata file or select a different drug"%drug)
+ sys.exit(0)
+ elif (int(days) not in metadata["days_predepletion"].values.tolist()):
+ sys.stderr.write("%d is not found in your metadata days of predepletion column. Add the day's information in the metadata file or select a different day"%days)
+ sys.exit(0)
+ elif (control_condition not in metadata["drug"].values.tolist()):
+ sys.stderr.write("%s is not found in your metadata. Add the corresponding information in the metadata file or select a different control"%control_condition)
+ sys.exit(0)
+ metadata = metadata.sort_values(by=["conc_xMIC"])
+ column_names = metadata["column_name"].values.tolist()
+ concs_list = metadata["conc_xMIC"].values.tolist()
+
+ print("# Condition Tested : "+drug+" D"+days)
+ headers = []
+ combined_counts_df = pd.read_csv(combined_counts_file,sep="\t", index_col=0)
+ combined_counts_df = combined_counts_df[column_names]
+
+ if(len(combined_counts_df.columns)==0):
+ sys.stderr.write("The samples assocaited with the selected drugs do not exist in your combined counts file. Please select one that does and check your metadata file has corresponding column names")
+ sys.exit(0)
+ elif(len(combined_counts_df.columns)<len(metadata)):
+ sys.stderr.write("WARNING: Not all of the samples from the metadata based on this criteron have a column in the combined counts file")
+
+ sgRNA_strength = pd.read_csv(sgRNA_strength_file,sep="\t", index_col=0)
+ sgRNA_strength = sgRNA_strength.iloc[:,-1:]
+ sgRNA_strength.columns = ["sgRNA strength"]
+ sgRNA_strength["sgRNA"] = sgRNA_strength.index
+ sgRNA_strength["sgRNA"]=sgRNA_strength["sgRNA"].str.split("_v", expand=True)[0]
+ sgRNA_strength.set_index("sgRNA",inplace=True)
+
+ no_dep_df = pd.read_csv(no_dep_abund, sep="\t", index_col=0, header=None)
+ no_dep_df = no_dep_df.iloc[:,-1:]
+ no_dep_df.columns = ["uninduced ATC values"]
+ no_dep_df["uninduced ATC values"] = no_dep_df["uninduced ATC values"]/ no_dep_df["uninduced ATC values"].sum()
+ no_dep_df["sgRNA"] = no_dep_df.index
+ no_dep_df["sgRNA"]=no_dep_df["sgRNA"].str.split("_v", expand=True)[0]
+ no_dep_df.set_index("sgRNA",inplace=True)
+
+ abund_df = pd.concat([sgRNA_strength, no_dep_df,combined_counts_df], axis=1)
+ abund_df= abund_df[~(abund_df.index.str.contains("Negative") | abund_df.index.str.contains("Empty"))]
+ sys.stderr.write("Disregarding Empty or Negative sgRNAs\n")
+ sys.stderr.write("%d sgRNAs are all of the following files : sgRNA strength metadata, uninduced ATC counts file, combined counts file\n"%len(abund_df))
+
+ headers = ["sgRNA strength","uninduced ATC values"]
+ for i,col in enumerate(column_names):
+ abund_df[col] = abund_df[col]/abund_df[col].sum()
+ abund_df[col] = (abund_df[col]+PC)/(abund_df["uninduced ATC values"]+PC)
+ headers.append(str(concs_list[i])+"_"+str(i))
+ print("# "+str(concs_list[i])+" conc_xMIC"+" - "+col)
+
+ abund_df.columns = headers
+ abund_df["sgRNA"] = abund_df.index.values.tolist()
+ abund_df[["orf-gene","remaining"]] = abund_df["sgRNA"].str.split('_',n=1,expand=True)
+ abund_df[["orf","gene"]]= abund_df["orf-gene"].str.split(':',expand=True)
+ abund_df = abund_df.drop(columns=["orf-gene","remaining","sgRNA"])
+ abund_df = abund_df.dropna()
+
+ abund_df.insert(0, "sgRNA strength", abund_df.pop("sgRNA strength"))
+ abund_df.insert(0, "uninduced ATC values", abund_df.pop("uninduced ATC values"))
+ abund_df.insert(0, 'gene', abund_df.pop('gene'))
+ abund_df.insert(0, 'orf', abund_df.pop('orf'))
+
+ abund_df_text = abund_df.to_csv(sep="\t")
+ print(abund_df_text)
+
+
+ #####################################################
+
+ # derived from logsigmoidfit.R
+ # see heatmap.py for example of how to put data in a pandas.DataFrame and call an R function like make_heatmapFunc()
+
+ def run_model(self, frac_abund_file):
+ import pandas as pd
+ import numpy as np
+ from mne.stats import fdr_correction
+ import statsmodels.api as sm
+
+ frac_abund_df = pd.read_csv(frac_abund_file, sep="\t",comment='#')
+
+ drug_output = []
+ for i,gene in enumerate(set(frac_abund_df["gene"])):
+ #print(i,gene)
+ sys.stderr.write("Analyzing Gene # %d \n"%i)
+ gene_df = frac_abund_df[frac_abund_df["gene"]==gene]
+ orf = gene_df["orf"].iloc[0]
+ gene_df = gene_df.drop(columns=["orf","gene","uninduced ATC values"])
+
+ melted_df = gene_df.melt(id_vars=["sgRNA","sgRNA strength"],var_name="conc",value_name="abund")
+ melted_df["conc"] = melted_df["conc"].str.split("_", expand=True)[0].astype(float)
+ min_conc = min(melted_df[melted_df["conc"]>0]["conc"])
+ melted_df.loc[melted_df["conc"]==0,"conc"] = min_conc/2
+ melted_df["abund"] = [0.01+(1-0.01)*(1-np.exp(-2*float(i)))/(1+np.exp(-2*float(i))) for i in melted_df["abund"]]
+ melted_df["logsig abund"] = [np.nan if (1-x)== 0 else np.log10(float(x)/(1-float(x))) for x in melted_df["abund"]]
+ melted_df["log conc"] = [np.log2(float(x)) for x in melted_df["conc"]]
+
+
+ melted_df = melted_df.dropna()
+ if len(melted_df.index)<2:
+ drug_output.append([orf,gene,len(gene_df)]+[np.nan]*6)
+ continue
+
+ Y = melted_df["logsig abund"]
+ X = melted_df.drop(columns=["abund", "logsig abund", "sgRNA", "conc"])
+ X = sm.add_constant(X)
+ model = sm.OLS(Y,X)
+ results = model.fit()
+ coeffs = results.params
+ pvals = results.pvalues
+ drug_output.append([orf,gene,len(gene_df)]+coeffs.values.tolist()+pvals.values.tolist())
+ sys.stderr.flush()
+
+ drug_out_df = pd.DataFrame(drug_output, columns=["Orf","Gene","Nobs", "intercept","ceofficient sgRNA_strength","ceofficient concentration dependence","pval intercept","pval pred_logFC","pval concentration dependence"])
+
+ mask = np.isfinite(drug_out_df["pval concentration dependence"])
+ pval_corrected = np.full(drug_out_df["pval concentration dependence"].shape, np.nan)
+ pval_corrected[mask] = fdr_correction(drug_out_df["pval concentration dependence"][mask])[1]
+ drug_out_df["qval concentration dependence"] = pval_corrected
+ drug_out_df = drug_out_df.replace(np.nan,1)
+
+ drug_out_df["Z"] = (drug_out_df["ceofficient concentration dependence"] - drug_out_df["ceofficient concentration dependence"].mean())/drug_out_df["ceofficient concentration dependence"].std()
+ drug_out_df["Siginificant Interactions"] = [0] * len(drug_out_df)
+ drug_out_df.loc[(drug_out_df["qval concentration dependence"]<0.05) & (drug_out_df["Z"]<-2),"Siginificant Interactions"]=-1
+ drug_out_df.loc[(drug_out_df["qval concentration dependence"]<0.05) & (drug_out_df["Z"]>2),"Siginificant Interactions"]=1
+ drug_out_df.insert(0, "Siginificant Interactions", drug_out_df.pop("Siginificant Interactions"))
+
+ n = len(drug_out_df[drug_out_df["Siginificant Interactions"]!=0])
+ depl_n = len(drug_out_df[drug_out_df["Siginificant Interactions"]== -1])
+ enrich_n = len(drug_out_df[drug_out_df["Siginificant Interactions"]==1])
+ sys.stderr.write("%d Total Siginificant Gene Interactions\n"%n)
+ sys.stderr.write("%d Siginificant Gene Depletions\n"%depl_n)
+ sys.stderr.write("%d Siginificant Gene Enrichments\n"%enrich_n)
+
+ drug_out_df = drug_out_df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)
+ drug_out_txt = drug_out_df.to_csv(sep="\t", index=False)
+ print(drug_out_txt)
+
+ def visualize(self,fractional_abundances_file, gene, fig_location):
+ import pandas as pd
+ import seaborn as sns
+ import matplotlib.pyplot as plt
+ import matplotlib as mpl
+ import numpy as np
+ import statsmodels.api as sm
+
+ abund_df = pd.read_csv(fractional_abundances_file,sep="\t", comment="#")
+ with open(fractional_abundances_file) as f:
+ first_line = f.readline()
+ condition = first_line.split(" : ")[1]
+
+ abund_df = abund_df[(abund_df["gene"]==gene)| (abund_df["orf"]==gene)]
+ if len(abund_df)==0:
+ sys.stderr.write("Gene not found : %d \n"%idx)
+ sys.exit(0)
+ abund_df = abund_df.reset_index(drop=True)
+ all_slopes = []
+
+ df_list = []
+ for idx,row in abund_df.iterrows():
+ sys.stderr.write("Fitting sgRNA # : %d \n"%idx)
+ raw_Y= row[5:].values
+ Y = [max(0.01,x) for x in raw_Y]
+ Y = [np.log10(x) for x in Y]
+
+ X = abund_df.columns[5:]
+ X = [float(i.split("_")[0]) for i in X]
+ min_conc = min([i for i in X if i>0])
+ X = [min_conc/2 if i==0 else i for i in X ]
+ X = [np.log2(float(x)) for x in X]
+
+ data = pd.DataFrame({"Log (Concentration)":X, "Log (Relative Abundance)":Y})
+ X = pd.DataFrame({"log concentration":X})
+ X_in = sm.add_constant(X, has_constant='add')
+ results = sm.OLS(Y,X_in).fit()
+ all_slopes.append(results.params[1])
+ data["sgRNA strength"] = [row["sgRNA strength"]] * len(data)
+ data["slope"] = [results.params[1]] * len(data)
+ df_list.append(data)
+
+
+ plot_df = pd.concat(df_list)
+ plt.figure()
+ cmap = mpl.colors.LinearSegmentedColormap.from_list("", ["#8ecae6","#219ebc","#023047","#ffb703","#fb8500"], N=len(abund_df))
+ palette = [mpl.colors.rgb2hex(cmap(i)) for i in range(cmap.N)]
+ #print("-----------", bo_palette.as_hex())
+ g = sns.lmplot(data=plot_df, x='Log (Concentration)', y='Log (Relative Abundance)', hue="sgRNA strength", palette=palette, legend=False,ci=None, scatter=False, line_kws={"lw":0.75})
+
+ sm1 = mpl.cm.ScalarMappable(norm=mpl.colors.Normalize(vmin=plot_df['sgRNA strength'].min(), vmax=0, clip=False), cmap=cmap)
+ g.figure.colorbar(sm1, shrink=0.8, aspect=50, label="sgRNA strength")
+ g.set(ylim=(-2.5, 1.0))
+ plt.gca().set_title(gene+"\n"+condition, wrap=True)
+ plt.tight_layout()
+ plt.savefig(fig_location)
+################################
+
+if __name__ == "__main__":
+
+ G = CGI.fromargs(sys.argv[1:])
+ G.Run()
+
+
=====================================
src/pytransit/analysis/__init__.py
=====================================
@@ -25,6 +25,7 @@ from pytransit.analysis import tnseq_stats
from pytransit.analysis import corrplot
from pytransit.analysis import heatmap
from pytransit.analysis import ttnfitness
+from pytransit.analysis import CGI
methods = {}
methods["example"] = example.ExampleAnalysis()
@@ -48,6 +49,7 @@ methods["tnseq_stats"]=tnseq_stats.TnseqStats()
methods["corrplot"]=corrplot.Corrplot()
methods["heatmap"]=heatmap.Heatmap()
methods["ttnfitness"]=ttnfitness.TTNFitnessAnalysis()
+methods["CGI"]=CGI.CGI()
# EXPORT METHODS
from pytransit.analysis import norm
=====================================
src/pytransit/analysis/zinb.py
=====================================
@@ -267,15 +267,20 @@ class ZinbMethod(base.MultiConditionMethod):
]
def def_r_zinb_signif(self):
- r('''
- zinb_signif = function(df,
+ r("""
+ zinb_signif = function(
+ df,
zinbMod1,
zinbMod0,
nbMod1,
- nbMod0, DEBUG = F) {
+ nbMod0,
+ DEBUG = F
+ ) {
+ print("Starting ZINB in R")
suppressMessages(require(pscl))
suppressMessages(require(MASS))
melted = df
+ #print(head(melted))
# filter out genes that have low saturation across all conditions, since pscl sometimes does not fit params well (resulting in large negative intercepts and high std errors)
NZpercs = aggregate(melted$cnt,by=list(melted$cond),FUN=function(x) { sum(x>0)/length(x) })
@@ -288,7 +293,7 @@ class ZinbMethod(base.MultiConditionMethod):
for (i in 1:length(sums[,1])) {
subset = melted[melted$cond==sums[i,1],]
newvec = subset[1,]
- newvec$cnt = 1 # note: NZmean and NZperc are copied from last dataset in condition
+ newvec$cnt = 1 # note: non_zero_mean and NZperc are copied from last dataset in condition
#newvec$cnt = as.integer(mean(subset$cnt))+1 # add the mean for each condition as a pseudocount
melted = rbind(melted,newvec) }
}
@@ -349,9 +354,10 @@ class ZinbMethod(base.MultiConditionMethod):
# this gives same answer, but I would need to extract the Pvalue...
#require(lmtest)
#print(lrtest(mod1,mod0))
+ print("Finished ZINB in R")
return (c(pval, status))
}
- ''')
+ """)
return globalenv['zinb_signif']
@@ -453,6 +459,12 @@ class ZinbMethod(base.MultiConditionMethod):
melted = DataFrame(df_args)
# r_args = [IntVector(readCounts), StrVector(condition), melted, map(lambda x: StrVector(x), covars), FloatVector(NZmean), FloatVector(logitZPerc)] + [True]
debugFlag = True if DEBUG or GENE else False
+ #print(f'''melted =''', str(melted))
+ print("zinbMod1", str(zinbMod1))
+ print("zinbMod0", str(zinbMod0))
+ print("nbMod1", str(nbMod1))
+ print("nbMod0", str(nbMod0))
+ print("debugFlag", str(debugFlag))
pval, msg = r_zinb_signif(melted, zinbMod1, zinbMod0, nbMod1, nbMod0, debugFlag)
status.append(msg)
pvals.append(float(pval))
@@ -525,8 +537,18 @@ class ZinbMethod(base.MultiConditionMethod):
condition_name = self.condition
# if a covar is not found, this crashes; check for it?
# read it first with no condition specified, to get original Condition names
- conditionsByFile1, covariatesByFileList1, interactionsByFileList1, orderingMetadata1 = tnseq_tools.read_samples_metadata(self.metadata, self.covars, self.interactions) # without specifiying condition
- conditionsByFile, covariatesByFileList, interactionsByFileList, orderingMetadata = tnseq_tools.read_samples_metadata(self.metadata, self.covars, self.interactions, condition_name=condition_name)
+ # (
+ # conditionsByFile1, # use this when the exclude condition arg
+ # covariatesByFileList1,
+ # interactionsByFileList1,
+ # orderingMetadata1
+ # ) = tnseq_tools.read_samples_metadata(self.metadata, self.covars, self.interactions) # without specifiying condition
+ (
+ conditionsByFile,
+ covariatesByFileList,
+ interactionsByFileList,
+ orderingMetadata
+ ) = tnseq_tools.read_samples_metadata(self.metadata, self.covars, self.interactions, condition_name=condition_name)
## [Condition] in the order of files in combined wig
conditions = self.wigs_to_conditions(
@@ -575,6 +597,8 @@ class ZinbMethod(base.MultiConditionMethod):
pairs = []
print("\nsamples in cross-product:")
any_empty = self.expandVar([],vars,varsByFileList,vars2vals,set(samples_used))
+ print(f'''varsByFileList = {varsByFileList}''')
+ print(f'''vars2vals = {vars2vals}''')
if any_empty: print("warning: ZINB requires samples in all combinations of conditions; the fact that one is empty could result in Model Errors")
genes = tnseq_tools.read_genes(self.annotation_path)
=====================================
src/pytransit/data.ignore/macrophages.comwig
=====================================
The diff for this file was not included because it is too large.
=====================================
src/pytransit/data.ignore/macrophages_metadata.txt
=====================================
@@ -0,0 +1,30 @@
+Id Condition Filename
+Input-1 Input /pacific/home/ioerger/TRASH/macrophages/Input-1_raw_templates.wig
+Input-2 Input /pacific/home/ioerger/TRASH/macrophages/Input-2_raw_templates.wig
+Input-3 Input /pacific/home/ioerger/TRASH/macrophages/Input-3_raw_templates.wig
+Input-4 Input /pacific/home/ioerger/TRASH/macrophages/Input-4_raw_templates.wig
+Untreated1 Untreated /pacific/home/ioerger/TRASH/macrophages/Untreated1_raw_templates.wig
+Untreated2 Untreated /pacific/home/ioerger/TRASH/macrophages/Untreated2_raw_templates.wig
+Untreated3 Untreated /pacific/home/ioerger/TRASH/macrophages/Untreated3_raw_templates.wig
+Tcell1 Tcell /pacific/home/ioerger/TRASH/macrophages/Tcell1_raw_templates.wig
+Tcell2 Tcell /pacific/home/ioerger/TRASH/macrophages/Tcell2_raw_templates.wig
+Tcell3 Tcell /pacific/home/ioerger/TRASH/macrophages/Tcell3_raw_templates.wig
+Tcell4 Tcell /pacific/home/ioerger/TRASH/macrophages/Tcell4_raw_templates.wig
+Rapamycin1 Rapa /pacific/home/ioerger/TRASH/macrophages/Rapamycin1_raw_templates.wig
+Rapamycin2 Rapa /pacific/home/ioerger/TRASH/macrophages/Rapamycin2_raw_templates.wig
+Rapamycin3 Rapa /pacific/home/ioerger/TRASH/macrophages/Rapamycin3_raw_templates.wig
+IL1-1 IL1 /pacific/home/ioerger/TRASH/macrophages/IL1-1_raw_templates.wig
+IL1-2 IL1 /pacific/home/ioerger/TRASH/macrophages/IL1-2_raw_templates.wig
+IL1-3 IL1 /pacific/home/ioerger/TRASH/macrophages/IL1-3_raw_templates.wig
+IFNg1 IFNg /pacific/home/ioerger/TRASH/macrophages/IFNg1_raw_templates.wig
+IFNg2 IFNg /pacific/home/ioerger/TRASH/macrophages/IFNg2_raw_templates.wig
+IFNg3 IFNg /pacific/home/ioerger/TRASH/macrophages/IFNg3_raw_templates.wig
+GMCSF1 GMCSF /pacific/home/ioerger/TRASH/macrophages/GMCSF1_raw_templates.wig
+GMCSF2 GMCSF /pacific/home/ioerger/TRASH/macrophages/GMCSF2_raw_templates.wig
+GMCSF3 GMCSF /pacific/home/ioerger/TRASH/macrophages/GMCSF3_raw_templates.wig
+IFNg-GMCSF1 IFNg-GMCSF /pacific/home/ioerger/TRASH/macrophages/IFNg-GMCSF1_raw_templates.wig
+IFNg-GMCSF2 IFNg-GMCSF /pacific/home/ioerger/TRASH/macrophages/IFNg-GMCSF2_raw_templates.wig
+IFNg-GMCSF3 IFNg-GMCSF /pacific/home/ioerger/TRASH/macrophages/IFNg-GMCSF3_raw_templates.wig
+IFNab1 IFNab /pacific/home/ioerger/TRASH/macrophages/IFNab1_raw_templates.wig
+IFNab2 IFNab /pacific/home/ioerger/TRASH/macrophages/IFNab2_raw_templates.wig
+IFNab3 IFNab /pacific/home/ioerger/TRASH/macrophages/IFNab3_raw_templates.wig
=====================================
src/pytransit/data/CGI/IDs.H37Rv.CRISPRi.lib.txt
=====================================
The diff for this file was not included because it is too large.
=====================================
src/pytransit/data/CGI/counts_metadata.txt
=====================================
@@ -0,0 +1,13 @@
+column_name drug conc_xMIC days_predepletion
+DMSO_D1_rep1 DMSO 0 1
+DMSO_D1_rep2 DMSO 0 1
+DMSO_D1_rep3 DMSO 0 1
+RIF_D1_Low_rep1 RIF 0.0625 1
+RIF_D1_Low_rep2 RIF 0.0625 1
+RIF_D1_Low_rep3 RIF 0.0625 1
+RIF_D1_Med_rep1 RIF 0.125 1
+RIF_D1_Med_rep2 RIF 0.125 1
+RIF_D1_Med_rep3 RIF 0.125 1
+RIF_D1_High_rep1 RIF 0.25 1
+RIF_D1_High_rep2 RIF 0.25 1
+RIF_D1_High_rep3 RIF 0.25 1
\ No newline at end of file
=====================================
src/pytransit/data/CGI/sgRNA_metadata.txt
=====================================
The diff for this file was not included because it is too large.
=====================================
src/pytransit/data/CGI/uninduced_ATC_counts.txt
=====================================
The diff for this file was not included because it is too large.
=====================================
src/pytransit/doc/source/CGI.rst
=====================================
@@ -0,0 +1,247 @@
+.. rst-class:: transit_clionly
+
+.. _cgi:
+
+
+CRISPRi-DR
+==========
+CRISPRi-DR is designed to analyze CRISPRi libraries from CGI experiments and identify significant CGIs ie genes that affect sensitivity to the drug when depleted.
+[REF: TBA]
+
+
+Workflow
+--------
+Starting with fastq files, barcode counts are extracted. The user creates their own metadata file, for the counts. Fractional abundances are and used to run the CRISPRi-DR model. The output of this model is a file that lists genes with their statistacal parameters and significance. Genes with significant interactions are those with qval of condetration dependence < 0.05 and \|Z score of concentration dpendence|>2 on the slope coefficient. However, genes can be ranked by depletion by sorting the coefficient of concentration dependence in ascending order
+
+
+.. image:: _images/CGI_workflow.png
+ :width: 1000
+ :alt: Alternative text
+
+**Preprocessing: Fastq to Count Files**
+
+This is a longer process, taking a few minutes each. However, the number of reads processed is printed to the console to indicate progress.
+
+::
+
+ > python3 ../src/transit.py CGI extract_counts <fastq_file> <ids_file> > <counts_file>
+
+* ids_file : List of sgRNAs used in the experiment, where each row is one sgRNA id.
+ * For H37Rv experiments, the ids file is available in : ``transit/src/pytransit/data/CGI/IDs.H37Rv.CRISPRi.lib.txt``
+
+
+**Step 1: Combine Individual Counts File to a Combined Counts File**
+This is a fairly fast process. It takes at most a minute for the combination of 12 files with 2 columns (sgRNA id and counts) to one large file of 13 columns (first column sgRNA id and remaining columns are counts from the files entered).
+
+::
+
+ > python3 ../src/transit.py CGI combined_counts <comma seperated headers> <counts file 1> <counts file 2> ... <counts_file n> > <combined counts file>
+
+* counts files : sgRNA ids as their first column, and can have any number of columns.
+* comma-separated headers: the column names of the combined counts file
+ .. note::
+ the comma-seperated headers must be in order of the columns in the count files(s) provided
+
+
+**Step 2: Extract Fractional Abundances**
+
+ This is a relatively quick process, taking less than a minute. This step is to turn the barcodes counts into relative normalized abundances. Counts are normalized within samples and calculated relative to the abundances in the uninduced ATC file, essentially fractions. The first few lines of the output file contains information about the counts files processed.
+
+::
+
+ > python3 ../src/transit.py CGI extract_abund <combined counts file> <counts metadata file> <control condition> <sgRNA strengths file> <uninduced ATC file> <drug> <days> > <fractional abundance file>
+
+* counts metadata file (USER created):
+
+ * The columns expected in this file: column_name, drug, conc_xMIC, days_predepletion
+
+ * column_name: the corresponding header name(s) in the combined counts file
+ * conc_xMIC is the concentration of the drug the sample is treated with
+ .. warning::
+ conc_xMIC must be a numerical value, ie. 0.5 and not a categorical value such as "low" or "high"
+ * Equal number of replicates for all concentrations are not nessessary
+ * see [Li, S et al. 2022, PMID: 35637331] for explanation of days_predepletion
+
+ * Example metadata: ``transit/src/pytransit/data/CGI/counts_metadata.txt``
+
+* control condition: The condition to to be considered the control for these set of experiments, as specificed in the "drug" column of the metadata file; typically an atc-induced (+ ATC) with 0 drug concentration condition.
+
+* sgRNA strengths file: A file that contains metadata for each sgRNA in the combined counts file, where the first column must be sgRNA id (as seen in the combined counts file) and the last column must be the strength measurement of the sgRNAs (in publication of this method, sgRNA strength is measurement as extrapolated LFCs calculated through a passaging experiment).
+
+* uninduced ATC file: A two column file of sgRNAs and their counts in uninduced ATC (no ATC) with 0 drug concentration
+
+* drug : Name of the drug in the "drug" column of the metadata file passed in to be fit in the model
+
+* days: Sampled from predepletion day as listed in the "days_predepletion" column of the metadata file to be used in the analysis
+
+
+**Step 3: Run the CRISPRi-DR model**
+
+This is a relatively quick process, taking at most 3 minutes for a dataset of ~90,000 sgRNAs . This step fits the CRISPRi-DR model (statistical analysis of concentration dependence for each gene) to each gene in the file and prints each output to the <CRISPRi-DR results file> in a tab seperated file.
+::
+
+ > python3 ../src/transit.py CGI run_model <fractional abundance file> > <CRISPRi-DR results file>
+
+* Siginificant interacting genes are those with adjusted P-val (Q-val) < 0.05 and \|Z slope\| > 2, these are indicated by a "-1" for depleted and "1" for enriched in in the "Significant Interactions" column
+
+.. note::
+ When the file is sorted on the slope of concentration dependence, the user can rank the genes based on amount of depletion.
+
+
+**Visualize Specific Genes**
+
+This process is fairly quick, taking less than a minute to run. This figure visualizes the amount of depletion in a gene at the sgRNA level. If control concentration provided is 0, the lowest value on the x-axis in the plot refers to this concentration (due to taking log concentration, 0 concentration is treated as a teo fold lower than the lowest concentration.) The slope of relative abundance (fraction of abundance of counts in ATC induced vs. ATC uninduced) versus log2(concentration) for each sgRNA is calculated and plotted, colored by sgRNA strenght based on a blue-orange gradient (as seen here):
+
+.. image:: _images/RVBD3645_lmplot.png
+ :width: 400
+ :alt: Alternative text
+
+::
+
+ > python3 ../src/transit.py CGI visualize <fractional abundance file> <gene> <output plot location>
+
+* fractional abundance file : Fractional abundance file as created in Step 2.
+
+ .. warning::
+ This visualization assumes the columns are in increasing order of concentration, with the first three abundance columns (after the column "sgRNA strength"), as the control. This order depends on the order of columns during the creation of the combined counts file in Step 1.
+
+* gene : select a gene to visualize. Use orf or gene name
+* output plot location : The location where to save the generated plot.
+
+.. note::
+ If comparing plots from different genes, note the scale of sgRNA strength shown in the plots.
+
+
+Tutorial
+-------
+**Data : Obtain FastQ files from NCBI using the following run numbers**
+
+Fetch and process the following fastq files from NCBI using the SRA toolkit and place them in the ``transit/src/pytransit/data/CGI`` directory :
+
+* FastQ files for the 3 replicates of control samples in this experiment. They are in a ATC-induced 0 drug concentration DMSO library with 1 day predepletion
+
+ * SRR14827863 -> extracts SRR14827863_1.fastq
+ * SRR14827862 -> extracts SRR14827862_1.fastq
+ * SRR14827799 -> extracts SRR14827799_1.fastq
+
+* FastQ files for 3 replicates of high concentration RIF in a 1 day pre-depletion library
+
+ * SRR14827727 -> extracts SRR14827727_1.fastq
+ * SRR14827861 -> extracts SRR14827861_1.fastq
+ * SRR14827850 -> extracts SRR14827850_1.fastq
+* FastQ files for 3 replicates of medium concentration RIF in a 1 day pre-depletion library
+
+ * SRR14827760 -> extracts SRR14827760_1.fastq
+ * SRR14827749 -> extracts SRR14827749_1.fastq
+ * SRR14827738 -> extracts SRR14827738_1.fastq
+
+* FastQ files for 3 replicates of low concentration RIF in a 1 day pre-depletion library
+
+ * SRR14827769 -> extracts SRR14827769_1.fastq
+ * SRR14827614 -> extracts SRR14827614_1.fastq
+ * SRR14827870 -> extracts SRR14827870_1.fastq
+
+This tutorial shows commands relative to this directory. Other files in the ``transit/src/pytransit/data/CGI`` directory are:
+
+* counts_metadata.txt - describes the samples
+* sgRNA_metadata.txt - contains extrapolated LFCs for each sgRNA
+* uninduced_ATC_counts.txt - counts for uninduced ATC (no induction of target depletion) library
+* IDs.H37Rv.CRISPRi.lib.txt - ids of the sgRNAs that target the genes in H37Rv used in these experiments
+
+
+
+**Preprocessing: Fastq to Count Files**
+
+Create file of barcode counts from fastq files. Each fastq files reflect one replicate of a drug concentration, thus each will be converted into a file with two columns, sgNRA id and barcode counts
+
+::
+
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827863_1.fastq IDs.H37Rv.CRISPRi.lib.txt > DMSO_D1_rep1.counts
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827862_1.fastq IDs.H37Rv.CRISPRi.lib.txt > DMSO_D1_rep2.counts
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827799_1.fastq IDs.H37Rv.CRISPRi.lib.txt > DMSO_D1_rep3.counts
+
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827769_1.fastq IDs.H37Rv.CRISPRi.lib.txt > RIF_D1_Low_rep1.counts
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827614_1.fastq IDs.H37Rv.CRISPRi.lib.txt > RIF_D1_Low_rep2.counts
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827870_1.fastq IDs.H37Rv.CRISPRi.lib.txt > RIF_D1_Low_rep3.counts
+
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827760_1.fastq IDs.H37Rv.CRISPRi.lib.txt > RIF_D1_Med_rep1.counts
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827749_1.fastq IDs.H37Rv.CRISPRi.lib.txt > RIF_D1_Med_rep2.counts
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827738_1.fastq IDs.H37Rv.CRISPRi.lib.txt > RIF_D1_Med_rep3.counts
+
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827727_1.fastq IDs.H37Rv.CRISPRi.lib.txt > RIF_D1_High_rep1.counts
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827861_1.fastq IDs.H37Rv.CRISPRi.lib.txt > RIF_D1_High_rep2.counts
+ > python3 ../../../transit.py CGI extract_counts RIF_fastq_files/SRR14827850_1.fastq IDs.H37Rv.CRISPRi.lib.txt > RIF_D1_High_rep3.counts
+
+
+
+**Step 1: Combine Counts Files to a Combined Counts File**
+
+Combine the 12 seperate counts files into one combined counts file. Here we put the control samples first (DMSO) and then the drug-treated libraries (RIF) in increasing concentration
+
+::
+
+ > python3 ../../../transit.py CGI create_combined_counts DMSO_D1_rep1,DMSO_D1_rep2,DMSO_D1_rep3,RIF_D1_Low_rep1,RIF_D1_Low_rep2,RIF_D1_Low_rep3,RIF_D1_Med_rep1,RIF_D1_Med_rep2,RIF_D1_Med_rep3,RIF_D1_High_rep1,RIF_D1_High_rep2,RIF_D1_High_rep3 DMSO_D1_rep1.counts DMSO_D1_rep2.counts DMSO_D1_rep3.counts RIF_D1_Low_rep1.counts RIF_D1_Low_rep2.counts RIF_D1_Low_rep3.counts RIF_D1_Med_rep1.counts RIF_D1_Med_rep2.counts RIF_D1_Med_rep3.counts RIF_D1_High_rep1.counts RIF_D1_High_rep2.counts RIF_D1_High_rep3.counts > RIF_D1_combined_counts.txt
+
+The resulting file will have 13 columns, where the first column is sgRNA ids and the remaining are the counts for three replicates each for DMSO, RIF D1 Low Concentration, RIF D1 Med Concentration and RIF D1 High Concentration, respectively.
+
+**Step 2: Extract Fractional Abundances**
+
+.. note::
+ As a part of this step, the *user must also generate a metadata file.* , ie. ``counts_metadata.txt``. Note the values in the conc_xMIC column is actual values (0.0625, 0.125, 0.25) and not categorical values ("low", "medium", "high") as seen in the counts file names.
+
+::
+
+ > python3 ../../../transit.py CGI extract_abund RIF_D1_combined_counts.txt counts_metadata.txt DMSO sgRNA_metadata.txt uninduced_ATC_counts.txt RIF 1 > RIF_D1_frac_abund.txt
+
+The result of this command should be a file with a set of comments at the top, detailing the libraries used (DMSO and RIF). There should be a total of 17 columns, the last 12 of which are the calculated abundances, the first is the sgRNA ids followed by the orf/gene the sgRNA is targeting, uninduced ATC values, and sgRNA strength.
+
+**Step 3: Run the CRISPRi-DR model**
+::
+
+ > python3 ../../../transit.py CGI run_model RIF_D1_frac_abund.txt > RIF_D1_CRISPRi-DR_results.txt
+
+There should be a total of 184 significant gene interactions, where 111 are significant depletions and 73 are significantly enriched.
+
+.. note::
+ When the file is sorted on the slope of concentration dependence, the user can rank the genes based on amount of depletion.
+
+**Visualize Specific Genes**
+
+Here are a few samples of the interactions visualized at the sgRNA level for this experiment. Note the difference in sgRNA strength scales shown.
+
+*Significantly depleted gene : RVBD3645*
+
+*RVBD3645* is one of the significantly depleted genes in this experiment. In this plot, notice how most of the slopes are negative but the amount of depletion varies, where the more blue slopes (higher sgRNA strength) are steeper than orange sgRNA slopes (lower sgRNA strength)
+
+.. image:: _images/RVBD3645_lmplot.png
+ :width: 400
+ :alt: Alternative text
+
+::
+
+ > python3 ../../../transit.py CGI visualize RIF_D1_frac_abund.txt RVBD3645 ./RVBD3645_lmplot.png
+
+*Significantly enriched gene : ndh*
+
+*ndh* is one of the signifincantly enriched genes in this experiment. In its plot, notice how sgRNAs of high strength (blue green ones) show a strong upwards trend but those will lower strength (the orange ones) do not. In fact there a few sgRNAs that show almost no change in fractional abundace as concentration increases.
+
+.. image:: _images/ndh_lmplot.png
+ :width: 400
+ :alt: Alternative text
+
+::
+
+ > python3 ../../../transit.py CGI visualize RIF_D1_frac_abund.txt ndh ./ndh_lmplot.png #enriched
+
+*Non-interacting gene : thiL*
+
+*thiL* is an example on an non-interacting gene. It was found to be neither signifinicantly enriched nor depleted. Notice how in its plot, most of the slopes are fairly flat. As seen in the plots of *RVBD3645* and *ndh*, the bluer slopes show greater depletion than the orange slopes, but there is no overall trend present
+
+.. image:: _images/thiL_lmplot.png
+ :width: 400
+ :alt: Alternative text
+
+
+::
+
+ > python3 ../../../transit.py CGI visualize RIF_D1_frac_abund.txt thiL ./thiL_lmplot.png
=====================================
src/pytransit/doc/source/_images/CGI_workflow.png
=====================================
Binary files /dev/null and b/src/pytransit/doc/source/_images/CGI_workflow.png differ
=====================================
src/pytransit/doc/source/_images/RVBD3645_lmplot.png
=====================================
Binary files /dev/null and b/src/pytransit/doc/source/_images/RVBD3645_lmplot.png differ
=====================================
src/pytransit/doc/source/_images/ndh_lmplot.png
=====================================
Binary files /dev/null and b/src/pytransit/doc/source/_images/ndh_lmplot.png differ
=====================================
src/pytransit/doc/source/_images/thiL_lmplot.png
=====================================
Binary files /dev/null and b/src/pytransit/doc/source/_images/thiL_lmplot.png differ
=====================================
src/pytransit/doc/source/index.rst
=====================================
@@ -16,6 +16,15 @@ This page contains the documentation for TRANSIT. Below are a few
quick links to some of the most important sections of the
documentation, followed by a brief overview of TRANSIT's features.
+.. NOTE::
+
+ This documentation is for the original version of Transit. We
+ recently released a new version of Transit called `TRANSIT2
+ <https://transit2.readthedocs.io/en/latest/>`_ (re-implementation
+ from scratch), which has most of the same TnSeq analytical methods,
+ but it has an enhanced GUI. Nonetheless, this original version of
+ TRANSIT is still being maintained for backwards-compatibility.
+
Quick Links
~~~~~~~~~~~
.. _quick-link:
@@ -70,10 +79,16 @@ TRANSIT offers a variety of features including:
transit_install
transit_running
transit_features
+
+.. toctree::
+ :maxdepth: 3
+ :caption: [** NEW **] CRISRPi ANALYSIS METHODS
+
+ CGI
.. toctree::
:maxdepth: 3
- :caption: ANALYSIS METHODS
+ :caption: TnSeq ANALYSIS METHODS
method_gumbel
method_griffin
=====================================
src/pytransit/doc/source/transit_install.rst
=====================================
@@ -35,6 +35,11 @@ Requirements
TRANSIT runs on both python2.7 and python3. But the dependencies vary slightly.
+Starting with release v3.0, TRANSIT now requires python3.
+
+To use TRANSIT with python2, use a TRANSIT release prior to 3.0 (e.g. v2.5.2)
+
+
Python 2.7:
-----------
View it on GitLab: https://salsa.debian.org/med-team/tnseq-transit/-/compare/e7fcdb595572f7ea598ca27f80632375a9e0790d...898cc600b276228ce95ff463aa822e85dffdcaa1
--
View it on GitLab: https://salsa.debian.org/med-team/tnseq-transit/-/compare/e7fcdb595572f7ea598ca27f80632375a9e0790d...898cc600b276228ce95ff463aa822e85dffdcaa1
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20230813/7892b388/attachment-0001.htm>
More information about the debian-med-commit
mailing list