[med-svn] [Git][med-team/rna-star][upstream] New upstream version 2.7.5c+dfsg

Steffen Möller gitlab at salsa.debian.org
Fri Aug 21 20:52:05 BST 2020



Steffen Möller pushed to branch upstream at Debian Med / rna-star


Commits:
63ea0026 by Steffen Moeller at 2020-08-21T21:46:35+02:00
New upstream version 2.7.5c+dfsg
- - - - -


24 changed files:

- CHANGES.md
- README.md
- doc/STARmanual.pdf
- extras/doc-latex/STARmanual.tex
- extras/doc-latex/parametersDefault.tex
- extras/docker/Dockerfile
- source/ErrorWarning.cpp
- source/GTF.cpp
- source/Genome_genomeGenerate.cpp
- source/IncludeDefine.h
- source/Makefile
- source/Parameters.cpp
- source/ParametersSolo.cpp
- source/Parameters_openReadsFiles.cpp
- source/Parameters_readSAMheader.cpp
- source/ReadAlignChunk_processChunks.cpp
- source/ReadAlign_alignBAM.cpp
- source/STAR.cpp
- source/SoloFeature_collapseUMI.cpp
- source/Transcriptome_classifyAlign.cpp
- source/VERSION
- source/genomeScanFastaFiles.cpp
- source/parametersDefault
- source/readLoad.cpp


Changes:

=====================================
CHANGES.md
=====================================
@@ -1,5 +1,26 @@
+STAR 2.7.5c --- 2020/08/16
+==========================
+Bug-fix release.
+----------------
+
+* Issue #988: proceed reading from GTF after a warning that exon end is past chromosome end.
+* Issue #978: fixed corrupted transcriptInfo.tab in genome generation for cases where GTF file contains extra chromosomes not present in FASTA files.
+* Issue #945: otuput GX/GN for --soloFeatures GeneFull .
+* Implemented removal of control characters from the ends of input read lines, for compatibility with files pre-processed on Windows.
+
+STAR 2.7.5b --- 2020/08/01
+==========================
+Bug-fix release.
+----------------
+
+* Issue #558: Fixed a bug that can cause a seg-fault in STARsolo run with paired-end reads that have protruding ends.
+* Issue #952: Increased the maximum allowed length of the SAM tags in the input SAM files.
+* Issue #955: fixed seg-fault-causing bug for --soloFeatures SJ option.
+* Issue #963: When reading GTF file, skip any exons that extend past the end of the chromosome, and give a warning.
+* Issue #965: output genome sizes with and without padding into Log.out.
 * Docker build: switched to debian:stable-slim in the Dockerfile.
 * --soloType CB_samTagOut now allows output of (uncorrected) UMI sequences and quality scores with SAM tags UR and UY.
+* Throw an error if FIFO file cannot be created on non-Linux partitions.
 
 STAR 2.7.5a 2020/06/16
 ======================


=====================================
README.md
=====================================
@@ -1,7 +1,7 @@
 STAR 2.7
 ========
 Spliced Transcripts Alignment to a Reference
-© Alexander Dobin, 2009-2019
+© Alexander Dobin, 2009-2020
 https://www.ncbi.nlm.nih.gov/pubmed/23104886
 
 AUTHOR/SUPPORT
@@ -35,9 +35,9 @@ Download the latest [release from](https://github.com/alexdobin/STAR/releases) a
 
 ```bash
 # Get latest STAR source from releases
-wget https://github.com/alexdobin/STAR/archive/2.7.5a.tar.gz
-tar -xzf 2.7.5a.tar.gz
-cd STAR-2.7.5a
+wget https://github.com/alexdobin/STAR/archive/2.7.5c.tar.gz
+tar -xzf 2.7.5c.tar.gz
+cd STAR-2.7.5c
 
 # Alternatively, get STAR source using git
 git clone https://github.com/alexdobin/STAR.git


=====================================
doc/STARmanual.pdf
=====================================
Binary files a/doc/STARmanual.pdf and b/doc/STARmanual.pdf differ


=====================================
extras/doc-latex/STARmanual.tex
=====================================
@@ -34,7 +34,7 @@
 
 \newcommand{\sechyperref}[1]{\hyperref[#1]{Section \ref{#1}. \nameref{#1}}}
 
-\title{STAR manual 2.7.5a}
+\title{STAR manual 2.7.5c}
 \author{Alexander Dobin\\
 dobin at cshl.edu}
 \maketitle
@@ -292,28 +292,83 @@ The \opt{outSAMmultNmax} parameter limits the number of output alignments (SAM l
 
 
 \subsubsection{SAM attributes.}
-The SAM attributes can be specified by the user using \opt{outSAMattributes} \optvr{A1 A2 A3 ...} option which accept a list of 2-character SAM attributes. The implemented attributes are: \optv{NH HI NM MD AS nM jM jI XS}. By default, STAR outputs \optv{NH HI AS nM} attributes.
-\begin{itemize}
+The SAM attributes can be specified by the user using \opt{outSAMattributes} \optvr{A1 A2 A3 ...} option which accept a list of 2-character SAM attributes. The attributes can be listed in any order, and will be recorded in that order in the SAM file. By default, STAR outputs \optv{NH HI AS nM} attributes.
+
+\begin{itemize}[itemsep=1pt]
+\item[]	
+\textbf{Presets:}
+%
+\item[]
+\optv{None} : No SAM attributes
+%
 \item[]
-\optv{NH HI NM MD} : have standard meaning as defined in the SAM format specifications.
+\optv{Standard} : NH HI AS nM
+%
+\item[]
+\optv{All} : NH HI AS nM NM MD jM jI MC ch 	
+
+\textbf{Alignment:}
+%
 \item[]
-\optv{AS} : id the local alignment score (paired for paired-end reads). 
+\optv{NH} : number of loci the reads maps to: $=1$ for unique mappers, $>1$ for multimappers. Standard SAM tag.
+%
 \item[]
-\optv{nM} : is the number of mismatches per (paired) alignment, not to be confused with \optv{NM}, which is the number of mismatches in each mate..
+\optv{HI}  : multiple alignment index, starts with --outSAMattrIHstart ($=1$ by default). Standard SAM tag.
+%
+\item[]
+\optv{AS} : local alignment score, $+1/-1$ for matches/mismateches, score* penalties for indels and gaps. For PE reads, total score for two mates. Stadnard SAM tag.
+%
+\item[]
+\optv{NM} : edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag.
+%
+\item[]
+\optv{nM} : number of mismatches per (paired) alignment, not to be confused with \optv{NM}, which is the number of mismatches+indels in each mate..
+%
 \item[]
 \optv{jM:B:c,M1,M2,...} : intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value.
+%
+\item[]
+\optv{MD} : string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag.
+%
 \item[]
 \optv{jI:B:I,Start1,End1,Start2,End2,...} : Start and End of introns for all junctions (1-based).
+%
 \item[]
 \optv{jM jI} : attributes require samtools 0.1.18 or later, and were reported to be incompatible with some downstream tools such as Cufflinks.
+
+\textbf{Variation:}
+%
+\item[]
+\optv{vA} : variant allele.
+%
+\item[]
+\optv{vG} : genomic coordinate of the variant overlapped by the read.
+%
+\item[]
+\optv{vW} : WASP filtering tag, see detailed description in Section \ref{section:WASP}. Requires \opt{waspOutputMode} \optv{SAMtag}.
+
+\textbf{STARsolo:}
+%
 \item[]
-\optv{vA} : variant allele
+\optv{CR CY UR UY} : sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing, not error corrected.
+%
 \item[]
-\optv{vG} : genomic coordiante of the variant overlapped by the read
+\optv{GX GN} : gene ID and name.
+%
 \item[]
-\optv{vW} : WASP filtering tag, see detailed description in Section \ref{section:WASP}==. Requires --waspOutputMode SAMtag
+\optv{CB UB} : error-corrected cell barcodes and UMIs for solo* demultiplexing. Requires \opt{outSAMtype} \optv{BAM SortedByCoordinate}.
+%
 \item[]
-\optv{CR CY UR UY} : STARsolo: sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing, not error corrected
+\optv{sM} : assessment of CB and UMI.
+%
+\item[]
+\optv{sS} : sequence of the entire barcode (CB,UMI,adapter...).
+%
+\item[]
+\optv{sQ} : quality of the entire barcode.
+
+
+\textbf{Unmapped reads:}
 \item[]
 \optv{uT} : for unmapped reads, reason for not mapping:
            \begin{itemize}[noitemsep,topsep=-3pt]


=====================================
extras/doc-latex/parametersDefault.tex
=====================================
@@ -291,35 +291,47 @@
 \end{optOptTable}
 \optName{outSAMattributes}
   \optValue{Standard}
-  \optLine{string: a string of desired SAM attributes, in the order desired for the output SAM} 
+  \optLine{string: a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order.} 
+  \optLine{***Presets:} 
 \begin{optOptTable}
-  \optOpt{NH HI AS nM NM MD jM jI XS MC ch}   \optOptLine{any combination in any order}
   \optOpt{None}   \optOptLine{no attributes}
   \optOpt{Standard}   \optOptLine{NH HI AS nM}
   \optOpt{All}   \optOptLine{NH HI AS nM NM MD jM jI MC ch}
 \end{optOptTable}
-  \optLine{variation:} 
+  \optLine{***Alignment:} 
+\begin{optOptTable}
+  \optOpt{NH}   \optOptLine{number of loci the reads maps to: =1 for unique mappers, {\textgreater}1 for multimappers. Standard SAM tag.}
+  \optOpt{HI}   \optOptLine{multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag.}
+  \optOpt{AS}   \optOptLine{local alignment score, +1/-1 for matches/mismateches, score* penalties for indels and gaps. For PE reads, total score for two mates. Stadnard SAM tag.}
+  \optOpt{nM}   \optOptLine{number of mismatches. For PE reads, sum over two mates.}
+  \optOpt{NM}   \optOptLine{edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag.}
+  \optOpt{MD}   \optOptLine{string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag.}
+  \optOpt{jM}   \optOptLine{intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value.}
+  \optOpt{jI}   \optOptLine{start and end of introns for all junctions (1-based).}
+  \optOpt{XS}   \optOptLine{alignment strand according to --outSAMstrandField.}
+  \optOpt{MC}   \optOptLine{mate's CIGAR string. Standard SAM tag.}
+  \optOpt{ch}   \optOptLine{marks all segment of all chimeric alingments for --chimOutType WithinBAM output.}
+\end{optOptTable}
+  \optLine{***Variation:} 
 \begin{optOptTable}
   \optOpt{vA}   \optOptLine{variant allele}
-  \optOpt{ha}   \optOptLine{haplotype (1/2) when mapping to the diploid genome. Requires genome generated with --genomeTransformType Diploid}
-  \optOpt{vG}   \optOptLine{genomic coordinate of the variant overlapped by the read}
+  \optOpt{vG}   \optOptLine{genomic coordinate of the variant overlapped by the read.}
   \optOpt{vW}   \optOptLine{1 - alignment passes WASP filtering; 2,3,4,5,6,7 - alignment does not pass WASP filtering. Requires --waspOutputMode SAMtag.}
 \end{optOptTable}
-  \optLine{STARsolo:} 
+  \optLine{***STARsolo:} 
 \begin{optOptTable}
-  \optOpt{CR CY UR UY}   \optOptLine{sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing}
+  \optOpt{CR CY UR UY}   \optOptLine{sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing.}
+  \optOpt{GX GN}   \optOptLine{gene ID and gene name.}
   \optOpt{CB UB}   \optOptLine{error-corrected cell barcodes and UMIs for solo* demultiplexing. Requires --outSAMtype BAM SortedByCoordinate.}
-  \optOpt{sM}   \optOptLine{assessment of CB and UMI}
-\end{optOptTable}
-  \optLine{sS          ... sequence of the entire barcode (CB,UMI,adapter...)} 
-\begin{optOptTable}
-  \optOpt{sQ}   \optOptLine{quality of the entire barcode}
-  \optOpt{GX GN}   \optOptLine{gene ID and gene name}
+  \optOpt{sM}   \optOptLine{assessment of CB and UMI.}
+  \optOpt{sS}   \optOptLine{sequence of the entire barcode (CB,UMI,adapter).}
+  \optOpt{sQ}   \optOptLine{quality of the entire barcode.}
 \end{optOptTable}
-  \optLine{Unsupported/undocumented:} 
+  \optLine{***Unsupported/undocumented:} 
 \begin{optOptTable}
-  \optOpt{rB}   \optOptLine{alignment block read/genomic coordinates}
-  \optOpt{vR}   \optOptLine{read coordinate of the variant}
+  \optOpt{ha}   \optOptLine{haplotype (1/2) when mapping to the diploid genome. Requires genome generated with --genomeTransformType Diploid .}
+  \optOpt{rB}   \optOptLine{alignment block read/genomic coordinates.}
+  \optOpt{vR}   \optOptLine{read coordinate of the variant.}
 \end{optOptTable}
 \optName{outSAMattrIHstart}
   \optValue{1}


=====================================
extras/docker/Dockerfile
=====================================
@@ -2,7 +2,7 @@ FROM debian:stable-slim
 
 MAINTAINER dobin at cshl.edu
 
-ARG STAR_VERSION=2.7.5a
+ARG STAR_VERSION=2.7.5c
 
 ENV PACKAGES gcc g++ make wget zlib1g-dev unzip
 


=====================================
source/ErrorWarning.cpp
=====================================
@@ -5,10 +5,12 @@
 #include "TimeFunctions.h"
 #include "GlobalVariables.h"
 
-void exitWithError(string messageOut, ostream &streamOut1, ostream &streamOut2, int errorInt, Parameters &P) {
+void exitWithError(string messageOut, ostream &streamOut1, ostream &streamOut2, int errorInt, Parameters &P)
+{
+    if (P.runThreadN>1) 
+        pthread_mutex_lock(&g_threadChunks.mutexError);
     time_t timeCurrent;
     time( &timeCurrent);
-    if (P.runThreadN>1) pthread_mutex_lock(&g_threadChunks.mutexError);
     if (streamOut1.good()) {
         streamOut1 << "\n" << messageOut << endl << timeMonthDayTime(timeCurrent) <<" ...... FATAL ERROR, exiting\n"  <<flush;
     };
@@ -20,14 +22,16 @@ void exitWithError(string messageOut, ostream &streamOut1, ostream &streamOut2,
     exit(errorInt);
 };
 
-void warningMessage(string messageOut, ostream &streamOut1, ostream &streamOut2, Parameters &P) {
-    time_t timeCurrent;
-    time( &timeCurrent);
-    if (P.runThreadN>1) pthread_mutex_lock(&g_threadChunks.mutexError);
+void warningMessage(string messageOut, ostream &streamOut1, ostream &streamOut2, Parameters &P)
+{
+    if (P.runThreadN>1) 
+        pthread_mutex_lock(&g_threadChunks.mutexError);
     if (streamOut1.good()) {
-        streamOut1 << timeMonthDayTime(timeCurrent) <<"\tWARNING: " << messageOut <<endl;
+        streamOut1 << "!!!!! WARNING: " << messageOut <<endl;
     };
     if (streamOut2.good()) {
-        streamOut2 << timeMonthDayTime(timeCurrent) << "\tWARNING: " << messageOut <<endl;
+        streamOut2 << "!!!!! WARNING: " << messageOut <<endl;
     };
-};
\ No newline at end of file
+    if (P.runThreadN>1) 
+        pthread_mutex_unlock(&g_threadChunks.mutexError);    
+};


=====================================
source/GTF.cpp
=====================================
@@ -71,7 +71,7 @@ GTF::GTF(Genome &genome, Parameters &P, const string &dirOut, SjdbClass &sjdbLoc
             if (genome.pGe.sjdbGTFchrPrefix!="-") chr1=genome.pGe.sjdbGTFchrPrefix + chr1;
 
             if (genome.chrNameIndex.count(chr1)==0) {//chr not in Genome
-                P.inOut->logMain << "WARNING: while processing pGe.sjdbGTFfile=" << genome.pGe.sjdbGTFfile <<": chromosome '"<<chr1<<"' not found in Genome fasta files for line:\n";
+                P.inOut->logMain << "WARNING: while processing sjdbGTFfile=" << genome.pGe.sjdbGTFfile <<": chromosome '"<<chr1<<"' not found in Genome fasta files for line:\n";
                 P.inOut->logMain << oneLine <<"\n"<<flush;
                 continue; //do not process exons/transcripts on missing chromosomes
             };
@@ -79,6 +79,12 @@ GTF::GTF(Genome &genome, Parameters &P, const string &dirOut, SjdbClass &sjdbLoc
             uint64 ex1,ex2;
             char str1;
             oneLineStream >> ex1 >> ex2 >> ddd2 >> str1 >> ddd2; //read all fields except the last
+            if ( ex2 > genome.chrLength[genome.chrNameIndex[chr1]] ) {
+            	warningMessage("while processing sjdbGTFfile=" + genome.pGe.sjdbGTFfile + ", line:\n" + oneLine + "\n exon end = " + to_string(ex2) +  \
+            			       " is larger than the chromosome " + chr1 + " length = " + to_string(genome.chrLength[genome.chrNameIndex[chr1]] ) + " , will skip this exon\n", \
+							   std::cerr, P.inOut->logMain, P);
+            	continue;
+            };
             
             string oneLine1;
             getline(oneLineStream, oneLine1);//get the last field     
@@ -152,7 +158,11 @@ GTF::GTF(Genome &genome, Parameters &P, const string &dirOut, SjdbClass &sjdbLoc
     if (exonN==0) {
         ostringstream errOut;
         errOut << "Fatal INPUT FILE error, no valid ""exon"" lines in the GTF file: " << genome.pGe.sjdbGTFfile <<"\n";
-        errOut << "Solution: check the formatting of the GTF file. Most likely cause is the difference in chromosome naming between GTF and FASTA file.\n";
+        errOut << "Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file.\n";
         exitWithError(errOut.str(),std::cerr, P.inOut->logMain, EXIT_CODE_INPUT_FILES, P);
     };
+    
+    exonLoci.resize(exonN); //previous exonLoci.size() was an estimate, need to reszie to the actual value
+    
+    return;
 };


=====================================
source/Genome_genomeGenerate.cpp
=====================================
@@ -142,7 +142,14 @@ void Genome::genomeGenerate() {
     nGenome = genomeScanFastaFiles(P,NULL,false,*this);//first scan the fasta file to find all the sizes
     genomeSequenceAllocate(nGenome, nG1alloc, G, G1);
     genomeScanFastaFiles(P,G,true,*this);    //load the genome sequence
-    
+
+    uint64 nGenomeTrue=0;
+    for (auto &cl : chrLength)
+    	nGenomeTrue += cl;
+
+    P.inOut->logMain <<"Genome sequence total length = " << nGenomeTrue << "\n";
+    P.inOut->logMain <<"Genome size with padding = "<< nGenome <<"\n";
+
     //consensusSequence(); //replace with consensus allele 
         
     SjdbClass sjdbLoci; //will be filled in transcriptGeneSJ below
@@ -157,10 +164,10 @@ void Genome::genomeGenerate() {
     
     sjdbLoadFromFiles(P,sjdbLoci);//this will not be transformed. TODO prevent this parameter combination
 
-    if (pGe.gSAindexNbases > log2(nGenome)/2-1) {
+    if (pGe.gSAindexNbases > log2(nGenomeTrue)/2-1) {
         ostringstream warnOut; 
-        warnOut << "--genomeSAindexNbases " << pGe.gSAindexNbases << " is too large for the genome size=" << nGenome;
-        warnOut << ", which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases " << int(log2(nGenome)/2-1);
+        warnOut << "--genomeSAindexNbases " << pGe.gSAindexNbases << " is too large for the genome size=" << nGenomeTrue;
+        warnOut << ", which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases " << int(log2(nGenomeTrue)/2-1);
         warningMessage(warnOut.str(),P.inOut->logMain,std::cerr,P);
     };    
     
@@ -183,7 +190,7 @@ void Genome::genomeGenerate() {
     // GstrandBit
     GstrandBit = (char) (uint) floor(log(nGenome+P.limitSjdbInsertNsj*sjdbLength)/log(2))+1; //GstrandBit uses P.limitSjdbInsertNsj even if no insertion requested, in case it will be requested at the mapping stage
     if (GstrandBit<32) GstrandBit=32; //TODO: should not this be 31? Need to test for small genomes. TODO: use simple access function for SA
-    P.inOut->logMain <<"Estimated genome size="<<nGenome+P.limitSjdbInsertNsj*sjdbLength<<" = "<<nGenome<<" + "<<P.limitSjdbInsertNsj*sjdbLength<<"\n";
+    P.inOut->logMain <<"Estimated genome size with padding and SJs: total=genome+SJ="<<nGenome+P.limitSjdbInsertNsj*sjdbLength<<" = "<<nGenome<<" + "<<P.limitSjdbInsertNsj*sjdbLength<<"\n";
     P.inOut->logMain << "GstrandBit=" << int(GstrandBit) <<"\n";
     GstrandMask = ~(1LLU<<GstrandBit);
     SA.defineBits(GstrandBit+1,nSA);


=====================================
source/IncludeDefine.h
=====================================
@@ -117,12 +117,13 @@ typedef uint8_t uint8;
 #define BAM_CIGAR_X 8
 
 
+
+#define BAM_ATTR_MaxSize 10000
+
 #if defined COMPILE_FOR_LONG_READS
     #define MAX_N_EXONS 1000
-    #define BAM_ATTR_MaxSize 10000
 #else
     #define MAX_N_EXONS 20
-    #define BAM_ATTR_MaxSize 1000
 #endif
 
 //input reads
@@ -151,6 +152,7 @@ typedef uint8_t uint8;
 #define EXIT_CODE_FILE_OPEN 109
 #define EXIT_CODE_FILE_WRITE 110
 #define EXIT_CODE_INCONSISTENT_DATA 111
+#define EXIT_CODE_FIFO 112
 
 //cleaned-end
 


=====================================
source/Makefile
=====================================
@@ -27,7 +27,7 @@ CFLAGS ?= -pipe -Wall -Wextra -O3
 # Unconditionally set essential flags and optimization options
 CXXFLAGS_common := -std=c++11 -fopenmp $(COMPTIMEPLACE)
 CXXFLAGS_main := -O3 $(CXXFLAGS_common)
-CXXFLAGS_gdb := -O -g3 $(CXXFLAGS_common)
+CXXFLAGS_gdb := -O0 -g3 $(CXXFLAGS_common)
 
 ##########################################################################################################
 OBJECTS = soloInputFeatureUMI.o SoloFeature_countSmartSeq.o SoloFeature_redistributeReadsByCB.o \


=====================================
source/Parameters.cpp
=====================================
@@ -1339,6 +1339,7 @@ void Parameters::inputParameters (int argInN, char* argIn[]) {//input parameters
     };
 
     alignEndsProtrude.nBasesMax=stoi(alignEndsProtrude.in.at(0),nullptr);
+    alignEndsProtrude.concordantPair=false;
     if (alignEndsProtrude.nBasesMax>0) {//allow ends protrusion
         if (alignEndsProtrude.in.at(1)=="ConcordantPair") {
             alignEndsProtrude.concordantPair=true;


=====================================
source/ParametersSolo.cpp
=====================================
@@ -137,13 +137,21 @@ void ParametersSolo::initialize(Parameters *pPin)
         exitWithError(errOut.str(),std::cerr, pP->inOut->logMain, EXIT_CODE_PARAMETER, *pP);
     };
 
-    if (featureYes[SoloFeatureTypes::Gene]) {
-        pP->quant.gene.yes=true;
-        pP->quant.yes = true;
+    if (featureYes[SoloFeatureTypes::Velocyto] && type==SoloTypes::SmartSeq) {
+        string errOut = "EXITING because of fatal PARAMETERS error: --soloFeatures Velocyto is presently not compatible with --soloType SmartSeq .\n";
+        errOut       += "SOLUTION: re-run without --soloFeatures Velocyto .";
+        exitWithError(errOut, std::cerr, pP->inOut->logMain, EXIT_CODE_PARAMETER, *pP);
     };
-    if (featureYes[SoloFeatureTypes::GeneFull]) {
-        pP->quant.geneFull.yes=true;
-        pP->quant.yes = true;
+    
+    if (type != SoloTypes::CB_samTagOut) {//gene quantification is needed
+        if (featureYes[SoloFeatureTypes::Gene]) {
+            pP->quant.gene.yes=true;
+            pP->quant.yes = true;
+        };
+        if (featureYes[SoloFeatureTypes::GeneFull]) {
+            pP->quant.geneFull.yes=true;
+            pP->quant.yes = true;
+        };
     };
     
     ////////////////////////////////////////////umiDedup


=====================================
source/Parameters_openReadsFiles.cpp
=====================================
@@ -27,7 +27,13 @@ void Parameters::openReadsFiles()
             sysCom << outFileTmp <<"tmp.fifo.read"<<imate+1;
             readFilesInTmp.push_back(sysCom.str());
             remove(readFilesInTmp.at(imate).c_str());
-            mkfifo(readFilesInTmp.at(imate).c_str(), S_IRUSR | S_IWUSR );
+            if (mkfifo(readFilesInTmp.at(imate).c_str(), S_IRUSR | S_IWUSR ) != 0) {
+                exitWithError("Exiting because of *FATAL ERROR*: could not create FIFO file " + readFilesInTmp.at(imate) + "\n"
+                            + "SOLUTION: check the if run directory supports FIFO files.\n"
+                            + "If run partition does not support FIFO (e.g. Windows partitions FAT, NTFS), "
+                            + "re-run on a Linux partition, or point --outTmpDir to a Linux partition.\n"
+                            , std::cerr, inOut->logMain, EXIT_CODE_FIFO, *this);
+            };
 
             inOut->logMain << "\n   Input read files for mate "<< imate+1 <<" :\n";
 


=====================================
source/Parameters_readSAMheader.cpp
=====================================
@@ -18,7 +18,13 @@ void Parameters::readSAMheader(const string readFilesCommandString, const vector
 
     string tmpFifo=outFileTmp+"tmp.fifo.header";
     remove(tmpFifo.c_str());
-    mkfifo(tmpFifo.c_str(), S_IRUSR | S_IWUSR );
+    if (mkfifo(tmpFifo.c_str(), S_IRUSR | S_IWUSR ) != 0) {
+        exitWithError("Exiting because of *FATAL ERROR*: could not create FIFO file " + tmpFifo + "\n"
+                      + "SOLUTION: check the if run directory supports FIFO files.\n"
+                      + "If run partition does not support FIFO (e.g. Windows partitions FAT, NTFS), "
+                      + "re-run on a Linux partition, or point --outTmpDir to a Linux partition.\n"
+                      , std::cerr, inOut->logMain, EXIT_CODE_FIFO, *this);
+    };
 
     ifstream tmpFifoIn;
     for (uint32 ii=0; ii<readFilesNames.size(); ii++) {


=====================================
source/ReadAlignChunk_processChunks.cpp
=====================================
@@ -5,6 +5,12 @@
 #include "SequenceFuns.h"
 #include "GlobalVariables.h"
 
+inline void removeStringEndControl(string &str)
+{//removes control character (including space) from the end of the string
+    if (int(str.back())<33)
+        str.pop_back();
+};
+
 void ReadAlignChunk::processChunks() {//read-map-write chunks
     noReadsLeft=false; //true if there no more reads left in the file
     bool newFile=false; //new file marker in the input stream
@@ -79,6 +85,7 @@ void ReadAlignChunk::processChunks() {//read-map-write chunks
                     if (P.outFilterBySJoutStage!=2) {//not the 2nd stage of the 2-stage mapping, read ID from the 1st read
                         string readID;
                         P.inOut->readIn[0] >> readID;
+                        removeStringEndControl(readID);
                         if (P.outSAMreadIDnumber) {
                             readID="@"+to_string(P.iReadAll);
                         };
@@ -87,9 +94,11 @@ void ReadAlignChunk::processChunks() {//read-map-write chunks
                         if (P.inOut->readIn[0].peek()!='\n') {//2nd field exists
                             string field2;
                             P.inOut->readIn[0] >> field2;
-                            if (field2.length()>=3 && field2.at(1)==':' && field2.at(2)=='Y' && field2.at(3)==':' )
+                            if (field2.length()>=3 && field2[1]==':' && field2[2]=='Y' && field2[3]==':' )
                                 passFilterIllumina='Y';
                         };
+                        
+                        //add extra information to readID line
                         readID += ' '+ to_string(P.iReadAll)+' '+passFilterIllumina+' '+to_string(P.readFilesIndex);
 
                         //ignore the rest of the read name for both mates
@@ -99,6 +108,7 @@ void ReadAlignChunk::processChunks() {//read-map-write chunks
                         if (P.pSolo.barcodeReadYes) {//record barcode sequence
                             string seq1;
                             getline(P.inOut->readIn[P.pSolo.barcodeRead],seq1);
+                            removeStringEndControl(seq1);
                             if (seq1.size() != P.pSolo.bL) {
                             	if (P.pSolo.bL > 0) {
 									ostringstream errOut;
@@ -114,6 +124,7 @@ void ReadAlignChunk::processChunks() {//read-map-write chunks
                             readID += ' ' + seq1;
                             P.inOut->readIn[P.pSolo.barcodeRead].ignore(DEF_readNameSeqLengthMax,'\n');//skip to the end of 3rd ("+") line
                             getline(P.inOut->readIn[P.pSolo.barcodeRead],seq1); //read qualities
+                            removeStringEndControl(seq1);
                             readID += ' ' + seq1;
                             g_statsAll.qualHistCalc(1, seq1.c_str()+P.pSolo.barcodeStart, P.pSolo.barcodeEnd==0 ? seq1.size() : P.pSolo.barcodeEnd-P.pSolo.barcodeStart+1);
                         };
@@ -127,12 +138,19 @@ void ReadAlignChunk::processChunks() {//read-map-write chunks
                     //copy 3 (4 for stage 2) lines: sequence, dummy, quality
                     for (uint imate=0; imate<P.readNmates; imate++) {
                         for (uint iline=(P.outFilterBySJoutStage==2 ? 0:1);iline<4;iline++) {
+                            uint64 origChunkStart=chunkInSizeBytesTotal[imate];
+                            
                             P.inOut->readIn[imate].getline(chunkIn[imate] + chunkInSizeBytesTotal[imate], DEF_readNameSeqLengthMax+1 );
-                            chunkInSizeBytesTotal[imate] += P.inOut->readIn[imate].gcount();
+                            chunkInSizeBytesTotal[imate] += P.inOut->readIn[imate].gcount(); //this includes the final \n
+                            
+                            if ( int(chunkIn[imate][chunkInSizeBytesTotal[imate]-2]) < 33 ) {//remove control char at the end if present
+                                chunkInSizeBytesTotal[imate]--;
+                            };
+                            
                             chunkIn[imate][chunkInSizeBytesTotal[imate]-1]='\n';
                             
                             if (iline==3 && P.outFilterBySJoutStage!=2) {
-                                g_statsAll.qualHistCalc(imate, chunkIn[imate] + chunkInSizeBytesTotal[imate] - P.inOut->readIn[imate].gcount(), P.inOut->readIn[imate].gcount());
+                                g_statsAll.qualHistCalc(imate, chunkIn[imate] + origChunkStart, chunkInSizeBytesTotal[imate] - origChunkStart);
                             };
                         };
                     };
@@ -151,18 +169,20 @@ void ReadAlignChunk::processChunks() {//read-map-write chunks
                             P.inOut->readIn[imate].ignore(DEF_readNameSeqLengthMax,'\n');
 
                             chunkInSizeBytesTotal[imate] += sprintf(chunkIn[imate] + chunkInSizeBytesTotal[imate], " %llu %c %i \n", P.iReadAll, 'N', P.readFilesIndex);
-
-
                         };
-//                         else {//2nd stage of 2-stage mapping
-//                         read index and file index are already recorded with the read name, simply copy it
-//                         P.inOut->readIn[imate].getline(chunkIn[imate] + chunkInSizeBytesTotal[imate], DEF_readNameSeqLengthMax+1 );
-//                         };
+                        
+                        //read multi-line fasta
                         nextChar=P.inOut->readIn[imate].peek();
-                        while (nextChar!='@' && nextChar!='>' && nextChar!=' ' && nextChar!='\n' && P.inOut->readIn[imate].good()) {//read multi-line fasta
+                        while (nextChar!='@' && nextChar!='>' && nextChar!=' ' && nextChar!='\n' && P.inOut->readIn[imate].good()) {
                             P.inOut->readIn[imate].getline(chunkIn[imate] + chunkInSizeBytesTotal[imate], DEF_readSeqLengthMax + 1 );
-                            if (P.inOut->readIn[imate].gcount()<2) break; //no more input
-                            chunkInSizeBytesTotal[imate] += P.inOut->readIn[imate].gcount()-1;
+                            if (P.inOut->readIn[imate].gcount()<2) 
+                                break; //no more input
+                                
+                            chunkInSizeBytesTotal[imate] += P.inOut->readIn[imate].gcount()-1; //-1 because \n was counted, bu wee need to remove it
+                            if ( int(chunkIn[imate][chunkInSizeBytesTotal[imate]-1]) < 33 ) {//remove control char at the end if present
+                                chunkInSizeBytesTotal[imate]--;
+                            };
+                            
                             nextChar=P.inOut->readIn[imate].peek();
                         };
                         chunkIn[imate][chunkInSizeBytesTotal[imate]]='\n';


=====================================
source/ReadAlign_alignBAM.cpp
=====================================
@@ -374,13 +374,25 @@ int ReadAlign::alignBAM(Transcript const &trOut, uint nTrOut, uint iTrOut, uint
                     case ATTR_UY:
                         attrN+=bamAttrArrayWrite(soloRead->readBar->umiQual,"UY",attrOutArray+attrN);
                         break;
+
                     case ATTR_GX:
-                        if (readAnnot.geneConcordant.size()==1 && trOut.alignGenes.size()==1) //only output if read maps to a single gene, and this alignment maps to one gene
-                            attrN+=bamAttrArrayWrite(chunkTr->geID[*trOut.alignGenes.begin()],"GX",attrOutArray+attrN);
+                        if ( P.quant.gene.yes ) {
+                            if (readAnnot.geneConcordant.size()==1 && trOut.alignGenes.size()==1) //only output if read maps to a single gene, and this alignment maps to one gene
+                                attrN+=bamAttrArrayWrite(chunkTr->geID[*trOut.alignGenes.begin()],"GX",attrOutArray+attrN);
+                        } else if ( P.quant.geneFull.yes ) {
+                            if ( readAnnot.geneFull.size()==1 ) //only output if read maps to a single gene
+                                attrN+=bamAttrArrayWrite(chunkTr->geID[*readAnnot.geneFull.begin()],"GX",attrOutArray+attrN);
+                        };
                         break;
+                            
                     case ATTR_GN:
-                        if (readAnnot.geneConcordant.size()==1 && trOut.alignGenes.size()==1) //only output if read maps to a single gene, and this alignment maps to one gene
-                            attrN+=bamAttrArrayWrite(chunkTr->geName[*trOut.alignGenes.begin()],"GN",attrOutArray+attrN);
+                        if ( P.quant.gene.yes ) {                        
+                            if (readAnnot.geneConcordant.size()==1 && trOut.alignGenes.size()==1) //only output if read maps to a single gene, and this alignment maps to one gene
+                                attrN+=bamAttrArrayWrite(chunkTr->geName[*trOut.alignGenes.begin()],"GN",attrOutArray+attrN);
+                        } else if ( P.quant.geneFull.yes ) {
+                            if ( readAnnot.geneFull.size()==1 ) //only output if read maps to a single gene
+                                attrN+=bamAttrArrayWrite(chunkTr->geName[*readAnnot.geneFull.begin()],"GN",attrOutArray+attrN);
+                        };                           
                         break;                        
                         
                     case ATTR_sM:


=====================================
source/STAR.cpp
=====================================
@@ -49,8 +49,7 @@ void usage(int usageType) {
                    parametersDefault_len);
     };
     exit(0);
-}
-
+};
 
 int main(int argInN, char* argIn[]) {
     // If no argument is given, or the first argument is either '-h' or '--help', run usage()


=====================================
source/SoloFeature_collapseUMI.cpp
=====================================
@@ -89,7 +89,7 @@ uint32 graphNumberOfConnectedComponents(uint32 N, vector<array<uint32,2>> V, vec
     
     uint32 nConnComp=0;
     for (uint32 ii=0; ii<N; ii++) {
-        if (nodeEdges[ii].size()==0) {//this node is not connected, no need to check. Save time beacuse this happens often
+        if (nodeEdges[ii].size()==0) {//this node is not connected, no need to check. Save time because this happens often
             ++nConnComp;
         } else if (nodeColor[ii]==(uint32)-1) {//node not visited
             ++nConnComp;


=====================================
source/Transcriptome_classifyAlign.cpp
=====================================
@@ -22,11 +22,13 @@ int alignToTranscript(Transcript &aG, uint64 trS1, uint16 exN1, uint32 *exSE1, u
     //distTTS=trLen[tr1]-(gthaG.exons[iab][EX_G]-trS1-exSE1[2*ex1]+exLenCum1[ex1]+aG.exons[iab][EX_L]);
     //if trStr1==2: TTS=trLen[tr1]-TTS, TSS=trLen[tr1]-TSS
     
-    for (uint32 iab=0, ex1=0, bS=0, bE=0, eE=0, enS=0;
-                iab<aG.nExons; iab++) {//scan through all blocks of the align
+    for (uint32 iab=0, ex1=0, bS=0, bE=0, eE=0, enS=0;  iab<aG.nExons;  iab++) {//scan through all blocks of the align
 
         uint64 bEprev=bE;
 
+        if (aG.exons[iab][EX_G] < trS1) //this can happen for PE reads, when the 2nd mate protrudes to the left of the first
+            return -1;
+    
         bS=(uint32) (aG.exons[iab][EX_G]-trS1);//block start
         bE=bS+aG.exons[iab][EX_L]-1;//block end
 


=====================================
source/VERSION
=====================================
@@ -1 +1 @@
-#define STAR_VERSION "2.7.5a_2020-06-29"
+#define STAR_VERSION "2.7.5c"


=====================================
source/genomeScanFastaFiles.cpp
=====================================
@@ -44,7 +44,8 @@ uint genomeScanFastaFiles (Parameters &P, char* G, bool flagRun, Genome &mapGen)
                     mapGen.chrName.push_back(chrName1);
                 };
 
-                if (!flagRun && mapGen.chrStart.size()>0) mapGen.chrLength.push_back(N-mapGen.chrStart.at(mapGen.chrStart.size()-1)); //true length of the chr
+                if (!flagRun && mapGen.chrStart.size()>0) mapGen.chrLength.push_back(N-mapGen.chrStart.back()); //true length of the chr
+
 
                 if (N>0) {//pad the chromosomes to bins boudnaries
                     N = ( (N+1)/mapGen.genomeChrBinNbases+1 )*mapGen.genomeChrBinNbases;
@@ -52,7 +53,9 @@ uint genomeScanFastaFiles (Parameters &P, char* G, bool flagRun, Genome &mapGen)
 
                 if (!flagRun) {
                     mapGen.chrStart.push_back(N);
-                    P.inOut->logMain << mapGen.pGe.gFastaFiles.at(ii)<<" : chr # " << mapGen.chrStart.size()-1 << "  \""<<mapGen.chrName.at(mapGen.chrStart.size()-1)<<"\" chrStart: "<<N<<"\n"<<flush;
+                    P.inOut->logMain << mapGen.pGe.gFastaFiles.at(ii)<<" : chr # " << mapGen.chrStart.size()-1   \
+                    		         << "  \""<< mapGen.chrName.back() <<"\""       \
+									 << " chrStart: "<<N<<"\n"<<flush;
                 };
             } else {//char lines
                 if (flagRun) {
@@ -70,15 +73,17 @@ uint genomeScanFastaFiles (Parameters &P, char* G, bool flagRun, Genome &mapGen)
     };
 
     if (!flagRun)
-        mapGen.chrLength.push_back(N-mapGen.chrStart.at(mapGen.chrStart.size()-1)); //true length of the last chr
+        mapGen.chrLength.push_back(N-mapGen.chrStart.back()); //true length of the last chr
 
     N = ( (N+1)/mapGen.genomeChrBinNbases+1)*mapGen.genomeChrBinNbases;
 
     if (!flagRun) {
         mapGen.nChrReal=mapGen.chrStart.size();
         mapGen.chrStart.push_back(N); //last chromosome end+1
+        P.inOut->logMain << "Chromosome sequence lengths: \n";
         for (uint ii=0;ii<mapGen.nChrReal;ii++) {
             mapGen.chrNameIndex[mapGen.chrName[ii]]=ii;
+            P.inOut->logMain <<  mapGen.chrName[ii] <<"\t"<< mapGen.chrLength[ii] << "\n";
         };
     };
 


=====================================
source/parametersDefault
=====================================
@@ -269,26 +269,38 @@ outSAMstrandField               None
                                 intronMotif ... strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out.
 
 outSAMattributes                Standard
-    string: a string of desired SAM attributes, in the order desired for the output SAM
-                                NH HI AS nM NM MD jM jI XS MC ch ... any combination in any order
+    string: a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order.
+                                ***Presets:
                                 None        ... no attributes
                                 Standard    ... NH HI AS nM
-                                All         ... NH HI AS nM NM MD jM jI MC ch
-                                variation:
+                                All         ... NH HI AS nM NM MD jM jI MC ch                                                                    
+                                ***Alignment:
+                                NH          ... number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag.
+                                HI          ... multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag.
+                                AS          ... local alignment score, +1/-1 for matches/mismateches, score* penalties for indels and gaps. For PE reads, total score for two mates. Stadnard SAM tag.
+                                nM          ... number of mismatches. For PE reads, sum over two mates.
+                                NM          ... edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag.
+                                MD          ... string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag.
+                                jM          ... intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value.
+                                jI          ... start and end of introns for all junctions (1-based).
+                                XS          ... alignment strand according to --outSAMstrandField.
+                                MC          ... mate's CIGAR string. Standard SAM tag.
+                                ch          ... marks all segment of all chimeric alingments for --chimOutType WithinBAM output.
+                                ***Variation:
                                 vA          ... variant allele
-                                ha          ... haplotype (1/2) when mapping to the diploid genome. Requires genome generated with --genomeTransformType Diploid
-                                vG          ... genomic coordinate of the variant overlapped by the read
+                                vG          ... genomic coordinate of the variant overlapped by the read.
                                 vW          ... 1 - alignment passes WASP filtering; 2,3,4,5,6,7 - alignment does not pass WASP filtering. Requires --waspOutputMode SAMtag.
-                                STARsolo:
-                                CR CY UR UY ... sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing
+                                ***STARsolo:
+                                CR CY UR UY ... sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing.
+                                GX GN       ... gene ID and gene name.                                
                                 CB UB       ... error-corrected cell barcodes and UMIs for solo* demultiplexing. Requires --outSAMtype BAM SortedByCoordinate.
-                                sM          ... assessment of CB and UMI
-                                sS          ... sequence of the entire barcode (CB,UMI,adapter...)
-                                sQ          ... quality of the entire barcode
-                                GX GN       ... gene ID and gene name
-                                Unsupported/undocumented:
-                                rB          ... alignment block read/genomic coordinates
-                                vR          ... read coordinate of the variant
+                                sM          ... assessment of CB and UMI..
+                                sS          ... sequence of the entire barcode (CB,UMI,adapter).
+                                sQ          ... quality of the entire barcode.
+                                ***Unsupported/undocumented:
+                                ha          ... haplotype (1/2) when mapping to the diploid genome. Requires genome generated with --genomeTransformType Diploid .                               
+                                rB          ... alignment block read/genomic coordinates.
+                                vR          ... read coordinate of the variant.
 
 outSAMattrIHstart               1
     int>=0:                     start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie..


=====================================
source/readLoad.cpp
=====================================
@@ -84,7 +84,7 @@ int readLoad(istream& readInStream, Parameters& P, uint iMate, uint& Lread, uint
     } else {
         Lread=0;
     };
-    convertNucleotidesToNumbersRemoveControls(Seq+P.clip5pNbases[iMate],SeqNum,Lread);
+    convertNucleotidesToNumbers(Seq+P.clip5pNbases[iMate],SeqNum,Lread);
 
     //clip the adapter
     if (P.clip3pAdapterSeq.at(iMate).length()>0) {



View it on GitLab: https://salsa.debian.org/med-team/rna-star/-/commit/63ea00262b1c169acb719ad4e01b005ef625d0e5

-- 
View it on GitLab: https://salsa.debian.org/med-team/rna-star/-/commit/63ea00262b1c169acb719ad4e01b005ef625d0e5
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200821/a84cd0f9/attachment-0001.html>


More information about the debian-med-commit mailing list