[med-svn] [rambo-k] 03/09: Imported Upstream version 1.21+dfsg

Andreas Tille tille at debian.org
Sun Nov 27 20:59:07 UTC 2016


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository rambo-k.

commit 91a77ca9466f7715f815e1a10a6ad69a579ddd78
Author: Andreas Tille <tille at debian.org>
Date:   Mon Nov 21 15:15:01 2016 +0100

    Imported Upstream version 1.21+dfsg
---
 .gitignore                                         |   7 +
 License.txt                                        |   0
 RAMBOK.py                                          |   0
 ReadClassifier_eclipse/.classpath                  |   0
 ReadClassifier_eclipse/.project                    |   0
 ReadClassifier_eclipse/classifier.log              |   1 -
 .../src/org/rki/readclassifier/Classifier.java     |   0
 .../org/rki/readclassifier/ClassifierThread.java   |   0
 .../src/org/rki/readclassifier/DistClassifier.java |   0
 .../org/rki/readclassifier/KmerDistribution.java   |   0
 .../src/org/rki/readclassifier/MMClassifier.java   |   0
 .../src/org/rki/readclassifier/Read.java           |   0
 .../src/org/rki/readclassifier/ReadClassifier.java |   0
 .../src/org/rki/readclassifier/ReadPair.java       |   0
 .../src/org/rki/readclassifier/Reader.java         |   0
 .../src/org/rki/readclassifier/Writer.java         |   0
 ReadTrainer_eclipse/.classpath                     |   0
 ReadTrainer_eclipse/.project                       |   0
 .../src/org/rki/readtrainer/Trainer.java           |   0
 .../src/org/rki/readtrainer/TrainerThread.java     |   0
 Readme.pdf                                         | Bin 0 -> 1321321 bytes
 Readme.txt                                         |   0
 ReadmeLatex_src/Parameter_estimation.png           | Bin 0 -> 748490 bytes
 ReadmeLatex_src/Readme.tex                         | 208 +++++++++++++++++++++
 ReadmeLatex_src/heatmap_batadeno.jpg               | Bin 0 -> 48436 bytes
 ReadmeLatex_src/heatmap_full.png                   | Bin 0 -> 23765 bytes
 ReadmeLatex_src/parameter_estimation_batadeno.png  | Bin 0 -> 780165 bytes
 plot.py                                            |   0
 readme.md                                          | 122 ++++++++++++
 simulate_reads.py                                  |   0
 30 files changed, 337 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..37a1344
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,7 @@
+*.aux
+*.bbl
+*.blg
+*.log
+*.toc
+*.pyc
+
diff --git a/License.txt b/License.txt
old mode 100644
new mode 100755
diff --git a/RAMBOK.py b/RAMBOK.py
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/.classpath b/ReadClassifier_eclipse/.classpath
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/.project b/ReadClassifier_eclipse/.project
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/classifier.log b/ReadClassifier_eclipse/classifier.log
deleted file mode 100644
index d42d177..0000000
--- a/ReadClassifier_eclipse/classifier.log
+++ /dev/null
@@ -1 +0,0 @@
-INFO - Found kmer size: 4
diff --git a/ReadClassifier_eclipse/src/org/rki/readclassifier/Classifier.java b/ReadClassifier_eclipse/src/org/rki/readclassifier/Classifier.java
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/src/org/rki/readclassifier/ClassifierThread.java b/ReadClassifier_eclipse/src/org/rki/readclassifier/ClassifierThread.java
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/src/org/rki/readclassifier/DistClassifier.java b/ReadClassifier_eclipse/src/org/rki/readclassifier/DistClassifier.java
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/src/org/rki/readclassifier/KmerDistribution.java b/ReadClassifier_eclipse/src/org/rki/readclassifier/KmerDistribution.java
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/src/org/rki/readclassifier/MMClassifier.java b/ReadClassifier_eclipse/src/org/rki/readclassifier/MMClassifier.java
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/src/org/rki/readclassifier/Read.java b/ReadClassifier_eclipse/src/org/rki/readclassifier/Read.java
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/src/org/rki/readclassifier/ReadClassifier.java b/ReadClassifier_eclipse/src/org/rki/readclassifier/ReadClassifier.java
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/src/org/rki/readclassifier/ReadPair.java b/ReadClassifier_eclipse/src/org/rki/readclassifier/ReadPair.java
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/src/org/rki/readclassifier/Reader.java b/ReadClassifier_eclipse/src/org/rki/readclassifier/Reader.java
old mode 100644
new mode 100755
diff --git a/ReadClassifier_eclipse/src/org/rki/readclassifier/Writer.java b/ReadClassifier_eclipse/src/org/rki/readclassifier/Writer.java
old mode 100644
new mode 100755
diff --git a/ReadTrainer_eclipse/.classpath b/ReadTrainer_eclipse/.classpath
old mode 100644
new mode 100755
diff --git a/ReadTrainer_eclipse/.project b/ReadTrainer_eclipse/.project
old mode 100644
new mode 100755
diff --git a/ReadTrainer_eclipse/src/org/rki/readtrainer/Trainer.java b/ReadTrainer_eclipse/src/org/rki/readtrainer/Trainer.java
old mode 100644
new mode 100755
diff --git a/ReadTrainer_eclipse/src/org/rki/readtrainer/TrainerThread.java b/ReadTrainer_eclipse/src/org/rki/readtrainer/TrainerThread.java
old mode 100644
new mode 100755
diff --git a/Readme.pdf b/Readme.pdf
new file mode 100755
index 0000000..523034b
Binary files /dev/null and b/Readme.pdf differ
diff --git a/Readme.txt b/Readme.txt
old mode 100644
new mode 100755
diff --git a/ReadmeLatex_src/Parameter_estimation.png b/ReadmeLatex_src/Parameter_estimation.png
new file mode 100755
index 0000000..4635258
Binary files /dev/null and b/ReadmeLatex_src/Parameter_estimation.png differ
diff --git a/ReadmeLatex_src/Readme.tex b/ReadmeLatex_src/Readme.tex
new file mode 100755
index 0000000..6b13b51
--- /dev/null
+++ b/ReadmeLatex_src/Readme.tex
@@ -0,0 +1,208 @@
+\documentclass{article}
+\usepackage{varwidth}
+\usepackage{fancyhdr} % Required for custom headers
+\usepackage{lastpage} % Required to determine the last page for the footer
+\usepackage{extramarks} % Required for headers and footers
+\usepackage{graphicx} % Required to insert images
+\usepackage{color}
+% Margins
+\topmargin=-0.45in
+\evensidemargin=0in
+\oddsidemargin=0in
+\textwidth=6.5in
+\textheight=9.0in
+\headsep=0.25in 
+%\definecolor{light-gray}{gray}{0.9}
+%\definecolor{dark-gray}{gray}{0.05}
+\definecolor{dark-gray}{gray}{0.9}
+\definecolor{light-gray}{gray}{0.05}
+
+\newcommand\MyCBox[1]{%
+  \colorbox{dark-gray}{\begin{varwidth}{\dimexpr\linewidth-2\fboxsep}#1\end{varwidth}}}
+
+\linespread{1.1} % Line spacing
+
+\setlength\parindent{0pt} % Removes all indentation from paragraphs
+
+%----------------------------------------------------------------------------------------
+%	DOCUMENT STRUCTURE COMMANDS
+%	Skip this unless you know what you're doing
+%----------------------------------------------------------------------------------------
+
+% Header and footer for when a page split occurs within a problem environment
+\newcommand{\enterProblemHeader}[1]{
+\nobreak\extramarks{#1}{#1 continued on next page\ldots}\nobreak
+\nobreak\extramarks{#1 (continued)}{#1 continued on next page\ldots}\nobreak
+}
+
+% Header and footer for when a page split occurs between problem environments
+\newcommand{\exitProblemHeader}[1]{
+\nobreak\extramarks{#1 (continued)}{#1 continued on next page\ldots}\nobreak
+\nobreak\extramarks{#1}{}\nobreak
+}
+
+\setcounter{secnumdepth}{0} % Removes default section numbers
+\newcounter{homeworkProblemCounter} % Creates a counter to keep track of the number of problems
+
+\newcommand{\homeworkProblemName}{}
+\newenvironment{homeworkProblem}[1][Problem \arabic{homeworkProblemCounter}]{ % Makes a new environment called homeworkProblem which takes 1 argument (custom name) but the default is "Problem #"
+\stepcounter{homeworkProblemCounter} % Increase counter for number of problems
+\renewcommand{\homeworkProblemName}{#1} % Assign \homeworkProblemName the name of the problem
+\section{\homeworkProblemName} % Make a section in the document with the custom problem count
+\enterProblemHeader{\homeworkProblemName} % Header and footer within the environment
+}{
+\exitProblemHeader{\homeworkProblemName} % Header and footer after the environment
+}
+
+\newcommand{\problemAnswer}[1]{ % Defines the problem answer command with the content as the only argument
+\noindent\framebox[\columnwidth][c]{\begin{minipage}{0.98\columnwidth}#1\end{minipage}} % Makes the box around the problem answer and puts the content inside
+}
+
+\newcommand{\homeworkSectionName}{}
+\newenvironment{homeworkSection}[1]{ % New environment for sections within homework problems, takes 1 argument - the name of the section
+\renewcommand{\homeworkSectionName}{#1} % Assign \homeworkSectionName to the name of the section from the environment argument
+\subsection{\homeworkSectionName} % Make a subsection with the custom name of the subsection
+\enterProblemHeader{\homeworkProblemName\ [\homeworkSectionName]} % Header and footer within the environment
+}{
+\enterProblemHeader{\homeworkProblemName} % Header and footer after the environment
+}
+   
+\begin{document}
+
+
+\tableofcontents
+
+\pagebreak
+
+\section{INTRODUCTION}
+
+RAMBO-K is a reference-based tool for rapid and sensitive extraction of one organism’s reads from a mixed dataset. It is based on a Markov chain implementation, which uses genomic characteristics of each reference to assign reads to the associated set.
+
+
+\section{SYSTEM REQUIREMENTS}
+
+RAMBO-K is implemented in python and Java. Thus, it requires Java SE 7 as well as python 2.7 or higher (including modules numpy and matplotlib). 
+
+\section{LICENSE}
+
+RAMBO-K - Read Assignment Based On K-mers\\
+Copyright \copyright  2015  Simon H. Tausch\\
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU Lesser General Public License, version 3
+as published by the Free Software Foundation.\\
+
+This program is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU Lesser General Public License for more details.\\
+
+You should have received a copy of the GNU Lesser General Public
+License along with this program.  If not, see
+http://www.gnu.org/licenses/. 
+
+\section{RUNNING RAMBO-K}
+
+\subsection{References and names}
+To use RAMBO-K you need to specify two reference sets (using parameters $-R$ and $-r$). References must be provided as FASTA-files and may contain one or more sequences corresponding to the desired species. If you lack an exact reference, a multiple FASTA file containing sequences from a set of species related as closely as possible will also work.
+You can enter two names for the reference sets (using $-N$ corresponding to $-R$ and $-n$ corresponding to $-r$), which will be used for naming of the result files and labelling the plots.
+
+\subsection{Unassigned reads}
+The unassigned reads from the original dataset can be provided as fastq or fasta files (if fasta, filetype option must be set to $-t$ fasta) either as single- or paired end sets using parameter \textit{-1} (and \textit{-2} for paired reads). If paired end information is available, using it is strongly recommended.
+
+\subsection{File structure}
+RAMBO-K will create a folder for every new set of species (named \$name1\_\$name2\/). In that folder, the assigned reads will be saved as \$name*.fastq (or .fasta if input format is specified as fasta). All files are named according to the entered options.\\
+There will also be a folder $\$name1\_\$name2/temp/$ containing temporary files. This can be used to rerun RAMBO-K without repeating all calculations. The temp folder can be deleted after a run using the $-d$ parameter, but as temporary data are rather small and save a lot of time when re-running RAMBO-K, this is not recommended. \\
+In the $\$name1\_\$name2/graphics/$ folder you will find some plots which are helpful for determining your final parameters (see Cutoffs).
+If you want to write your results to a folder other than your working directory, use the parameter $-o$.
+
+\subsection{K-mer sizes}
+The size of $k$ has a crucial influence on assignment quality. It is recommended to run RAMBO-K over a range of $k$ values first to find a suitable value. Generally, a low $k$ ($<$5) works best if the reference is rather distant, while higher values ($>$8) can be used if a close reference is known. Best results are usually achieved with $k$ between 4 and 12. 
+You can enter $k$ as a range (e.g. $-k$ 4:8), a list (e.g. $-k$ 4,8,12) or a single integer (e.g. $-k$ 8).
+For a detailed discussion on the choice of $k$ in different cases see figures \ref{fig:1} and \ref{fig:2} in the appendix.
+
+\subsection{Cutoffs}
+After the first run of RAMBO-K, you will find a series of plots provided in the graphics folder. All plots are named according to the entered options. The score\_histogram\_*.png plot will show the theoretical distribution of scores of each species. Ideally, the curves for the two species, which are to be separated, should overlap as little as possible. Theoretical specificity and sensitivity is shown in ROCplot\_*.png. To check the correlation of the theoretical distributions to your re [...]
+Cutoffs are provided either using parameter $-c$ to assign all reads with scores lower than the given number, or parameter $-C$ to assign all reads scoring higher than the given number.\\
+For negative cutoffs, use m instead of - (e.g. m1 = -1).
+
+\subsection{Number of reads used for simulation}
+The $-a$ parameter allows you to vary the number of reads used to simulate data and draw the plots. Especially for very large reference genomes, a higher number is recommended to yield more precise results. Also, the plots will look smoother using more data. However, using higher numbers of reads for this step may slow down the precalculation phase.
+	
+\subsection{Threading}
+RAMBO-K is implemented to use multiple threads with near-linear speedup until hard drive access speed becomes the limiting factor. The number of threads is specified by parameter $-t$. Per default, multithreading is disabled. 
+
+\subsection{Hints}
+In case of low quality data, quality trimming will significantly improve the results of RAMBO-K. \\
+For a short help and explanation of all parameters type ./RAMBOK -h .
+	
+\pagebreak
+\section{EXAMPLE}
+
+
+Divide a set of viral and human reads from a paired end data set in fastq format:
+
+1) Run RAMBO-K to evaluate optimal parameters (optional):\\
+
+\MyCBox{
+\color{light-gray}\tt \small ./RAMBOK -r human\_reference\_sequences.fasta -n H.sapiens -R viral\_reference\_sequences.fasta \\ 
+-N virus -1 unassigned\_reads.1.fastq -2 unassigned\_reads.2.fastq -t 4 -k 4:8}
+\\
+\\
+\\
+2) Now take a look at the plots to choose ideal values of $k$ and $c$. Run RAMBO-K again specifying $k$ and $c$. Do not change any of the other options except for the number of threads. In this example, the best discrimination would be obtained using $k$ = 6 and setting the cutoff to 0. To assign viral reads while dismissing the human background, run: \\
+
+\MyCBox{\color{light-gray}\tt \small ./RAMBOK -r human\_reference\_sequences.fasta -n H.sapiens -R viral\_reference\_sequences.fasta \\
+-N virus -1 unassigned\_reads.1.fastq -2 unassigned\_reads.2.fastq -t 4 -k 6 -c 0}
+\\
+\\
+\\
+Results will be saved in virus\_cutoff\_0\_k\_6\_1.fastq and virus\_cutoff\_0\_k\_6\_2.fastq in your working directory or, if specified, the output folder.\\
+Or, to assign the human reads while dismissing viral reads, run:\\
+
+\MyCBox{\color{light-gray}\tt \small ./RAMBOK -r human\_reference\_sequences.fasta -n H.sapiens -R viral\_reference\_sequences.fasta \\ 
+-N virus -1 unassigned\_reads.1.fastq -2 unassigned\_reads.2.fastq -t 4 -k 6 -C 0}
+\\
+\\
+\\
+Results will be saved in H.sapiens\_cutoff\_0\_k\_6\_1.fastq and H.sapiens\_cutoff\_0\_k\_6\_2.fastq in your working directory or, if specified, the output folder. \\
+
+Or, to divide a set of viral and human reads from a single end data set in fasta format:\\
+1) Run RAMBO-K to evaluate optimal parameters (optional):\\
+
+\MyCBox{\color{light-gray}\tt \small ./RAMBOK -r human\_reference\_sequences.fasta -n H.sapiens -R viral\_reference\_sequences.fasta \\ 
+-N virus -1 unassigned\_reads.1.fasta -f fasta -t 4 -k 4:8 }
+\\
+\\
+\\
+2) Now take a look at the plots to choose ideal values of $k$ and c. Run RAMBO-K again specifying $k$ and $c$. Do not change any of the other options except for the number of threads. In this example, the best discrimination would be obtained using $k$ = 10 and setting the cutoff to -10. To assign viral reads while dismissing the human background, run: \\
+
+\MyCBox{\color{light-gray}\tt \small  ./RAMBOK -r human\_reference\_sequences.fasta -n H.sapiens -R viral\_reference\_sequences.fasta \\ -N virus -1 unassigned\_reads.1.fasta -f fasta -t 4 -k 10 -c m10}
+\\
+\\
+\\
+Results will be saved in virus\_cutoff\_m10\_k\_10.fasta in your working directory or, if specified, the output folder. 	
+Or, to assign the human reads while dismissing viral reads, run:\\
+
+\MyCBox{\color{light-gray}\tt \small ./RAMBOK -r human\_reference\_sequences.fasta -n H.sapiens -R viral\_reference\_sequences.fasta \\ 
+-N virus -1 unassigned\_reads.1.fasta -f fasta -t 4 -k 10 -C m10}
+\\
+\\
+\\	
+Results will be saved in H.sapiens\_cutoff\_m10\_k\_10.fasta in your working directory or, if specified, the output folder. 	
+\pagebreak
+\begin{figure}[h]
+\centering
+\includegraphics[height=20cm]{Parameter_estimation.png}
+\caption{ROC-plots (right) and fitted histograms (left) of mixed reads of pox and human background. Theoretical read separation (as represented by the ROC-plots) and fitting of the score distributions (represented by the fitted histograms) improve with increasing $k$. An optimal choice of values would here be $k=8$ and $c=0$.}
+\label{fig:1}
+\end{figure}
+
+\begin{figure}[h]
+\centering
+\includegraphics[height=20cm]{parameter_estimation_batadeno.png}
+\caption{ROC-plots (right) and fitted histograms (left) of mixed reads of bat adenovirus and bat background. Training was executed using bat sequences and more distant canine adenovirus sequences. Theoretical read separation (as represented by the ROC-plots) improves with increasing $k$, but fitting of the score distributions (represented by the fitted histograms) diminishes. This is due to the fact that no k-mers of high length are identical in the canine adenovirus reference and the re [...]
+\label{fig:2}
+\end{figure}
+
+\end{document}
diff --git a/ReadmeLatex_src/heatmap_batadeno.jpg b/ReadmeLatex_src/heatmap_batadeno.jpg
new file mode 100755
index 0000000..2e40a02
Binary files /dev/null and b/ReadmeLatex_src/heatmap_batadeno.jpg differ
diff --git a/ReadmeLatex_src/heatmap_full.png b/ReadmeLatex_src/heatmap_full.png
new file mode 100755
index 0000000..06bd203
Binary files /dev/null and b/ReadmeLatex_src/heatmap_full.png differ
diff --git a/ReadmeLatex_src/parameter_estimation_batadeno.png b/ReadmeLatex_src/parameter_estimation_batadeno.png
new file mode 100755
index 0000000..962639c
Binary files /dev/null and b/ReadmeLatex_src/parameter_estimation_batadeno.png differ
diff --git a/plot.py b/plot.py
old mode 100644
new mode 100755
diff --git a/readme.md b/readme.md
new file mode 100644
index 0000000..f36baee
--- /dev/null
+++ b/readme.md
@@ -0,0 +1,122 @@
+INTRODUCTION
+============
+
+RAMBO-K is a reference-based tool for rapid and sensitive extraction of one organism.s reads from a mixed dataset. It is based on a Markov chain implementation, which uses genomic characteristics of each reference to assign reads to the associated set.
+
+
+SYSTEM REQUIREMENTS
+===================
+
+RAMBO-K is implemented in python and Java. Thus, it requires Java SE 7 as well as python 2.7 or higher (including modules numpy and matplotlib). 
+
+
+LICENSE
+=======
+
+Copyright (C) 2015  Simon H. Tausch
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU Lesser General Public License, version 3
+as published by the Free Software Foundation.
+
+This program is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU Lesser General Public License for more details.
+
+You should have received a copy of the GNU Lesser General Public
+License along with this program.  If not, see
+<http://www.gnu.org/licenses/>. 
+
+
+RUNNING RAMBO-K
+===============
+
+References and names
+--------------------
+
+To use RAMBO-K you need to specify two reference sets (using parameters -R and -r). References must be provided as fasta-files and may contain one or more sequences corresponding to the desired species. If you lack an exact reference, a multiple FASTA file containing sequences from a set of species related as closely as possible will also work.
+You can enter two names for the reference sets (using -N corresponding to -R and -n corresponding to -r), which will be used for naming of the result files and labelling the plots.
+
+Unassigned reads
+----------------
+
+The unassigned reads from the original dataset can be provided as fastq or fasta files (if fasta, filetype option must be set to -t fasta) either as single- or paired end sets using parameter -1 (and -2 for paired reads). If paired end information is available, using it is strongly recommended.
+
+File structure
+--------------
+
+RAMBO-K will create a folder for every new set of species (named $name1_$name2/). In that folder, the assigned reads will be saved as $name*.fastq (or .fasta if input format is specified as fasta). All files are named according to the entered options.
+There will also be a folder $name1_$name2/temp/ containing temporary files. This can be used to rerun RAMBO-K without repeating all calculations. The temp folder can be deleted after a run using the -d parameter, but as temporary data are rather small and save a lot of time when re-running RAMBO-K, this is not recommended. 
+In the $name1_$name2/graphics/ folder you will find some plots which are helpful for determining your final parameters (see Cutoffs).
+If you want to write your results to a folder other than your working directory, use the parameter -o.
+
+K-mer sizes
+-----------
+
+The size of k has a crucial influence on assignment quality. It is recommended to run RAMBO-K over a range of k values first to find a suitable value. Generally, a low k (<5) works best if the reference is rather distant, while higher values (>8) can be used if a close reference is known. Best results are usually achieved with k between 4 and 12. 
+You can enter k as a range (e.g. -k 4:8), a list (e.g. -k 4,8,12) or a single integer (e.g. -k 8). For a more detailed discussion on the choice of k see the provided file Readme.pdf
+
+Cutoffs
+-------
+
+After the first run of RAMBO-K, you will find a series of plots provided in the graphics folder. All plots are named according to the entered options. The score_histogram_*.png plot will show the theoretical distribution of scores of each species. Ideally, the curves for the two species, which are to be separated, should overlap as little as possible. Theoretical specificity and sensitivity is shown in ROCplot_*.png. To check the correlation of the theoretical distributions to your real  [...]
+Cutoffs are provided either using parameter -c to assign all reads with scores lower than the given number, or parameter -C to assign all reads scoring higher than the given number.
+For negative cutoffs, use m instead of - (e.g. m1 = -1).
+
+Number of reads used for simulation
+-----------------------------------
+
+The -a parameter allows you to vary the number of reads used to simulate data and draw the plots. Especially for very large reference genomes, a higher number is recommended to yield more precise results. Also, the plots will look smoother using more data. However, using higher numbers of reads for this step may slow down the precalculation phase.
+
+Threading
+---------
+
+RAMBO-K is implemented to use multiple threads with near-linear speedup until hard drive access speed becomes the limiting factor. The number of threads is specified by parameter -t. Per default, multithreading is disabled. 
+
+Hint
+----
+
+In case of low quality data, quality trimming will significantly improve the results of RAMBO-K. 
+For a short help and explanation of all parameters type ./RAMBOK -h .
+	
+	
+EXAMPLE
+=======
+
+Divide a set of viral and human reads from a paired end data set in fastq format:
+
+1) Run RAMBO-K to evaluate optimal parameters (optional):
+
+	./RAMBOK -r human_reference_sequences.fasta -n H.sapiens -R viral_reference_sequences.fasta -N virus -1 unassigned_reads.1.fastq -2 unassigned_reads.2.fastq -t 4 -k 4:8 
+
+2) Now take a look at the plots to choose ideal values of k and c. Run RAMBO-K again specifying k and c. Do not change any of the other options except for the number of threads. In this example, the best discrimination would be obtained using k = 6 and setting the cutoff to 0. To assign viral reads while dismissing the human background, run: 
+
+	./RAMBOK -r human_reference_sequences.fasta -n H.sapiens -R viral_reference_sequences.fasta -N virus -1 unassigned_reads.1.fastq -2 unassigned_reads.2.fastq -t 4 -k 6 -c 0
+
+Results will be saved in virus_cutoff=0_k=6_1.fastq and virus_cutoff=0_k=6_2.fastq in your working directory or, if specified, the output folder. 	
+Or, to assign the human reads while dismissing viral reads, run:
+
+	./RAMBOK -r human_reference_sequences.fasta -n H.sapiens -R viral_reference_sequences.fasta -N virus -1 unassigned_reads.1.fastq -2 unassigned_reads.2.fastq -t 4 -k 6 -C 0
+	
+Results will be saved in H.sapiens_cutoff=0_k=6_1.fastq and H.sapiens_cutoff=0_k=6_2.fastq in your working directory or, if specified, the output folder. 
+
+	
+Or, to divide a set of viral and human reads from a single end data set in fasta format:
+
+1) Run RAMBO-K to evaluate optimal parameters (optional):
+
+	./RAMBOK -r human_reference_sequences.fasta -n H.sapiens -R viral_reference_sequences.fasta -N virus -1 unassigned_reads.1.fasta -f fasta -t 4 -k 4:8 
+
+2) Now take a look at the plots to choose ideal values of k and c. Run RAMBO-K again specifying k and c. Do not change any of the other options except for the number of threads. In this example, the best discrimination would be obtained using k = 10 and setting the cutoff to -10. To assign viral reads while dismissing the human background, run: 
+
+	./RAMBOK -r human_reference_sequences.fasta -n H.sapiens -R viral_reference_sequences.fasta -N virus -1 unassigned_reads.1.fasta -f fasta -t 4 -k 10 -c m10
+
+	
+Results will be saved in virus_cutoff=m10_k=10.fasta in your working directory or, if specified, the output folder. 	
+Or, to assign the human reads while dismissing viral reads, run:
+
+	./RAMBOK -r human_reference_sequences.fasta -n H.sapiens -R viral_reference_sequences.fasta -N virus -1 unassigned_reads.1.fasta -f fasta -t 4 -k 10 -C m10
+	
+Results will be saved in H.sapiens_cutoff=m10_k=10.fasta in your working directory or, if specified, the output folder. 	
+
diff --git a/simulate_reads.py b/simulate_reads.py
old mode 100644
new mode 100755

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/rambo-k.git



More information about the debian-med-commit mailing list