[med-svn] [Git][med-team/fastqc][upstream] New upstream version 0.11.9+dfsg
Steffen Möller
gitlab at salsa.debian.org
Sun Jan 12 21:48:22 GMT 2020
Steffen Möller pushed to branch upstream at Debian Med / fastqc
Commits:
d1c9f051 by Steffen Moeller at 2020-01-12T22:40:39+01:00
New upstream version 0.11.9+dfsg
- - - - -
16 changed files:
- Help/3 Analysis Modules/12 Per Tile Sequence Quality.html
- INSTALL.txt
- RELEASE_NOTES.txt
- fastqc
- uk/ac/babraham/FastQC/Analysis/AnalysisRunner.java
- uk/ac/babraham/FastQC/Analysis/OfflineRunner.java
- uk/ac/babraham/FastQC/FastQCApplication.java
- uk/ac/babraham/FastQC/Graphs/LineGraph.java
- uk/ac/babraham/FastQC/Modules/AdapterContent.java
- uk/ac/babraham/FastQC/Modules/BasicStats.java
- uk/ac/babraham/FastQC/Modules/DuplicationLevel.java
- uk/ac/babraham/FastQC/Modules/PerTileQualityScores.java
- uk/ac/babraham/FastQC/Modules/SequenceLengthDistribution.java
- uk/ac/babraham/FastQC/Results/ResultsPanel.java
- uk/ac/babraham/FastQC/Sequence/Fast5File.java
- uk/ac/babraham/FastQC/Utilities/NanoporeBasename.java
Changes:
=====================================
Help/3 Analysis Modules/12 Per Tile Sequence Quality.html
=====================================
@@ -45,7 +45,7 @@ tiles.
<h2>Failure</h2>
<p>
-This module will issue a warning if any tile shows a mean Phred
+This module will raise and error if any tile shows a mean Phred
score more than 5 less than the mean for that base across all
tiles.
</p>
=====================================
INSTALL.txt
=====================================
@@ -1,147 +1,152 @@
-Installing FastQC
--------------------
-FastQC is a java application. In order to run it needs your system to have a suitable
-Java Runtime Environment (JRE) installed. Before you try to run FastQC you should therefore
-ensure that you have a suitable JRE. There are a number of different JREs available
-however the ones we have tested are the v1.6-v1.8 JREs from Oracle. These are available
-for a number of different platforms.
-
-Windows/Linux: Go to java.com - click on Free Java Download - DON'T click the large red button
-but choose the smaller link to "See all java downloads". Find your operating system and select
-the appropriate offline installer. If you are using a 64bit operating system (and nearly
-everyone is these days), then make sure you select the 64bit version of the the installer.
-
-OSX: On newer versions of OSX you need to install the Java Development Kit. The normal Java
-runtime environment IS NOT enough. To get this go to java.com, click "Free java download",
-then IGNORE the big red button, and select "See all java downloads", on the next screen select
-"Looking for the JDK?" from the left hand menu and select the link to "JDK downloads" in the
-first paragraph. You can then click the "Download" button underneath JDK in the page you are
-taken to. Sorry this is such a pain!
-
-
-If you're not sure whether you have java installed then you can test this from a command
-prompt. To get a command prompt try:
-
-Windows: Select Start > Run, and type 'cmd' (no quotes) in the box which appears, press OK
-
-MaxOSX: Run Applications > Utilities > Terminal
-
-Linux: From your applications menu look for an application called 'Terminal' or 'Konsole'.
-Either of these will give you a usable shell.
-
-At the command prompt type 'java -version' and press enter. You should see something like:
-
-java version "1.8.0_60"
-Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
-Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
-
-If you get an error then you don't have java installed. If the version listed on the first
-line is less than 1.6 then you might have problems running FastQC.
-
-Actually installing FastQC is as simple as unzipping the zip file it comes in into a
-suitable location. That's it. Once unzipped it's ready to go.
-
-Running FastQC
---------------
-
-You can run FastQC in one of two modes, either as an interactive graphical application
-in which you can dynamically load FastQ files and view their results.
-
-Alternatively you can run FastQC in a non-interactive mode where you specify the files
-you want to process on the command line and FastQC will generate an HTML report for
-each file without launching a user interface. This would allow FastQC to be run as
-part of an analysis pipeline.
-
-
-Running FastQC Interactively
-----------------------------
-Windows: Simply double click on the run_fastqc bat file. If you want to make a pretty
-shortcut then we've included an icon file in the top level directory so you don't have
-to use the generic bat file icon.
-
-MacOSX: There is an application bundle for MacOSX which you can use to install and run
-FastQC. Just drag the application from the disk image to your Applications folder (or
-wherever you want to install the program).
-
-Linux: We have included a wrapper script, called 'fastqc' which is the easiest way to
-start the program. The wrapper is in the top level of the FastQC installation. You
-may need to make this file executable:
-
-chmod 755 fastqc
-
-..but once you have done that you can run it directly
-
-./fastqc
-
-..or place a link in /usr/local/bin to be able to run the program from any location:
-
-sudo ln -s /path/to/FastQC/fastqc /usr/local/bin/fastqc
-
-
-Running FastQC as part of a pipeline
-------------------------------------
-To run FastQC non-interactively you should use the fastqc wrapper script to launch
-the program. You will probably want to use the zipped install file on every platform
-(even OSX).
-
-To run non-interactively you simply have to specify a list of files to process
-on the commandline
-
-fastqc somefile.txt someotherfile.txt
-
-You can specify as many files to process in a single run as you like. If you don't
-specify any files to process the program will try to open the interactive application
-which may result in an error if you're running in a non-graphical environment.
-
-There are a few extra options you can specify when running non-interactively. Full
-details of these can be found by running
-
-fastqc --help
-
-By default, in non-interactive mode FastQC will create an HTML report with embedded
-graphs, but also a zip file containing individual graph files and additional data files
-containing the raw data from which plots were drawn. The zip file will not be extracted
-by default but you can enable this by adding:
-
---extract
-
-To the launch command.
-
-If you want to save your reports in a folder other than the folder which contained
-your original FastQ files then you can specify an alternative location by setting a
---outdir value:
-
---outdir=/some/other/dir/
-
-If you want to run fastqc on a stream of data to be read from standard input then you
-can do this by specifing 'stdin' as the name of the file to be processed and then
-streaming uncompressed fastq format data to the program. For example:
-
-zcat *fastq.gz | fastqc stdin
-
-If you want the results from a streamed analysis sent to a file with a name other than
-stdin then you can add a colon and put the file name you want, for example:
-
-zcat *fastq.gz | fastqc stdin:my_results
-
-..would write results to my_result.html and my_results.zip.
-
-
-Customising the report output
------------------------------
-
-If you want to run FastQC as part of a sequencing pipeline you may wish to change the
-formatting of the report to add in your own branding or to include extra information.
-
-In the Templates directory you will find a file called 'header_template.html' which
-you can edit to change the look of the report. This file contains all of the header for
-the report file, including the CSS section and you can alter this however you see fit.
-
-Whilst you can make whatever changes you like you should probably leave in place the
-<div> structure of the html template since later code will expect to close the main div
-which is left open at the end of the header. There is no facility to change the code in
-the main body of the report or the footer (although you can of course change the styling).
-
-The text tags @@FILENAME@@ and @@DATE@@ are placeholders which are filled in when the
-report it created. You can use these placeholders in other parts of the header if you
-wish.
+Installing FastQC
+-------------------
+
+OSX
+---
+FastQC is distributed as a DMG image file. Download the image from the project page
+and double click it to open it. You should see the FastQC application appear in a
+Finder window. Drag the application from there to wherever you want to install it
+on your machine. Once you've copied the application double click it to open it.
+
+FastQC is not a signed application therefore it may initially be blocked by the
+Gatekeeper application. To avoid this open FastQC by right clicking on the app
+and selecting open. This may prompt you to allow it to open. If it is still
+blocked go to System Preferences > Security and Privacy and you should see an option
+to allow the application to open. You only need to do this once and the preference
+should be remembered by OSX.
+
+Windows and Linux
+-----------------
+FastQC is a java application. In order to run it needs your system to have a suitable
+Java Runtime Environment (JRE) installed. Before you try to run FastQC you should
+therefore ensure that you have a suitable JRE. There are a number of different JREs
+available however the ones we have tested are the latest Oracle runtime environments
+and those from the adoptOpenJDK project (https://adoptopenjdk.net/). You need to
+download and install a suitable 64-bit JRE and make sure that the java application
+is in your path (most installers will take care of this for you).
+
+On linux most distributions will have java installed already so you might not need to
+do anything. If java isn't installed then you can add it by doing:
+
+Ubuntu / Mint: sudo apt install default-jre
+
+CentOS / Redhat: sudo yum install java-1.8.0-openjdk
+
+You can check whether java is installed by opening the 'cmd' program on windows, or
+any shell on linux and typing:
+
+java -version
+
+You should see something like:
+
+>java -version
+openjdk version "11.0.2" 2019-01-15
+OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.2+9)
+OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.2+9, mixed mode)
+
+
+Actually installing FastQC is as simple as unzipping the zip file it comes in into a
+suitable location. That's it. Once unzipped it's ready to go.
+
+Running FastQC
+--------------
+
+You can run FastQC in one of two modes, either as an interactive graphical application
+in which you can dynamically load FastQ files and view their results.
+
+Alternatively you can run FastQC in a non-interactive mode where you specify the files
+you want to process on the command line and FastQC will generate an HTML report for
+each file without launching a user interface. This would allow FastQC to be run as
+part of an analysis pipeline.
+
+
+Running FastQC Interactively
+----------------------------
+Windows: Simply double click on the run_fastqc bat file. If you want to make a pretty
+shortcut then we've included an icon file in the top level directory so you don't have
+to use the generic bat file icon.
+
+MacOSX: Double click on the FastQC application icon.
+
+Linux: We have included a wrapper script, called 'fastqc' which is the easiest way to
+start the program. The wrapper is in the top level of the FastQC installation. You
+may need to make this file executable:
+
+chmod 755 fastqc
+
+..but once you have done that you can run it directly
+
+./fastqc
+
+..or place a link in /usr/local/bin to be able to run the program from any location:
+
+sudo ln -s /path/to/FastQC/fastqc /usr/local/bin/fastqc
+
+
+Running FastQC as part of a pipeline
+------------------------------------
+To run FastQC non-interactively you should use the fastqc wrapper script to launch
+the program. You will probably want to use the zipped install file on every platform
+(even OSX).
+
+To run non-interactively you simply have to specify a list of files to process
+on the commandline
+
+fastqc somefile.txt someotherfile.txt
+
+You can specify as many files to process in a single run as you like. If you don't
+specify any files to process the program will try to open the interactive application
+which may result in an error if you're running in a non-graphical environment.
+
+There are a few extra options you can specify when running non-interactively. Full
+details of these can be found by running
+
+fastqc --help
+
+By default, in non-interactive mode FastQC will create an HTML report with embedded
+graphs, but also a zip file containing individual graph files and additional data files
+containing the raw data from which plots were drawn. The zip file will not be extracted
+by default but you can enable this by adding:
+
+--extract
+
+To the launch command.
+
+If you want to save your reports in a folder other than the folder which contained
+your original FastQ files then you can specify an alternative location by setting a
+--outdir value:
+
+--outdir=/some/other/dir/
+
+If you want to run fastqc on a stream of data to be read from standard input then you
+can do this by specifing 'stdin' as the name of the file to be processed and then
+streaming uncompressed fastq format data to the program. For example:
+
+zcat *fastq.gz | fastqc stdin
+
+If you want the results from a streamed analysis sent to a file with a name other than
+stdin then you can add a colon and put the file name you want, for example:
+
+zcat *fastq.gz | fastqc stdin:my_results
+
+..would write results to my_result.html and my_results.zip.
+
+
+Customising the report output
+-----------------------------
+
+If you want to run FastQC as part of a sequencing pipeline you may wish to change the
+formatting of the report to add in your own branding or to include extra information.
+
+In the Templates directory you will find a file called 'header_template.html' which
+you can edit to change the look of the report. This file contains all of the header for
+the report file, including the CSS section and you can alter this however you see fit.
+
+Whilst you can make whatever changes you like you should probably leave in place the
+<div> structure of the html template since later code will expect to close the main div
+which is left open at the end of the header. There is no facility to change the code in
+the main body of the report or the footer (although you can of course change the styling).
+
+The text tags @@FILENAME@@ and @@DATE@@ are placeholders which are filled in when the
+report it created. You can use these placeholders in other parts of the header if you
+wish.
=====================================
RELEASE_NOTES.txt
=====================================
@@ -1,3 +1,22 @@
+RELEASE NOTES FOR FastQC v0.11.9
+--------------------------------
+
+This is a bugfix release which resolves some issues with the program;
+
+- We removed the native look and feel from the linux application since
+ it's horribly broken
+
+- Fixed a hang if a run terminated from an out-of-memory error
+
+- Fixed a corner case where adapters could occasionally be double-counted
+
+- Updated the fast5 parser to account for the newer format multi-read
+ oxford nanopore fast5 files
+
+- Fixed problems if analysing a completely blank file
+
+
+
RELEASE NOTES FOR FastQC v0.11.8
--------------------------------
=====================================
fastqc
=====================================
@@ -45,6 +45,29 @@ else {
$ENV{CLASSPATH} = "$RealBin$delimiter$RealBin/sam-1.103.jar$delimiter$RealBin/jbzip2-0.9.jar$delimiter$RealBin/cisd-jhdf5.jar";
}
+
+# We need to find the java interpreter. We'll start from the assumption that this
+# is included in the path.
+
+my $java_bin = "java";
+
+# We might have bundled a jre with the installation. If that's the case then we'll
+# use the interpreter which is bundled in preference to the system one.
+
+# Windows first
+if (-e "$RealBin/jre/bin/java.exe") {
+ $java_bin = "$RealBin/jre/bin/java.exe";
+}
+# Linux
+elsif (-e "$RealBin/jre/bin/java") {
+ $java_bin = "$RealBin/jre/bin/java";
+}
+# OSX
+elsif (-e "$RealBin/jre/Contents/Home/bin/java") {
+ $java_bin = "$RealBin/jre/Contents/Home/bin/java";
+}
+
+
my @java_args;
my @files;
@@ -79,7 +102,6 @@ my $nano;
my $nofilter;
my $kmer_size;
my $temp_directory;
-my $java_bin = 'java';
my $min_length;
my $result = GetOptions('version' => \$version,
@@ -234,6 +256,8 @@ if ($format) {
}
if ($java_bin ne 'java') {
+
+ warn "Java is $java_bin\n";
# $java_bin =~ s/\\/\//g;
unless (-e $java_bin) {
=====================================
uk/ac/babraham/FastQC/Analysis/AnalysisRunner.java
=====================================
@@ -1,115 +1,128 @@
-/**
- * Copyright Copyright 2010-17 Simon Andrews
- *
- * This file is part of FastQC.
- *
- * FastQC is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 3 of the License, or
- * (at your option) any later version.
- *
- * FastQC is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with FastQC; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-package uk.ac.babraham.FastQC.Analysis;
-
-import java.util.ArrayList;
-import java.util.Iterator;
-import java.util.List;
-
-import uk.ac.babraham.FastQC.Modules.QCModule;
-import uk.ac.babraham.FastQC.Sequence.Sequence;
-import uk.ac.babraham.FastQC.Sequence.SequenceFile;
-import uk.ac.babraham.FastQC.Sequence.SequenceFormatException;
-
-public class AnalysisRunner implements Runnable {
-
- private SequenceFile file;
- private QCModule [] modules;
- private List<AnalysisListener> listeners = new ArrayList<AnalysisListener>();
- private int percentComplete = 0;
-
- public AnalysisRunner (SequenceFile file) {
- this.file = file;
- }
-
- public void addAnalysisListener (AnalysisListener l) {
- if (l != null && !listeners.contains(l)) {
- listeners.add(l);
- }
- }
-
- public void removeAnalysisListener (AnalysisListener l) {
- if (l != null && listeners.contains(l)) {
- listeners.remove(l);
- }
- }
-
-
- public void startAnalysis (QCModule [] modules) {
- this.modules = modules;
- for (int i=0;i<modules.length;i++) {
- modules[i].reset();
- }
- AnalysisQueue.getInstance().addToQueue(this);
- }
-
- public void run() {
-
- Iterator<AnalysisListener> i = listeners.iterator();
- while (i.hasNext()) {
- i.next().analysisStarted(file);
- }
-
-
- int seqCount = 0;
- while (file.hasNext()) {
- ++seqCount;
- Sequence seq;
- try {
- seq = file.next();
- }
- catch (SequenceFormatException e) {
- i = listeners.iterator();
- while (i.hasNext()) {
- i.next().analysisExceptionReceived(file,e);
- }
- return;
- }
-
- for (int m=0;m<modules.length;m++) {
- if (seq.isFiltered() && modules[m].ignoreFilteredSequences()) continue;
- modules[m].processSequence(seq);
- }
-
- if (seqCount % 1000 == 0) {
- if (file.getPercentComplete() >= percentComplete+5) {
-
- percentComplete = (((int)file.getPercentComplete())/5)*5;
-
- i = listeners.iterator();
- while (i.hasNext()) {
- i.next().analysisUpdated(file,seqCount,percentComplete);
- }
- try {
- Thread.sleep(10);
- }
- catch (InterruptedException e) {}
- }
- }
- }
-
- i = listeners.iterator();
- while (i.hasNext()) {
- i.next().analysisComplete(file,modules);
- }
-
- }
-
-}
+/**
+ * Copyright Copyright 2010-17 Simon Andrews
+ *
+ * This file is part of FastQC.
+ *
+ * FastQC is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * FastQC is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with FastQC; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+package uk.ac.babraham.FastQC.Analysis;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import uk.ac.babraham.FastQC.Modules.BasicStats;
+import uk.ac.babraham.FastQC.Modules.QCModule;
+import uk.ac.babraham.FastQC.Sequence.Sequence;
+import uk.ac.babraham.FastQC.Sequence.SequenceFile;
+import uk.ac.babraham.FastQC.Sequence.SequenceFormatException;
+
+public class AnalysisRunner implements Runnable {
+
+ private SequenceFile file;
+ private QCModule [] modules;
+ private List<AnalysisListener> listeners = new ArrayList<AnalysisListener>();
+ private int percentComplete = 0;
+
+ public AnalysisRunner (SequenceFile file) {
+ this.file = file;
+ }
+
+ public void addAnalysisListener (AnalysisListener l) {
+ if (l != null && !listeners.contains(l)) {
+ listeners.add(l);
+ }
+ }
+
+ public void removeAnalysisListener (AnalysisListener l) {
+ if (l != null && listeners.contains(l)) {
+ listeners.remove(l);
+ }
+ }
+
+
+ public void startAnalysis (QCModule [] modules) {
+ this.modules = modules;
+ for (int i=0;i<modules.length;i++) {
+ modules[i].reset();
+ }
+ AnalysisQueue.getInstance().addToQueue(this);
+ }
+
+ public void run() {
+
+ Iterator<AnalysisListener> i = listeners.iterator();
+ while (i.hasNext()) {
+ i.next().analysisStarted(file);
+ }
+
+
+ int seqCount = 0;
+ while (file.hasNext()) {
+ ++seqCount;
+ Sequence seq;
+ try {
+ seq = file.next();
+ }
+ catch (SequenceFormatException e) {
+ i = listeners.iterator();
+ while (i.hasNext()) {
+ i.next().analysisExceptionReceived(file,e);
+ }
+ return;
+ }
+
+ for (int m=0;m<modules.length;m++) {
+ if (seq.isFiltered() && modules[m].ignoreFilteredSequences()) continue;
+ modules[m].processSequence(seq);
+ }
+
+ if (seqCount % 1000 == 0) {
+ if (file.getPercentComplete() >= percentComplete+5) {
+
+ percentComplete = (((int)file.getPercentComplete())/5)*5;
+
+ i = listeners.iterator();
+ while (i.hasNext()) {
+ i.next().analysisUpdated(file,seqCount,percentComplete);
+ }
+ try {
+ Thread.sleep(10);
+ }
+ catch (InterruptedException e) {}
+ }
+ }
+ }
+
+ // We need to account for their potentially being no sequences
+ // in the file. In this case the BasicStats module never gets
+ // the file name so we need to explicitly pass it.
+
+ if (seqCount == 0) {
+ for (int m=0; m<modules.length; m++) {
+ if (modules[m] instanceof BasicStats) {
+ ((BasicStats)modules[m]).setFileName(file.name());
+ }
+ }
+ }
+
+ i = listeners.iterator();
+ while (i.hasNext()) {
+ i.next().analysisComplete(file,modules);
+ }
+
+ }
+
+}
=====================================
uk/ac/babraham/FastQC/Analysis/OfflineRunner.java
=====================================
@@ -1,214 +1,221 @@
-/**
- * Copyright Copyright 2010-17 Simon Andrews
- *
- * This file is part of FastQC.
- *
- * FastQC is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 3 of the License, or
- * (at your option) any later version.
- *
- * FastQC is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with FastQC; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-package uk.ac.babraham.FastQC.Analysis;
-
-import java.io.File;
-import java.io.IOException;
-import java.util.Vector;
-import java.util.concurrent.atomic.AtomicInteger;
-
-import uk.ac.babraham.FastQC.FastQCConfig;
-import uk.ac.babraham.FastQC.Modules.ModuleFactory;
-import uk.ac.babraham.FastQC.Modules.QCModule;
-import uk.ac.babraham.FastQC.Report.HTMLReportArchive;
-import uk.ac.babraham.FastQC.Sequence.SequenceFactory;
-import uk.ac.babraham.FastQC.Sequence.SequenceFile;
-import uk.ac.babraham.FastQC.Utilities.CasavaBasename;
-import uk.ac.babraham.FastQC.Utilities.NanoporeBasename;
-
-public class OfflineRunner implements AnalysisListener {
-
- private AtomicInteger filesRemaining;
- private boolean showUpdates = true;
-
- public OfflineRunner (String [] filenames) {
-
- // See if we need to show updates
- showUpdates = !FastQCConfig.getInstance().quiet;
-
- Vector<File> files = new Vector<File>();
-
- // We make a special case if they supply a single filename
- // which is stdin. In this case we'll take data piped to us
- // rather than trying to read the actual file. We'll also
- // skip the existence check.
-
- if (filenames.length == 1 && filenames[0].startsWith("stdin")) {
- files.add(new File(filenames[0]));
- }
- else {
- for (int f=0;f<filenames.length;f++) {
- File file = new File(filenames[f]);
-
- if (!file.exists() || ! file.canRead()) {
- System.err.println("Skipping '"+filenames[f]+"' which didn't exist, or couldn't be read");
- continue;
- }
-
- if (FastQCConfig.getInstance().nano && file.isDirectory()) {
- File [] fast5files = file.listFiles();
- for (int i=0;i<fast5files.length;i++) {
- if (fast5files[i].getName().endsWith(".fast5")) {
- files.add(fast5files[i]);
- }
- }
-
- // In newer nanopore software instances the fast5 files are
- // put into subdirectories of the main one specified so we
- // also need to look into those as well.
- for (int i=0;i<fast5files.length;i++) {
- if (fast5files[i].isDirectory()) {
- File [] subFast5files = fast5files[i].listFiles();
-
- for (int j=0;j<subFast5files.length;i++) {
- if (subFast5files[j].getName().endsWith(".fast5")) {
- files.add(subFast5files[j]);
- }
- }
-
- }
- }
-
- }
- else {
- files.add(file);
- }
- }
- }
-
-
- File [][] fileGroups;
-
- // See if we need to group together files from a casava group
- if (FastQCConfig.getInstance().casava) {
- fileGroups = CasavaBasename.getCasavaGroups(files.toArray(new File[0]));
- }
- else if (FastQCConfig.getInstance().nano) {
- fileGroups = NanoporeBasename.getNanoporeGroups(files.toArray(new File[0]));
- }
- else {
- fileGroups = new File [files.size()][1];
- for (int f=0;f<files.size();f++) {
- fileGroups[f][0] = files.elementAt(f);
- }
- }
-
-
- filesRemaining = new AtomicInteger(fileGroups.length);
-
- boolean somethingFailed = false;
-
- for (int i=0;i<fileGroups.length;i++) {
-
- try {
- processFile(fileGroups[i]);
- }
- catch (Exception e) {
- System.err.println("Failed to process "+fileGroups[i][0]);
- e.printStackTrace();
- filesRemaining.decrementAndGet();
- somethingFailed = true;
- }
- }
-
- // We need to hold this class open as otherwise the main method
- // exits when it's finished.
- while (filesRemaining.intValue() > 0) {
- try {
- Thread.sleep(1000);
- }
- catch (InterruptedException e) {}
- }
- if (somethingFailed) {
- System.exit(1);
- }
- System.exit(0);
-
- }
-
- public void processFile (File [] files) throws Exception {
- for (int f=0;f<files.length;f++) {
- if (!files[f].getName().startsWith("stdin") && !files[f].exists()) {
- throw new IOException(files[f].getName()+" doesn't exist");
- }
- }
- SequenceFile sequenceFile = SequenceFactory.getSequenceFile(files);
-
- AnalysisRunner runner = new AnalysisRunner(sequenceFile);
- runner.addAnalysisListener(this);
-
- QCModule [] module_list = ModuleFactory.getStandardModuleList();
-
- runner.startAnalysis(module_list);
-
- }
-
- public void analysisComplete(SequenceFile file, QCModule[] results) {
- File reportFile;
-
- if (showUpdates) System.out.println("Analysis complete for "+file.name());
-
-
- if (FastQCConfig.getInstance().output_dir != null) {
- String fileName = file.getFile().getName().replaceAll("stdin:","").replaceAll("\\.gz$","").replaceAll("\\.bz2$","").replaceAll("\\.txt$","").replaceAll("\\.fastq$", "").replaceAll("\\.fq$", "").replaceAll("\\.csfastq$", "").replaceAll("\\.sam$", "").replaceAll("\\.bam$", "")+"_fastqc.html";
- reportFile = new File(FastQCConfig.getInstance().output_dir+"/"+fileName);
- }
- else {
- reportFile = new File(file.getFile().getAbsolutePath().replaceAll("stdin:","").replaceAll("\\.gz$","").replaceAll("\\.bz2$","").replaceAll("\\.txt$","").replaceAll("\\.fastq$", "").replaceAll("\\.fq$", "").replaceAll("\\.csfastq$", "").replaceAll("\\.sam$", "").replaceAll("\\.bam$", "")+"_fastqc.html");
- }
-
- try {
- new HTMLReportArchive(file, results, reportFile);
- }
- catch (Exception e) {
- analysisExceptionReceived(file, e);
- return;
- }
- filesRemaining.decrementAndGet();
-
- }
-
- public void analysisUpdated(SequenceFile file, int sequencesProcessed, int percentComplete) {
-
- if (percentComplete % 5 == 0) {
- if (percentComplete == 105) {
- if (showUpdates) System.err.println("It seems our guess for the total number of records wasn't very good. Sorry about that.");
- }
- if (percentComplete > 100) {
- if (showUpdates) System.err.println("Still going at "+percentComplete+"% complete for "+file.name());
- }
- else {
- if (showUpdates) System.err.println("Approx "+percentComplete+"% complete for "+file.name());
- }
- }
- }
-
- public void analysisExceptionReceived(SequenceFile file, Exception e) {
- System.err.println("Failed to process file "+file.name());
- e.printStackTrace();
- filesRemaining.decrementAndGet();
- }
-
- public void analysisStarted(SequenceFile file) {
- if (showUpdates) System.err.println("Started analysis of "+file.name());
-
- }
-
-}
+/**
+ * Copyright Copyright 2010-17 Simon Andrews
+ *
+ * This file is part of FastQC.
+ *
+ * FastQC is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * FastQC is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with FastQC; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+package uk.ac.babraham.FastQC.Analysis;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Vector;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import uk.ac.babraham.FastQC.FastQCConfig;
+import uk.ac.babraham.FastQC.Modules.ModuleFactory;
+import uk.ac.babraham.FastQC.Modules.QCModule;
+import uk.ac.babraham.FastQC.Report.HTMLReportArchive;
+import uk.ac.babraham.FastQC.Sequence.SequenceFactory;
+import uk.ac.babraham.FastQC.Sequence.SequenceFile;
+import uk.ac.babraham.FastQC.Utilities.CasavaBasename;
+import uk.ac.babraham.FastQC.Utilities.NanoporeBasename;
+
+public class OfflineRunner implements AnalysisListener {
+
+ private AtomicInteger filesRemaining;
+ private boolean showUpdates = true;
+
+ public OfflineRunner (String [] filenames) {
+
+ // See if we need to show updates
+ showUpdates = !FastQCConfig.getInstance().quiet;
+
+ Vector<File> files = new Vector<File>();
+
+ // We make a special case if they supply a single filename
+ // which is stdin. In this case we'll take data piped to us
+ // rather than trying to read the actual file. We'll also
+ // skip the existence check.
+
+ if (filenames.length == 1 && filenames[0].startsWith("stdin")) {
+ files.add(new File(filenames[0]));
+ }
+ else {
+ for (int f=0;f<filenames.length;f++) {
+ File file = new File(filenames[f]);
+
+ if (!file.exists() || ! file.canRead()) {
+ System.err.println("Skipping '"+filenames[f]+"' which didn't exist, or couldn't be read");
+ continue;
+ }
+
+ if (FastQCConfig.getInstance().nano && file.isDirectory()) {
+ File [] fast5files = file.listFiles();
+ for (int i=0;i<fast5files.length;i++) {
+ if (fast5files[i].getName().endsWith(".fast5")) {
+ files.add(fast5files[i]);
+ }
+ }
+
+ // In newer nanopore software instances the fast5 files are
+ // put into subdirectories of the main one specified so we
+ // also need to look into those as well.
+ for (int i=0;i<fast5files.length;i++) {
+ if (fast5files[i].isDirectory()) {
+ File [] subFast5files = fast5files[i].listFiles();
+
+ for (int j=0;j<subFast5files.length;i++) {
+ if (subFast5files[j].getName().endsWith(".fast5")) {
+ files.add(subFast5files[j]);
+ }
+ }
+
+ }
+ }
+
+ }
+ else {
+ files.add(file);
+ }
+ }
+ }
+
+
+ File [][] fileGroups;
+
+ // See if we need to group together files from a casava group
+ if (FastQCConfig.getInstance().casava) {
+ fileGroups = CasavaBasename.getCasavaGroups(files.toArray(new File[0]));
+ }
+ else if (FastQCConfig.getInstance().nano) {
+ fileGroups = NanoporeBasename.getNanoporeGroups(files.toArray(new File[0]));
+ }
+ else {
+ fileGroups = new File [files.size()][1];
+ for (int f=0;f<files.size();f++) {
+ fileGroups[f][0] = files.elementAt(f);
+ }
+ }
+
+
+ filesRemaining = new AtomicInteger(fileGroups.length);
+
+ boolean somethingFailed = false;
+
+ for (int i=0;i<fileGroups.length;i++) {
+
+ try {
+ processFile(fileGroups[i]);
+ }
+ catch (OutOfMemoryError e) {
+ System.err.println("Ran out of memory for "+fileGroups[i][0]);
+ e.printStackTrace();
+ System.exit(2);
+ }
+ catch (Exception e) {
+ System.err.println("Failed to process "+fileGroups[i][0]);
+ e.printStackTrace();
+ filesRemaining.decrementAndGet();
+ somethingFailed = true;
+ }
+
+
+ }
+
+ // We need to hold this class open as otherwise the main method
+ // exits when it's finished.
+ while (filesRemaining.intValue() > 0) {
+ try {
+ Thread.sleep(1000);
+ }
+ catch (InterruptedException e) {}
+ }
+ if (somethingFailed) {
+ System.exit(1);
+ }
+ System.exit(0);
+
+ }
+
+ public void processFile (File [] files) throws Exception {
+ for (int f=0;f<files.length;f++) {
+ if (!files[f].getName().startsWith("stdin") && !files[f].exists()) {
+ throw new IOException(files[f].getName()+" doesn't exist");
+ }
+ }
+ SequenceFile sequenceFile = SequenceFactory.getSequenceFile(files);
+
+ AnalysisRunner runner = new AnalysisRunner(sequenceFile);
+ runner.addAnalysisListener(this);
+
+ QCModule [] module_list = ModuleFactory.getStandardModuleList();
+
+ runner.startAnalysis(module_list);
+
+ }
+
+ public void analysisComplete(SequenceFile file, QCModule[] results) {
+ File reportFile;
+
+ if (showUpdates) System.out.println("Analysis complete for "+file.name());
+
+
+ if (FastQCConfig.getInstance().output_dir != null) {
+ String fileName = file.getFile().getName().replaceAll("stdin:","").replaceAll("\\.gz$","").replaceAll("\\.bz2$","").replaceAll("\\.txt$","").replaceAll("\\.fastq$", "").replaceAll("\\.fq$", "").replaceAll("\\.csfastq$", "").replaceAll("\\.sam$", "").replaceAll("\\.bam$", "")+"_fastqc.html";
+ reportFile = new File(FastQCConfig.getInstance().output_dir+"/"+fileName);
+ }
+ else {
+ reportFile = new File(file.getFile().getAbsolutePath().replaceAll("stdin:","").replaceAll("\\.gz$","").replaceAll("\\.bz2$","").replaceAll("\\.txt$","").replaceAll("\\.fastq$", "").replaceAll("\\.fq$", "").replaceAll("\\.csfastq$", "").replaceAll("\\.sam$", "").replaceAll("\\.bam$", "")+"_fastqc.html");
+ }
+
+ try {
+ new HTMLReportArchive(file, results, reportFile);
+ }
+ catch (Exception e) {
+ analysisExceptionReceived(file, e);
+ return;
+ }
+ filesRemaining.decrementAndGet();
+
+ }
+
+ public void analysisUpdated(SequenceFile file, int sequencesProcessed, int percentComplete) {
+
+ if (percentComplete % 5 == 0) {
+ if (percentComplete == 105) {
+ if (showUpdates) System.err.println("It seems our guess for the total number of records wasn't very good. Sorry about that.");
+ }
+ if (percentComplete > 100) {
+ if (showUpdates) System.err.println("Still going at "+percentComplete+"% complete for "+file.name());
+ }
+ else {
+ if (showUpdates) System.err.println("Approx "+percentComplete+"% complete for "+file.name());
+ }
+ }
+ }
+
+ public void analysisExceptionReceived(SequenceFile file, Exception e) {
+ System.err.println("Failed to process file "+file.name());
+ e.printStackTrace();
+ filesRemaining.decrementAndGet();
+ }
+
+ public void analysisStarted(SequenceFile file) {
+ if (showUpdates) System.err.println("Started analysis of "+file.name());
+
+ }
+
+}
=====================================
uk/ac/babraham/FastQC/FastQCApplication.java
=====================================
@@ -54,7 +54,7 @@ import uk.ac.babraham.FastQC.Utilities.NanoporeBasename;
public class FastQCApplication extends JFrame {
- public static final String VERSION = "0.11.8";
+ public static final String VERSION = "0.11.9";
private JTabbedPane fileTabs;
private WelcomePanel welcomePanel;
@@ -318,8 +318,14 @@ public class FastQCApplication extends JFrame {
}
else {
+ // Recent java themes for linux are just horribly broken with missing
+ // bits of UI. We're therefore not going to set a native look if
+ // we're on linux. See seqmonk bug #95 for details.
+
try {
- UIManager.setLookAndFeel(UIManager.getSystemLookAndFeelClassName());
+ if (! System.getProperty("os.name").toLowerCase().contains("linux")) {
+ UIManager.setLookAndFeel(UIManager.getSystemLookAndFeelClassName());
+ }
} catch (Exception e) {}
=====================================
uk/ac/babraham/FastQC/Graphs/LineGraph.java
=====================================
@@ -139,7 +139,7 @@ public class LineGraph extends JPanel {
// Now draw the data points
- int baseWidth = (getWidth()-(xOffset+10))/data[0].length;
+ int baseWidth = (getWidth()-(xOffset+10))/Math.max(data[0].length,1); // Math.max is there in case we have no data (no sequences)
if (baseWidth<1) baseWidth=1;
// System.out.println("Base Width is "+baseWidth);
@@ -183,8 +183,9 @@ public class LineGraph extends JPanel {
for (int d=0;d<data.length;d++) {
g.setColor(COLOURS[d % COLOURS.length]);
-
- lastY = getY(data[d][0]);
+
+ if (data[d].length > 0)
+ lastY = getY(data[d][0]);
for (int i=1;i<data[d].length;i++) {
int thisY = getY(data[d][i]);
g.drawLine((baseWidth/2)+xOffset+(baseWidth*(i-1)), lastY, (baseWidth/2)+xOffset+(baseWidth*i), thisY);
=====================================
uk/ac/babraham/FastQC/Modules/AdapterContent.java
=====================================
@@ -146,10 +146,13 @@ public class AdapterContent extends AbstractQCModule {
// than we've seen before, but also that the last position we could find a hit
// is a positive position.
+ // If the sequence is longer than it was then we need to expand the storage in
+ // all of the adapter objects to account for this.
+
if (sequence.getSequence().length() > longestSequence && sequence.getSequence().length() - longestAdapter > 0) {
longestSequence = sequence.getSequence().length();
for (int a=0;a<adapters.length;a++) {
- adapters[a].expandLengthTo(longestSequence-longestAdapter);
+ adapters[a].expandLengthTo((longestSequence-longestAdapter)+1);
}
}
@@ -311,9 +314,10 @@ public class AdapterContent extends AbstractQCModule {
public void incrementCount (int position) {
- if (position >= positions.length) {
- expandLengthTo(position+1);
- }
+ // Don't ever check or expand the storage within this
+ // function as it ends up double counting previously
+ // incremented positions. Rely on the upstream code
+ // having done the expansion correctly already.
++positions[position];
=====================================
uk/ac/babraham/FastQC/Modules/BasicStats.java
=====================================
@@ -83,12 +83,16 @@ public class BasicStats extends AbstractQCModule {
public String name() {
return "Basic Statistics";
}
+
+ public void setFileName (String name) {
+ this.name = name;
+
+ this.name = this.name.replaceFirst("stdin:", "");
+ }
public void processSequence(Sequence sequence) {
- if (name == null) name = sequence.file().name();
-
- name = name.replaceFirst("stdin:", "");
+ if (name == null) setFileName(sequence.file().name());
// If this is a filtered sequence we simply count it and move on.
if (sequence.isFiltered()) {
=====================================
uk/ac/babraham/FastQC/Modules/DuplicationLevel.java
=====================================
@@ -151,6 +151,7 @@ public class DuplicationLevel extends AbstractQCModule {
percentDifferentSeqs = (dedupTotal/rawTotal)*100;
+ if (rawTotal == 0) percentDifferentSeqs = 100;
}
=====================================
uk/ac/babraham/FastQC/Modules/PerTileQualityScores.java
=====================================
@@ -1,367 +1,367 @@
-/**
- * Copyright Copyright 2010-17 Simon Andrews
- *
- * This file is part of FastQC.
- *
- * FastQC is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 3 of the License, or
- * (at your option) any later version.
- *
- * FastQC is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with FastQC; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-package uk.ac.babraham.FastQC.Modules;
-
-import java.io.IOException;
-import java.util.Arrays;
-import java.util.HashMap;
-import java.util.Iterator;
-
-import javax.swing.JPanel;
-import javax.xml.stream.XMLStreamException;
-
-import uk.ac.babraham.FastQC.Graphs.BaseGroup;
-import uk.ac.babraham.FastQC.Graphs.TileGraph;
-import uk.ac.babraham.FastQC.Report.HTMLReportArchive;
-import uk.ac.babraham.FastQC.Sequence.Sequence;
-import uk.ac.babraham.FastQC.Sequence.QualityEncoding.PhredEncoding;
-import uk.ac.babraham.FastQC.Utilities.QualityCount;
-
-public class PerTileQualityScores extends AbstractQCModule {
-
-
- public HashMap<Integer, QualityCount []> perTileQualityCounts = new HashMap<Integer, QualityCount[]>();
- private int currentLength = 0;
- private double [][] means = null;
- private String [] xLabels;
- private int [] tiles;
- private int high = 0;
- PhredEncoding encodingScheme;
- private boolean calculated = false;
-
- private long totalCount = 0;
-
- private int splitPosition = -1;
-
- private double maxDeviation = 0;
-
- private boolean ignoreInReport = false;
-
- public JPanel getResultsPanel() {
-
- if (!calculated) getPercentages();
-
- return new TileGraph(xLabels, tiles, means);
-
- }
-
- public boolean ignoreFilteredSequences() {
- return true;
- }
-
- public boolean ignoreInReport () {
- if (ignoreInReport || ModuleConfig.getParam("tile", "ignore") > 0 || currentLength == 0) {
- return true;
- }
- return false;
- }
-
- private synchronized void getPercentages () {
-
- char [] range = calculateOffsets();
- encodingScheme = PhredEncoding.getFastQEncodingOffset(range[0]);
- high = range[1] - encodingScheme.offset();
- if (high < 35) {
- high = 35;
- }
-
- BaseGroup [] groups = BaseGroup.makeBaseGroups(currentLength);
-
- Integer [] tileNumbers = perTileQualityCounts.keySet().toArray(new Integer[0]);
-
- Arrays.sort(tileNumbers);
-
- tiles = new int[tileNumbers.length];
- for (int i=0;i<tiles.length;i++) {
- tiles[i] = tileNumbers[i];
- }
-
- means = new double[tileNumbers.length][groups.length];
- xLabels = new String[groups.length];
-
- for (int t=0;t<tileNumbers.length;t++){
- for (int i=0;i<groups.length;i++) {
- if (t==0)
- xLabels[i] = groups[i].toString();
-
- int minBase = groups[i].lowerCount();
- int maxBase = groups[i].upperCount();
- means[t][i] = getMean(tileNumbers[t],minBase,maxBase,encodingScheme.offset());
- }
- }
-
- // Now we normalise across each column to see if there are any tiles with unusually
- // high or low quality.
-
- double maxDeviation = 0;
-
- double [] averageQualitiesPerGroup = new double[groups.length];
-
- for (int t=0;t<tileNumbers.length;t++) {
- for (int i=0;i<groups.length;i++) {
- averageQualitiesPerGroup[i] += means[t][i];
- }
- }
-
- for (int i=0;i<averageQualitiesPerGroup.length;i++) {
- averageQualitiesPerGroup[i] /= tileNumbers.length;
- }
-
- for (int i=0;i<groups.length;i++) {
- for (int t=0;t<tileNumbers.length;t++) {
- means[t][i] -= averageQualitiesPerGroup[i];
- if (Math.abs(means[t][i])> maxDeviation) {
- maxDeviation = Math.abs(means[t][i]);
- }
- }
- }
-
- this.maxDeviation = maxDeviation;
-
- calculated = true;
-
- }
-
- private char [] calculateOffsets () {
- // Works out from the set of chars what is the most
- // likely encoding scale for this file.
-
- char minChar = 0;
- char maxChar = 0;
-
- // Use the data from the first tile
- QualityCount [] qualityCounts = perTileQualityCounts.get(perTileQualityCounts.keySet().toArray()[0]);
-
- for (int q=0;q<qualityCounts.length;q++) {
- if (q == 0) {
- minChar = qualityCounts[q].getMinChar();
- maxChar = qualityCounts[q].getMaxChar();
- }
- else {
- if (qualityCounts[q].getMinChar() < minChar) {
- minChar = qualityCounts[q].getMinChar();
- }
- if (qualityCounts[q].getMaxChar() > maxChar) {
- maxChar = qualityCounts[q].getMaxChar();
- }
- }
- }
-
- return new char[] {minChar,maxChar};
- }
-
- public void processSequence(Sequence sequence) {
-
- // Check if we can skip counting because the module is being ignored anyway
- if (totalCount == 0) {
- if (ModuleConfig.getParam("tile", "ignore") > 0) {
- ignoreInReport = true;
- }
- }
-
-
- // Don't waste time calculating this if we're not going to use it anyway
- if (ignoreInReport) return;
-
- calculated = false;
-
- // Try to find the tile id. This can come in one of two forms:
- // @HWI-1KL136:211:D1LGAACXX:1:1101:18518:48851 3:N:0:ATGTCA
- // ^
- // @HWUSI-EAS493_0001:2:1:1000:16900#0/1
- // ^
-
- // These would appear at sections 2 or 3 of an array split on :
-
- // This module does quite a lot of work and ends up being the limiting
- // step when calculating. We'll therefore take only a sample of the
- // sequences to try to get a representative selection.
-
- ++totalCount;
- if (totalCount % 10 != 0) return;
-
- // First try to split the id by :
- int tile = 0;
-
- String [] splitID = sequence.getID().split(":");
-
-
- // If there are 7 or more fields then it's a 1.8+ file
- try {
-
-
- if (splitPosition >=0) {
- // We've found a split position before so let's try to use it again
-
-
- if (splitID.length <= splitPosition) {
- // There isn't enough data in this header to split the way we did before
- throw new NumberFormatException("Can't extract a number - not enough data");
- }
-
- tile = Integer.parseInt(splitID[splitPosition]);
- }
-
- else if (splitID.length>=7) {
- splitPosition = 4;
- tile = Integer.parseInt(splitID[4]);
- }
- else if (splitID.length >=5) {
- splitPosition = 2;
- // We can try the older format
- tile = Integer.parseInt(splitID[2]);
- }
- else {
- // We're not going to get a tile out of this
- ignoreInReport = true;
- return;
- }
-
-
- }
- catch (NumberFormatException nfe) {
- // This doesn't conform
- ignoreInReport = true;
- return;
- }
-
- char [] qual = sequence.getQualityString().toCharArray();
- if (currentLength < qual.length) {
-
- Iterator<Integer> tiles = perTileQualityCounts.keySet().iterator();
- while (tiles.hasNext()) {
- int thisTile = tiles.next();
-
- QualityCount [] qualityCounts = perTileQualityCounts.get(thisTile);
- QualityCount [] qualityCountsNew = new QualityCount[qual.length];
-
- for (int i=0;i<qualityCounts.length;i++) {
- qualityCountsNew[i] = qualityCounts[i];
- }
- for (int i=qualityCounts.length;i<qualityCountsNew.length;i++) {
- qualityCountsNew[i] = new QualityCount();
- }
- perTileQualityCounts.put(thisTile, qualityCountsNew);
- }
-
- currentLength = qual.length;
-
- }
-
- if (! perTileQualityCounts.containsKey(tile)) {
-
- if (perTileQualityCounts.size() > 1000) {
- // There are too many tiles, so we're probably parsing this wrong.
- // Let's give up
- System.err.println("Too many tiles (>1000) so giving up trying to do per-tile qualities since we're probably parsing the file wrongly");
- ignoreInReport = true;
- perTileQualityCounts.clear();
- return;
- }
-
- QualityCount [] qualityCounts = new QualityCount[currentLength];
- for (int i=0;i<currentLength;i++) {
- qualityCounts[i] = new QualityCount();
- }
-
- perTileQualityCounts.put(tile, qualityCounts);
- }
-
- QualityCount [] qualityCounts = perTileQualityCounts.get(tile);
-
- for (int i=0;i<qual.length;i++) {
- qualityCounts[i].addValue(qual[i]);
- }
-
- }
-
- public void reset () {
- totalCount = 0;
- perTileQualityCounts = new HashMap<Integer, QualityCount[]>();
- }
-
- public String description() {
- return "Shows the perl tile Quality scores of all bases at a given position in a sequencing run";
- }
-
- public String name() {
- return "Per tile sequence quality";
- }
-
- public boolean raisesError() {
- if (!calculated) getPercentages();
-
- if (maxDeviation > ModuleConfig.getParam("tile", "error")) return true;
- return false;
- }
-
- public boolean raisesWarning() {
- if (!calculated) getPercentages();
-
- if (maxDeviation > ModuleConfig.getParam("tile", "warn")) return true;
- return false;
- }
-
- public void makeReport(HTMLReportArchive report) throws IOException,XMLStreamException {
- if (!calculated) getPercentages();
-
- writeDefaultImage(report, "per_tile_quality.png", "Per base quality graph", Math.max(800, xLabels.length*15), 600);
-
- StringBuffer sb = report.dataDocument();
- sb.append("#Tile\tBase\tMean\n");
-
- for (int t=0;t<tiles.length;t++) {
- for (int i=0;i<means[t].length;i++) {
-
- sb.append(tiles[t]);
- sb.append("\t");
-
- sb.append(xLabels[i]);
- sb.append("\t");
-
- sb.append(means[t][i]);
-
- sb.append("\n");
- }
- }
- }
-
- private double getMean (int tile, int minbp, int maxbp, int offset) {
- int count = 0;
- double total = 0;
-
- QualityCount [] qualityCounts = perTileQualityCounts.get(tile);
-
- for (int i=minbp-1;i<maxbp;i++) {
- if (qualityCounts[i].getTotalCount() > 0) {
- count++;
- total += qualityCounts[i].getMean(offset);
- }
- }
-
- if (count > 0) {
- return total/count;
- }
- return 0;
-
- }
-
-
-}
+/**
+ * Copyright Copyright 2010-17 Simon Andrews
+ *
+ * This file is part of FastQC.
+ *
+ * FastQC is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * FastQC is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with FastQC; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+package uk.ac.babraham.FastQC.Modules;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Iterator;
+
+import javax.swing.JPanel;
+import javax.xml.stream.XMLStreamException;
+
+import uk.ac.babraham.FastQC.Graphs.BaseGroup;
+import uk.ac.babraham.FastQC.Graphs.TileGraph;
+import uk.ac.babraham.FastQC.Report.HTMLReportArchive;
+import uk.ac.babraham.FastQC.Sequence.Sequence;
+import uk.ac.babraham.FastQC.Sequence.QualityEncoding.PhredEncoding;
+import uk.ac.babraham.FastQC.Utilities.QualityCount;
+
+public class PerTileQualityScores extends AbstractQCModule {
+
+
+ public HashMap<Integer, QualityCount []> perTileQualityCounts = new HashMap<Integer, QualityCount[]>();
+ private int currentLength = 0;
+ private double [][] means = null;
+ private String [] xLabels;
+ private int [] tiles;
+ private int high = 0;
+ PhredEncoding encodingScheme;
+ private boolean calculated = false;
+
+ private long totalCount = 0;
+
+ private int splitPosition = -1;
+
+ private double maxDeviation = 0;
+
+ private boolean ignoreInReport = false;
+
+ public JPanel getResultsPanel() {
+
+ if (!calculated) getPercentages();
+
+ return new TileGraph(xLabels, tiles, means);
+
+ }
+
+ public boolean ignoreFilteredSequences() {
+ return true;
+ }
+
+ public boolean ignoreInReport () {
+ if (ignoreInReport || ModuleConfig.getParam("tile", "ignore") > 0 || currentLength == 0) {
+ return true;
+ }
+ return false;
+ }
+
+ private synchronized void getPercentages () {
+
+ char [] range = calculateOffsets();
+ encodingScheme = PhredEncoding.getFastQEncodingOffset(range[0]);
+ high = range[1] - encodingScheme.offset();
+ if (high < 35) {
+ high = 35;
+ }
+
+ BaseGroup [] groups = BaseGroup.makeBaseGroups(currentLength);
+
+ Integer [] tileNumbers = perTileQualityCounts.keySet().toArray(new Integer[0]);
+
+ Arrays.sort(tileNumbers);
+
+ tiles = new int[tileNumbers.length];
+ for (int i=0;i<tiles.length;i++) {
+ tiles[i] = tileNumbers[i];
+ }
+
+ means = new double[tileNumbers.length][groups.length];
+ xLabels = new String[groups.length];
+
+ for (int t=0;t<tileNumbers.length;t++){
+ for (int i=0;i<groups.length;i++) {
+ if (t==0)
+ xLabels[i] = groups[i].toString();
+
+ int minBase = groups[i].lowerCount();
+ int maxBase = groups[i].upperCount();
+ means[t][i] = getMean(tileNumbers[t],minBase,maxBase,encodingScheme.offset());
+ }
+ }
+
+ // Now we normalise across each column to see if there are any tiles with unusually
+ // high or low quality.
+
+ double maxDeviation = 0;
+
+ double [] averageQualitiesPerGroup = new double[groups.length];
+
+ for (int t=0;t<tileNumbers.length;t++) {
+ for (int i=0;i<groups.length;i++) {
+ averageQualitiesPerGroup[i] += means[t][i];
+ }
+ }
+
+ for (int i=0;i<averageQualitiesPerGroup.length;i++) {
+ averageQualitiesPerGroup[i] /= tileNumbers.length;
+ }
+
+ for (int i=0;i<groups.length;i++) {
+ for (int t=0;t<tileNumbers.length;t++) {
+ means[t][i] -= averageQualitiesPerGroup[i];
+ if (Math.abs(means[t][i])> maxDeviation) {
+ maxDeviation = Math.abs(means[t][i]);
+ }
+ }
+ }
+
+ this.maxDeviation = maxDeviation;
+
+ calculated = true;
+
+ }
+
+ private char [] calculateOffsets () {
+ // Works out from the set of chars what is the most
+ // likely encoding scale for this file.
+
+ char minChar = 0;
+ char maxChar = 0;
+
+ // Use the data from the first tile
+ QualityCount [] qualityCounts = perTileQualityCounts.get(perTileQualityCounts.keySet().toArray()[0]);
+
+ for (int q=0;q<qualityCounts.length;q++) {
+ if (q == 0) {
+ minChar = qualityCounts[q].getMinChar();
+ maxChar = qualityCounts[q].getMaxChar();
+ }
+ else {
+ if (qualityCounts[q].getMinChar() < minChar) {
+ minChar = qualityCounts[q].getMinChar();
+ }
+ if (qualityCounts[q].getMaxChar() > maxChar) {
+ maxChar = qualityCounts[q].getMaxChar();
+ }
+ }
+ }
+
+ return new char[] {minChar,maxChar};
+ }
+
+ public void processSequence(Sequence sequence) {
+
+ // Check if we can skip counting because the module is being ignored anyway
+ if (totalCount == 0) {
+ if (ModuleConfig.getParam("tile", "ignore") > 0) {
+ ignoreInReport = true;
+ }
+ }
+
+
+ // Don't waste time calculating this if we're not going to use it anyway
+ if (ignoreInReport) return;
+
+ calculated = false;
+
+ // Try to find the tile id. This can come in one of two forms:
+ // @HWI-1KL136:211:D1LGAACXX:1:1101:18518:48851 3:N:0:ATGTCA
+ // ^
+ // @HWUSI-EAS493_0001:2:1:1000:16900#0/1
+ // ^
+
+ // These would appear at sections 2 or 3 of an array split on :
+
+ // This module does quite a lot of work and ends up being the limiting
+ // step when calculating. We'll therefore take only a sample of the
+ // sequences to try to get a representative selection.
+
+ ++totalCount;
+ if (totalCount % 10 != 0) return;
+
+ // First try to split the id by :
+ int tile = 0;
+
+ String [] splitID = sequence.getID().split(":");
+
+
+ // If there are 7 or more fields then it's a 1.8+ file
+ try {
+
+
+ if (splitPosition >=0) {
+ // We've found a split position before so let's try to use it again
+
+
+ if (splitID.length <= splitPosition) {
+ // There isn't enough data in this header to split the way we did before
+ throw new NumberFormatException("Can't extract a number - not enough data");
+ }
+
+ tile = Integer.parseInt(splitID[splitPosition]);
+ }
+
+ else if (splitID.length>=7) {
+ splitPosition = 4;
+ tile = Integer.parseInt(splitID[4]);
+ }
+ else if (splitID.length >=5) {
+ splitPosition = 2;
+ // We can try the older format
+ tile = Integer.parseInt(splitID[2]);
+ }
+ else {
+ // We're not going to get a tile out of this
+ ignoreInReport = true;
+ return;
+ }
+
+
+ }
+ catch (NumberFormatException nfe) {
+ // This doesn't conform
+ ignoreInReport = true;
+ return;
+ }
+
+ char [] qual = sequence.getQualityString().toCharArray();
+ if (currentLength < qual.length) {
+
+ Iterator<Integer> tiles = perTileQualityCounts.keySet().iterator();
+ while (tiles.hasNext()) {
+ int thisTile = tiles.next();
+
+ QualityCount [] qualityCounts = perTileQualityCounts.get(thisTile);
+ QualityCount [] qualityCountsNew = new QualityCount[qual.length];
+
+ for (int i=0;i<qualityCounts.length;i++) {
+ qualityCountsNew[i] = qualityCounts[i];
+ }
+ for (int i=qualityCounts.length;i<qualityCountsNew.length;i++) {
+ qualityCountsNew[i] = new QualityCount();
+ }
+ perTileQualityCounts.put(thisTile, qualityCountsNew);
+ }
+
+ currentLength = qual.length;
+
+ }
+
+ if (! perTileQualityCounts.containsKey(tile)) {
+
+ if (perTileQualityCounts.size() > 1000) {
+ // There are too many tiles, so we're probably parsing this wrong.
+ // Let's give up
+ System.err.println("Too many tiles (>1000) so giving up trying to do per-tile qualities since we're probably parsing the file wrongly");
+ ignoreInReport = true;
+ perTileQualityCounts.clear();
+ return;
+ }
+
+ QualityCount [] qualityCounts = new QualityCount[currentLength];
+ for (int i=0;i<currentLength;i++) {
+ qualityCounts[i] = new QualityCount();
+ }
+
+ perTileQualityCounts.put(tile, qualityCounts);
+ }
+
+ QualityCount [] qualityCounts = perTileQualityCounts.get(tile);
+
+ for (int i=0;i<qual.length;i++) {
+ qualityCounts[i].addValue(qual[i]);
+ }
+
+ }
+
+ public void reset () {
+ totalCount = 0;
+ perTileQualityCounts = new HashMap<Integer, QualityCount[]>();
+ }
+
+ public String description() {
+ return "Shows the perl tile Quality scores of all bases at a given position in a sequencing run";
+ }
+
+ public String name() {
+ return "Per tile sequence quality";
+ }
+
+ public boolean raisesError() {
+ if (!calculated) getPercentages();
+
+ if (maxDeviation > ModuleConfig.getParam("tile", "error")) return true;
+ return false;
+ }
+
+ public boolean raisesWarning() {
+ if (!calculated) getPercentages();
+
+ if (maxDeviation > ModuleConfig.getParam("tile", "warn")) return true;
+ return false;
+ }
+
+ public void makeReport(HTMLReportArchive report) throws IOException,XMLStreamException {
+ if (!calculated) getPercentages();
+
+ writeDefaultImage(report, "per_tile_quality.png", "Per tile quality graph", Math.max(800, xLabels.length*15), 600);
+
+ StringBuffer sb = report.dataDocument();
+ sb.append("#Tile\tBase\tMean\n");
+
+ for (int t=0;t<tiles.length;t++) {
+ for (int i=0;i<means[t].length;i++) {
+
+ sb.append(tiles[t]);
+ sb.append("\t");
+
+ sb.append(xLabels[i]);
+ sb.append("\t");
+
+ sb.append(means[t][i]);
+
+ sb.append("\n");
+ }
+ }
+ }
+
+ private double getMean (int tile, int minbp, int maxbp, int offset) {
+ int count = 0;
+ double total = 0;
+
+ QualityCount [] qualityCounts = perTileQualityCounts.get(tile);
+
+ for (int i=minbp-1;i<maxbp;i++) {
+ if (qualityCounts[i].getTotalCount() > 0) {
+ count++;
+ total += qualityCounts[i].getMean(offset);
+ }
+ }
+
+ if (count > 0) {
+ return total/count;
+ }
+ return 0;
+
+ }
+
+
+}
=====================================
uk/ac/babraham/FastQC/Modules/SequenceLengthDistribution.java
=====================================
@@ -68,7 +68,12 @@ public class SequenceLengthDistribution extends AbstractQCModule {
}
maxLen = i;
}
- }
+ }
+
+ // We can get a -1 value for min if there aren't any valid sequences
+ // at all.
+
+ if (minLen < 0) minLen = 0;
// We put one extra category either side of the actual size
if (minLen>0) minLen--;
@@ -194,9 +199,12 @@ public class SequenceLengthDistribution extends AbstractQCModule {
return false;
}
-
- if (lengthCounts[0] > 0) {
- return true;
+ // We might not have any sequences so only check if we do
+ if (lengthCounts.length > 0) {
+ // Empty sequences get us an error
+ if (lengthCounts[0] > 0) {
+ return true;
+ }
}
return false;
}
=====================================
uk/ac/babraham/FastQC/Results/ResultsPanel.java
=====================================
@@ -124,7 +124,7 @@ public class ResultsPanel extends JPanel implements ListSelectionListener, Analy
panels = new JPanel[modules.length];
for (int m=0;m<modules.length;m++) {
- System.err.println("Getting panel for "+modules[m].name()+" with "+modules[m].description());
+// System.err.println("Getting panel for "+modules[m].name()+" with "+modules[m].description());
panels[m] = modules[m].getResultsPanel();
}
=====================================
uk/ac/babraham/FastQC/Sequence/Fast5File.java
=====================================
@@ -1,107 +1,142 @@
-/**
- * Copyright Copyright 2010-17 Simon Andrews
- *
- * This file is part of FastQC.
- *
- * FastQC is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 3 of the License, or
- * (at your option) any later version.
- *
- * FastQC is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with FastQC; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-package uk.ac.babraham.FastQC.Sequence;
-
-import java.io.File;
-import java.io.IOException;
-
-import ch.systemsx.cisd.hdf5.HDF5Factory;
-import ch.systemsx.cisd.hdf5.IHDF5SimpleReader;
-
-public class Fast5File implements SequenceFile {
-
- private Sequence nextSequence = null;
- private File file;
-
- private String name;
-
- protected Fast5File(File file) throws SequenceFormatException, IOException {
- this.file = file;
- name = file.getName();
-
- IHDF5SimpleReader reader = HDF5Factory.openForReading(file);
-
- String [] rdfPaths = new String [] {
- "Analyses/Basecall_2D_000/BaseCalled_template/Fastq",
- "Analyses/Basecall_2D_000/BaseCalled_2D/Fastq",
- "Analyses/Basecall_1D_000/BaseCalled_template/Fastq",
- "Analyses/Basecall_1D_000/BaseCalled_1D/Fastq"
- };
-
- boolean foundPath = false;
- for (int r=0;r<rdfPaths.length;r++) {
-
- if (reader.exists(rdfPaths[r])) {
-
- foundPath = true;
- String fastq = reader.readString(rdfPaths[r]);
-
- String [] sections = fastq.split("\\n");
-
- if (sections.length != 4) {
- throw new SequenceFormatException("Didn't get 4 sections from "+fastq);
- }
-
- nextSequence = new Sequence(this, sections[1].toUpperCase(),sections[3], sections[0]);
- break;
- }
- }
-
- reader.close();
-
- if (!foundPath) {
- throw new SequenceFormatException("No valid fastq paths found in "+file);
- }
-
- }
-
- public String name() {
- return name;
- }
-
- public int getPercentComplete() {
- if (! hasNext()) return 100;
-
- return 0;
- }
-
- public boolean isColorspace() {
- return false;
- }
-
- public boolean hasNext() {
- return nextSequence != null;
- }
-
- public Sequence next() throws SequenceFormatException {
- Sequence seq = nextSequence;
- nextSequence = null;
- return seq;
- }
-
- public void remove() {
- // No action here
- }
-
- public File getFile() {
- return file;
- }
-
-}
+/**
+ * Copyright Copyright 2010-17 Simon Andrews
+ *
+ * This file is part of FastQC.
+ *
+ * FastQC is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * FastQC is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with FastQC; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+package uk.ac.babraham.FastQC.Sequence;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import ch.systemsx.cisd.hdf5.HDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5SimpleReader;
+
+public class Fast5File implements SequenceFile {
+
+ private Sequence nextSequence = null;
+ private File file;
+ private String name;
+ private IHDF5SimpleReader reader;
+ private String [] readPaths = new String[] {""};
+
+ private int readPathsIndexPosition = 0;
+
+ private String [] rdfPaths = new String [] {
+ "Analyses/Basecall_2D_000/BaseCalled_template/Fastq",
+ "Analyses/Basecall_2D_000/BaseCalled_2D/Fastq",
+ "Analyses/Basecall_1D_000/BaseCalled_template/Fastq",
+ "Analyses/Basecall_1D_000/BaseCalled_1D/Fastq"
+ };
+
+
+
+ protected Fast5File(File file) throws SequenceFormatException, IOException {
+ this.file = file;
+ name = file.getName();
+
+ reader = HDF5Factory.openForReading(file);
+
+ // These files have changed structure over time. Originally they contained
+ // a single read per file where the base of the heirarchy was the read
+ // itself.
+ //
+ // Later the files moved to having multiple reads per file. Now there is
+ // an additional top level folder per read with the sub-structure being the
+ // same as it used to be for the individual reads.
+ //
+ // We need to account for both of these structures.
+
+
+ // See if we can see a bunch of paths starting with "read_" at the top of the
+ // heirarchy. If we can then we substitute the read paths for these.
+
+ List<String> topLevelFolders = reader.getGroupMembers("/");
+
+ List<String> readFolders = new ArrayList<String>();
+
+ for (String folder : topLevelFolders) {
+ System.err.println("Looking at "+folder);
+
+ if (folder.startsWith("read_")) {
+ readFolders.add(folder+"/");
+ }
+ }
+
+ if (readFolders.size() > 0) {
+ // We have read folders so we'll replace the default readPaths with
+ // the list we made
+
+ readPaths = readFolders.toArray(new String[0]);
+ }
+
+ }
+
+ public String name() {
+ return name;
+ }
+
+ public int getPercentComplete() {
+ return (readPathsIndexPosition*100) / readPaths.length;
+ }
+
+ public boolean isColorspace() {
+ return false;
+ }
+
+ public boolean hasNext() {
+ return readPathsIndexPosition < readPaths.length;
+ }
+
+ public Sequence next() throws SequenceFormatException {
+
+ for (int r=0;r<rdfPaths.length;r++) {
+
+ if (reader.exists(readPaths[readPathsIndexPosition]+rdfPaths[r])) {
+
+ String fastq = reader.readString(readPaths[readPathsIndexPosition]+rdfPaths[r]);
+
+ String [] sections = fastq.split("\\n");
+
+ if (sections.length != 4) {
+ throw new SequenceFormatException("Didn't get 4 sections from "+fastq);
+ }
+
+ Sequence seq = new Sequence(this, sections[1].toUpperCase(),sections[3], sections[0]);
+ ++readPathsIndexPosition;
+
+ if(readPathsIndexPosition >= readPaths.length) {
+ reader.close();
+ }
+
+ return(seq);
+ }
+ }
+
+ throw new SequenceFormatException("No valid fastq paths found in "+file);
+ }
+
+ public void remove() {
+ // No action here
+ }
+
+ public File getFile() {
+ return file;
+ }
+
+}
=====================================
uk/ac/babraham/FastQC/Utilities/NanoporeBasename.java
=====================================
@@ -1,103 +1,106 @@
-/**
- * Copyright Copyright 2011-17 Simon Andrews
- *
- * This file is part of FastQC.
- *
- * FastQC is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 3 of the License, or
- * (at your option) any later version.
- *
- * FastQC is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with FastQC; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-package uk.ac.babraham.FastQC.Utilities;
-
-import java.io.File;
-import java.util.Hashtable;
-import java.util.Vector;
-
-public class NanoporeBasename {
-
- /**
- * This method finds the core name from an ONT fast5 file. It strips off the
- * part which indicates that this file is one of a set and returns the base name with
- * this part removed.
- *
- * If the filename does not conform to standard CASAVA naming then a NameFormatException
- * is thrown.
- *
- * @param originalName
- * @return
- * @throws NameFormatException
- */
-
- public static String getNanoporeBasename (String originalName) throws NameFormatException {
-
- // Files from nanopores look like: Computer_Samplename_number_chXXX_fileXXX_strand.fast5
- // We need to reduce this to Computer_Samplename_number
-
- String [] subNames = originalName.split("_");
-
- if (subNames.length < 5) {
- throw new NameFormatException();
- }
-
- String basename = subNames[0]+"_"+subNames[1]+"_"+subNames[2];
-
- System.err.println("Basename is "+basename);
-
- return basename;
-
- }
-
- public static File [][] getNanoporeGroups (File [] files) {
- Hashtable<String, Vector<File>> fileBases = new Hashtable<String, Vector<File>>();
-
- for (int f=0;f<files.length;f++) {
-
-
- if (files[f].getName().contains("muxscan")) continue; // Control files not containing real data.
-
- // If a file forms part of a nanopore group then put it into that
- // group.
- try {
- String baseName = NanoporeBasename.getNanoporeBasename(files[f].getName());
- if (! fileBases.containsKey(baseName)) {
- fileBases.put(baseName,new Vector<File>());
- }
- fileBases.get(baseName).add(files[f]);
-
- }
-
- // If the file name doesn't appear to be part of a nanopore group
- // then add it as a singleton
- catch (NameFormatException nfe) {
-
- System.err.println("File '"+files[f].getName()+"' didn't look like part of a CASAVA group");
- Vector<File> newVector = new Vector<File>();
- newVector.add(files[f]);
- fileBases.put(files[f].getName(), newVector);
- }
-
- }
-
- String [] baseNames = fileBases.keySet().toArray(new String [0]);
-
- File [][] fileGroups = new File[baseNames.length][];
-
- for (int i=0;i<baseNames.length;i++) {
- fileGroups[i] = fileBases.get(baseNames[i]).toArray(new File[0]);
- }
-
- return fileGroups;
- }
-
-}
+/**
+ * Copyright Copyright 2011-17 Simon Andrews
+ *
+ * This file is part of FastQC.
+ *
+ * FastQC is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * FastQC is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with FastQC; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+package uk.ac.babraham.FastQC.Utilities;
+
+import java.io.File;
+import java.util.Hashtable;
+import java.util.Vector;
+
+public class NanoporeBasename {
+
+ /**
+ * This method finds the core name from an ONT fast5 file. It strips off the
+ * part which indicates that this file is one of a set and returns the base name with
+ * this part removed.
+ *
+ * If the filename does not conform to standard CASAVA naming then a NameFormatException
+ * is thrown.
+ *
+ * @param originalName
+ * @return
+ * @throws NameFormatException
+ */
+
+ public static String getNanoporeBasename (String originalName) throws NameFormatException {
+
+ // Files from nanopores look like: Computer_Samplename_number_chXXX_fileXXX_strand.fast5
+ // We need to reduce this to Computer_Samplename_number
+
+ // Some more recent files have names which are just: Computer_Samplename_number.fast5 so
+ // we need to account for those too.
+
+ String [] subNames = originalName.replaceAll(".fast5$", "").split("_");
+
+ if (subNames.length < 3) {
+ throw new NameFormatException();
+ }
+
+ String basename = subNames[0]+"_"+subNames[1]+"_"+subNames[2];
+
+ System.err.println("Basename is "+basename);
+
+ return basename;
+
+ }
+
+ public static File [][] getNanoporeGroups (File [] files) {
+ Hashtable<String, Vector<File>> fileBases = new Hashtable<String, Vector<File>>();
+
+ for (int f=0;f<files.length;f++) {
+
+
+ if (files[f].getName().contains("muxscan")) continue; // Control files not containing real data.
+
+ // If a file forms part of a nanopore group then put it into that
+ // group.
+ try {
+ String baseName = NanoporeBasename.getNanoporeBasename(files[f].getName());
+ if (! fileBases.containsKey(baseName)) {
+ fileBases.put(baseName,new Vector<File>());
+ }
+ fileBases.get(baseName).add(files[f]);
+
+ }
+
+ // If the file name doesn't appear to be part of a nanopore group
+ // then add it as a singleton
+ catch (NameFormatException nfe) {
+
+ System.err.println("File '"+files[f].getName()+"' didn't look like part of a CASAVA group");
+ Vector<File> newVector = new Vector<File>();
+ newVector.add(files[f]);
+ fileBases.put(files[f].getName(), newVector);
+ }
+
+ }
+
+ String [] baseNames = fileBases.keySet().toArray(new String [0]);
+
+ File [][] fileGroups = new File[baseNames.length][];
+
+ for (int i=0;i<baseNames.length;i++) {
+ fileGroups[i] = fileBases.get(baseNames[i]).toArray(new File[0]);
+ }
+
+ return fileGroups;
+ }
+
+}
View it on GitLab: https://salsa.debian.org/med-team/fastqc/commit/d1c9f051d1e3face771b3d47b0900d1b928c4381
--
View it on GitLab: https://salsa.debian.org/med-team/fastqc/commit/d1c9f051d1e3face771b3d47b0900d1b928c4381
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200112/329403a3/attachment-0001.html>
More information about the debian-med-commit
mailing list