[med-svn] [Git][med-team/chromhmm][master] 8 commits: routine-update: New upstream version
Dylan Aïssi
gitlab at salsa.debian.org
Wed Jul 15 09:19:13 BST 2020
Dylan Aïssi pushed to branch master at Debian Med / chromhmm
Commits:
c894f401 by Dylan Aïssi at 2020-07-15T10:17:38+02:00
routine-update: New upstream version
- - - - -
bd1ced35 by Dylan Aïssi at 2020-07-15T10:17:40+02:00
New upstream version 1.21+dfsg
- - - - -
29538eb6 by Dylan Aïssi at 2020-07-15T10:18:01+02:00
Update upstream source from tag 'upstream/1.21+dfsg'
Update to upstream version '1.21+dfsg'
with Debian dir c6717a6e18c5b85c486f6a392e1783b066103c1f
- - - - -
ce49ed9b by Dylan Aïssi at 2020-07-15T10:18:01+02:00
routine-update: Standards-Version: 4.5.0
- - - - -
c6f9d069 by Dylan Aïssi at 2020-07-15T10:18:01+02:00
routine-update: debhelper-compat 13
- - - - -
3fee6057 by Dylan Aïssi at 2020-07-15T10:18:04+02:00
routine-update: Add salsa-ci file
- - - - -
ccff31a1 by Dylan Aïssi at 2020-07-15T10:18:04+02:00
routine-update: Rules-Requires-Root: no
- - - - -
d7d2a150 by Dylan Aïssi at 2020-07-15T10:18:49+02:00
update upstream changelog
- - - - -
9 changed files:
- README.md
- debian/changelog
- debian/control
- + debian/salsa-ci.yml
- debian/versionlog.txt
- edu/mit/compbio/ChromHMM/BrowserOutput.java
- edu/mit/compbio/ChromHMM/ChromHMM.java
- edu/mit/compbio/ChromHMM/Preprocessing.java
- edu/mit/compbio/ChromHMM/StateAnalysis.java
Changes:
=====================================
README.md
=====================================
@@ -1,2 +1,2 @@
-See http://compbio.mit.edu/ChromHMM/ or http://www.biolchem.ucla.edu/labs/ernst/ChromHMM/ for more information on ChromHMM.
+See http://compbio.mit.edu/ChromHMM/ or https://ernstlab.biolchem.ucla.edu/ChromHMM/ for more information on ChromHMM.
========
=====================================
debian/changelog
=====================================
@@ -1,3 +1,13 @@
+chromhmm (1.21+dfsg-1) UNRELEASED; urgency=medium
+
+ * New upstream version
+ * Standards-Version: 4.5.0 (routine-update)
+ * debhelper-compat 13 (routine-update)
+ * Add salsa-ci file (routine-update)
+ * Rules-Requires-Root: no (routine-update)
+
+ -- Dylan Aïssi <daissi at debian.org> Wed, 15 Jul 2020 10:17:38 +0200
+
chromhmm (1.20+dfsg-1) unstable; urgency=medium
* New upstream release.
=====================================
debian/control
=====================================
@@ -3,16 +3,17 @@ Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.
Uploaders: Dylan Aïssi <daissi at debian.org>
Section: science
Priority: optional
-Build-Depends: debhelper-compat (= 12),
+Build-Depends: debhelper-compat (= 13),
javahelper,
libhtsjdk-java,
libbatik-java,
libjheatchart-java
Build-Depends-Indep: default-jdk
-Standards-Version: 4.4.1
+Standards-Version: 4.5.0
Vcs-Browser: https://salsa.debian.org/med-team/chromhmm
Vcs-Git: https://salsa.debian.org/med-team/chromhmm.git
Homepage: http://compbio.mit.edu/ChromHMM/
+Rules-Requires-Root: no
Package: chromhmm
Architecture: all
=====================================
debian/salsa-ci.yml
=====================================
@@ -0,0 +1,4 @@
+---
+include:
+ - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/salsa-ci.yml
+ - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/pipeline-jobs.yml
=====================================
debian/versionlog.txt
=====================================
@@ -1,151 +1,168 @@
-12/9/2019 ChromHMM 1.20
-*Added a '-noautoopen' option that prevents ChromHMM from trying to automatically open a web browser with the summary page of results.
-*Fixed a bug that prevented the '-printbystatebyline' flag from being recogonized in MakeSegmentation unless the '-printposterior' flag was also present
-*Leading and trailing white space are now trimmed from entries when reading a cellmarkfiletable
-*Small code optimization when loading data
-*Added a more informative error message if the number of columns in the headers differ across binarized data
-*Added an API call to get the max state at a specified position for data stored in a 2D array
-
-6/25/2019 ChromHMM 1.19
-* Added a '-labels' option to OverlapEnrichment and NeighborhoodEnrichment that allows
-them to be applied to bed files where the fourth column are state labels that don't correspond to state
-numbers or IDs. If the fourth column has a state ID or state number before a '_' and then followed by a label,
-the states will state be ordered by the state ID or number, otherwise the state ordering in the output may differ from the original state ordering.
-* Clarified in the documentation that MakeSegmentation expects the columns of the binarized data
-to be in the same order as the columns in the model file which is by default the order of the columns in the binarized data
-used to learn the model. Added error checking enforcing the column names in the model file agree with the binarized data.
-Also added the '-reordercolsmodelfile' option to Reorder which causes the columns in the model file to be reordered
-
-12/26/2018 ChromHMM 1.18
-*Added the option '-splitrows' to BinarizeBam, BinarizeBed, and BinarizeSignal which enables splitting
-the binarized data across multiple files per chromosome. Splitting files can be desired to improve scalability
-in specific large scale applications. If this option is present the maximum number of rows per file is by default 5000,
-but this can be changed with the added '-j numsplitbins' option.
-*Added the option '-splitrows' to LearnModel and MakeSegmentation. If the binarized data was generated with the '-splitrows' option
-then this flag needs to be present so the segmentation file produces properly named chromosomes with correct coordinates
-*Added the option '-i splitindex' to BinarizeBam and BinarizeBed for conducting row spliting in a more parallelized manner when
-binarizing based on peak data with the '-peaks' option. See the manual for additional information
-*Added the command MergeBinary which allows merging binary files for different mark subsets split across different
-subdirectories. The command also supports row splitting binary files even if no merging was done with the '-splitrows' option.
-*Added the option '-r bedfilein bedfileout' in Reorder which enables directly relabeling the states in a segmentation or browser
-file after a specifying a reordering, without the need to run MakeSegmentation
-*Added the option '-holdroworder' in the LearnModel command which does not reorder the states of the model
-*Added the option '-scalebeta' to use an alternative numerical procedure to estimate the backward variables, beta, to avoid
-overflow observed in specialized settings
-*Added a more informative error message in ConvertGeneTable if the chromosome length is not found
-*Modified ChromHMM so that in cases where there is no data for a chromosome in one cell type, but there is in another
-to not produce a segmentation for the chromosome in the cell type with no data. Previously ChromHMM was not consistent
-in whether it would still produce output for a chromosome with no data in one cell type if there was data for the
-chromosome in another cell type. Previously ChromHMM had inconsistent behavior if in one input cell type there was data
-
-8/2/2018 ChromHMM 1.17
-*This version fixes a bug introduced in ChromHMM 1.16 that causes ChromHMM to produce anotations only for one chromosome
-
-7/29/2018 ChromHMM 1.16
-*Added the command ConvertGeneTable which converts a gene table from the UCSC genome browser table format into gene annotations found in the COORDS and ANCHORFILES directory
-*Added the '-gzip' flag to BinarizeBam, BinarizeBed, BinarizeSignal, ConvertGeneTable, MakeSegmentation, MakeBrowserFiles, and LearnModel
-which enables outputting segmentation files from the command in a zipped format
-*Added the options '-u coorddir' and '-v anchorfiledir' to LearnModel and ConvertGeneTable which allow specifying the directory of COORDS and ANCHORFILES which defaults to the
-directory where the ChromHMM.jar file is.
-*Removed duplicate entries in the exon annotation files. This does not effect the enrichments with the default settings.
-
-
-4/25/2018 ChromHMM 1.15
-*Added danRer11 to the included assemblies
-*Added the '-many' flag to LearnModel, MakeSegmentation, and EvalSubset which is more numerically stable when having many input features, i.e. hundreds of features, at the cost
-of additional runtime
-*Added the '-pseudo' flag to LearnModel. If this flag is present, pseudo counts of 1 are used in computing the model parameters to smooth away from zero values.
-These pseudo counts can provide numerical stability in the situation when the -n numseq is specified in training and some feature has very few present occurrences.
-*Added the '-lowmem' flag option also to EvalSubset which uses less memory by only loading one chromosome in at a time though with potentially additional runtime.
-*Added the '-paired' flag to BinarizeBam. If this option is present then reads in the BAM file are treated as pairs, and each pair is counted once with bin assignment is based on shifting half the insert size. If this option is present then the –n shift, –center, and –peaks options cannot be used.
-*Added a more informative error message for the situation in which the chromosome naming in the chromosome length file is
-inconsistent with the Bam/Bed files when binarizing data
-*Added a more informative error message for the situation in which the chromosome names in the segmentation files are not
-consistent with an external annotation when computing enrichments
-*Added additional range changing when computing enrichments for situations in which coordinates of the external annotations are off the chromosome.
-Such coordinate positions that are off the chromosome are ignored instead of an exception being thrown.
-*Fixed a bug that caused exceptions to be thrown if in LearnModel the '-n numseq' option was used without the '-lowmem' flag
-*Minor internal changes to the code including some that could give slight performance improvements.
-*Added details in the user manual on the computation of the fold enrichment calculation done in OverlapEnrichment and NeighborhoodEnrichment
-
-11/2/2017 ChromHMM 1.14
-*Added '-noimage' option to LearnModel, OverlapEnrichment, NeighborhoodEnrichment, CompareModels, Reorder, EvalSubset to surpress printing of image files.
-*Added annotations and chromosome length file for the ce11 assembly
-*Added a check if a beta value exceeds Double.MAX_VALUE then it is set to Double.MAX_VALUE to improve numerical stability
-*Updated the printing of posterior values to always be printed in Locale.ENGLISH to ensure they can still be read back in by ChromHMM if the default Locale is non-compatible
-*Fixed a bug in which the default initalization procedure would throw an exception if a chromosome was only one bin long
-*Fixed a bug in which the bin(s) with the maximum control value genomewide was not being binarized correctly.
-*Updated handling of situation in which control files were provided for some, but not all marks in a cell type. Previously if only one unique file was provided for a cell type it was used for all other marks in the cell type. Now a uniform control is assume. Previously if there was two or more unique control files for a cell type, then a mark without a control was not being binarized correctly because of the above fixed bug.
-*Slight change with update to how initial values of the initial parameters are set when not using all chromosomes for initialization
-
-11/2/2017 ChromHMM 1.13 (GitHub only release)
-*Added '-lowmem' flag to LearnModel, OverlapEnrichment, NeighborhoodEnrichment, and MakeSegmentation to have ChromHMM only load one chromosome file into memory at time thus reducing maximum memory usage at a potential of additional runtime
-*Added '-n numseq' flag to LearnModel. If this flag is present and the ‘-p’ flag is present then on each iteration of training only numseq chromosome files are randomly selected to be used for training. In such cases the ‘-d’ flag should be set to a negative number so model learning does not terminate prematurely since negative changes in the log-likelihood are expected since different chromosomes are used on each iteration. Also only numseq files are considered in the initial model initialization under the default ‘information’ mode. If the ‘-n’ flag is specified without the ‘-p’ flag a subset of chromosomes will still be used for initialization, but all chromosomes will still be used on all iterations of training.
-
-
-4/3/2016 ChromHMM 1.12 (4/15/2016 updated hg38 and rn6 CpGIsland files)
-*Fixed a numerical instability issue that could cause
-NA in the models when including missing data (encoded by a '2' in the input) in special cases
-*Fixed a bug in Reorder in its handling of the situation when adding labels
-at the same time as reordering the states. Now it consistently expects
-both the prefix and state number of the new states.
-* CpGIsland coordinate files for hg38 and rn6 added in Version 1.11 were not in bed format causing an exception to be thrown when computing enrichments
-with these. These files were fixed on 4/15/2016.
-
-7/27/2015 ChromHMM 1.11
-*Added a BinarizeBam command that allows binarization of aligned reads in bam files instead of bed files. This uses the HTSJDK software to implement this feature.
-*Added annotations for assemblies rn6, hg38, dm6, danRer10, and ce10, and updated annotation files for the other assemblies.
-*Added the option in BinarizeBed/BinarizeBed/BinarizeSignal to put a binarzation threshold directly on the signal level through a -g option
-*Added support for gzip files for the commands EvalSubset and CompareModels, so now all ChromHMM commands support both text and gzip format of files
-*Fixed a bug in the Reorder command which did not update the ordering prefix character, and instead maintained the orignal prefix.
-If the states are reorder based on a user provided ordering they will now have a 'U' prefix, a 'T' prefix for transition based ordering,
-and a 'E' prefix for an emission based ordering.
-*Fixed a bug which caused the Reorder command to throw exceptions when parsing elim* model files generated from the StatePruning command
-*Now ignores Hidden files when considering a set of files in a directory
-*Previously it was undocumented what happens if the same cell-mark combination appears multiple times in the cell-mark-file table.
-It was and remains the policy that for the target signal reads are combined for each entry in the combination. For control data previously
-reads for each entry was also combined, while in this version the policy is changed so each unique entry is only counted once.
-*Improved floating point stability when running ChromHMM with hundreds of features. Now if the emission probability for all states at a position
-is less than 10^-300, then each state is associated with an emission probability of 10^-300, to prevent all states from getting 0 probability
-causing instability.
-*Fixed a bug which led to not giving working links to the model files from the generated webpage when using the '-i outfileID' option.
-Also in these cases the webpage is now named based on webpage_NUMSTATES_outfileID.html.
-*Gives more informative error message in places.
-*Renamed the included file Lamina.hg18.bed.gz to laminB1lads.hg18.bed.gz for consistency with the hg19 naming.
-*Minor internal changes to the code that could give slight performance improvements.
-*Now include the source code for the Heatmap (org.tc33.jheatchart.HeatChart.java) in the zipfile download which was previously modified
-from its original state.
-*Updated the license from GPL 2.0 to 3.0.
-
-7/28/2013 ChromHMM 1.10
-*Added the option for LearnModel train in parallel using multiprocessors with the '-p' option.
-The value option specifies the maximum number of processors ChromHMM should try use or if 0 the maximum is set to the number of processors available.
-*Added annotations for assemblies mm10, rn5, danRer7 and updated the annotations for the other assemblies.
-*Fixed a minor bug that prevented printing of control signal without requesting print of regular signal in BinarizeBed.
-*Updated StatePruning to output models with a 1-based numbering instead of a 0-based numbering.
-
-11/4/2012 ChromHMM 1.06
-*Added the EvalSubset command
-
-10/16/2012 ChromHMM 1.05
-*Fixed inconsistencies in whether the label file labelmappingfile used or did not use
-the state ordering letter prefix. Now the state ordering letter prefix is consistently required.
-
-10/14/2012 ChromHMM 1.04
-*Fixed a bug with treatment of missing data sets and control data was being used that caused the data to not be binarized.
-*Also fixed a bug that caused any overlap coordinates past the end of segmentations not to be handled correctly.
-
-5/27/2012 ChromHMM 1.03
-*Added the ability to specify descriptive state labels or mnemonics in OverlapEnrichment, NeighborhoodEnrichment, and
-Reorder.
-*Fixed a bug that caused OverlapEnrichment to throw an exception if there was a chromosome included in the segmentation
-without any coordinates in the file being overlapped.
-
-3/12/2012 ChromHMM 1.02
-*minor fix so that state colors remain consistent if a concatenated model is learned across multiple cell types but not every state is
-assigned to a location in every cell type
-
-2/8/2012 ChromHMM 1.01
-*bug fixed with four column cell-mark table
-
-2/1/2012 ChromHMM 1.00 released
+7/5/2020 ChromHMM 1.21
+*Consistent capitalization of 'S'State' in output files
+*More informative error message when trying to use a lifted over
+segmentation with the default parameters of OverlapEnrichment that
+the '-b 1' parameter should be used
+*Added error message if using '-signal' option and no '_signal' named
+files are found
+*Updated ChromHMM to consider the options '-printstatesbyline' and '-printstatebyline'
+interchangeable and likewise for the '-readstatesbyline' and '-readstatebyline' options
+*Fixed bug so '-color' option in CompareModels is recognized
+*Removed an extra space in the description line of browser files
+*Now handles spaces in label names when using the '-labels' option in OverlapEnrichment or NeighborhoodEnrichment
+*Fixed a bug that if running MergeBinary and not every mark is available in every cell type,
+ChromHMM only prints warning message, and it doesn't throw an exception as before
+*Added a '-lowmem' option to MakeBrowserFiles which uses less memory to create the files.
+This less memory option is also used when applying LearnModel with the '-lowmem' flag
+
+12/9/2019 ChromHMM 1.20
+*Added a '-noautoopen' option that prevents ChromHMM from trying to automatically open a web browser with the summary page of results.
+*Fixed a bug that prevented the '-printbystatebyline' flag from being recognized in MakeSegmentation unless the '-printposterior' flag was also present
+*Leading and trailing white space are now trimmed from entries when reading a cellmarkfiletable
+*Small code optimization when loading data
+*Added a more informative error message if the number of columns in the headers differ across binarized data
+*Added an API call to get the max state at a specified position for data stored in a 2D array
+
+6/25/2019 ChromHMM 1.19
+* Added a '-labels' option to OverlapEnrichment and NeighborhoodEnrichment that allows
+them to be applied to bed files where the fourth column are state labels that don't correspond to state
+numbers or IDs. If the fourth column has a state ID or state number before a '_' and then followed by a label,
+the states will state be ordered by the state ID or number, otherwise the state ordering in the output may differ from the original state ordering.
+* Clarified in the documentation that MakeSegmentation expects the columns of the binarized data
+to be in the same order as the columns in the model file which is by default the order of the columns in the binarized data
+used to learn the model. Added error checking enforcing the column names in the model file agree with the binarized data.
+Also added the '-reordercolsmodelfile' option to Reorder which causes the columns in the model file to be reordered
+
+12/26/2018 ChromHMM 1.18
+*Added the option '-splitrows' to BinarizeBam, BinarizeBed, and BinarizeSignal which enables splitting
+the binarized data across multiple files per chromosome. Splitting files can be desired to improve scalability
+in specific large scale applications. If this option is present the maximum number of rows per file is by default 5000,
+but this can be changed with the added '-j numsplitbins' option.
+*Added the option '-splitrows' to LearnModel and MakeSegmentation. If the binarized data was generated with the '-splitrows' option
+then this flag needs to be present so the segmentation file produces properly named chromosomes with correct coordinates
+*Added the option '-i splitindex' to BinarizeBam and BinarizeBed for conducting row spliting in a more parallelized manner when
+binarizing based on peak data with the '-peaks' option. See the manual for additional information
+*Added the command MergeBinary which allows merging binary files for different mark subsets split across different
+subdirectories. The command also supports row splitting binary files even if no merging was done with the '-splitrows' option.
+*Added the option '-r bedfilein bedfileout' in Reorder which enables directly relabeling the states in a segmentation or browser
+file after a specifying a reordering, without the need to run MakeSegmentation
+*Added the option '-holdroworder' in the LearnModel command which does not reorder the states of the model
+*Added the option '-scalebeta' to use an alternative numerical procedure to estimate the backward variables, beta, to avoid
+overflow observed in specialized settings
+*Added a more informative error message in ConvertGeneTable if the chromosome length is not found
+*Modified ChromHMM so that in cases where there is no data for a chromosome in one cell type, but there is in another
+to not produce a segmentation for the chromosome in the cell type with no data. Previously ChromHMM was not consistent
+in whether it would still produce output for a chromosome with no data in one cell type if there was data for the
+chromosome in another cell type. Previously ChromHMM had inconsistent behavior if in one input cell type there was data
+
+8/2/2018 ChromHMM 1.17
+*This version fixes a bug introduced in ChromHMM 1.16 that causes ChromHMM to produce anotations only for one chromosome
+
+7/29/2018 ChromHMM 1.16
+*Added the command ConvertGeneTable which converts a gene table from the UCSC genome browser table format into gene annotations found in the COORDS and ANCHORFILES directory
+*Added the '-gzip' flag to BinarizeBam, BinarizeBed, BinarizeSignal, ConvertGeneTable, MakeSegmentation, MakeBrowserFiles, and LearnModel
+which enables outputting segmentation files from the command in a zipped format
+*Added the options '-u coorddir' and '-v anchorfiledir' to LearnModel and ConvertGeneTable which allow specifying the directory of COORDS and ANCHORFILES which defaults to the
+directory where the ChromHMM.jar file is.
+*Removed duplicate entries in the exon annotation files. This does not effect the enrichments with the default settings.
+
+
+4/25/2018 ChromHMM 1.15
+*Added danRer11 to the included assemblies
+*Added the '-many' flag to LearnModel, MakeSegmentation, and EvalSubset which is more numerically stable when having many input features, i.e. hundreds of features, at the cost
+of additional runtime
+*Added the '-pseudo' flag to LearnModel. If this flag is present, pseudo counts of 1 are used in computing the model parameters to smooth away from zero values.
+These pseudo counts can provide numerical stability in the situation when the -n numseq is specified in training and some feature has very few present occurrences.
+*Added the '-lowmem' flag option also to EvalSubset which uses less memory by only loading one chromosome in at a time though with potentially additional runtime.
+*Added the '-paired' flag to BinarizeBam. If this option is present then reads in the BAM file are treated as pairs, and each pair is counted once with bin assignment is based on shifting half the insert size. If this option is present then the –n shift, –center, and –peaks options cannot be used.
+*Added a more informative error message for the situation in which the chromosome naming in the chromosome length file is
+inconsistent with the Bam/Bed files when binarizing data
+*Added a more informative error message for the situation in which the chromosome names in the segmentation files are not
+consistent with an external annotation when computing enrichments
+*Added additional range changing when computing enrichments for situations in which coordinates of the external annotations are off the chromosome.
+Such coordinate positions that are off the chromosome are ignored instead of an exception being thrown.
+*Fixed a bug that caused exceptions to be thrown if in LearnModel the '-n numseq' option was used without the '-lowmem' flag
+*Minor internal changes to the code including some that could give slight performance improvements.
+*Added details in the user manual on the computation of the fold enrichment calculation done in OverlapEnrichment and NeighborhoodEnrichment
+
+11/2/2017 ChromHMM 1.14
+*Added '-noimage' option to LearnModel, OverlapEnrichment, NeighborhoodEnrichment, CompareModels, Reorder, EvalSubset to surpress printing of image files.
+*Added annotations and chromosome length file for the ce11 assembly
+*Added a check if a beta value exceeds Double.MAX_VALUE then it is set to Double.MAX_VALUE to improve numerical stability
+*Updated the printing of posterior values to always be printed in Locale.ENGLISH to ensure they can still be read back in by ChromHMM if the default Locale is non-compatible
+*Fixed a bug in which the default initalization procedure would throw an exception if a chromosome was only one bin long
+*Fixed a bug in which the bin(s) with the maximum control value genomewide was not being binarized correctly.
+*Updated handling of situation in which control files were provided for some, but not all marks in a cell type. Previously if only one unique file was provided for a cell type it was used for all other marks in the cell type. Now a uniform control is assume. Previously if there was two or more unique control files for a cell type, then a mark without a control was not being binarized correctly because of the above fixed bug.
+*Slight change with update to how initial values of the initial parameters are set when not using all chromosomes for initialization
+
+11/2/2017 ChromHMM 1.13 (GitHub only release)
+*Added '-lowmem' flag to LearnModel, OverlapEnrichment, NeighborhoodEnrichment, and MakeSegmentation to have ChromHMM only load one chromosome file into memory at time thus reducing maximum memory usage at a potential of additional runtime
+*Added '-n numseq' flag to LearnModel. If this flag is present and the ‘-p’ flag is present then on each iteration of training only numseq chromosome files are randomly selected to be used for training. In such cases the ‘-d’ flag should be set to a negative number so model learning does not terminate prematurely since negative changes in the log-likelihood are expected since different chromosomes are used on each iteration. Also only numseq files are considered in the initial model initialization under the default ‘information’ mode. If the ‘-n’ flag is specified without the ‘-p’ flag a subset of chromosomes will still be used for initialization, but all chromosomes will still be used on all iterations of training..
+
+
+4/3/2016 ChromHMM 1.12 (4/15/2016 updated hg38 and rn6 CpGIsland files)
+*Fixed a numerical instability issue that could cause
+NA in the models when including missing data (encoded by a '2' in the input) in special cases
+*Fixed a bug in Reorder in its handling of the situation when adding labels
+at the same time as reordering the states. Now it consistently expects
+both the prefix and state number of the new states.
+* CpGIsland coordinate files for hg38 and rn6 added in Version 1.11 were not in bed format causing an exception to be thrown when computing enrichments
+with these. These files were fixed on 4/15/2016.
+
+7/27/2015 ChromHMM 1.11
+*Added a BinarizeBam command that allows binarization of aligned reads in bam files instead of bed files. This uses the HTSJDK software to implement this feature.
+*Added annotations for assemblies rn6, hg38, dm6, danRer10, and ce10, and updated annotation files for the other assemblies.
+*Added the option in BinarizeBed/BinarizeBed/BinarizeSignal to put a binarzation threshold directly on the signal level through a -g option
+*Added support for gzip files for the commands EvalSubset and CompareModels, so now all ChromHMM commands support both text and gzip format of files
+*Fixed a bug in the Reorder command which did not update the ordering prefix character, and instead maintained the orignal prefix.
+If the states are reorder based on a user provided ordering they will now have a 'U' prefix, a 'T' prefix for transition based ordering,
+and a 'E' prefix for an emission based ordering.
+*Fixed a bug which caused the Reorder command to throw exceptions when parsing elim* model files generated from the StatePruning command
+*Now ignores Hidden files when considering a set of files in a directory
+*Previously it was undocumented what happens if the same cell-mark combination appears multiple times in the cell-mark-file table.
+It was and remains the policy that for the target signal reads are combined for each entry in the combination. For control data previously
+reads for each entry was also combined, while in this version the policy is changed so each unique entry is only counted once.
+*Improved floating point stability when running ChromHMM with hundreds of features. Now if the emission probability for all states at a position
+is less than 10^-300, then each state is associated with an emission probability of 10^-300, to prevent all states from getting 0 probability
+causing instability.
+*Fixed a bug which led to not giving working links to the model files from the generated webpage when using the '-i outfileID' option.
+Also in these cases the webpage is now named based on webpage_NUMSTATES_outfileID.html.
+*Gives more informative error message in places.
+*Renamed the included file Lamina.hg18.bed.gz to laminB1lads.hg18.bed.gz for consistency with the hg19 naming.
+*Minor internal changes to the code that could give slight performance improvements.
+*Now include the source code for the Heatmap (org.tc33.jheatchart.HeatChart.java) in the zipfile download which was previously modified
+from its original state.
+*Updated the license from GPL 2.0 to 3.0.
+
+7/28/2013 ChromHMM 1.10
+*Added the option for LearnModel train in parallel using multiprocessors with the '-p' option.
+The value option specifies the maximum number of processors ChromHMM should try use or if 0 the maximum is set to the number of processors available.
+*Added annotations for assemblies mm10, rn5, danRer7 and updated the annotations for the other assemblies.
+*Fixed a minor bug that prevented printing of control signal without requesting print of regular signal in BinarizeBed.
+*Updated StatePruning to output models with a 1-based numbering instead of a 0-based numbering.
+
+11/4/2012 ChromHMM 1.06
+*Added the EvalSubset command
+
+10/16/2012 ChromHMM 1.05
+*Fixed inconsistencies in whether the label file labelmappingfile used or did not use
+the state ordering letter prefix. Now the state ordering letter prefix is consistently required.
+
+10/14/2012 ChromHMM 1.04
+*Fixed a bug with treatment of missing data sets and control data was being used that caused the data to not be binarized.
+*Also fixed a bug that caused any overlap coordinates past the end of segmentations not to be handled correctly.
+
+5/27/2012 ChromHMM 1.03
+*Added the ability to specify descriptive state labels or mnemonics in OverlapEnrichment, NeighborhoodEnrichment, and
+Reorder.
+*Fixed a bug that caused OverlapEnrichment to throw an exception if there was a chromosome included in the segmentation
+without any coordinates in the file being overlapped.
+
+3/12/2012 ChromHMM 1.02
+*minor fix so that state colors remain consistent if a concatenated model is learned across multiple cell types but not every state is
+assigned to a location in every cell type
+
+2/8/2012 ChromHMM 1.01
+*bug fixed with four column cell-mark table
+
+2/1/2012 ChromHMM 1.00 released
=====================================
edu/mit/compbio/ChromHMM/BrowserOutput.java
=====================================
@@ -385,7 +385,7 @@ public class BrowserOutput
String szID = szFullID.substring(1); //this removes ordering type
if (bfirst)
{
- String szout = "track name=\""+szsegmentationname+"\" description=\" "+szsegmentationname+" ("+ChromHMM.convertCharOrderToStringOrder(szFullID.charAt(0))
+ String szout = "track name=\""+szsegmentationname+"\" description=\""+szsegmentationname+" ("+ChromHMM.convertCharOrderToStringOrder(szFullID.charAt(0))
+" ordered)"+"\" visibility=1 itemRgb=\"On\""+"\n";
byte[] btformat = szout.getBytes();
pwzip.write(btformat,0,btformat.length);
@@ -433,7 +433,7 @@ public class BrowserOutput
String szID = szFullID.substring(1); //this removes ordering type
if (bfirst)
{
- pw.println("track name=\""+szsegmentationname+"\" description=\" "+szsegmentationname+" ("+ChromHMM.convertCharOrderToStringOrder(szFullID.charAt(0))
+ pw.println("track name=\""+szsegmentationname+"\" description=\""+szsegmentationname+" ("+ChromHMM.convertCharOrderToStringOrder(szFullID.charAt(0))
+" ordered)"+"\" visibility=1 itemRgb=\"On\"");
bfirst = false;
}
@@ -559,7 +559,7 @@ public class BrowserOutput
if (bgzip)
{
GZIPOutputStream pwzip = new GZIPOutputStream(new FileOutputStream(szoutputfileprefix+ChromHMM.SZBROWSEREXPANDEDEXTENSION+".bed.gz"));
- String szout = "track name=\"Expanded_"+szsegmentationname+"\" description=\" "+szsegmentationname+" ("+ChromHMM.convertCharOrderToStringOrder(szLabelFull.charAt(0))
+ String szout = "track name=\"Expanded_"+szsegmentationname+"\" description=\""+szsegmentationname+" ("+ChromHMM.convertCharOrderToStringOrder(szLabelFull.charAt(0))
+" ordered)"+"\" visibility=2 itemRgb=\"On\""+"\n";
byte[] btformat = szout.getBytes();
pwzip.write(btformat,0,btformat.length);
@@ -641,7 +641,7 @@ public class BrowserOutput
else
{
PrintWriter pw = new PrintWriter(new FileWriter(szoutputfileprefix+ChromHMM.SZBROWSEREXPANDEDEXTENSION+".bed"));
- pw.println("track name=\"Expanded_"+szsegmentationname+"\" description=\" "+szsegmentationname+" ("+ChromHMM.convertCharOrderToStringOrder(szLabelFull.charAt(0))
+ pw.println("track name=\"Expanded_"+szsegmentationname+"\" description=\""+szsegmentationname+" ("+ChromHMM.convertCharOrderToStringOrder(szLabelFull.charAt(0))
+" ordered)"+"\" visibility=2 itemRgb=\"On\"");
int nbrowserend = (int) (((Integer)hmchromMax.get(szChroms[0])).intValue()*.001)+1;
pw.println("browser position "+szChroms[0]+":1-"+nbrowserend);
@@ -699,5 +699,308 @@ public class BrowserOutput
pw.close();
}
}
+
+
+
+
+ //////////////////////////////////////////////////////////////////////////////////////////////////
+ /**
+ * Makes a single track browser view of the segmentation represented in szsegmentfile
+ * szcolormapping is a two or three column text file which maps state ID to R,G,B color triples and optionally a state label
+ * Name of segmentation in the browser file is given by szsegmentationame
+ * Output is a file named szoutputfileprefix_browserdense.bed of segmentation viewable with one state per row
+ */
+ public void makebrowserexpandedLowMem() throws IOException
+ {
+ if (bgzip)
+ {
+ System.out.println("Writing to file "+szoutputfileprefix+ChromHMM.SZBROWSEREXPANDEDEXTENSION+".bed.gz");
+ }
+ else
+ {
+ System.out.println("Writing to file "+szoutputfileprefix+ChromHMM.SZBROWSEREXPANDEDEXTENSION+".bed");
+ }
+
+ String szLine;
+
+ BufferedReader brsegment = Util.getBufferedReader(szsegmentfile);
+
+ //stores set of chromosomes and labels
+ HashSet hschroms = new HashSet();
+ HashSet hslabels = new HashSet();
+
+ //stores for each chromosome the maximum coordinate
+ HashMap hmchromMax = new HashMap();
+
+
+ //maps a label without the prefix back to the full label
+ HashMap hmlabelToFull = new HashMap();
+
+ String szLabelFull=null;
+ while ((szLine = brsegment.readLine())!=null)
+ {
+ StringTokenizer st = new StringTokenizer(szLine,"\t");
+ String szchrom = st.nextToken();
+ int nbegin = Integer.parseInt(st.nextToken());
+ int nend = Integer.parseInt(st.nextToken());
+ szLabelFull = st.nextToken();
+ String szLabel = szLabelFull.substring(1);
+
+ hmlabelToFull.put(szLabel, szLabelFull);
+
+ hschroms.add(szchrom);
+ hslabels.add(szLabel);
+ //ArrayList alRecs = (ArrayList) hmcoords.get(szchrom+"\t"+szLabel);
+ //if (alRecs ==null)
+ //{
+ //creating first entry for chromsome and coordinate
+ // alRecs = new ArrayList();
+ // hmcoords.put(szchrom+"\t"+szLabel,alRecs);
+ //}
+ //alRecs.add(new BeginEndRec(nbegin,nend));
+
+ //potentially updating maximum coordinate for chromosome
+ Object obj = ((Integer) hmchromMax.get(szchrom));
+ if (obj != null)
+ {
+ int nval = ((Integer) obj).intValue();
+
+ if (nend > nval)
+ {
+ hmchromMax.put(szchrom,Integer.valueOf(nend));
+ }
+ }
+ else
+ {
+ hmchromMax.put(szchrom,Integer.valueOf(nend));
+ }
+ }
+ brsegment.close();
+
+
+ //gets all the state labels and sorts them
+ String[] szLabels = new String[hslabels.size()];
+ Iterator itrLabels = hslabels.iterator();
+ int nindex = 0;
+ while (itrLabels.hasNext())
+ {
+ szLabels[nindex] = (String) itrLabels.next();
+ nindex++;
+ }
+ Arrays.sort(szLabels, new LabelCompare());
+
+ //gets all the chromsomes and sorts them
+ String[] szChroms = new String[hschroms.size()];
+ Iterator itrChroms = hschroms.iterator();
+ nindex = 0;
+ while (itrChroms.hasNext())
+ {
+ szChroms[nindex] = (String) itrChroms.next();
+ nindex++;
+ }
+ Arrays.sort(szChroms);
+
+ if (bgzip)
+ {
+ GZIPOutputStream pwzip = new GZIPOutputStream(new FileOutputStream(szoutputfileprefix+ChromHMM.SZBROWSEREXPANDEDEXTENSION+".bed.gz"));
+ String szout = "track name=\"Expanded_"+szsegmentationname+"\" description=\""+szsegmentationname+" ("+ChromHMM.convertCharOrderToStringOrder(szLabelFull.charAt(0))
+ +" ordered)"+"\" visibility=2 itemRgb=\"On\""+"\n";
+ byte[] btformat = szout.getBytes();
+ pwzip.write(btformat,0,btformat.length);
+
+ //pw.println("track name=\"Expanded_"+szsegmentationname+"\" description=\" "+szsegmentationname+" ("+ChromHMM.convertCharOrderToStringOrder(szLabelFull.charAt(0))
+ // +" ordered)"+"\" visibility=2 itemRgb=\"On\"");
+ int nbrowserend = (int) (((Integer)hmchromMax.get(szChroms[0])).intValue()*.001)+1;
+
+ szout = "browser position "+szChroms[0]+":1-"+nbrowserend+"\n";
+ btformat = szout.getBytes();
+ pwzip.write(btformat,0,btformat.length);
+ //pwzip.println("browser position "+szChroms[0]+":1-"+nbrowserend);
+
+ for (int nchrom = 0; nchrom < szChroms.length; nchrom++)
+ {
+ HashMap hmcoords = new HashMap();
+ String szcurrchrom = szChroms[nchrom];
+
+ //stores the set of interval coordinates for each chromosome and label
+ brsegment = Util.getBufferedReader(szsegmentfile);
+ while ((szLine = brsegment.readLine())!=null)
+ {
+ StringTokenizer st = new StringTokenizer(szLine,"\t");
+ String szchrom = st.nextToken();
+ if (szchrom.equals(szcurrchrom))
+ {
+ int nbegin = Integer.parseInt(st.nextToken());
+ int nend = Integer.parseInt(st.nextToken());
+ szLabelFull = st.nextToken();
+ String szLabel = szLabelFull.substring(1);
+
+ ArrayList alRecs = (ArrayList) hmcoords.get(szLabel);
+ if (alRecs ==null)
+ {
+ //creating first entry for chromsome and coordinate
+ alRecs = new ArrayList();
+ hmcoords.put(szLabel,alRecs);
+ }
+ alRecs.add(new BeginEndRec(nbegin,nend));
+ }
+ }
+ brsegment.close();
+
+
+ //UCSC browser seems to reverse the ordering of browser track files
+ for (int nlabel = szLabels.length-1; nlabel >=0; nlabel--)
+ {
+ String szcolor = (String) hmcolor.get(""+szLabels[nlabel]);
+ //omits those segment labels not observed at all on chromosome
+
+ ArrayList alRecs = (ArrayList) hmcoords.get(szLabels[nlabel]);
+ if (alRecs == null) continue;
+
+ int nmax = ((Integer) hmchromMax.get(szChroms[nchrom])).intValue();
+
+ //this forces browser to display segment until the end of the chromosome
+ alRecs.add(new BeginEndRec(nmax-1,nmax));
+
+ int nsize = alRecs.size();
+ int nmin = ((BeginEndRec) alRecs.get(0)).nbegin;
+ int nfinalend = nmax;
+
+ String szoutlabel;
+ String szsuffix;
+ if ((szsuffix = (String) hmlabelExtend.get((String) hmlabelToFull.get(szLabels[nlabel])))!=null)
+ {
+ szoutlabel = szLabels[nlabel]+"_"+szsuffix;
+ }
+ else
+ {
+ szoutlabel = szLabels[nlabel];
+ }
+
+ StringBuffer sbout = new StringBuffer();
+ sbout.append(szcurrchrom+"\t"+0+"\t"+nfinalend+"\t"+szoutlabel+"\t0\t.\t"+nmin+"\t"+nfinalend+"\t"+szcolor+"\t"+(nsize+1)+"\t");
+ //pw.print(szChroms[nchrom]+"\t"+0+"\t"+nfinalend+"\t"+szoutlabel+"\t0\t.\t"+nmin+"\t"+nfinalend+"\t"+szcolor+"\t"+(nsize+1)+"\t");
+ //pw.print(0); //forcing the display to start at the beginning of the chromosome
+ sbout.append(0);
+ for (int ni = 0; ni < nsize; ni++)
+ {
+ BeginEndRec theBeginEndRec = (BeginEndRec) alRecs.get(ni);
+ int ndiff = theBeginEndRec.nend - theBeginEndRec.nbegin;
+ sbout.append(",");
+ sbout.append(ndiff);
+ //pw.print(",");
+ //pw.print(ndiff);
+ }
+ sbout.append("\t");
+ sbout.append(0);
+ //pw.print("\t");
+ //pw.print(0);
+ for (int ni = 0; ni < nsize; ni++)
+ {
+ int nloc = ((BeginEndRec) alRecs.get(ni)).nbegin;
+ sbout.append(",");
+ sbout.append(nloc);
+ //pw.print(",");
+ //pw.print(nloc);
+ }
+ sbout.append("\n");
+ // pw.println();
+ btformat = sbout.toString().getBytes();
+ pwzip.write(btformat,0,btformat.length);
+ }
+ }
+ pwzip.finish();
+ pwzip.close();
+ }
+ else
+ {
+ PrintWriter pw = new PrintWriter(new FileWriter(szoutputfileprefix+ChromHMM.SZBROWSEREXPANDEDEXTENSION+".bed"));
+ pw.println("track name=\"Expanded_"+szsegmentationname+"\" description=\""+szsegmentationname+" ("+ChromHMM.convertCharOrderToStringOrder(szLabelFull.charAt(0))
+ +" ordered)"+"\" visibility=2 itemRgb=\"On\"");
+ int nbrowserend = (int) (((Integer)hmchromMax.get(szChroms[0])).intValue()*.001)+1;
+ pw.println("browser position "+szChroms[0]+":1-"+nbrowserend);
+
+ for (int nchrom = 0; nchrom < szChroms.length; nchrom++)
+ {
+ String szcurrchrom = szChroms[nchrom];
+ HashMap hmcoords = new HashMap();
+
+ //stores the set of interval coordinates for each chromosome and label
+ brsegment = Util.getBufferedReader(szsegmentfile);
+ while ((szLine = brsegment.readLine())!=null)
+ {
+ StringTokenizer st = new StringTokenizer(szLine,"\t");
+ String szchrom = st.nextToken();
+ if (szchrom.equals(szcurrchrom))
+ {
+ int nbegin = Integer.parseInt(st.nextToken());
+ int nend = Integer.parseInt(st.nextToken());
+ szLabelFull = st.nextToken();
+ String szLabel = szLabelFull.substring(1);
+
+ ArrayList alRecs = (ArrayList) hmcoords.get(szLabel);
+ if (alRecs ==null)
+ {
+ //creating first entry for chromsome and coordinate
+ alRecs = new ArrayList();
+ hmcoords.put(szLabel,alRecs);
+ }
+ alRecs.add(new BeginEndRec(nbegin,nend));
+ }
+ }
+ brsegment.close();
+
+ //UCSC browser seems to reverse the ordering of browser track files
+ for (int nlabel = szLabels.length-1; nlabel >=0; nlabel--)
+ {
+ String szcolor = (String) hmcolor.get(""+szLabels[nlabel]);
+
+ //omits those segment labels not observed at all on chromosome
+ ArrayList alRecs = (ArrayList) hmcoords.get(szLabels[nlabel]);
+ if (alRecs == null) continue;
+
+ int nmax = ((Integer) hmchromMax.get(szChroms[nchrom])).intValue();
+
+ //this forces browser to display segment until the end of the chromosome
+ alRecs.add(new BeginEndRec(nmax-1,nmax));
+
+ int nsize = alRecs.size();
+ int nmin = ((BeginEndRec) alRecs.get(0)).nbegin;
+ int nfinalend = nmax;
+
+ String szoutlabel;
+ String szsuffix;
+ if ((szsuffix = (String) hmlabelExtend.get((String) hmlabelToFull.get(szLabels[nlabel])))!=null)
+ {
+ szoutlabel = szLabels[nlabel]+"_"+szsuffix;
+ }
+ else
+ {
+ szoutlabel = szLabels[nlabel];
+ }
+
+ pw.print(szcurrchrom+"\t"+0+"\t"+nfinalend+"\t"+szoutlabel+"\t0\t.\t"+nmin+"\t"+nfinalend+"\t"+szcolor+"\t"+(nsize+1)+"\t");
+ pw.print(0); //forcing the display to start at the beginning of the chromosome
+ for (int ni = 0; ni < nsize; ni++)
+ {
+ BeginEndRec theBeginEndRec = (BeginEndRec) alRecs.get(ni);
+ int ndiff = theBeginEndRec.nend - theBeginEndRec.nbegin;
+ pw.print(",");
+ pw.print(ndiff);
+ }
+ pw.print("\t");
+ pw.print(0);
+ for (int ni = 0; ni < nsize; ni++)
+ {
+ int nloc = ((BeginEndRec) alRecs.get(ni)).nbegin;
+ pw.print(",");
+ pw.print(nloc);
+ }
+ pw.println();
+ }
+ }
+ pw.close();
+ }
+ }
}
=====================================
edu/mit/compbio/ChromHMM/ChromHMM.java
=====================================
@@ -1591,7 +1591,7 @@ public class ChromHMM
System.out.println("Writing to file "+szfile);
}
- pw.print("state ("+szorder+" order)");
+ pw.print("State ("+szorder+" order)");
for (int ni = 0; ni < datasets.length; ni++)
{
pw.print("\t"+datasets[colordering[ni]]);
@@ -1644,7 +1644,7 @@ public class ChromHMM
System.out.println("Writing to file "+szfile);
}
- pw.print("state (from\\to) ("+szorder+" order)");
+ pw.print("State (from\\to) ("+szorder+" order)");
for (int ni = 0; ni < numstates; ni++)
{
pw.print("\t"+(ni+1));
@@ -12848,7 +12848,7 @@ public class ChromHMM
if (szcommand.equalsIgnoreCase("Version"))
{
- System.out.println("This is Version 1.20 of ChromHMM (c) Copyright 2008-2012 Massachusetts Institute of Technology");
+ System.out.println("This is Version 1.21 of ChromHMM (c) Copyright 2008-2012 Massachusetts Institute of Technology");
}
else if ((szcommand.equals("BinarizeBam"))||(szcommand.equalsIgnoreCase("BinarizeBed")))
{
@@ -13308,39 +13308,47 @@ public class ChromHMM
boolean bprintimage = true;
int nargindex = 1;
- if (args.length == 5)
+
+ if (args.length <= 2)
{
bok = false;
- try
- {
- if (args[nargindex].equals("-color"))
- {
- String szcolor = args[++nargindex];
- StringTokenizer stcolor = new StringTokenizer(szcolor,",");
- if (stcolor.countTokens()==3)
- {
- nr = Integer.parseInt(stcolor.nextToken());
- ng = Integer.parseInt(stcolor.nextToken());
- nb = Integer.parseInt(stcolor.nextToken());
- }
- else
- {
- bok = false;
- }
- }
- else if (args[nargindex].equals("-noimage"))
- {
- bprintimage = false;
- }
- else
- {
- bok = false;
- }
- }
- catch (NumberFormatException ex)
- {
- bok = false;
- }
+ }
+ else
+ {
+ try
+ {
+ while (nargindex < args.length-3)
+ {
+ if (args[nargindex].equals("-color"))
+ {
+ String szcolor = args[++nargindex];
+ StringTokenizer stcolor = new StringTokenizer(szcolor,",");
+ if (stcolor.countTokens()==3)
+ {
+ nr = Integer.parseInt(stcolor.nextToken());
+ ng = Integer.parseInt(stcolor.nextToken());
+ nb = Integer.parseInt(stcolor.nextToken());
+ }
+ else
+ {
+ bok = false;
+ }
+ }
+ else if (args[nargindex].equals("-noimage"))
+ {
+ bprintimage = false;
+ }
+ else
+ {
+ bok = false;
+ }
+ nargindex++;
+ }
+ }
+ catch (NumberFormatException ex)
+ {
+ bok = false;
+ }
}
if (nargindex != args.length-3)
@@ -13476,7 +13484,7 @@ public class ChromHMM
{
breadposterior = true;
}
- else if (args[nargindex].equals("-readstatesbyline"))
+ else if ((args[nargindex].equals("-readstatesbyline"))||(args[nargindex].equals("-readstatebyline")))
{
breadstatebyline = true;
}
@@ -13602,7 +13610,7 @@ public class ChromHMM
{
bprintposterior = true;
}
- else if (args[nargindex].equals("-printstatesbyline"))
+ else if ((args[nargindex].equals("-printstatesbyline"))||(args[nargindex].equals("-printstatebyline")))
{
bprintstatebyline = true;
}
@@ -13694,7 +13702,7 @@ public class ChromHMM
if (!bok)
{
System.out.println("usage: MakeSegmentation [-b binsize][-f inputfilelist][-gzip][-i outfileID][-l chromosomelengthfile][-lowmem][-many][-nobed]"+
- "[-printposterior][-printstatesbyline][-scalebeta][-splitrows]"+
+ "[-printposterior][-printstatebyline][-scalebeta][-splitrows]"+
" modelfile inputdir outputdir");
}
}
@@ -13703,7 +13711,7 @@ public class ChromHMM
String szcolormapping = null;
String szlabelmapping = null;
boolean bgzip = false;
-
+ boolean blowmem = false;
int nargindex = 1;
int numstates = -1;
@@ -13722,6 +13730,10 @@ public class ChromHMM
//the -l is for backwards compatibility
szlabelmapping = args[++nargindex];
}
+ else if (args[nargindex].equals("-lowmem"))
+ {
+ blowmem = true;
+ }
else if (args[nargindex].equals("-n"))
{
numstates = Integer.parseInt(args[++nargindex]);
@@ -13742,7 +13754,14 @@ public class ChromHMM
BrowserOutput theBrowserOutput = new BrowserOutput(szsegmentfile,szcolormapping,szlabelmapping,
szsegmentationname, szoutputfileprefix,numstates,bgzip);
theBrowserOutput.makebrowserdense();
- theBrowserOutput.makebrowserexpanded();
+ if (blowmem)
+ {
+ theBrowserOutput.makebrowserexpandedLowMem();
+ }
+ else
+ {
+ theBrowserOutput.makebrowserexpanded();
+ }
}
else
{
@@ -13751,7 +13770,7 @@ public class ChromHMM
if (!bok)
{
- System.out.println("usage: MakeBrowserFiles [-c colormappingfile][-gzip][-m labelmappingfile][-n numstates] segmentfile segmentationname outputfileprefix");
+ System.out.println("usage: MakeBrowserFiles [-c colormappingfile][-gzip][-lowmem][-m labelmappingfile][-n numstates] segmentfile segmentationname outputfileprefix");
}
}
else if (szcommand.equalsIgnoreCase("OverlapEnrichment"))
@@ -14323,7 +14342,7 @@ public class ChromHMM
{
bnoorderrows = true;
}
- else if (args[nargindex].equals("-printstatebyline"))
+ else if ((args[nargindex].equals("-printstatebyline"))||(args[nargindex].equals("-printstatesbyline")))
{
bprintstatebyline = true;
}
@@ -14559,7 +14578,14 @@ public class ChromHMM
szprefix,szoutputdir+"/"+szprefix,numstates,bgzip);
theBrowserOutput.makebrowserdense();
- theBrowserOutput.makebrowserexpanded();
+ if (blowmem)
+ {
+ theBrowserOutput.makebrowserexpandedLowMem();
+ }
+ else
+ {
+ theBrowserOutput.makebrowserexpanded();
+ }
if (bgzip)
{
=====================================
edu/mit/compbio/ChromHMM/Preprocessing.java
=====================================
@@ -2487,6 +2487,12 @@ public class Preprocessing
nfilecount++;
}
}
+
+ if (nfilecount == 0)
+ {
+ throw new IllegalArgumentException("No _signal files were found in directory "+szbinneddataDIR);
+ }
+
String[] signalchromfiles = new String[nfilecount];
int nfileindex = 0;
for (int nfile = 0; nfile < allfiles.length; nfile++)
@@ -3110,6 +3116,12 @@ public class Preprocessing
nfilecount++;
}
}
+
+ if (nfilecount == 0)
+ {
+ throw new IllegalArgumentException("No _signal files were found in directory "+szbinneddataDIR);
+ }
+
String[] signalchromfiles = new String[nfilecount];
int nfileindex = 0;
for (int nfile = 0; nfile < allfiles.length; nfile++)
@@ -3979,7 +3991,10 @@ public class Preprocessing
for (int ndir = 0; ndir < hmbrA.length; ndir++)
{
BufferedReader br = (BufferedReader) hmbrA[ndir].get(szcurrfile);
- br.close();
+ if (br != null)
+ {
+ br.close();
+ }
}
}
=====================================
edu/mit/compbio/ChromHMM/StateAnalysis.java
=====================================
@@ -685,13 +685,23 @@ public class StateAnalysis
BufferedReader brinputsegment = Util.getBufferedReader(szinputsegment);
while ((szLine = brinputsegment.readLine())!=null)
{
- StringTokenizer st = new StringTokenizer(szLine,"\t ");
+ StringTokenizer st;
+ if (bstringlabels)
+ {
+ st = new StringTokenizer(szLine,"\t");
+ }
+ else
+ {
+ st = new StringTokenizer(szLine,"\t ");
+ }
+
String szchrom = st.nextToken();
int nbegincoord = Integer.parseInt(st.nextToken());
int nendcoord = Integer.parseInt(st.nextToken());
if (nbegincoord % nbinsize != 0)
{
- throw new IllegalArgumentException("Binsize of "+nbinsize+" does not agree with input segment "+szLine);
+ throw new IllegalArgumentException("Binsize of "+nbinsize+" does not agree with coordinates in input segment "+szLine+". -b binsize should match parameter value to LearnModel or "+
+ "MakeSegmentation used to produce segmentation. If segmentation is derived from a lift over from another assembly, then the '-b 1' option should be used");
}
int nbegin = nbegincoord/nbinsize;
int nend = (nendcoord-1)/nbinsize;
@@ -1289,13 +1299,23 @@ public class StateAnalysis
BufferedReader brinputsegment = Util.getBufferedReader(szinputsegment);
while ((szLine = brinputsegment.readLine())!=null)
{
- StringTokenizer st = new StringTokenizer(szLine,"\t ");
+ StringTokenizer st;
+ if (bstringlabels)
+ {
+ st = new StringTokenizer(szLine,"\t");
+ }
+ else
+ {
+ st = new StringTokenizer(szLine,"\t ");
+ }
+
String szchrom = st.nextToken();
int nbegincoord = Integer.parseInt(st.nextToken());
int nendcoord = Integer.parseInt(st.nextToken());
if (nbegincoord % nbinsize != 0)
{
- throw new IllegalArgumentException("Binsize of "+nbinsize+" does not agree with input segment "+szLine);
+ throw new IllegalArgumentException("Binsize of "+nbinsize+" does not agree with coordinates in input segment "+szLine+". -b binsize should match parameter value to LearnModel or "+
+ "MakeSegmentation used to produce segmentation. If segmentation is derived from a lift over from another assembly, then the '-b 1' option should be used");
}
//int nbegin = nbegincoord/nbinsize;
int nend = (nendcoord-1)/nbinsize;
@@ -1431,7 +1451,16 @@ public class StateAnalysis
brinputsegment = Util.getBufferedReader(szinputsegment);
while ((szLine = brinputsegment.readLine())!=null)
{
- StringTokenizer st = new StringTokenizer(szLine,"\t ");
+ StringTokenizer st;
+ if (bstringlabels)
+ {
+ st = new StringTokenizer(szLine,"\t");
+ }
+ else
+ {
+ st = new StringTokenizer(szLine,"\t ");
+ }
+
String szchrom = st.nextToken();
if (!szchrom.equals(szchromwant))
continue;
@@ -1862,7 +1891,7 @@ public class StateAnalysis
System.out.println("Writing to file "+szoutfile+".txt");
PrintWriter pw = new PrintWriter(new FileWriter(szoutfile+".txt"));
- pw.print("state ("+szstateorder+" order)\tGenome %");
+ pw.print("State ("+szstateorder+" order)\tGenome %");
for (int nfile = 0; nfile < files.length; nfile++)
{
pw.print("\t"+files[nfile]);
@@ -2209,7 +2238,15 @@ public class StateAnalysis
//this loops reads in the segmentation
while ((szLine = brinputsegment.readLine())!=null)
{
- StringTokenizer st = new StringTokenizer(szLine,"\t ");
+ StringTokenizer st;
+ if (bstringlabels)
+ {
+ st = new StringTokenizer(szLine,"\t");
+ }
+ else
+ {
+ st = new StringTokenizer(szLine,"\t ");
+ }
String szchrom = st.nextToken();
//assumes segments are in standard bed format which to get to
//0-based inclusive requires substract 1 from the end
@@ -2356,8 +2393,16 @@ public class StateAnalysis
while ((szLine = brinputsegment.readLine())!=null)
{
//int numlines = alsegments.size();
+ StringTokenizer st;
+ if (bstringlabels)
+ {
+ st = new StringTokenizer(szLine,"\t");
+ }
+ else
+ {
+ st = new StringTokenizer(szLine,"\t ");
+ }
- StringTokenizer st = new StringTokenizer(szLine,"\t ");
String szchrom = st.nextToken();
if (!szchromwant.equals(szchrom))
continue;
@@ -2591,7 +2636,15 @@ public class StateAnalysis
//this loops reads in the segmentation
while ((szLine = brinputsegment.readLine())!=null)
{
- StringTokenizer st = new StringTokenizer(szLine,"\t ");
+ StringTokenizer st;
+ if (bstringlabels)
+ {
+ st = new StringTokenizer(szLine,"\t");
+ }
+ else
+ {
+ st = new StringTokenizer(szLine,"\t ");
+ }
String szchrom = st.nextToken();
//assumes segments are in standard bed format which to get to
//0-based inclusive requires substract 1 from the end
@@ -3785,7 +3838,7 @@ public class StateAnalysis
String[] collabels = new String[theRecEmissionFileCompareA.length];
- pwcompare.print("state");
+ pwcompare.print("State");
for (ncol = 0; ncol < theRecEmissionFileCompareA.length; ncol++)
{
collabels[ncol] = ""+theRecEmissionFileCompareA[ncol].numstates;
View it on GitLab: https://salsa.debian.org/med-team/chromhmm/-/compare/4f27a760b5e9926702ef62d3ed7132e4f4877a90...d7d2a1503316ccf02e9fc73c00e0d312fece2ab1
--
View it on GitLab: https://salsa.debian.org/med-team/chromhmm/-/compare/4f27a760b5e9926702ef62d3ed7132e4f4877a90...d7d2a1503316ccf02e9fc73c00e0d312fece2ab1
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200715/89856712/attachment-0001.html>
More information about the debian-med-commit
mailing list