[Debian-med-packaging] [j.johnson at imb.uq.edu.au: Re: Installation of binary tools inside MEME]

Thu Feb 14 18:08:08 UTC 2013

Hi Andreas,

Yes, I did start looking at Meme but quickly realised it was a lot more
work than I thought to do a proper job on it.  I think all I wanted to
do in the first instance was to get an updated glam2 binary package
based upon the improved glam2 source within the meme code.  I guess this
is now the definitive glam2 as the original standalone source hasn't
been updated since 2008.

Armed with the list below, it should be fairly straightforward to make a
passably neat meme package, with the binaries living in /usr/lib/meme
and a little wrapper to set the path so that the "user" programs can see
the "utility" programs.  It looks from the SVN logs like you are working
on this right now?  Or did you want me to have a crack?

Cheers,

TIM

On Thu, 2013-02-14 at 08:28 +0100, Andreas Tille wrote:
> Hi Tim,
> 
> as you at least in my perception are the main driver behind meme could
> you try a first answer onto this mail on Debian Med mailing list?
> 
> My motivation is to do as much as possible in preparation for Kiel that
> we will be able to do the last polishing / testing there.
> 
> See you
> 
>      Andreas.
> 
> ----- Forwarded message from James Johnson <j.johnson at imb.uq.edu.au> -----
> 
> Date: Thu, 14 Feb 2013 14:41:28 +1000
> From: James Johnson <j.johnson at imb.uq.edu.au>
> To: Andreas Tille <andreas at fam-tille.de>
> CC: MEME Support <meme at nbcr.net>,
> 	Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>,
> 	"H. Soon Gweon" <soonio at gmail.com>,
> 	Faheem Mitha <faheem at faheem.info>
> Subject: Re: Installation of binary tools inside MEME
> X-Spam_score: 0.0
> 
> Hi Andreas,
> 
> On 13/02/13 16:36, Andreas Tille wrote:
> > Hi James,
> > 
> > On Wed, Feb 13, 2013 at 10:18:58AM +1000, James Johnson wrote:
> >> ...
> >>> Any hint would be welcome.
> >> I'll work on a list of what each of the programs is used for and get
> >> back to you. Even discounting the fossils there are probably quite a
> >> few scripts and programs only relevant to installing the MEME Suite
> >> on a webserver. For example a local user would almost never want to
> >> use the update_db script as it's a very clumsy way to get sequence
> >> data for specific tasks.
> > This sounds great - so we stay in idle mode until we hear some news from
> > your side.
> > 
> > Thanks for your quick and helpful response
> I've attached an annotated list of the things that the MEME Suite
> currently installs to the bin directory in our main development branch
> (there may be minor differences to the current distribution). There's
> quite a few things that shouldn't be in there like Python libraries
> (there are 4) and a few Perl libraries (there are 2). There are 6
> programs which have no good reason to be there anymore annotated as
> "Fossil".
> 
> Aside from that there's programs which have been obsoleted because
> there's a newer better version (see most of the mhmm related things)
> and a few scripts which are only used by us developers. There's also a
> few scripts that would only be useful to someone running a webserver.
> 
> After that the decisions get a lot harder. There are programs which
> are generally only called by other programs in the suite, however if
> they're not in the bin directory they won't be found. I'm not sure how
> you're going to work around the restriction on installing only
> programs "a user should execute" with those ones (like meme.bin called
> by the meme script or mast2txt called by mast). There's also quite a
> lot of programs that might be conceivably useful to someone somewhere
> so I'm not sure how you decide with those.
> 
> ~James
> 
> > 
> >        Andreas.
> > 
> 
> 
> alphtype                        	Webserver         	determines if a alphabet string is DNA or PROTEIN, I will probably reimplement as part of a Perl module
> ama                             	Useful            	calculates average/maximum motif score for sequences, first step in gomo analysis
> ama-qvalues                     	Useful            	calculates q-values for ama output
> ame                             	Useful            	calculates motif enrichment in sequences
> beadstring                      	Obsolete          	Obsoleted by MCAST
> beeml2meme                      	Useful            	converts motifs to MEME format from BEEML format
> cat_max                         	Fossil            	...        
> centrimo                        	Useful            	calculates areas of localised motif enrichment
> ceqlogo                         	Rarely Useful     	generates a single motif logo, it's not very user friendly and mostly called through the c interface
> changetoweb                     	Fossil            	...
> chen2meme                       	Useful            	converts motifs to MEME format from Chen format
> clustalw2fasta                  	Possibly Useful   	converts sequences in clustalw format to fasta format
> clustalw2phylip                 	Possibly Useful   	converts sequences in clustalw format to phylip format
> clustalw-io                     	Not Useful        	allows testing of the clustalw parser
> compare_dates                   	Fossil            	another script used by the "download" script
> compute-prior-dist              	Possibly Useful   	computes the distribution of priors in a MEME PSP file
> compute-uniform-priors          	Possibly Useful   	computes a uniform prior psp file equal to the mean of all input priors in another psp file (missing doc)
> create-priors                   	Useful            	allows running MEME in discriminative mode by creating a position specific prior file
> download                        	Fossil            	old code for downloading sequence databases
> draw-mhmm                       	Possibly Useful   	produces a graphvis representation of a MHMM model
> dreme                           	Useful            	Discover short DNA motifs
> dust                            	Useful            	filters low complexity regions from sequences
> fasta-center                    	Possibly Useful   	filters a set of sequences to only leave the central region
> fasta-dinucleotide-shuffle      	Useful            	shuffles a sequence while maintaining di-nucleotide frequencies
> fasta-dinucleotide-shuffle.py   	Python Library    	I'll get this moved into the libs directory like with the Perl modules.
> fasta-fetch                     	Possibly Useful   	Seems to use an index generated by fasta-make-index to fetch sequences out of a fasta file
> fasta-get-markov                	Useful            	generates a Markov model of letter frequencies used as backgrounds by many MEME Suite programs
> fasta-hamming-enrich            	Possibly Useful   	compute the Hamming distance from a word to each sequence in two sets, apply Fisher Exact test
> fasta-hamming-enrich.py         	Python Library    	...
> fasta-io                        	Not Useful        	just allows testing of the fasta file reading
> fasta-make-index                	Possibly Useful   	makes an index of a fasta file
> fasta-most                      	Useful            	finds the length of sequence that occurs most, used by MEME-ChIP
> fasta-shuffle-letters           	Possibly Useful   	shuffles letters of a sequence, though fasta-dinucleotide-shuffle is better
> fasta-subsample                 	Useful            	selects a subset of the sequences
> fasta-unique-names              	Possibly Useful   	makes sequence names unique, replaces U+0001 with space
> fimo                            	Useful            	searches for motif sites
> fisher_exact                    	Possibly Useful   	computes the result of the Fisher Exact test with the given numbers
> fitevd                          	Possibly Useful   	fits an extreme value distribution to a set of score-length pairs.
> gendb                           	Possibly Useful   	generates a synthetic fasta database from a background model
> get_db_csv                      	Webserver         	queries an online sequence repository for its databases and creates a csv file for update_db to use
> getsize                         	Useful            	measures statistics about a fasta file
> glam2                           	Useful            	Discover gapped motifs
> glam2format                     	Possibly Useful   	converts GLAM2 output to FASTA (with gaps) or MSF
> glam2html                       	Useful            	converts GLAM2 output to HTML, called by glam2 but not often by users
> glam2mask                       	Useful            	used with GLAM2 to mask out found motifs and find weaker ones
> glam2psfm                       	Useful            	convert GLAM2 output to a MEME motif
> glam2scan                       	Useful            	scans a sequence with a GLAM2 motif
> glam2scan2html                  	Useful            	converts GLAM2SCAN output to HTML, called by glam2scan but not often by users
> gomo                            	Useful            	finds enriched GO terms associated with high ranking genes
> gomo_highlight                  	Useful            	post processes gomo XML output to include further information which makes the HTML better
> hart2meme-bkg                   	Possibly Useful   	Convert a Hartemink background to a MEME background
> hartemink2psp                   	Possibly Useful   	Convert a Hartemink PSP file into a MEME PSP file
> hypergeometric.py               	Python Library    	...
> iupac2meme                      	Useful            	Make a MEME motif from a IUPAC string
> jaspar2meme                     	Useful            	Convert motifs from JASPAR to MEME format
> llr                             	Possibly Useful   	Compute the probability distribution for the log-likelihood ratio (LLR) of N letters.
> mast                            	Useful            	Find sequences which best match a group of motifs
> mast2txt                        	Useful            	Convert mast XML output to mast text output. Called by mast
> mcast                           	Useful            	Find matches to a motif hidden markov model
> meme                            	Useful            	Discover motifs, this script calls meme.bin handling the details of parallelization
> meme2images                     	Useful            	Create motif logos for all motifs in a MEME motif file
> meme2meme                       	Useful            	Combine multiple MEME motif files into 1
> meme.bin                        	Useful            	Discover motifs, this is typically called by the meme script
> meme-chip                       	Useful            	Discover motifs, look for enriched motifs, calls MEME, DREME, CentriMo, TOMTOM, eventually FIMO and SpaMo
> meme-get-motif                  	Possibly Useful   	Extract motifs from a MEME text file.
> meme-rename                     	Possibly Useful   	Renames all the output HTML files from MEME-ChIP so they can be kept in one folder and emailed easily
> meme-xml-html                   	Possibly Useful   	Does an XML transformation to convert XML output to HTML output, not actually specific to MEME
> metameme                        	Fossil            	Used to handle web jobs for metameme
> mhmm                            	Obsolete          	Given MEME motif, write a motif-based HMM, obsoleted by MCAST
> mhmm2html                       	Obsolete          	Convert MHMM output to HTML, obsoleted by MCAST
> mhmme                           	Obsolete          	obsoleted by MCAST
> mhmm-io                         	Not Useful        	allows testing of reading/writing MHMM models
> mhmms                           	Obsolete          	obsoleted by MCAST
> mhmmscan                        	Obsolete          	obsoleted by MCAST
> motiph                          	Possibly Useful   	part of a publised paper so we want to keep it around
> nmica2meme                      	Useful            	converts motifs from NMICA format to MEME format
> oldmeme2meme                    	Fossil            	converts reallly really old MEME files into only really old MEME files...
> plotgen                         	Webserver         	used to generate usage plots from the logs
> pmp_bf                          	Possibly Useful   	calculates the statistical power of a phylogenetic motif model
> priority2meme                   	Useful            	converts motifs in priority format to MEME format
> prior_utils.pl                  	Perl library      	...
> psp-gen                         	Useful            	calculates PSP files for MEME
> purge                           	Possibly Useful   	filters sequences to remove repeats
> qvalue                          	Possibly Useful   	computes q-values from a list of p-values
> ramen                           	Useful (bugs?)    	integration was never tested so it may have major bugs, does regression analysis of motif enrichment
> ranksum_test                    	Possibly Useful   	calculates the rank-sum test
> read_fasta_file.pl              	Perl Library      	...
> readseq                         	Useful            	converts sequence formats
> reconcile-tree-alignment        	Possibly Useful   	identify the intersection of the sets of sequence IDs and leaf labels
> reduce-alignment                	Possibly Useful   	Extract specified columns from a multiple alignment.
> remove-alignment-gaps           	Possibly Useful   	Remove from an alignment all columns that correspond to a gap in a specified species. 
> rna2meme                        	Useful            	convert an RNA sequence to it's binding motif in MEME format
> scpd2meme                       	Useful            	convert motifs in SCPD format to MEME format
> sd                              	Possibly Useful   	calcualtes mean and standard deviation of a list of numbers
> sequence.py                     	Python Library    	...
> shadow                          	Not Useful        	related to the motiph program but never went anywhere, Perform phylogenetic shadowing
> spamo                           	Useful            	motif spacing enrichment analysis
> taipale2meme                    	Useful            	converts motifs from Taipale format to MEME format
> tamo2meme                       	Useful            	converts motifs from TAMO format to MEME format
> tomtom                          	Useful            	comparison of DNA motifs
> transfac2meme                   	Useful            	converts motifs from Transfac matrics to MEME format
> tree                            	Obsolete          	obsoleted by MCAST
> uniprobe2meme                   	Useful            	convert motifs from Uniprobe format to MEME format
> update_db                       	Webserver         	download sequences listed on the page get_db_list.cgi if there timestamp is newer
> update_meme_tests               	Not Useful        	updates the MEME and MAST smoke tests (they must be run first), shouldn't be needed by end users
> xsltproc_lite                   	Rarely Useful     	used to generate the documentation from XML, the next version will not need it
> 
> 
> ----- End forwarded message -----
> 

-- 
If you can't find an apposite quote for your sig, just make one up.
     - Anon