[Debichem-devel] New packaging project being started
Filippo Rusconi
lopippo at debian.org
Wed Nov 21 13:06:16 UTC 2012
Greetings, Fellow Debianists,
I would like to introduce a number of new packages that are being
prepared for mass spectrometry-related software. But first I would
like to share with you some thoughts on Free Software and mass
spectrometry.
// Small essay
I would like to mention that during a presentation I gave at the
``Debian for Scientific Facilities Days'' at the ESRF [0], I stated
the wish that I had to package for Debian as much Free Software as
possible in the field of mass spectrometry for biology (slides [1]).
[0] http://www.esrf.eu/events/conferences/debian-for-scientific-facilities-days-1/
[1] http://www.esrf.eu/events/conferences/debian-for-scientific-facilities-days-1/debian-talks/rusconi.pdf
Mass spectrometry for biomolecules has become widely popular only some
ten/fifteen years ago, while biochemistry/genetics were popular since
ages (that is, many decades). This is one of the most compelling
reasons why mass spectrometrists have had --for the last fifteen
years-- an almost total ignorance of Free Software: when mass spec
became popular almost *all* of the platforms for driving spectrometers
were already under MS-Windows (MS-W95, typically at that time!). This
is in sharp contrast with DNA sequencers, protein sequencers, UV
spectrometers and software to align DNA sequences, make searches in
the gene banks (and so on and on...) that used to be driven by UNIX
platforms (specifically Sun Microsystems or VAX-VMS ones) since years
before.
Interestingly, since some years, I find more and more people doing
development for biological mass spectrometry/proteomics on GNU/Linux
workstations (Ubuntu, mainly). Generally, ``free beer" and flexibility
are the most compelling reasons for that choice, but nonetheless
``Free Beer" awareness has increased a lot these last years. I
therefore think that we have a good time window now to propell
``Freedom awareness" for mass spectrometry.
There are a number of serious problems at hand here, not only
technical: the data that are generated by spectrometers are in huge
volumes and tend to be in a closed-format [ either immediately after
data acquisition or even after the data have been stored in a
laboratory information management system (aka LIMS) ]. Pushing Free
Software will necessarily push alternative ways of doing mass spec
data processing, thus increasing awareness that alternativef ways of
doing analysis/mining/storage may replace closed-source ``integrated
solutions" sold by vendors willing to lock-in customers. This is a
huge concern and may be loosely related to the problem of big masses
of data (xxx-omes: whole set of data characterizing the cell contents
of a given set of biomolecules, proteomes, glycomes, lipidomes) being
not open.
The packaging effort should, in my opinion, tend to have a full set of
software that would provide in a single platform the following
categories of programs:
* data readers and converters (arguably the first step to free the
mass data by converting from proprietary formats into standard
formats, see below);
* data management/analysis workflow-enabling software, that is,
typically libraries with which to construct mass spectrometry
software to solve specific needs (libopenms, proteowizard) or full
sets of programs that can be used in a chained way
(TOPP software in OpenMS) so as to pipeline data from/to binaries
that accomplish useful tasks (deisotoping, centroiding...);
* data visualization software (typically mass spectrum/chromatogram
viewers, like mmass);
* experiment simulation software with data analysis software
(massxpert).
I thus concentrated my first efforts the two huge packages libopenms
and lipwiz, that are powerful and efficient software. Two fellow
developers joined me to package another oft-used software (tandem) and
a small software to read mass data in a specific format. Our efforts
should not stop here :-)
// End of small essay.
One of these packages (libpwiz) is still undergoing revision (ongoing
negociations with upstream) since it appeared that some files in the
source tree are in conflict with the main license of the project.
>From their respective d/control file:
libpwiz (aka proteowizard)
Description: library to perform proteomics data analyses (devel files)
The libpwiz library from the ProteoWizard project provides a modular
and extensible set of open-source, cross-platform tools and
libraries. The tools perform proteomics data analyses; the libraries
enable rapid tool creation by providing a robust, pluggable
development framework that simplifies and unifies data file access,
and performs standard chemistry and LCMS dataset computations.
.
The primary goal of ProteoWizard is to eliminate the existing
barriers to proteomic software development so that researchers can
focus on the development of new analytic approaches, rather than
having to dedicate significant resources to mundane (if important)
tasks, like reading data files.
libopenms (aka OpenMS/TOPP)
Description: library for LC/MS data management and analysis - runtime
OpenMS is a library for LC/MS data management and analysis. OpenMS
offers an infrastructure for the development of mass
spectrometry-related software and powerful 2D and 3D visualization
solutions.
.
OpenMS offers analyses for various quantitation protocols, including
label-free quantitation, SILAC, iTRAQ, SRM, SWATH…
.
It provides built-in algorithms for de-novo identification and
database search, as well as adapters to other state-of-the art tools
like X!Tandem, Mascot, OMSSA…
.
OpenMS supports the Proteomics Standard Initiative (PSI) formats for
MS data and supports easy integration of tools into workflow engines
like Knime, Galaxy, WS-Pgrade, and TOPPAS via the TOPPtools concept
and a unified parameter handling.
python-mzml (aka pymzML)
Description: mzML mass spectrometric data parsing
python-mzml is an extension to Python that offers:
- easy access to mass spectrometry (MS) data that allows
the rapid development of tools;
- a very fast parser for mzML data, the standard in
mass spectrometry data format;
- a set of functions to compare or handle spectra.
The first two projects are *big* in the field and known to be very
difficult to build only. I'll certainly need your help from time to
time.
As I detailed in a talk I gave at the Debian for Science Facilities
Days at the ESRF (Grenoble), I would like to engage in an effort to
package as much software as possible in the field of mass spectrometry
for biology.
tandem-mass (aka X!Tandem)
Description: mass spectrometry software for protein identification
X! Tandem can match tandem mass spectra with peptide sequences, in a
process that is commonly used to perform protein identification.
.
This software has a very simple, unsophisticated application
programming interface (API): it simply takes an XML file of
instructions on its command line, and outputs the results into an XML
file, which has been specified in the input XML file. The output file
format is described at
\fI`http://www.thegpm.org/docs/X_series_output_form.pdf'\fR.
.
Unlike some earlier generation search engines, all of the X! Series
search engines calculate statistical confidence (expectation values)
for all of the individual spectrum-to-sequence assignments. They also
reassemble all of the peptide assignments in a data set onto the
known protein sequences and assign the statistical confidence that
this assembly and alignment is non-random. The formula for which can
be found here. Therefore, separate assembly and statistical analysis
software, e.g. PeptideProphet and ProteinProphet, do not need to be
used.
r-cran-readbrukerflexdata (aka readBrukerFlexData)
Description: GNU R package to read Bruker Daltonics *flex format files
The readBrukerFlexData package reads data files acquired by MALDI-TOF MS on
Bruker Daltonics machines of the *flex series.
I've been overseeing the packaging work on the last two packages by,
respectively, Olivier Langella and Sebastian Gibb.
We still have details to fix, but the work is going on fine.
Many thanks to the Debichem administrators who kindly answered their
requests about setting up git access accounts.
Thank you for listening,
Cheers,
Filippo
--
Filippo Rusconi, PhD - public crypto key C78F687C @ pgp.mit.edu
Researcher at CNRS and Debian Developer <lopippo at debian.org>
Author of ``massXpert'' at http://www.massxpert.org
More information about the Debichem-devel
mailing list