[med-svn] [python-csb] 02/04: Imported Upstream version 1.2.2+dfsg
Andreas Tille
tille at debian.org
Fri Dec 27 23:19:22 UTC 2013
This is an automated email from the git hooks/post-receive script.
tille pushed a commit to branch master
in repository python-csb.
commit 1c84ae5b2aab5eb613c4661ce2261589d0865a91
Author: Andreas Tille <tille at debian.org>
Date: Fri Dec 27 23:57:18 2013 +0100
Imported Upstream version 1.2.2+dfsg
---
PKG-INFO | 364 ++++----
csb/__init__.py | 542 ++++++------
csb/apps/csfrag.py | 3 +-
csb/bio/fragments/__init__.py | 586 ++-----------
csb/bio/fragments/hhsites.py | 152 ----
csb/bio/fragments/isites.py | 1122 ------------------------
csb/bio/io/__init__.py | 1 -
csb/bio/io/isites.py | 326 -------
csb/statistics/maxent.py | 4 +-
csb/statistics/pdf/__init__.py | 62 +-
csb/statistics/pdf/parameterized.py | 346 ++++++++
csb/statistics/rand.py | 2 +-
csb/statistics/samplers/mc/singlechain.py | 69 +-
csb/statistics/scalemixture.py | 7 +-
csb/test/__init__.py | 180 ++--
csb/test/cases/bio/fragments/__init__.py | 17 +-
csb/test/cases/bio/io/isites/__init__.py | 38 -
csb/test/cases/statistics/pdf/parameterized.py | 323 +++++++
csb/test/cases/statistics/samplers/__init__.py | 190 +++-
19 files changed, 1574 insertions(+), 2760 deletions(-)
diff --git a/PKG-INFO b/PKG-INFO
index 8aca75a..a69d881 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,182 +1,182 @@
-Metadata-Version: 1.1
-Name: csb
-Version: 1.2.1
-Summary: Computational Structural Biology Toolbox
-Home-page: http://csb.codeplex.com
-Author: Michael Habeck et al.
-Author-email: ivan.kalev at gmail.com
-License: MIT
-Description: Computational Structural Biology Toolbox
- ========================================
-
- CSB is a python library and application framework, which can be used
- to solve problems in the field of structural bioinformatics. If
- you are a bioinformatician, software engineer or a researcher working
- in this field, chances are you may find something useful here. Our
- package consists of a few major components:
-
- 1. Core class library -- object-oriented, granular, with an emphasis
- on design and clean interfaces.
-
- 2. Application framework -- console applications ("protocols"),
- which consume objects from the core library in order to build
- something executable (and hopefully useful).
-
- 3. Test framework -- ensures that the library *actually* works.
-
-
- Compatibility
- -------------
-
- In short: CSB requires python 2.6 or higher.
-
- CSB is being developed on Linux, under python 2.7. However, compatibility
- is a design goal and the package works on any platform, on any modern python
- interpreter since version 2.6 (that includes python 3 support out of
- the box). If you found any issues on a platform/interpreter different from
- our development environment, please let us know.
-
-
- Installation
- ------------
-
- Full installation instructions can be found in the INSTALL file packaged with
- this release, as well as on the project's web site:
-
- http://csb.codeplex.com/documentation
-
- Here we provide only a brief summary of the installation procedure.
- First, make sure all required dependencies are installed:
-
- 1. numpy, scipy -- required
- 2. matplotlib and wxPython -- optional, needed only if you want to use csb.io.plots
- 3. unittest2, argparse -- needed only if you are running python 2.6
-
- Next, install CSB::
-
- $ python setup.py install
-
- CSB has just been installed at the following location::
-
- $ python -c "import csb, os; print(os.path.abspath(os.path.join(csb.__path__[0], '..')))"
-
- Let us call this path *$LIB*.
-
-
- Testing
- -------
-
- Running the CSB test suite may be useful if you made any modifications to
- the source code, or if you simply want to check if your installation works.
-
- All CSB tests are executed with the csb.test.Console. A typical way to run
- the console is::
-
- $ python $LIB/csb/test/app.py "csb.test.cases.*"
-
- or just::
-
- $ python $LIB/csb/test/app.py
-
- For help try::
-
- $ python $LIB/csb/test/app.py -h
-
- For more details on our test framework, including guidelines for writing
- unit tests, please refer to the API documentation, package "csb.test".
-
-
- Running CSB Applications
- ------------------------
-
- CSB is bundled with a number of executable console csb.apps. Each app
- provides a standard command line interface. To run any app, try::
-
- $ python $LIB/csb/apps/appname.py --help
-
- where *appname* is the name of the application. For more details on
- our app framework, including guidelines for writing new applications,
- please refer to the API documentation, package "csb.apps".
-
-
- Documentation
- -------------
-
- The project's web site at `CodePlex <http://csb.codeplex.com>`_ contains
- online documentation and samples. Be sure to check
-
- http://csb.codeplex.com/documentation
-
- Detailed API documentation can be found in the "docs/api" directory in the
- distribution package (docs/api/index.html). Many packages contain
- introductory module level documentation and samples/tutorials. These are also
- available in the HTML docs, but a quick way to access them is by using
- the built-in python help system. For example, for a general introduction
- see the module documentation of the root package::
-
- $ python -c "import csb; help(csb)"
-
- If you are interested in a specific package, such as cs.bio.sequence,
- try::
-
- $ python -c "import csb.bio.sequence; help(csb.bio.sequence)"
-
-
- Contact
- -------
-
- CSB is developed by Michael Habeck's Computational Structural Biology
- `research group <http://www.eb.tuebingen.mpg.de/research/research-groups/michael-habeck.html>`_.
-
- For complete source code, contribution, support or bug reports please visit
- our web site at CodePlex:
-
- `csb.codeplex.com <http://csb.codeplex.com>`_
-
-
- License
- -------
-
- CSB is open source and distributed under OSI-approved MIT license.
- ::
-
- Copyright (c) 2012 Michael Habeck
-
- Permission is hereby granted, free of charge, to any person obtaining
- a copy of this software and associated documentation files (the
- "Software"), to deal in the Software without restriction, including
- without limitation the rights to use, copy, modify, merge, publish,
- distribute, sublicense, and/or sell copies of the Software, and to
- permit persons to whom the Software is furnished to do so, subject to
- the following conditions:
-
- The above copyright notice and this permission notice shall be
- included in all copies or substantial portions of the Software.
-
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
- IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
- CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
- TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
- SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
-
-Platform: UNKNOWN
-Classifier: Development Status :: 5 - Production/Stable
-Classifier: Intended Audience :: Developers
-Classifier: Intended Audience :: Science/Research
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Operating System :: OS Independent
-Classifier: Programming Language :: Python
-Classifier: Programming Language :: Python :: 2.6
-Classifier: Programming Language :: Python :: 2.7
-Classifier: Programming Language :: Python :: 3.1
-Classifier: Programming Language :: Python :: 3.2
-Classifier: Topic :: Scientific/Engineering
-Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
-Classifier: Topic :: Scientific/Engineering :: Mathematics
-Classifier: Topic :: Scientific/Engineering :: Physics
-Classifier: Topic :: Software Development :: Libraries
-Requires: numpy
-Requires: scipy
+Metadata-Version: 1.1
+Name: csb
+Version: 1.2.2
+Summary: Computational Structural Biology Toolbox
+Home-page: http://csb.codeplex.com
+Author: Michael Habeck et al.
+Author-email: ivan.kalev at gmail.com
+License: MIT
+Description: Computational Structural Biology Toolbox
+ ========================================
+
+ CSB is a python library and application framework, which can be used
+ to solve problems in the field of structural bioinformatics. If
+ you are a bioinformatician, software engineer or a researcher working
+ in this field, chances are you may find something useful here. Our
+ package consists of a few major components:
+
+ 1. Core class library -- object-oriented, granular, with an emphasis
+ on design and clean interfaces.
+
+ 2. Application framework -- console applications ("protocols"),
+ which consume objects from the core library in order to build
+ something executable (and hopefully useful).
+
+ 3. Test framework -- ensures that the library *actually* works.
+
+
+ Compatibility
+ -------------
+
+ In short: CSB requires python 2.6 or higher.
+
+ CSB is being developed on Linux, under python 2.7. However, compatibility
+ is a design goal and the package works on any platform, on any modern python
+ interpreter since version 2.6 (that includes python 3 support out of
+ the box). If you found any issues on a platform/interpreter different from
+ our development environment, please let us know.
+
+
+ Installation
+ ------------
+
+ Full installation instructions can be found in the INSTALL file packaged with
+ this release, as well as on the project's web site:
+
+ http://csb.codeplex.com/documentation
+
+ Here we provide only a brief summary of the installation procedure.
+ First, make sure all required dependencies are installed:
+
+ 1. numpy, scipy -- required
+ 2. matplotlib and wxPython -- optional, needed only if you want to use csb.io.plots
+ 3. unittest2, argparse -- needed only if you are running python 2.6
+
+ Next, install CSB::
+
+ $ python setup.py install
+
+ CSB has just been installed at the following location::
+
+ $ python -c "import csb, os; print(os.path.abspath(os.path.join(csb.__path__[0], '..')))"
+
+ Let us call this path *$LIB*.
+
+
+ Testing
+ -------
+
+ Running the CSB test suite may be useful if you made any modifications to
+ the source code, or if you simply want to check if your installation works.
+
+ All CSB tests are executed with the csb.test.Console. A typical way to run
+ the console is::
+
+ $ python $LIB/csb/test/app.py "csb.test.cases.*"
+
+ or just::
+
+ $ python $LIB/csb/test/app.py
+
+ For help try::
+
+ $ python $LIB/csb/test/app.py -h
+
+ For more details on our test framework, including guidelines for writing
+ unit tests, please refer to the API documentation, package "csb.test".
+
+
+ Running CSB Applications
+ ------------------------
+
+ CSB is bundled with a number of executable console csb.apps. Each app
+ provides a standard command line interface. To run any app, try::
+
+ $ python $LIB/csb/apps/appname.py --help
+
+ where *appname* is the name of the application. For more details on
+ our app framework, including guidelines for writing new applications,
+ please refer to the API documentation, package "csb.apps".
+
+
+ Documentation
+ -------------
+
+ The project's web site at `CodePlex <http://csb.codeplex.com>`_ contains
+ online documentation and samples. Be sure to check
+
+ http://csb.codeplex.com/documentation
+
+ Detailed API documentation can be found in the "docs/api" directory in the
+ distribution package (docs/api/index.html). Many packages contain
+ introductory module level documentation and samples/tutorials. These are also
+ available in the HTML docs, but a quick way to access them is by using
+ the built-in python help system. For example, for a general introduction
+ see the module documentation of the root package::
+
+ $ python -c "import csb; help(csb)"
+
+ If you are interested in a specific package, such as cs.bio.sequence,
+ try::
+
+ $ python -c "import csb.bio.sequence; help(csb.bio.sequence)"
+
+
+ Contact
+ -------
+
+ CSB is developed by Michael Habeck's Computational Structural Biology
+ `research group <http://www.eb.tuebingen.mpg.de/research/research-groups/michael-habeck.html>`_.
+
+ For complete source code, contribution, support or bug reports please visit
+ our web site at CodePlex:
+
+ `csb.codeplex.com <http://csb.codeplex.com>`_
+
+
+ License
+ -------
+
+ CSB is open source and distributed under OSI-approved MIT license.
+ ::
+
+ Copyright (c) 2012 Michael Habeck
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice shall be
+ included in all copies or substantial portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+
+Platform: UNKNOWN
+Classifier: Development Status :: 5 - Production/Stable
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python
+Classifier: Programming Language :: Python :: 2.6
+Classifier: Programming Language :: Python :: 2.7
+Classifier: Programming Language :: Python :: 3.1
+Classifier: Programming Language :: Python :: 3.2
+Classifier: Topic :: Scientific/Engineering
+Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
+Classifier: Topic :: Scientific/Engineering :: Mathematics
+Classifier: Topic :: Scientific/Engineering :: Physics
+Classifier: Topic :: Software Development :: Libraries
+Requires: numpy
+Requires: scipy
diff --git a/csb/__init__.py b/csb/__init__.py
index ff4c846..ff38330 100644
--- a/csb/__init__.py
+++ b/csb/__init__.py
@@ -1,271 +1,271 @@
-"""
-CSB is a high-level, object-oriented library used to solve problems in the
-field of Computational Structural Biology.
-
-
-Introduction
-============
-
-The library is composed of a set of highly branched python packages
-(namespaces). Some of the packages are meant to be directly used by
-the clients (core library), while others are utility modules and take
-part in the development of the library:
-
- 1. Core class library -- object-oriented, granular, with an emphasis
- on design and clean interfaces. A Sequence is not a string, and a
- Structure is not a dict or list. Naming conventions matter.
-
- 2. Application framework -- executable console applications
- ("protocols"), which consume objects from the core library.
- The framework ensures that each CSB application is also reusable
- and can be instantiated as a regular python object without any
- ugly side effects (sys.exit() and friends). See L{csb.apps} for more
- details.
-
- 3. Test framework -- built on top of the standard unittest as a thin
- wrapping layer. Provides some sugar like transparent management of
- test data files, and modular test execution. L{csb.test} will give
- you all the details.
-
-The core library is roughly composed of:
-
- - bioinformatics API: L{csb.bio}, which includes stuff like
- L{csb.bio.io}, L{csb.bio.structure}, L{csb.bio.sequence},
- L{csb.bio.hmm}
-
- - statistics API: L{csb.statistics}, L{csb.numeric}
-
- - utilities - L{csb.io}, L{csb.core}
-
-
-Getting started
-===============
-
-Perhaps one of the most frequently used parts of the library is the
-L{csb.bio.structure} module, which provides the L{Structure}, L{Chain},
-L{Residue} and L{Atom} objects. You could easily build a L{Structure}
-from scratch, but a far more common scenario is parsing a structure from
-a PDB file using one of the L{AbstractStructureParser}s. All bio IO
-objects, including the StructureParser factory, are defined in
-L{csb.bio.io} and sub-packages:
-
- >>> from csb.bio.io.wwpdb import StructureParser
- >>> p = StructureParser("/some/file/pdb1x80.ent")
- >>> s = p.parse_structure()
- >>> print(s)
- <Structure: 1x80, 2 chains>
-
-The last statement will return a L{csb.bio.structure.Structure} instance,
-which is a composite hierarchical object:
-
- >>> for chain_id in s.chains:
- chain = s.chains[chain_id]
- for residue in chain.residues:
- for atom_id in residue.atoms:
- atom = residue.atoms[atom_id]
- print(atom.vector)
-
-Some of the inner objects in this hierarchy behave just like dictionaries
-(but are not):
-
- >>> s.chains['A'] # access chain A by ID
- <Chain A: Protein>
- >>> s['A'] # the same
- <Chain A: Protein>
-
-Others behave like collections:
-
- >>> chain.residues[10] # 1-based access to the residues in the chain
- <ProteinResidue [10]: PRO 10>
- >>> chain[10] # 0-based, list-like access
- <ProteinResidue [11]: GLY 11>
-
-But all entities are iterable because they inherit the C{items} iterator
-from L{AbstractEntity}. The above loop can be shortened:
-
- >>> for chain in s.items:
- for residue in chain.items:
- for atom in residue.items:
- print(atom.vector)
-
-or even more:
-
- >>> from csb.bio.structure import Atom
- >>> for atom in s.components(klass=Atom):
- print(atom.vector)
-
-You may also be interested in extracting a sub-chain from this structure:
-
- >>> s.chains['B'].subregion(3, 20) # from positions 3 to 20, inclusive
- <Chain B: Protein>
-
-or modifying it in some way, for example, in order to append a new residue,
-try:
-
- >>> from csb.bio.structure import ProteinResidue
- >>> from csb.bio.sequence import ProteinAlphabet
- >>> residue = ProteinResidue(401, ProteinAlphabet.ALA)
- >>> s.chains['A'].residues.append(residue)
-
-Finally, you would probably want to save your structure back to a PDB file:
-
- >>> s.to_pdb('/some/file/name.pdb')
-
-
-Where to go from here
-=====================
-
-If you want to dive into statistics, you could peek inside L{csb.statistics}
-and its sub-packages. For example, L{csb.statistics.pdf} contains a collection
-of L{probability density objects<csb.statistics.pdf.AbstractDensity>},
-like L{Gaussian<csb.statistics.pdf.Normal>} or L{Gamma<csb.statistics.pdf.Gamma>}.
-
-But chances are you would first like to try reading some files, so you could
-start exploring L{csb.bio.io} right now. As we have already seen,
-L{csb.bio.io.wwpdb} provides PDB L{Structure<csb.bio.structure.Structure>}
-parsers, for example L{csb.bio.io.wwpdb.RegularStructureParser} and
-L{csb.bio.io.wwpdb.LegacyStructureParser}.
-
-L{csb.bio.io.fasta} is all about reading FASTA
-L{Sequence<csb.bio.sequence.AbstractSequence>}s and
-L{SequenceAlignment<csb.bio.sequence.AbstractAlignment>}s. Be sure to check out
-L{csb.bio.io.fasta.SequenceParser}, L{csb.bio.io.fasta.SequenceAlignmentReader}
-and L{csb.bio.io.fasta.StructureAlignmentFactory}.
-
-If you are working with HHpred (L{ProfileHMM<csb.bio.hmm.ProfileHMM>}s,
-L{HHpredHit<csb.bio.hmm.HHpredHit>}s), then L{csb.bio.io.hhpred} is for you.
-This package provides L{csb.bio.io.hhpred.HHProfileParser} and
-L{csb.bio.io.hhpred.HHOutputParser}, which are used to read *.hhm and *.hhr
-files.
-
-Finally, if you want to make some nice plots with matplotlib, you may like the
-clean object-oriented interface of our L{Chart<csb.io.plots.Chart>}. See
-L{csb.io.plots} and maybe also L{csb.io.tsv} to get started.
-
-
-Development
-===========
-
-When contributing code to CSB, please take into account the following:
-
- 1. New features or bug fixes should always be accompanied by test cases.
- Also, always run the complete test suite before committing. For more
- details on this topic, see L{csb.test}.
-
- 2. The source code of CSB must be cross-platform and cross-interpreter
- compatible. L{csb.core} and L{csb.io} will give you all necessary
- details on how to use the CSB compatibility layer.
-
-
-License
-=======
-
-CSB is open source and distributed under OSI-approved MIT license::
-
- Copyright (c) 2012 Michael Habeck
-
- Permission is hereby granted, free of charge, to any person obtaining
- a copy of this software and associated documentation files (the
- "Software"), to deal in the Software without restriction, including
- without limitation the rights to use, copy, modify, merge, publish,
- distribute, sublicense, and/or sell copies of the Software, and to
- permit persons to whom the Software is furnished to do so, subject to
- the following conditions:
-
- The above copyright notice and this permission notice shall be
- included in all copies or substantial portions of the Software.
-
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
- IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
- CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
- TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
- SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
-"""
-
-__version__ = '1.2.1.603'
-
-
-class Version(object):
- """
- CSB version number.
- """
-
- def __init__(self):
-
- version = __version__.split('.')
-
- if not len(version) in (3, 4):
- raise ValueError(version)
-
- self._package = __name__
-
- self._major = version[0]
- self._minor = version[1]
- self._micro = version[2]
- self._revision = None
-
- if len(version) == 4:
- self._revision = version[3]
-
- def __str__(self):
- return self.short
-
- def __repr__(self):
- return '{0.package} {0.full}'.format(self)
-
- @property
- def major(self):
- """
- Major version (huge, incompatible changes)
- @rtype: int
- """
- return int(self._major)
-
- @property
- def minor(self):
- """
- Minor version (significant, but compatible changes)
- @rtype: int
- """
- return int(self._minor)
-
- @property
- def micro(self):
- """
- Micro version (bug fixes and small enhancements)
- @rtype: int
- """
- return int(self._micro)
-
- @property
- def revision(self):
- """
- Build number (exact repository revision number)
- @rtype: int
- """
- try:
- return int(self._revision)
- except:
- return self._revision
-
- @property
- def short(self):
- """
- Canonical three-part version number.
- """
- return '{0.major}.{0.minor}.{0.micro}'.format(self)
-
- @property
- def full(self):
- """
- Full version, including the repository revision number.
- """
- return '{0.major}.{0.minor}.{0.micro}.{0.revision}'.format(self)
-
- @property
- def package(self):
- return self._package
-
+"""
+CSB is a high-level, object-oriented library used to solve problems in the
+field of Computational Structural Biology.
+
+
+Introduction
+============
+
+The library is composed of a set of highly branched python packages
+(namespaces). Some of the packages are meant to be directly used by
+the clients (core library), while others are utility modules and take
+part in the development of the library:
+
+ 1. Core class library -- object-oriented, granular, with an emphasis
+ on design and clean interfaces. A Sequence is not a string, and a
+ Structure is not a dict or list. Naming conventions matter.
+
+ 2. Application framework -- executable console applications
+ ("protocols"), which consume objects from the core library.
+ The framework ensures that each CSB application is also reusable
+ and can be instantiated as a regular python object without any
+ ugly side effects (sys.exit() and friends). See L{csb.apps} for more
+ details.
+
+ 3. Test framework -- built on top of the standard unittest as a thin
+ wrapping layer. Provides some sugar like transparent management of
+ test data files, and modular test execution. L{csb.test} will give
+ you all the details.
+
+The core library is roughly composed of:
+
+ - bioinformatics API: L{csb.bio}, which includes stuff like
+ L{csb.bio.io}, L{csb.bio.structure}, L{csb.bio.sequence},
+ L{csb.bio.hmm}
+
+ - statistics API: L{csb.statistics}, L{csb.numeric}
+
+ - utilities - L{csb.io}, L{csb.core}
+
+
+Getting started
+===============
+
+Perhaps one of the most frequently used parts of the library is the
+L{csb.bio.structure} module, which provides the L{Structure}, L{Chain},
+L{Residue} and L{Atom} objects. You could easily build a L{Structure}
+from scratch, but a far more common scenario is parsing a structure from
+a PDB file using one of the L{AbstractStructureParser}s. All bio IO
+objects, including the StructureParser factory, are defined in
+L{csb.bio.io} and sub-packages:
+
+ >>> from csb.bio.io.wwpdb import StructureParser
+ >>> p = StructureParser("/some/file/pdb1x80.ent")
+ >>> s = p.parse_structure()
+ >>> print(s)
+ <Structure: 1x80, 2 chains>
+
+The last statement will return a L{csb.bio.structure.Structure} instance,
+which is a composite hierarchical object:
+
+ >>> for chain_id in s.chains:
+ chain = s.chains[chain_id]
+ for residue in chain.residues:
+ for atom_id in residue.atoms:
+ atom = residue.atoms[atom_id]
+ print(atom.vector)
+
+Some of the inner objects in this hierarchy behave just like dictionaries
+(but are not):
+
+ >>> s.chains['A'] # access chain A by ID
+ <Chain A: Protein>
+ >>> s['A'] # the same
+ <Chain A: Protein>
+
+Others behave like collections:
+
+ >>> chain.residues[10] # 1-based access to the residues in the chain
+ <ProteinResidue [10]: PRO 10>
+ >>> chain[10] # 0-based, list-like access
+ <ProteinResidue [11]: GLY 11>
+
+But all entities are iterable because they inherit the C{items} iterator
+from L{AbstractEntity}. The above loop can be shortened:
+
+ >>> for chain in s.items:
+ for residue in chain.items:
+ for atom in residue.items:
+ print(atom.vector)
+
+or even more:
+
+ >>> from csb.bio.structure import Atom
+ >>> for atom in s.components(klass=Atom):
+ print(atom.vector)
+
+You may also be interested in extracting a sub-chain from this structure:
+
+ >>> s.chains['B'].subregion(3, 20) # from positions 3 to 20, inclusive
+ <Chain B: Protein>
+
+or modifying it in some way, for example, in order to append a new residue,
+try:
+
+ >>> from csb.bio.structure import ProteinResidue
+ >>> from csb.bio.sequence import ProteinAlphabet
+ >>> residue = ProteinResidue(401, ProteinAlphabet.ALA)
+ >>> s.chains['A'].residues.append(residue)
+
+Finally, you would probably want to save your structure back to a PDB file:
+
+ >>> s.to_pdb('/some/file/name.pdb')
+
+
+Where to go from here
+=====================
+
+If you want to dive into statistics, you could peek inside L{csb.statistics}
+and its sub-packages. For example, L{csb.statistics.pdf} contains a collection
+of L{probability density objects<csb.statistics.pdf.AbstractDensity>},
+like L{Gaussian<csb.statistics.pdf.Normal>} or L{Gamma<csb.statistics.pdf.Gamma>}.
+
+But chances are you would first like to try reading some files, so you could
+start exploring L{csb.bio.io} right now. As we have already seen,
+L{csb.bio.io.wwpdb} provides PDB L{Structure<csb.bio.structure.Structure>}
+parsers, for example L{csb.bio.io.wwpdb.RegularStructureParser} and
+L{csb.bio.io.wwpdb.LegacyStructureParser}.
+
+L{csb.bio.io.fasta} is all about reading FASTA
+L{Sequence<csb.bio.sequence.AbstractSequence>}s and
+L{SequenceAlignment<csb.bio.sequence.AbstractAlignment>}s. Be sure to check out
+L{csb.bio.io.fasta.SequenceParser}, L{csb.bio.io.fasta.SequenceAlignmentReader}
+and L{csb.bio.io.fasta.StructureAlignmentFactory}.
+
+If you are working with HHpred (L{ProfileHMM<csb.bio.hmm.ProfileHMM>}s,
+L{HHpredHit<csb.bio.hmm.HHpredHit>}s), then L{csb.bio.io.hhpred} is for you.
+This package provides L{csb.bio.io.hhpred.HHProfileParser} and
+L{csb.bio.io.hhpred.HHOutputParser}, which are used to read *.hhm and *.hhr
+files.
+
+Finally, if you want to make some nice plots with matplotlib, you may like the
+clean object-oriented interface of our L{Chart<csb.io.plots.Chart>}. See
+L{csb.io.plots} and maybe also L{csb.io.tsv} to get started.
+
+
+Development
+===========
+
+When contributing code to CSB, please take into account the following:
+
+ 1. New features or bug fixes should always be accompanied by test cases.
+ Also, always run the complete test suite before committing. For more
+ details on this topic, see L{csb.test}.
+
+ 2. The source code of CSB must be cross-platform and cross-interpreter
+ compatible. L{csb.core} and L{csb.io} will give you all necessary
+ details on how to use the CSB compatibility layer.
+
+
+License
+=======
+
+CSB is open source and distributed under OSI-approved MIT license::
+
+ Copyright (c) 2012 Michael Habeck
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice shall be
+ included in all copies or substantial portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+"""
+
+__version__ = '1.2.2.614'
+
+
+class Version(object):
+ """
+ CSB version number.
+ """
+
+ def __init__(self):
+
+ version = __version__.split('.')
+
+ if not len(version) in (3, 4):
+ raise ValueError(version)
+
+ self._package = __name__
+
+ self._major = version[0]
+ self._minor = version[1]
+ self._micro = version[2]
+ self._revision = None
+
+ if len(version) == 4:
+ self._revision = version[3]
+
+ def __str__(self):
+ return self.short
+
+ def __repr__(self):
+ return '{0.package} {0.full}'.format(self)
+
+ @property
+ def major(self):
+ """
+ Major version (huge, incompatible changes)
+ @rtype: int
+ """
+ return int(self._major)
+
+ @property
+ def minor(self):
+ """
+ Minor version (significant, but compatible changes)
+ @rtype: int
+ """
+ return int(self._minor)
+
+ @property
+ def micro(self):
+ """
+ Micro version (bug fixes and small enhancements)
+ @rtype: int
+ """
+ return int(self._micro)
+
+ @property
+ def revision(self):
+ """
+ Build number (exact repository revision number)
+ @rtype: int
+ """
+ try:
+ return int(self._revision)
+ except:
+ return self._revision
+
+ @property
+ def short(self):
+ """
+ Canonical three-part version number.
+ """
+ return '{0.major}.{0.minor}.{0.micro}'.format(self)
+
+ @property
+ def full(self):
+ """
+ Full version, including the repository revision number.
+ """
+ return '{0.major}.{0.minor}.{0.micro}.{0.revision}'.format(self)
+
+ @property
+ def package(self):
+ return self._package
+
diff --git a/csb/apps/csfrag.py b/csb/apps/csfrag.py
index bd5985d..fd37458 100644
--- a/csb/apps/csfrag.py
+++ b/csb/apps/csfrag.py
@@ -536,7 +536,6 @@ def _task(helper, subject, qs, qe, qcs, scs):
if __name__ == '__main__':
-
- args = "cs.py -v 1 -f -t 12 -d /home/ivan/Desktop/cstest/db -s /home/ivan/Desktop/cstest/t.str -o /home/ivan/Desktop/cstest /home/ivan/Desktop/cstest/t.fa".split()
+
AppRunner().run()
\ No newline at end of file
diff --git a/csb/bio/fragments/__init__.py b/csb/bio/fragments/__init__.py
index 68735c3..4498fa2 100644
--- a/csb/bio/fragments/__init__.py
+++ b/csb/bio/fragments/__init__.py
@@ -5,9 +5,6 @@ This package contains the nuts and bolts of HHfrag. Everything here revolves
around the L{Target} class, which describes a protein structure prediction
target. One typically assigns fragments (L{Assignment}s) to the target and then
builds a fragment library with L{RosettaFragsetFactory}.
-
- at note: Internal or legacy objects are intentionally left undocumented.
- This typically indicates experimental code.
"""
import os
@@ -20,15 +17,15 @@ import csb.bio.structure
import csb.bio.sequence
from csb.bio.structure import SecondaryStructure
+from csb.bio.io.wwpdb import FileSystemStructureProvider, StructureParser
class FragmentTypes(object):
+ HHfrag = 'TH'
+ Rosetta = 'NN'
ISites = 'IS'
- HMMFragments = 'HH'
- HHThread = 'TH'
- HHfrag = HHThread
- Rosetta = 'NN'
+ StaticHH = 'HH'
class Metrics(object):
@@ -103,26 +100,10 @@ class FragmentMatch(object):
@property
def end(self):
- raise NotImplementedError()
-
-class PredictionContainer(object):
-
- def __init__(self, target, isites_prediction, hmm_prediction, combined_prediction):
-
- self.target = target
-
- self.isites = isites_prediction
- self.hmm = hmm_prediction
- self.combined = combined_prediction
-
-class Prediction(object):
-
- def __init__(self, alignment, coordinates):
-
- self.alignment = alignment
- self.coordinates = coordinates
+ raise NotImplementedError()
+
-class TorsionAnglesPredictor(object):
+class TorsionAnglePredictor(object):
"""
Fragment-based phi/psi angles predictor.
@@ -410,7 +391,7 @@ class Target(csb.core.AbstractNIContainer):
@type residues: iterable of L{csb.bio.structure.ProteinResidue}s
"""
- def __init__(self, id, length, residues, overlap=None, segments=None, factory=AssignmentFactory()):
+ def __init__(self, id, length, residues, overlap=None, factory=AssignmentFactory()):
self._id = id
self._accession = id[:-1]
@@ -426,10 +407,6 @@ class Target(csb.core.AbstractNIContainer):
self._residues = csb.core.ReadOnlyCollectionContainer(items=resi,
type=TargetResidue, start_index=1)
- if segments is not None:
- segments = dict([(s.start, s) for s in segments])
- self._segments = csb.core.ReadOnlyDictionaryContainer(items=segments)
-
@staticmethod
def from_sequence(id, sequence):
"""
@@ -511,10 +488,6 @@ class Target(csb.core.AbstractNIContainer):
@property
def residues(self):
return self._residues
-
- @property
- def segments(self):
- return self._segments
def assign(self, fragment):
"""
@@ -531,12 +504,6 @@ class Target(csb.core.AbstractNIContainer):
for rank in range(fragment.qstart, fragment.qend + 1):
ai = ResidueAssignmentInfo(fragment, rank)
self._residues[rank].assign(ai)
-
- if fragment.segment is not None:
- try:
- self._segments[fragment.segment].assign(fragment)
- except KeyError:
- raise ValueError("Undefined segment starting at {0}".format(fragment.segment))
def assignall(self, fragments):
"""
@@ -574,13 +541,9 @@ class Target(csb.core.AbstractNIContainer):
@return: a deep copy of the target
@rtype: L{Target}
"""
-
- segments = [self.segments[start] for start in self.segments]
- segments = [TargetSegment(s.start, s.end, s.count) for s in segments]
-
- target = self._factory.target(self.id, self.length, [r.native for r in self.residues],
- overlap=self._overlap, segments=segments)
-
+ residues = [r.native for r in self.residues]
+ target = self._factory.target(self.id, self.length, residues,
+ overlap=self._overlap)
return target
class ChemShiftTarget(Target):
@@ -588,7 +551,7 @@ class ChemShiftTarget(Target):
def __init__(self, id, length, residues, overlap=None):
super(ChemShiftTarget, self).__init__(id, length, residues, overlap=overlap,
- segments=None, factory=ChemShiftAssignmentFactory())
+ factory=ChemShiftAssignmentFactory())
def assign(self, fragment):
@@ -602,7 +565,8 @@ class ChemShiftTarget(Target):
self._residues[rank].assign(ai)
def clone(self):
- return self._factory.target(self.id, self.length, [r.native for r in self.residues],
+ return self._factory.target(self.id, self.length,
+ [r.native for r in self.residues],
overlap=self._overlap)
class TargetResidue(object):
@@ -633,25 +597,6 @@ class TargetResidue(object):
def assign(self, assignment_info):
self._assignments._append_item(assignment_info)
-
- def verybest(self):
- """
- @return: the fragment with the lowest RMSD at this position in the L{Target}
- @rtype: L{Assignment}
- """
-
- best = None
-
- for ai in self.assignments:
- a = ai.fragment
- if a.length < FragmentCluster.MIN_LENGTH:
- continue
- if best is None or a.rmsd < best.rmsd:
- best = a
- elif a.rmsd == best.rmsd and a.length > best.length:
- best = a
-
- return best
def filter(self, method=Metrics.RMSD, threshold=1.5, extend=False):
"""
@@ -685,6 +630,25 @@ class TargetResidue(object):
except (ClusterExhaustedError, ClusterDivergingError):
return None
+
+ def closest(self):
+ """
+ @return: the fragment with the lowest RMSD at this position in the L{Target}
+ @rtype: L{Assignment}
+ """
+
+ best = None
+
+ for ai in self.assignments:
+ a = ai.fragment
+ if a.length < FragmentCluster.MIN_LENGTH:
+ continue
+ if best is None or a.rmsd < best.rmsd:
+ best = a
+ elif a.rmsd == best.rmsd and a.length > best.length:
+ best = a
+
+ return best
def longest(self):
"""
@@ -719,7 +683,7 @@ class TargetResidue(object):
class ChemShiftTargetResidue(TargetResidue):
- def verybest(self):
+ def closest(self):
best = None
@@ -735,230 +699,7 @@ class ChemShiftTargetResidue(TargetResidue):
best = a
return best
-
-class TargetSegment(object):
-
- def __init__(self, start, end, count):
-
- self._start = start
- self._end = end
- self._count = count
-
- self._assignments = csb.core.ReadOnlyCollectionContainer(type=Assignment)
-
- @property
- def count(self):
- return self._count
-
- @property
- def start(self):
- return self._start
-
- @property
- def end(self):
- return self._end
- @property
- def length(self):
- return (self._end - self._start + 1)
-
- @property
- def assignments(self):
- return self._assignments
-
- def assign(self, fragment):
- if fragment.segment != self.start:
- raise ValueError('Segment origin mismatch: {0} vs {1}'.format(fragment.segment, self.start))
- else:
- self._assignments._append_item(fragment)
-
- def verybest(self):
-
- best = None
-
- for a in self.assignments:
- if a.length < FragmentCluster.MIN_LENGTH:
- continue
- if best is None or a.rmsd < best.rmsd:
- best = a
- elif a.rmsd == best.rmsd and a.length > best.length:
- best = a
-
- return best
-
- def best(self, method=Metrics.RMSD):
-
- try:
- cluster = FragmentCluster(self.assignments, threshold=1.5,
- connectedness=0.5, method=method)
- centroid = cluster.shrink(minitems=1)
- return centroid
-
- except ClusterExhaustedError:
- return None
- finally:
- del cluster
-
- def longest(self):
-
- best = None
-
- for q in self.assignments:
- if best is None or (q.length > best.length):
- best = q
-
- return best
-
- def pairwise_rmsd(self, min_overlap=5):
-
- rmsds = []
-
- for q in self.assignments:
- for s in self.assignments:
- if q is not s:
- r = q.rmsd_to(s, min_overlap)
- if r is not None:
- rmsds.append(r)
- else:
- assert q.rmsd_to(s, 1) < 0.01
-
- return rmsds
-
- def pairwise_mda(self, min_overlap=5):
-
- mdas = []
-
- for q in self.assignments:
- for s in self.assignments:
- if q is not s:
- m = q.mda_to(s, min_overlap)
- if m is not None:
- mdas.append(m)
- return mdas
-
- def pairwise_sa_rmsd(self, profiles='.', min_overlap=5):
-
- from csb.bio.hmm import RELATIVE_SA
- from csb.bio.io.hhpred import ScoreUnits, HHProfileParser
-
- def convert_sa(sa):
- return numpy.array([ RELATIVE_SA[i] for i in sa ])
-
- sources = {}
- scores = []
-
- for q in self.assignments:
- for s in self.assignments:
-
- if s.source_id not in sources:
- hmm = HHProfileParser(os.path.join(profiles, s.source_id + '.hhm')).parse()
- sources[s.source_id] = hmm.dssp_solvent
-
- if q is not s:
-
- common = q.overlap(s)
- if len(common) >= min_overlap:
-
- qsa = q.solvent_at(sources[q.source_id], min(common), max(common))
- ssa = s.solvent_at(sources[s.source_id], min(common), max(common))
-
- if '-' in qsa + ssa:
- continue
-
- qsa = convert_sa(qsa)
- ssa = convert_sa(ssa)
- assert len(qsa) == len(ssa)
- sa_rmsd = numpy.sqrt(numpy.sum((qsa - ssa) ** 2) / float(len(qsa)))
-
- scores.append(sa_rmsd)
- return scores
-
- def pairwise_scores(self, profiles='.', min_overlap=5):
-
- from csb.bio.hmm import BACKGROUND
- back = numpy.sqrt(numpy.array(BACKGROUND))
-
- sources = {}
- scores = []
-
- for q in self.assignments:
- for s in self.assignments:
-
- if s.source_id not in sources:
- # hmm = HHProfileParser(os.path.join(hmm_path, s.source_id + '.hhm')).parse(ScoreUnits.Probability)
- sources[s.source_id] = csb.io.Pickle.load(open(os.path.join(profiles, s.source_id + '.pkl'), 'rb'))
-
- if q is not s:
-
- common = q.overlap(s)
- if len(common) >= min_overlap:
-
- qprof = q.profile_at(sources[q.source_id], min(common), max(common))
- sprof = s.profile_at(sources[s.source_id], min(common), max(common))
-
- #score = qhmm.emission_similarity(shmm)
- assert len(qprof) == len(sprof)
- dots = [ numpy.dot(qprof[i] / back, sprof[i] / back) for i in range(len(qprof)) ]
- score = numpy.log(numpy.prod(dots))
- if score is not None:
- scores.append(score)
- return scores
-
- def _entropy(self, data, binsize):
-
- binsize = float(binsize)
- bins = numpy.ceil(numpy.array(data) / binsize)
-
- hist = dict.fromkeys(bins, 0)
- for bin in bins:
- hist[bin] += (1.0 / len(bins))
-
- freq = numpy.array(hist.values())
- return - numpy.sum(freq * numpy.log(freq))
-
- def rmsd_entropy(self, binsize=0.1):
-
- rmsds = self.pairwise_rmsd()
- return self._entropy(rmsds, binsize)
-
- def score_entropy(self, profiles='.', binsize=1):
-
- scores = self.pairwise_scores(profiles)
- return self._entropy(scores, binsize)
-
- def rmsd_consistency(self, threshold=1.5):
-
- rmsds = self.pairwise_rmsd()
-
- if len(rmsds) < 1:
- return None
-
- return sum([1 for i in rmsds if i <= threshold]) / float(len(rmsds))
-
- def sa_rmsd_consistency(self, threshold=0.4, profiles='.'):
-
- sa_rmsds = self.pairwise_sa_rmsd(profiles=profiles)
-
- if len(sa_rmsds) < 1:
- return None
-
- return sum([1 for i in sa_rmsds if i <= threshold]) / float(len(sa_rmsds))
-
- def true_positives(self, threshold=1.5):
-
- if self.assignments.length < 1:
- return None
-
- return sum([1 for i in self.assignments if i.rmsd <= threshold]) / float(self.assignments.length)
-
- def confidence(self):
-
- cons = self.rmsd_consistency()
-
- if cons is None:
- return 0
- else:
- return numpy.log10(self.count) * cons
class ResidueAssignmentInfo(object):
@@ -1002,7 +743,7 @@ class Assignment(FragmentMatch):
"""
def __init__(self, source, start, end, qstart, qend, id=None, probability=None, rmsd=None,
- tm_score=None, score=None, neff=None, segment=None, internal_id=None):
+ tm_score=None, score=None, neff=None, internal_id=None):
assert source.has_torsion
sub = source.subregion(start, end, clone=True)
@@ -1024,7 +765,6 @@ class Assignment(FragmentMatch):
self._neff = neff
self._ss = None
- self._segment_start = segment
self.internal_id = internal_id
if id is None:
@@ -1088,11 +828,7 @@ class Assignment(FragmentMatch):
@property
def neff(self):
- return self._neff
-
- @property
- def segment(self):
- return self._segment_start
+ return self._neff
@property
def secondary_structure(self):
@@ -1109,7 +845,7 @@ class Assignment(FragmentMatch):
def transform(self, rotation, translation):
"""
- Apply rotation/translation to fragment's coordinates in place.
+ Apply rotation/translation to fragment's coordinates in place
"""
for ca in self.backbone:
@@ -1124,7 +860,7 @@ class Assignment(FragmentMatch):
def anchored_around(self, rank):
"""
- @return: True if the fragment is centered around position=C{rank}.
+ @return: True if the fragment is centered around position=C{rank}
@rtype: bool
"""
@@ -1149,10 +885,9 @@ class Assignment(FragmentMatch):
def torsion_at(self, qstart, qend):
"""
- @return: the torsion angles of the fragment at the specified subregion.
+ @return: torsion angles of the fragment at the specified subregion
@rtype: list
"""
-
self._check_range(qstart, qend)
relstart = qstart - self.qstart
@@ -1160,44 +895,18 @@ class Assignment(FragmentMatch):
return self.torsion[relstart : relend]
- def solvent_at(self, sa_string, qstart, qend):
-
- self._check_range(qstart, qend)
-
- relstart = qstart - self.qstart
- relend = qend - self.qstart + 1
-
- return sa_string[relstart : relend]
-
def sec_structure_at(self, qstart, qend):
-
+ """
+ @return: secondary structure of the fragment at the specified subregion
+ @rtype: list
+ """
+
self._check_range(qstart, qend)
start = qstart - self.qstart + 1
end = qend - self.qstart + 1
return self.secondary_structure.scan(start, end, loose=True, cut=True)
-
- def profile_at(self, source, qstart, qend):
-
- self._check_range(qstart, qend)
-
- start = qstart - self.qstart + self.start
- end = qend - self.qstart + self.start
-
- if hasattr(source, 'subregion'):
- return source.subregion(start, end)
- else:
- return source[start - 1 : end]
-
- def chain_at(self, source, qstart, qend):
-
- self._check_range(qstart, qend)
-
- start = qstart - self.qstart + self.start
- end = qend - self.qstart + self.start
-
- return source.subregion(start, end)
-
+
def overlap(self, other):
"""
@type other: L{Assignment}
@@ -1240,6 +949,17 @@ class Assignment(FragmentMatch):
return None
def nrmsd_to(self, other, min_overlap=5):
+ """
+ @return: the normalized CA RMSD between C{self} and C{other}.
+
+ @param other: another fragment
+ @type other: L{Assignment}
+ @param min_overlap: require at least that number of overlapping residues
+ (return None if not satisfied)
+ @type min_overlap: int
+
+ @rtype: float
+ """
common = self.overlap(other)
@@ -1256,7 +976,18 @@ class Assignment(FragmentMatch):
return None
def mda_to(self, other, min_overlap=5):
-
+ """
+ @return: the MDA (maximum deviation in torsion angles) between
+ C{self} and C{other}.
+
+ @param other: another fragment
+ @type other: L{Assignment}
+ @param min_overlap: require at least that number of overlapping residues
+ (return None if not satisfied)
+ @type min_overlap: int
+
+ @rtype: float
+ """
common = self.overlap(other)
if len(common) >= min_overlap:
@@ -1274,43 +1005,7 @@ class Assignment(FragmentMatch):
return max(maxphi, maxpsi)
return None
-
- def to_rosetta(self, source, qstart=None, qend=None, weight=None):
- """
- @deprecated: this method will be deleted soon. Use
- L{csb.bio.fragments.rosetta.OutputBuilder} instead.
- """
- stream = csb.io.MemoryStream()
-
- if weight is None:
- weight = self.probability
- if not qstart:
- qstart = self.qstart
- if not qend:
- qend = self.qend
-
- source.compute_torsion()
- chain = self.chain_at(source, qstart, qend)
-
- for i, r in enumerate(chain.residues):
-
- acc = self.source_id[:4]
- ch = self.source_id[4].upper()
-
- start = qstart - self.qstart + self.start + i
- aa = r.type
- ss = 'L'
- phi, psi, omega = 0, 0, 0
- if r.torsion.phi:
- phi = r.torsion.phi
- if r.torsion.psi:
- psi = r.torsion.psi
- if r.torsion.omega:
- omega = r.torsion.omega
-
- stream.write(' {0:4} {1:1} {2:>5} {3!s:1} {4!s:1} {5:>8.3f} {6:>8.3f} {7:>8.3f} {8:>8.3f}\n'.format(acc, ch, start, aa, ss, phi, psi, omega, weight))
-
- return stream.getvalue()
+
class ChemShiftAssignment(Assignment):
@@ -1322,7 +1017,7 @@ class ChemShiftAssignment(Assignment):
super(ChemShiftAssignment, self).__init__(
source, start, end, qstart, qend, id=None, probability=1.0,
- rmsd=rmsd, tm_score=None, score=score, neff=None, segment=None, internal_id=None)
+ rmsd=rmsd, tm_score=None, score=score, neff=None, internal_id=None)
@property
def window(self):
@@ -1717,126 +1412,6 @@ class ClusterRep(object):
centroid = self._centroid
self._centroid = self._alternative
self._alternative = centroid
-
- def to_rosetta(self, source):
- """
- @deprecated: this method is obsolete and will be deleted soon
- """
- return self.centroid.to_rosetta(source, weight=self.confidence)
-
-class AdaptedAssignment(object):
-
- @staticmethod
- def with_overhangs(center, start, end, overhang=1):
-
- if center.centroid.qstart <= (start - overhang):
- start -= overhang
- elif center.centroid.qstart < start:
- start = center.centroid.qstart
-
- if center.centroid.qend >= (end + overhang):
- end += overhang
- elif center.centroid.qend > end:
- end = center.centroid.end
-
- return AdaptedAssignment(center, start, end)
-
- def __init__(self, center, qstart, qend):
-
- if qstart < center.centroid.qstart:
- raise ValueError(qstart)
- if qend > center.centroid.qend:
- raise ValueError(qend)
-
- self._qstart = qstart
- self._qend = qend
- self._center = center
-
- @property
- def fragment(self):
- return self._center.centroid
-
- @property
- def center(self):
- return self._center
-
- @property
- def confidence(self):
- return self._center.confidence
-
- @property
- def qstart(self):
- return self._qstart
-
- @property
- def qend(self):
- return self._qend
-
- @property
- def backbone(self):
- return self.fragment.backbone_at(self.qstart, self.qend)
-
- def chain(self, source):
- return self.fragment.chain_at(source, self.qstart, self.qend)
-
- def to_rosetta(self, source):
- return self.fragment.to_rosetta(source, self.qstart, self.qend, self.confidence)
-
-class SmoothFragmentMap(csb.core.AbstractContainer):
-
- def __init__(self, length, centroids):
-
- if not length > 0:
- raise ValueError(length)
-
- self._length = int(length)
- self._slots = set(range(1, self._length + 1))
- self._map = {}
-
- centers = list(centroids)
- centers.sort(key=lambda i: i.confidence, reverse=True)
-
- for c in centers:
- self.assign(c)
-
- @property
- def _children(self):
- return self._map
-
- def assign(self, center):
-
- for r in range(center.centroid.qstart, center.centroid.qend + 1):
- if r in self._slots:
- self._map[r] = center
- self._slots.remove(r)
-
- def patches(self):
-
- center = None
- start = None
- end = None
-
- for r in range(1, self._length + 1):
-
- if center is None:
- if r in self._map:
- center = self._map[r]
- start = end = r
- else:
- center = None
- start = end = None
- else:
- if r in self._map:
- if self._map[r] is center:
- end = r
- else:
- yield AdaptedAssignment(center, start, end)
- center = self._map[r]
- start = end = r
- else:
- yield AdaptedAssignment(center, start, end)
- center = None
- start = end = None
class ResidueEventInfo(object):
@@ -2063,12 +1638,10 @@ class BenchmarkAdapter(object):
def __init__(self, pdb_paths, connection_string=None, factory=AssignmentFactory()):
- self._pdb = pdb_paths
self._connection = None
- from csb.bio.io.wwpdb import find, StructureParser
self._parser = StructureParser
- self._find = find
+ self._pdb = FileSystemStructureProvider(pdb_paths)
self._factory = factory
try:
@@ -2137,21 +1710,12 @@ class BenchmarkAdapter(object):
db.cursor.callproc('reporting."GetCentroids"', (benchmark_id,))
return db.cursor.fetchall()
- def target_segments(self, target_id):
-
- with BenchmarkAdapter.Connection() as db:
-
- db.cursor.callproc('reporting."GetTargetSegments"', (target_id,))
- data = db.cursor.fetchall()
-
- return [ TargetSegment(row['Start'], row['End'], row['Count']) for row in data ]
-
def structure(self, accession, chain=None):
- pdbfile = self._find(accession, self._pdb)
+ pdbfile = self._pdb.find(accession)
if not pdbfile and chain:
- pdbfile = self._find(accession + chain, self._pdb)
+ pdbfile = self._pdb.find(accession + chain)
if not pdbfile:
raise IOError('{0} not found here: {1}'.format(accession, self._pdb))
@@ -2170,8 +1734,7 @@ class BenchmarkAdapter(object):
overlap = float(row["MaxOverlap"]) / (length or 1.)
native = self.structure(id[:4], id[4]).chains[id[4]]
- segments = self.target_segments(target_id)
- target = self._factory.target(id, length, native.residues, overlap, segments)
+ target = self._factory.target(id, length, native.residues, overlap)
source = None
@@ -2206,7 +1769,6 @@ class BenchmarkAdapter(object):
neff=row['Neff'],
rmsd=row['RMSD'],
tm_score=row['TMScore'],
- segment=row['SegmentStart'],
internal_id=row['InternalID'])
target.assign(fragment)
diff --git a/csb/bio/fragments/hhsites.py b/csb/bio/fragments/hhsites.py
deleted file mode 100644
index da034f5..0000000
--- a/csb/bio/fragments/hhsites.py
+++ /dev/null
@@ -1,152 +0,0 @@
-"""
-HMM fragments derived from HHpred HMM profiles.
-
- at deprecated: This module may be removed in the future.
-"""
-import re
-import csb.bio.fragments
-import csb.bio.hmm
-import csb.bio.structure as structure
-import csb.bio.fragments.isites as isites
-
-from csb.bio.hmm import ProfileHMMSegment, ProfileHMMRegion
-
-class HMMFragment(ProfileHMMSegment):
- """
- Describes a HMM segment which can be slid over a subject profile.
- See the documentation of the base class for the signature of the
- constructor.
- """
-
- def slide_over(self, other):
- """
- Find instances of self by sliding it over other and computing
- emission vector similarity and RSMD.
-
- @param other: the subject HMM
- @type other: L{ProfileHMM}
-
- @return: a list of L{isites.ProfileMatch}-es
- @rtype: list
- """
- return self._slide_over(self, other)
-
- def _slide_over(self, this, that):
-
- query_residues = this.chain()
- window = this.layers.length
- matches = []
-
- i = that.layers.start_index
- while True:
- if that.layers.last_index - i + 1 < window:
- break
-
- score, tm, tm_out, rmsd_ca = None, None, None, None
- start = i
- end = i + window - 1
-
- subject = ProfileHMMRegion(that, start, end)
- subject_residues = structure.Chain(that.id[4].upper(), residues=that.residues[start : start + window])
-
- score = this.emission_similarity(subject)
-
- try:
- rmsd_ca = query_residues.rmsd(subject_residues)
- except structure.Broken3DStructureError:
- rmsd_ca = None
-
- if score is not None and (rmsd_ca is not None):
- match = isites.ProfileMatch(that.id, start, score, rmsd_ca, None, tm, tm_out)
- matches.append(match)
-
- i += 1
-
- return matches
-
-class HMMFragmentWrapper(HMMFragment):
- """
- Describes a HMM segment which can be slid over a subject profile.
- Wraps an existing L{ProfileHMMSegment} to decorate it with a slide_over
- method.
-
- @param hmm_segment: the HMM segment to wrap as a fragment
- @type hmm_segment: L{ProfileHMMSegment}
- """
-
- def __init__(self, hmm_segment):
-
- self._segment = hmm_segment
-
- def slide_over(self, other):
-
- return self._slide_over(self._segment, other)
-
-class HMMLibraryWrapper(isites.Library):
- """
- Library wrapper for H-Sites fragments.
-
- @param fragments: a list of L{HMMFragment}s
- @type list:
- """
-
- def __init__(self, fragments=None):
-
- self._ondemand = False
-
- self._fragments = []
- self._byid = {}
- self._byrep = {}
- self.name = 'HSites'
-
- if fragments is not None:
- self.fragments = fragments
-
- @property
- def fragments(self):
- return self._fragments
- @fragments.setter
- def fragments(self, fragments):
- self._fragments = []
- self._byid = {}
- self._byrep = {}
- for fragment in fragments:
- self._fragments.append(fragment)
- self._byid[fragment.isite] = fragment
- self._byrep[fragment.id] = fragment
-
- @property
- def clusters(self):
- return self.fragments
- @clusters.setter
- def clusters(self, clusters):
- self.fragments = clusters
-
-class HMMFragmentMatch(csb.bio.fragments.FragmentMatch):
-
- def __init__(self, fragment, qstart, qend, probability, rmsd, tm_score, qlength):
-
- if not isinstance(fragment, csb.bio.hmm.ProfileHMMSegment):
- raise TypeError(fragment)
-
- source_info = re.split('[\-\.]', fragment.id)
- self._source = source_info[0]
- self._start = int(source_info[1]) + fragment.source_start - 1
- self._end = int(source_info[1]) + fragment.source_end - 1
-
- assert (self._end - self._start + 1) == fragment.layers.length == (qend - qstart + 1)
-
- super(HMMFragmentMatch, self).__init__(fragment.id, qstart, qend, probability,
- rmsd, tm_score, qlength)
-
- @property
- def source_id(self):
- return self._source
-
- @property
- def start(self):
- return self._start
-
- @property
- def end(self):
- return self._end
diff --git a/csb/bio/fragments/isites.py b/csb/bio/fragments/isites.py
deleted file mode 100644
index 7875e46..0000000
--- a/csb/bio/fragments/isites.py
+++ /dev/null
@@ -1,1122 +0,0 @@
-"""
-I-Sites fragment library APIs.
-
- at warning: This module is no longer developed and is provided as is.
-"""
-
-import sys
-import collections
-import operator
-import math
-
-import csb.io
-import csb.core
-import csb.bio.fragments
-import csb.bio.structure as structure
-import csb.bio.sequence
-import csb.bio.hmm as hmm
-
-
-class FragmentMatching(csb.core.enum):
- """
- Enumeration of fragment assignment methods
- """
- Sequence=0; Bystroff=1; CrossEntropy=2; Soeding=3; Baker=4
-
-class InternalISitesError(ValueError):
- pass
-class InvalidParadigmError(InternalISitesError):
- pass
-
-class ChainSequence(object):
- """
- Represents a single PDB chain sequence entry with torsion angles
- """
-
- def __init__(self):
- self.id = None
- self.accession = None
- self.chain = None
- self.type = None
- self.header = None
- self.sequence = None
- self.torsion = structure.TorsionAnglesCollection()
-
- @property
- def has_torsion(self):
- if hasattr(self, 'torsion'):
- return len(self.torsion) > 0
- else:
- return False
-
- @property
- def entry_id(self):
- id = self.accession
- if self.id is not None:
- id += self.id
- return id
-
- @property
- def length(self):
- if self.sequence is None:
- return 0
- return len(self.sequence)
-
-class ProteinProfile(object):
- """
- Describes an I-Sites protein profile/PSSM.
-
- @param background: background amino acid frequencies
- @type background: dict
- @param matrix: initialization array with target frequencies
- @type matrix: dict
- @param alpha: fixed "pseudocount" number for this cluster/profile
- @type alpha: float
- """
-
- BackgroundFreqs = {
- 'A': 0.077049999999999993, 'C': 0.016140000000000002, 'D': 0.058470000000000001,
- 'E': 0.065640000000000004, 'F': 0.041090000000000002, 'G': 0.072179999999999994,
- 'H': 0.023630000000000002, 'I': 0.058400000000000001, 'L': 0.089560000000000001,
- 'K': 0.061460000000000001, 'M': 0.021680000000000001, 'N': 0.045400000000000003,
- 'P': 0.045179999999999998, 'Q': 0.037080000000000002, 'R': 0.049790000000000001,
- 'S': 0.061870000000000001, 'T': 0.055660000000000001, 'V': 0.069790000000000005,
- 'W': 0.014319999999999999, 'Y': 0.035610000000000003,
-
- 'B': 0.051935000000000002, 'Z': 0.051360000000000003, 'X': 1.000000000000000000,
-
- 'O': sys.float_info.min, 'U': sys.float_info.min
- }
-
- def __init__(self, background=BackgroundFreqs, matrix=None, alpha=0.2):
-
- self._matrix = [ ]
- self._pssm = [ ]
- self._maxscore = 0
- self.background = background
- self.alpha = float(alpha)
-
- if matrix:
- for col in matrix:
- self.add_column(**col)
-
-
- def add_column(self, **target_frequencies):
- """
- Append a new column to the profile.
-
- Keyword arguments are used to pass the individual amino acid target frequencies,
- e.g. A=0.12, B=0.0, etc. If an amino acid is omitted, its probability will be
- set automatically to 0.
-
- @param target_frequencies: amino acid frequencies in that column
- @type target_frequencies: dict
-
- @return: the index of the new column
- @rtype: int
-
- @raise KeyError: if the target frequencies dict contains an unknown amino acid
- with respect to C{profile.background}
- """
- if not set(target_frequencies.keys()).issubset(self.background):
- raise KeyError('Unrecognized residue name(s)')
-
- profcol = { }
- pssmcol = { }
-
- for aa in self.background:
-
- assert self.background[aa] > 0
-
- if aa in target_frequencies:
- assert target_frequencies[aa] >= 0
- profcol[aa] = target_frequencies[aa]
- else:
- profcol[aa] = 0.0
-
- Qa = self.background[aa]
- Pja = profcol[aa]
- alpha = self.alpha
-
- pssmcol[aa] = math.log( (Pja + alpha * Qa) / (Qa * (1 + alpha)) )
-
- self._matrix.append(profcol)
- self._pssm.append(pssmcol)
- self._maxscore += max(pssmcol.values())
-
- return len(self._matrix) - 1
-
- @property
- def length(self):
- return len(self._matrix)
-
- @property
- def matrix(self):
- return self._matrix
-
- @property
- def pssm(self):
- return self._pssm
-
- @property
- def max_score(self):
- return self._maxscore
-
-class ISitesFragmentMatch(csb.bio.fragments.FragmentMatch):
-
- def __init__(self, fragment, qstart, qend, probability, rmsd, tm_score, qlength):
-
- self._source = fragment.representative.accession + fragment.representative.chain
- self._start = fragment.representative.structure.residues[1].rank
- self._end = fragment.representative.structure.residues[-1].rank
- assert (self._end - self._start + 1) == fragment.length == (qend - qstart + 1)
-
- super(ISitesFragmentMatch, self).__init__(fragment.id, qstart, qend, probability,
- rmsd, tm_score, qlength)
-
- @property
- def source_id(self):
- return self._source
-
- @property
- def start(self):
- return self._start
-
- @property
- def end(self):
- return self._end
-
-class ProfileMatch(object):
- """
- Describes a profile-query match
-
- @param id: id of the match
- @type id: str
- @param start: start position of the match
- @type start: int
- @param score: score
- @type score: float
- @param rmsd: RMSD between the query and the representative structure of the
- fragment withing the matching region
- @type rmsd: float
- @param rmsd_dih: circular (torsion) RMSD between the query and the representative
- structure of the fragment withing the matching region
- @type rmsd_dih: float
- @param tm_score: TM-Score between the query and the representative
- structure of the fragment withing the matching region
- @type tm_score: float
- @param tm_scoreout: TM-Score, but calculated with local structural alignment
- @type tm_scoreout: float
- """
-
- def __init__(self, id, start, score, rmsd=None, rmsd_dih=None, tm_score=None, tm_scoreout=None):
- self.id = id
- self.start = start
- self.score = score
- self.rmsd = rmsd
- self.rmsd_dih = rmsd_dih
- self.tm_score = tm_score
- self.tm_scoreout = tm_scoreout
-
- def __repr__(self):
- return "<ProfileMatch: {0.id} at {0.start}, s={0.score}, rms={0.rmsd}, tm={0.tm_score},{0.tm_scoreout} >".format(self)
-
- def __lt__(self, other):
- return self.score < other.score
-
- def confidence(self, b0, b1, b2, b3=1.0):
- """
- Convert the raw profile score to probability.
-
- @param b0: C{cluster.linearfit[0]}
- @type b0: float
- @param b1: C{cluster.linearfit[1]}
- @type b1: float
- @param b2: C{cluster.linearfit[2]}
- @type b2: float
- @param b3: C{cluster.linearfit[3]}
- @type b3: float
-
- @return: probability for the hit being a true positive
- @rtype: float
- """
-
- if self.score is None:
- return 0
- return ((1 / math.pi) * math.atan(b0 + b1 * self.score) + b2) * b3
-
-class ProfileMatches(object):
- """
- A collection of profile-database hits.
-
- @param size: target maximum size of the hitlist
- @type size: int
- @param cutoff: inclusion score cutoff
- @type cutoff: float
- @param rmsd_cutoff: inclusion RMSD cutoff
- @type rmsd_cutoff: float
-
- @raise ValueError: on invalid hitlist C{size}
- """
-
- def __init__(self, size=20, cutoff=0.5, rmsd_cutoff=0.5):
- if not size > 0:
- raise ValueError('The maximum size of the matches table must be a positive number.')
-
- self._matches = [ ]
- self.list_size = int(size)
- self.cutoff = float(cutoff)
- self.rmsd_cutoff = float(rmsd_cutoff)
-
- self._sorted = True
-
- def _sort(self):
- if not self._sorted:
- self._matches.sort(key=operator.attrgetter('score'), reverse=True)
- self._sorted = True
-
- def add(self, id, start, score, rmsd=None):
- """
- Add a new match to the collection of hits if its score satisfies the cutoff.
-
- @param id: id of the match
- @type id: str
- @param start: start position of the match
- @type start: int
- @param score: score
- @type score: float
- @param rmsd: RMSD between the query and the representative structure of the
- fragment withing the matching region
- @type rmsd: float
-
- @raise value_error: on invalid start position
- """
- if not start > 0:
- raise ValueError('Match start position must be greater than zero.')
- score = float(score)
- start = int(start)
-
- self._sorted = False
-
- if len(self._matches) < self.list_size:
- if score >= self.cutoff: #or rmsd <= self.rmsd_cutoff:
- match = ProfileMatch(id, start, score, rmsd)
- self._matches.append(match)
- # sort is useless if not all slots are full, leave it for the first time the data is accessed through self.hits
- else:
- best_match = self._matches[-1]
- if score > best_match.score: #or rmsd < best_match.rmsd:
- self._matches.pop()
- match = ProfileMatch(id, start, score, rmsd)
- self._matches.append(match)
- self._sort() # sort is needed in real time if all hit slots are already occupied
-
- @property
- def length(self):
- return len(self._matches)
-
- @property
- def hits(self):
- if self.length > 0:
- self._sort()
- return self._matches
- else:
- return None
-
- @property
- def best_match(self):
- if self.length > 0:
- return self.hits[0]
- else:
- return None
-
-class Hitlist(object):
- """
- I-Site instance hitlist.
-
- @param query: the query fragment
- @type query: L{Cluster}
- @param hits: a list of L{ProfileMatch}es
- @type hits: list
- """
-
- def __init__(self, query, hits):
-
- self.query = query
- self.hits = hits
-
- def __getitem__(self, i):
- return self.hits[i]
-
- def __iter__(self):
- return iter(self.hits)
-
- def best_hit(self):
- """
- @return: the best hit in the list in terms of profile score
- @rtype: L{ProfileMatch}
- """
- best = None
- for hit in self.hits:
- if best is None:
- best = hit
- else:
- if best.score < hit.score or (best.score == hit.score and best.rmsd > hit.rmsd):
- best = hit
- return best
-
- def top(self, n):
- """
- @param n: number of top hits
- @type n: int
-
- @return: top C{n} hits in terms of profile score
- @rtype: list of L{ProfileMatch}
- """
-
- hits = list(self.hits)
- hits.sort()
- return hits[-n:]
-
-class Library(object):
- """
- Representation of an I-Sites peptide fragment library.
- Provides dictionary-like access to the underlying L{Cluster} objects by
- either C{cluster.id} or C{cluster.representative.accesion}.
- """
-
- def __init__(self):
-
- self.name = 'ISITES'
- self.version = None
- self.centroids = None
- self.documentation = None
-
- self._clusters = None
- self._ondemand = False
- self._byid = None
- self._byrep = None
-
- self.__iscfiles = []
-
- def __getitem__(self, item):
- if self._ondemand:
- raise AttributeError("ID/rep-based access to clusters is not available in Delay-parsed libraries.")
-
- if isinstance(item, csb.core.string):
- return self._byrep[item]
- else:
- return self._byid[item]
-
- def __contains__(self, key):
- try:
- self[key]
- return True
- except KeyError:
- return False
-
- @property
- def clusters(self):
- return self._clusters
-
- @clusters.setter
- def clusters(self, clusters):
- if clusters is not None:
- if not isinstance(clusters, collections.Iterable):
- clusters = [clusters]
-# if not isinstance(clusters, collections.Iterator):
-# for c in clusters:
-# assert isinstance(c, Cluster), "The I-Sites Library is constructed of Cluster objects."
-
- self._clusters = clusters
- if isinstance(clusters, collections.Iterator):
- self._ondemand = True
- else:
- self._ondemand = False
- self._byid = { }
- self._byrep = { }
-
- for c in clusters:
- self._byid[c.id] = c
-
- accession = c.representative.accession[:4]
-
- if accession not in self._byrep:
- self._byrep[accession] = [ ]
- self._byrep[accession].append(c)
-
- def compact(self):
- """
- Compact the library by excluding unimportant information:
-
- - documentation
- - clusters: file, program, covariance
-
- @raise AttributeError: on attempt to compact a delay-parsed library
- """
- if self._ondemand:
- raise AttributeError("Delay-parsed libraries cannot be compacted.")
-
- self.documentation = None
- for c in self.clusters:
- c.file = None
- c.program = None
- c.covariance = None
-
- def serialize(self, file):
- """
- Serialize the library to a file.
-
- @param file: output file name
- @type file: str
-
- @raise AttributeError: on attempt to serialize a delay-parsed library
- """
- if self._ondemand:
- raise AttributeError("Delay-parsed libraries cannot be serialized.")
-
- with open(file, 'wb') as output:
- csb.io.Pickle.dump(self, output, protocol=csb.io.Pickle.HIGHEST_PROTOCOL)
-
- @staticmethod
- def deserialize(file):
- """
- Load I-sites from a serialization file.
-
- @param file: source file name
- @type file: str
-
- @return: the deserialized library object
- @rtype: L{Library}
- """
-
- with open(file, 'rb') as output:
- library = csb.io.Pickle.load(output)
-
- return library
-
-class Cluster(object):
- """
- Object representation of an I-Sites library peptide fragment.
- """
-
- def __init__(self):
-
- self.id = None
- self.motiflen = None
- self.profilelen = None
- self.overhang = None
- self.file = None
- self.program = None
- self.mda = None
- self.dme = None
- self.pseudocount = 0.2
- self.linearfit = None
- self.representative = None
- self.covarweight = None
- self.profile = ProteinProfile(ProteinProfile.BackgroundFreqs, alpha=self.pseudocount)
- self.covariance = None
-
- def __lt__(self, other):
- return self.id < other.id
-
- @property
- def isite(self):
- return self.id
-
- @property
- def length(self):
- return self.profile.length
-
- @property
- def has_structure(self):
- if self.representative.structure:
- return True
- return False
-
- def serialize(self, file):
- """
- Serialize the cluster to a file.
-
- @param file: output file name
- @type file: str
- """
-
- with open(file, 'wb') as output:
- csb.io.Pickle.dump(self, output, protocol=csb.io.Pickle.HIGHEST_PROTOCOL)
-
- @staticmethod
- def deserialize(file):
- """
- Load a cluster from a serialized object.
-
- @param file: source file name
- @type file: str
-
- @return: deserialized fragment object
- @rtype: L{Cluster}
- """
-
- with open(file, 'rb') as output:
- cluster = csb.io.Pickle.load(output)
-
- return cluster
-
- def attach_structure(self, pdb_file):
- """
- Parse the paradigm's pdb_file and attach structural data to
- C{cluster.representative} (C{source} and C{structure} attributes).
-
- @param pdb_file: source PDB file containing the structure of the paradigm
- @type pdb_file: str
-
- @raise InternalISitesError: on parse and/or structure assignment errors
- @raise InvalidParadigmError: when the torsion angles in the cluster do not
- match the angles computed from the PDB file
- """
- from csb.bio.io import StructureParser
- rep = self.representative
- s = StructureParser(pdb_file).parse_structure()
-
- if not rep.chain:
- if s.chains.length > 1:
- raise InternalISitesError('Cluster {0} does not define any chain, but multiple choices exist.'.format(self.id))
- else:
- rep.chain = s.chains.keys()[0]
- if rep.chain not in s.chains:
- raise InternalISitesError('Structure {0} does not contain chain {1}'.format(rep.accession, rep.chain))
-
- s.chains[rep.chain].compute_torsion()
- start_residue = s.chains[rep.chain].find(rep.start)
-
- current_rank = start_residue.rank
- for i, rep_angles in enumerate(rep.angles, start=rep.angles.start_index):
-
- current_residue = s.chains[rep.chain].residues[current_rank]
- torsion_delta = 0
-
- if current_residue.torsion.phi:
- torsion_delta += abs(current_residue.torsion.phi - rep_angles.phi)
- if current_residue.torsion.psi:
- torsion_delta += abs(current_residue.torsion.psi - rep_angles.psi)
-
- assert (i + rep.start - 1) == int(current_residue.id)
-
- if torsion_delta > 1:
- raise InvalidParadigmError('delta_phi + delta_psi for '
- '{3.accession} {3.chain}[{4}] is {0}: phi: {1.phi} vs {2.phi}, psi: {1.psi} vs {2.psi}'.format(
- torsion_delta, current_residue.torsion, rep_angles, rep, current_rank))
- current_rank += 1
-
- start = start_residue.rank - self.overhang
- stop = start + self.profile.length - 1
-
- if start < s.chains[rep.chain].residues.start_index or stop > s.chains[rep.chain].residues.last_index:
- raise InternalISitesError('With the ovehangs included, fragment '
- '{0} exceeds the boundaries of the chain: {1}..{2} vs {3.start_index}..{3.last_index}'.format(
- self.id, start, stop, s.chains[rep.chain].residues))
-
- rep.structure = s.chains[rep.chain].subregion(start, stop)
-
- assert rep.structure.length == self.profile.length
-
- backbone = set(['N', 'CA', 'C'])
- for residue in rep.structure.residues:
- if not (residue.has_structure and set(residue.atoms).issuperset(backbone)):
- rep.structure = None
- raise InternalISitesError('Fragment {0} is built on discontinuous structure: {1} {4} at {2}-{3}'.format(
- self.id, rep.accession, start, stop, residue))
-
- rep.source = ChainSequence()
- rep.source.sequence = s.chains[rep.chain].sequence
- rep.source.accession = s.accession
- rep.source.id = rep.chain
- rep.source.type = s.chains[rep.chain].type
- rep.source.torsion = s.chains[rep.chain].torsion
-
- def sequence_similarity(self, sequence, start=0):
- """
- Compute the log-odds score of C{sequence} matching C{self}'s profile.
-
- @param sequence: subject sequence
- @type sequence: str
- @param start: do the comparison starting with this offset in the sequence
- @type start: int
-
- @return: log-odds score
- @rtype: float
- """
- score = 0
- window = self.profile.length
-
- sequence = sequence[start : start + window]
- assert len(sequence) == self.profile.length
-
- for column, aa in enumerate(sequence):
- aa = aa.upper()
- if aa in self.profile.pssm[column]:
- score += self.profile.pssm[column][aa]
- else:
- raise ValueError('Unknown residue {0}'.format(aa))
-
- return score
-
- def profile_similarity(self, profile, start=0):
- """
- Compute the similarity score of C{profile} matching C{self}'s profile (see
- Bystroff 2001, 2008).
-
- @param profile: subject profile, built with alpha=0.5
- @type profile: L{ProteinProfile}
- @param start: do the comparison starting with this offset in the subject
- @type start: int
-
- @return: similarity score calculated with Bystroff's metric
- @rtype: float
-
- @raise ValueError: on mismatch in the amino acid content at a given column
- in both profiles; or when the subject has not been created
- with alpha=0.5
- """
- if profile.alpha != 0.5:
- raise ValueError('The target profile must be built with alpha=0.5')
-
- score = 0
- window = self.profile.length
-
- for column in range(0, window):
-
- if set(self.profile.pssm[column]) != set(profile.pssm[column + start]):
- raise ValueError('Both profiles must contain identical amino acids at a given column')
-
- for aa in self.profile.pssm[column]:
- score += self.profile.pssm[column][aa] * profile.pssm[column + start][aa]
-
- return score
-
- def cross_entropy(self, profile, start=0):
- """
- Compute the similarity score of C{profile} matching C{self}'s profile using
- the cross entropy method.
-
- @param profile: subject profile
- @type profile: L{ProteinProfile}
- @param start: do the comparison starting with this offset in the subject
- @type start: int
-
- @return: the cross entropy between the profiles
- @rtype: float
- """
- score = 0
- window = self.profile.length
-
- for column in range(0, window):
- for aa in self.profile.pssm[column]:
- score += self.profile.pssm[column][aa] * math.exp( profile.pssm[column + start][aa] )
-
- return score
-
- def sum_of_odds(self, profile, start=0):
- """
- Compute the similarity score of C{profile} matching C{self}'s profile using
- the log-sum-of-odds method (Soeding).
-
- @param profile: subject profile
- @type profile: L{ProteinProfile}
- @param start: do the comparison starting with this offset in the subject
- @type start: int
-
- @return: the log-sum-of-odds similarity between the profiles
- @rtype: float
- """
- score = 1
- window = self.profile.length
-
- for column in range(0, window):
-
- dot_product = 0
-
- for aa in self.profile.pssm[column]:
- q_emission = self.profile.matrix[column][aa]
- s_emission = profile.matrix[column + start][aa]
- dot_product += ((q_emission * s_emission) / self.profile.background[aa])
-
- score *= dot_product
-
- return math.log(score or sys.float_info.min)
-
- def city_block(self, profile, start=0):
- """
- Compute the similarity score of C{profile} matching C{self}'s profile using
- the exponential city block metric (see Baker 1995).
-
- @param profile: subject profile
- @type profile: L{ProteinProfile}
- @param start: do the comparison starting with this offset in the subject
- @type start: int
-
- @return: the city block similarity between the profiles
- @rtype: float
- """
-
- score = 0
- window = self.profile.length
-
- for column in range(0, window):
- for aa in self.profile.pssm[column]:
- diff = math.exp( -self.profile.matrix[column][aa] ) - math.exp( -profile.matrix[column + start][aa] )
- score += abs(diff)
-
- return score
-
- def fragment_comparer(self, mode=FragmentMatching.Bystroff):
- """
- Return a callable fragment comparer that implements the specified
- comparison mode.
-
- @param mode: specifies the desired comparison method to be implemented
- by the callable - a member of the L{FragmentMatching} enum
- @type mode: L{csb.core.EnumItem}
-
- @return: the proper comparison function which implements fragment comparison
- with the C{mode} method
- @rtype: function
-
- @raise ValueError: on invalid C{mode} specified
- """
- if mode == FragmentMatching.Bystroff:
- return self.profile_similarity
- elif mode == FragmentMatching.CrossEntropy:
- return self.cross_entropy
- elif mode == FragmentMatching.Soeding:
- return self.sum_of_odds
- elif mode == FragmentMatching.Sequence:
- return self.sequence_similarity
- elif mode == FragmentMatching.Baker:
- return self.city_block
- else:
- raise ValueError(mode)
-
- def slide_over(self, target, mode=FragmentMatching.Bystroff):
- """
- Find profile and structural matches by sliding the fragment over a specified
- C{target} chain.
-
- @param target: subject chain to be scanned to fragment instances
- @type target: L{structure.Chain}
- @param mode: fragment instance searching mode - a L{FragmentMatching} member
- @type mode: L{csb.core.EnumItem}
-
- @return: a list of L{ProfileMatch}es
- @rtype: list
- """
-
- compare = self.fragment_comparer(mode)
-
- if hasattr(target, 'entry_id'):
- entry_id = target.entry_id
- else:
- entry_id = target.id
-
- window = self.profile.length
- rmsd_window = window - 2 * self.overhang
- matches = [ ]
- query_residues = structure.Chain(self.representative.chain,
- residues=self.representative.structure.residues[self.overhang + 1: -self.overhang])
- target_sequence = target.sequence # caching these values for efficiency
-
- for i in range(0, len(target_sequence) - window + 1):
-
- start = i + 1
- rmsd_start = start + self.overhang
- score, tm, tm_out, rmsd_ca, rmsd_dih = 0, None, None, None, None
-
- if mode == FragmentMatching.Sequence:
- score = compare(target_sequence, start=i)
- else:
- score = compare(target.profile, start=i)
-
- subject_residues = structure.Chain(target.id, residues=target.residues[rmsd_start : rmsd_start + rmsd_window])
-
- try:
- rmsd_ca = query_residues.rmsd(subject_residues)
- except structure.Broken3DStructureError:
- rmsd_ca = None
- pass
-
- if score is not None and (rmsd_ca is not None or rmsd_dih is not None):
- match = ProfileMatch(entry_id, start, score, rmsd_ca, rmsd_dih, tm, tm_out)
- matches.append(match)
-
- return matches
-
- def slide_over_fast(self, target, mode=FragmentMatching.Bystroff):
- """
- Find profile matches only (do not compute RMSD) by sliding the fragment
- over a specified C{target} chain.
-
- @param target: subject chain to be scanned to fragment instances
- @type target: L{structure.Chain}
- @param mode: fragment instance searching mode - a L{FragmentMatching} member
- @type mode: L{csb.core.EnumItem}
-
- @return: a list of L{ProfileMatch}es
- @rtype: list
- """
- if hasattr(target, 'entry_id'):
- entry_id = target.entry_id
- else:
- entry_id = target.id
-
- compare = self.fragment_comparer(mode)
-
- window = self.profile.length
- matches = [ ]
-
- for i in range(0, target.profile.length - window + 1):
-
- start = i + 1
- score = None
-
- if mode == FragmentMatching.Sequence:
- score = compare(target.sequence, start=i)
- else:
- score = compare(target.profile, start=i)
-
- if score is not None:
- match = ProfileMatch(entry_id, start, score)
- matches.append(match)
-
- return matches
-
- def to_hmm(self):
- """
- Convert this fragment into a profile HMM segment. Each column in the
- fragment's profile becomes a Match state with equivalent emissions.
- However, all Match states are connected with a fixed transition
- probability of 100%.
-
- @return: an HMM wrapper around the cluster's profile
- @rtype: L{hmm.ProfileHMM}
- """
-
- factory = hmm.StateFactory()
-
- p = hmm.ProfileHMM(hmm.ScoreUnits.Probability)
- p.family = 'is' + str(self.id)
- p.name = 'is' + str(self.id)
- p.id = str(self.id)
- p.pseudocounts = False
- p.version = '1.6'
-
- background = dict([ (aa, self.profile.background[aa])
- for aa in 'ACDEFGHIKLMNPQRSTVWY' ])
-
- for i, row in enumerate(self.profile.matrix, start=1):
-
- row = dict([ (aa, row[aa]) for aa in 'ACDEFGHIKLMNPQRSTVWY' ])
- residue = self.representative.structure.residues[i]
-
- match = factory.create_match(row, background)
- match.rank = i
-
- layer = hmm.HMMLayer(i, residue)
- layer.append(match)
-
- p.layers.append(layer)
-
- for layer in p.layers:
-
- match = layer[hmm.States.Match]
-
- if layer.rank < p.layers.last_index:
- next_match = p.layers[layer.rank + 1][hmm.States.Match]
- tran = hmm.Transition(match, next_match, 1.0)
- match.transitions.append(tran)
-
- last_tran = hmm.Transition(p.layers[-1][hmm.States.Match], p.end, 1.0)
- p.layers[-1][hmm.States.Match].transitions.append(last_tran)
-
- start_tran = hmm.Transition(p.start, p.layers[1][hmm.States.Match], 1.0)
- p.start.transitions.append(start_tran)
-
- p.length.matches = p.layers.length
- p.length.layers = p.layers.length
- p.effective_matches = 10
-
- rep_sequence = ''.join([ str(l.residue.type) for l in p.layers ])
- seq = csb.bio.sequence.Sequence('dummy', '>dummy', rep_sequence,
- type=csb.bio.sequence.SequenceTypes.Protein)
- p.alignment = csb.bio.sequence.A3MAlignment([seq])
-
- return p
-
- @staticmethod
- def from_hmm(source, id, start=None, end=None, rep_accession=None,
- rep_chain=None, alpha=0.2):
- """
- Build a fragment from a Profile HMM.
-
- @param source: the HMM to convert
- @type source: L{hmm.ProfileHMM}, L{hmm.ProfileHMMSegment}
- @param id: ID of the new fragment
- @type id: int
- @param start: start position (optional)
- @type start: int
- @param end: end position (optional)
- @type end: int
- @param rep_accession: accession number for I{cluster.representative}
- @type rep_accession: str
- @param rep_chain: chain ID for I{cluster.representative}
- @type rep_chain: str
- @param alpha: alpha pseudocount for I{cluster.profile.pssm}
- @type alpha: float
-
- @return: an I-Sites cluster wrapper
- @rtype: L{Cluster}
-
- @raise TypeError: when the HMM's structural data is missing or
- interrupted
- """
-
- if not source.has_structure:
- raise TypeError('This HMM has no structural data assigned and '
- 'cannot be converted.')
-
- if start is None:
- start = source.layers.start_index
- if end is None:
- end = source.layers.last_index
-
- score_units = source.score_units
- source.convert_scores(hmm.ScoreUnits.Probability)
-
- background = dict(ProteinProfile.BackgroundFreqs)
- for aa in source.layers[1][hmm.States.Match].emission:
- background[str(aa)] = source.layers[1][hmm.States.Match].background[aa]
-
- cluster = Cluster()
- cluster.id = id
- cluster.overhang = 0
- cluster.profile = ProteinProfile(background, alpha=alpha)
-
- residues = []
- for layer in source.layers:
-
- residues.append(csb.core.deepcopy(layer.residue))
-
- if start <= layer.rank <= end:
- if not layer.residue.has_structure:
- raise TypeError(
- "The HMM's structure is interrupted at layer {0}.".format(
- layer.rank))
-
- emission = {}
-
- for aa in layer[hmm.States.Match].emission:
- emission[str(aa)] = layer[hmm.States.Match].emission[aa]
-
- cluster.profile.add_column(**dict(emission))
-
- cluster.profilelen = cluster.profile.length
- cluster.motiflen = cluster.profile.length - 2 * cluster.overhang
-
- if rep_accession is None:
- rep_accession = source.id[1:5].lower()
- if rep_chain is None:
- rep_chain = source.id[5].upper()
-
- chain = structure.Chain(rep_chain, structure.SequenceTypes.Protein,
- None, residues, rep_accession)
- chain.compute_torsion()
-
- src = ChainSequence()
- src.sequence = chain.sequence
- src.accession = chain.accession
- src.id = chain.id
- src.type = chain.type
- src.torsion = chain.torsion
-
- cluster.representative = RepStructureFragment(chain.accession,
- chain.id, start)
- cluster.representative.source = src
- cluster.representative.structure = chain.subregion(start, end)
- cluster.representative.angles = cluster.representative.structure.torsion
-
- assert cluster.representative.angles.length == cluster.motiflen
- assert cluster.profile.length == cluster.representative.structure.length
- assert cluster.representative.angles.length == (cluster.profile.length -
- 2 * cluster.overhang)
-
- source.convert_scores(score_units)
-
- return cluster
-
-class RepStructureFragment(object):
- """
- Container class which describes an I-Sites paradigm structure for a cluster.
-
- @param accession: paradigm's accession number
- @type accession: str
- @param chain: chain ID
- @type chain: str
- @param start: the start position in the paradigm's chain where the fragment
- instance is located (including the overhangs). In the original
- library this is a PDB sequence number, not a SEQRES rank. The
- residue rank can be retrieved from the structure instead:
-
- >>> cluster.representative.structure.find(start).rank
- residue rank
-
- @type start: int
- @param angles: torsion angles of the paradigm, in degrees
- @type angles: L{structure.TorsionAnglesCollection}
- @param source: the entire paradigm chain with pre-computed torsion angles
- @type source: L{ChainSequence}
- @param struct: a segment of the paradigm's L{structure.Chain} which
- contains the I-Site. Residue ranks and IDs are preserved
- as they were in the original chain (no shifting)
- @type struct: L{structure.Chain}
- """
-
- def __init__(self, accession, chain, start, angles=None, source=None, struct=None):
-
- self.accession = accession
- self.chain = chain
- self.start = start
- self._source = None
- self._angles = structure.TorsionAnglesCollection()
- self._structure = None
-
-
- if angles is not None:
- self.angles = angles
- if source is not None:
- self.source = source
- if struct is not None:
- self.structure = struct
-
- @property
- def angles(self):
- return self._angles
- @angles.setter
- def angles(self, angles):
- self._angles.update(angles)
-
- @property
- def source(self):
- return self._source
- @source.setter
- def source(self, chain):
- if type(chain) not in (structure.Chain, ChainSequence):
- raise TypeError("The source property must be a Chain or ChainSequence instance with torsion property pre-calculated.")
- self._source = chain
-
- @property
- def structure(self):
- return self._structure
- @structure.setter
- def structure(self, chain_fragment):
- if not isinstance(chain_fragment, structure.Chain):
- raise TypeError("The structure property must be a Chain instance.")
- self._structure = chain_fragment
diff --git a/csb/bio/io/__init__.py b/csb/bio/io/__init__.py
index b372085..713482a 100644
--- a/csb/bio/io/__init__.py
+++ b/csb/bio/io/__init__.py
@@ -7,7 +7,6 @@ from csb.bio.io.hhpred import HHpredOutputParser, HHpredProfileParser
from csb.bio.io.clans import ClansParser, ClansFileWriter
from csb.bio.io.wwpdb import StructureParser, AsyncStructureParser, PDBHeaderParser
from csb.bio.io.fasta import SequenceParser, PDBSequenceParser
-from csb.bio.io.isites import ISitesParser
from csb.bio.io.dssp import DSSPParser, StrideParser
__all__ = ['HHOutputParser', 'HHProfileParser', 'ClansParser',
diff --git a/csb/bio/io/isites.py b/csb/bio/io/isites.py
deleted file mode 100644
index 84bcd79..0000000
--- a/csb/bio/io/isites.py
+++ /dev/null
@@ -1,326 +0,0 @@
-"""
-I-Sites fragment library parser.
-
- at deprecated: legacy module.
-"""
-
-import os
-import re
-import csb.io
-import csb.bio.structure as structure
-import csb.bio.fragments.isites as isites
-
-class Tags(object):
- """
- Enumeration of I-Sites flat file tags.
- """
-
- LIBRARY = 'LIBRARY'
- REMARK = 'REMARK'
- CENTROID = 'CENTROID'
- CLUSTER = 'CLUSTER'
- FROM = 'FROM'
- CREATEDBY = 'CREATEDBY'
- MDACUT = 'MDACUT'
- DMECUT = 'DMECUT'
- PSEUDOCOUNT = 'PSEUDOCOUNT'
- LINEARFIT = 'LINEARFIT'
- COVARWEIGHT = 'COVARWEIGHT'
- PARADIGM = 'PARADIGM'
- ANGLES = 'ANGLES'
- PROFILE = 'PROFILE'
- COVARTENSOR = 'COVARTENSOR'
- END = 'END'
-
-class ISitesParser(object):
- """
- Implements an I-Sites fragment library parser v.5.1+ (2008).
-
- @param flatfile: input *.isl I-Sites library file name
- @type flatfile: str
- @param express: if True, speeds up the parser by ignoring the covariance tensors
- @type express: bool
-
- @raise IOError: when the source file cannot be found
- """
-
- def __init__(self, flatfile, express=False):
- if not os.path.exists(flatfile):
- raise IOError("Could not read file {0}".format(flatfile))
-
- self._flatfile = flatfile
- self.express = bool(express)
- self._streams = [ ]
- self._library = None
-
- def __enter__(self):
- return self
-
- def __exit__(self, exc_type, exc_value, traceback):
- for s in self._streams:
- try:
- s.close()
- except:
- pass
-
- def __del__(self):
- for s in self._streams:
- try:
- s.close()
- except:
- pass
-
- def _newstream(self):
-
- s = open(self._flatfile, mode='r')
- self._streams.append(s)
- return s
-
- @property
- def library(self):
- """
- Return the general properties of the library.
- Library Clusters are iterable, but read efficiently only on demand.
- """
- return self.parse()
-
- @property
- def clusters(self):
- """
- Efficient iterator over all L{Cluster} objects.
- """
- reader = csb.io.EntryReader(self._newstream(), Tags.CLUSTER, Tags.END)
-
- for entry in reader.entries():
- yield self.parse_entry(entry)
-
- def parseall(self):
- """
- Parse the whole library to end.
-
- @return: object representation of the library with all clusters pre-parsed
- @rtype: L{Library}
- """
- library = self.parse()
- library.clusters = list(self.clusters)
- return library
-
- def parse(self):
- """
- Parse I-sites library common/general properties. Clusters are not parsed,
- but can be fetched on demand while iterating over C{library.clusters}.
-
- @return: object representation of the library with a
- bound clusters generator
- @rtype: L{Library}
- """
-
- library = isites.Library()
- library.centroids = [ ]
- library.documentation = ''
- library.clusters = self.clusters
-
- stream = self._newstream()
- done = False
-
- while not done:
-
- line = stream.readline()
- if not line:
- done = True
- break
-
- if line.startswith(Tags.REMARK) and line[len(Tags.REMARK):].strip() in ('===== start of library =====', '====== start of clusters ======='):
- done = True
- break
-
- elif line.startswith(Tags.LIBRARY):
- fields = line.split()[1:]
- if len(fields) > 1:
- library.name, library.version = fields[0], fields[1]
- else:
- library.version = fields[0]
-
- elif line.startswith(Tags.CENTROID):
- fields = line.split()
- index = int(fields[1])
- values = fields[2].split(',')
-
- matrow = { }
-
- for cn, aa in enumerate('ACDEFGHIKLMNPQRSTVWY'):
- matrow[aa] = float(values[cn])
-
- if index > len(library.centroids) - 1: # This is because in the current version CENTROIDS appears twice by mistake
- assert index == len(library.centroids), "Centroid offset indices are consecutive numbers, starting from 0."
- library.centroids.append(matrow)
-
- elif line.startswith(Tags.REMARK):
- library.documentation += line[len(Tags.REMARK)+1:]
-
- stream.close()
-
- return library
-
- def parse_entry(self, entry):
- """
- Parse a single I-Sites entry.
-
- @return: object representation of the entry
- @rtype: L{Cluster}
- """
- cluster = isites.Cluster()
- lines = iter(entry.splitlines())
-
- done = False
- in_profile = False
- in_tensor = False
-
- while not done:
- try:
- line = next(lines)
- except StopIteration:
- done = True
- break
-
- if line.startswith(Tags.CLUSTER):
- fields = line.split()[1:]
- cluster.id, cluster.motiflen, cluster.profilelen, cluster.overhang = map(int, fields)
-
- elif line.startswith(Tags.FROM):
- cluster.file = line.split()[1]
-
- elif line.startswith(Tags.CREATEDBY):
- cluster.program = line[len(Tags.CREATEDBY)+1:].strip()
-
- elif line.startswith(Tags.MDACUT):
- field = line.split()[1]
- cluster.mda = float(field)
-
- elif line.startswith(Tags.DMECUT):
- field = line.split()[1]
- cluster.dme = float(field)
-
- elif line.startswith(Tags.PSEUDOCOUNT):
- field = line.split()[1]
- cluster.pseudocount = float(field)
- assert cluster.pseudocount > 0
-
- elif line.startswith(Tags.LINEARFIT):
- fields = line.split()[1:]
- cluster.linearfit = tuple(map(float, fields))
-
- elif line.startswith(Tags.COVARWEIGHT):
- field = line.split()[1]
- cluster.covarweight = float(field)
-
- elif line.startswith(Tags.PARADIGM):
- fields = line.split()[1:]
- cluster.representative = isites.RepStructureFragment(fields[0], fields[1], int(fields[2]))
- if fields[1] == '_':
- cluster.representative.chain = ''
-
- elif line.startswith(Tags.ANGLES):
- rn = -1
- while True:
- try:
- subline = next(lines)
- except StopIteration:
- break
- if subline.startswith(Tags.PROFILE):
- in_profile = True
- break
- elif subline.startswith(Tags.END):
- break
-
- rn += 1
- fields = subline.split()
- angles = tuple(map(float, fields[1:]))
-
- torsion = structure.TorsionAngles(angles[0], angles[1], angles[2], units=structure.AngleUnits.Degrees)
- j = cluster.representative.angles.append(torsion)
-
- assert rn == j-1 == int(fields[0]), "Angle offsets in a cluster are consecutive numbers, starting at 0."
-
- elif line.startswith(Tags.PROFILE):
- in_profile = True
-
- elif in_profile:
- cluster.profile = isites.ProteinProfile(isites.ProteinProfile.BackgroundFreqs, alpha=cluster.pseudocount)
- rn = -1
- subline = line
-
- while True:
- if subline.startswith(Tags.CREATEDBY) or subline.startswith(Tags.END):
- in_profile = False
- break
- elif subline.startswith(Tags.COVARTENSOR):
- in_tensor = True
- in_profile = False
- break
-
- rn += 1
- fields = subline.split()
-
- assert rn == int(fields[0]), "ProteinProfile rows in a cluster are consecutive numbers," \
- + " starting from 0 (cluster {0}, profile row {1}/{2}).".format(cluster.id, rn, fields[0])
- column = { }
- for cn, aa in enumerate('ACDEFGHIKLMNPQRSTVWY'):
- column[aa] = float(fields[cn+1])
- cluster.profile.add_column(**column)
-
- try:
- subline = next(lines)
- except StopIteration:
- in_profile = False
- break
-
- assert cluster.profilelen == cluster.profile.length
-
- elif line.startswith(Tags.COVARTENSOR):
- in_tensor = True
-
- elif in_tensor:
- if self.express:
- break
-
- motiflen = cluster.motiflen
- # cluster.covariance = [ [ [] ]*motiflen for ii in range(0, motiflen) ]
- cluster.covariance = [ ]
- for mi in range(0, motiflen):
- cluster.covariance.append([])
- for mj in range(0, motiflen): #@UnusedVariable
- cluster.covariance[mi].append([])
-
- rn = -1
- i = j = -1
- subline = line
- dimline = re.compile('^[0-9]+\s+[0-9]+\s*$')
-
- while True:
- if subline.startswith(Tags.END):
- in_tensor = False
- break
-
- rn += 1
- fields = subline.split()
-
- if re.match(dimline, subline):
- istr, jstr = subline.split()
- i, j = int(istr) - 1, int(jstr) - 1
- assert 0 <= i < motiflen and 0 <= j < motiflen, "Covariance is a [motiflen x motiflen] matrix."
- else:
- values = list(map(float, subline.split()))
- cluster.covariance[i][j].append(values)
-
- try:
- subline = next(lines)
- except StopIteration:
- in_tensor = False
- break
-
- elif line.startswith(Tags.END):
- done = True
- break
-
- return cluster
diff --git a/csb/statistics/maxent.py b/csb/statistics/maxent.py
index afa6620..545f38f 100644
--- a/csb/statistics/maxent.py
+++ b/csb/statistics/maxent.py
@@ -5,10 +5,10 @@ Reference: Rowicka and Otwinowski 2004
import numpy
-from csb.statistics.pdf import AbstractDensity
+from csb.statistics.pdf import BaseDensity
-class MaxentModel(AbstractDensity):
+class MaxentModel(BaseDensity):
"""
Fourier expansion of a biangular log-probability density
"""
diff --git a/csb/statistics/pdf/__init__.py b/csb/statistics/pdf/__init__.py
index 9d256e7..1a90c07 100644
--- a/csb/statistics/pdf/__init__.py
+++ b/csb/statistics/pdf/__init__.py
@@ -265,11 +265,12 @@ class GumbelMaxMomentsEstimator(AbstractEstimator):
return GumbelMaximum(mu, beta)
-
+
class AbstractDensity(object):
"""
Defines the interface and common operations for all probability density
- functions.
+ functions. This is a generic class which can operate on parameters of
+ any type (e.g. simple floats or custom parameter objects).
Subclasses must complete the implementation by implementing the
L{AbstractDensity.log_prob} method. Subclasses could also consider--but
@@ -281,8 +282,8 @@ class AbstractDensity(object):
discouraged.
"""
- __metaclass__ = ABCMeta
-
+ __metaclass__ = ABCMeta
+
def __init__(self):
@@ -300,17 +301,18 @@ class AbstractDensity(object):
def __setitem__(self, param, value):
- if param in self._params:
- if csb.core.iterable(value):
- value = array(value)
- else:
- value = float(value)
-
+ if param in self._params:
self._validate(param, value)
- self._params[param] = value
+ self._set(param, value)
else:
raise ParameterNotFoundError(param)
+ def _set(self, param, value):
+ """
+ Update the C{value} of C{param}.
+ """
+ self._params[param] = value
+
@property
def estimator(self):
return self._estimator
@@ -418,7 +420,27 @@ class AbstractDensity(object):
except ParameterValueError as e:
raise EstimationFailureError(e.param, e.value)
-class Laplace(AbstractDensity):
+
+class BaseDensity(AbstractDensity):
+ """
+ Base abstract class for all PDFs, which operate on simple float
+ or array-of-float parameters. Parameters of any other type will trigger
+ TypeError-s.
+ """
+
+ def _set(self, param, value):
+
+ try:
+ if csb.core.iterable(value):
+ value = array(value)
+ else:
+ value = float(value)
+ except ValueError:
+ raise TypeError(value)
+
+ super(BaseDensity, self)._set(param, value)
+
+class Laplace(BaseDensity):
def __init__(self, mu=0, b=1):
@@ -463,7 +485,7 @@ class Laplace(AbstractDensity):
return numpy.random.laplace(loc, scale, size)
-class Normal(AbstractDensity):
+class Normal(BaseDensity):
def __init__(self, mu=0, sigma=1):
@@ -503,7 +525,7 @@ class Normal(AbstractDensity):
return numpy.random.normal(mu, sigma, size)
-class InverseGaussian(AbstractDensity):
+class InverseGaussian(BaseDensity):
def __init__(self, mu=1, shape=1):
@@ -563,7 +585,7 @@ class InverseGaussian(AbstractDensity):
return m * X + (1 - m) * mu ** 2 / X
-class GeneralizedNormal(AbstractDensity):
+class GeneralizedNormal(BaseDensity):
def __init__(self, mu=0, alpha=1, beta=1):
@@ -610,7 +632,7 @@ class GeneralizedNormal(AbstractDensity):
return log(beta / (2.0 * alpha)) - gammaln(1. / beta) - power(fabs(x - mu) / alpha, beta)
-class GeneralizedInverseGaussian(AbstractDensity):
+class GeneralizedInverseGaussian(BaseDensity):
def __init__(self, a=1, b=1, p=1):
super(GeneralizedInverseGaussian, self).__init__()
@@ -691,7 +713,7 @@ class GeneralizedInverseGaussian(AbstractDensity):
return numpy.array(rvs)
-class Gamma(AbstractDensity):
+class Gamma(BaseDensity):
def __init__(self, alpha=1, beta=1):
super(Gamma, self).__init__()
@@ -731,7 +753,7 @@ class Gamma(AbstractDensity):
def random(self, size=None):
return numpy.random.gamma(self['alpha'], 1 / self['beta'], size)
-class InverseGamma(AbstractDensity):
+class InverseGamma(BaseDensity):
def __init__(self, alpha=1, beta=1):
super(InverseGamma, self).__init__()
@@ -825,7 +847,7 @@ class MultivariateGaussian(Normal):
return MultivariateGaussian((mu, Sigma))
-class Dirichlet(AbstractDensity):
+class Dirichlet(BaseDensity):
def __init__(self, alpha):
super(Dirichlet, self).__init__()
@@ -853,7 +875,7 @@ class Dirichlet(AbstractDensity):
return numpy.random.mtrand.dirichlet(self.alpha, size)
-class GumbelMinimum(AbstractDensity):
+class GumbelMinimum(BaseDensity):
def __init__(self, mu=0, beta=1):
super(GumbelMinimum, self).__init__()
diff --git a/csb/statistics/pdf/parameterized.py b/csb/statistics/pdf/parameterized.py
new file mode 100644
index 0000000..951ba97
--- /dev/null
+++ b/csb/statistics/pdf/parameterized.py
@@ -0,0 +1,346 @@
+"""
+Probability density functions with support for shared and computed parameters.
+
+This module extends the functionality of L{csb.statistics.pdf} with a specialized
+and more sophisticated L{AbstractDensity} -- the L{ParameterizedDensity}, which
+works with L{AbstractParameter} objects rather than simple floats.
+
+Each L{AbstractParameter} holds two properties - L{AbstractParameter.name} and
+L{AbstractParameter.value}:
+
+ >>> class Sigma(AbstractParameter):
+ >>> def _validate(self, value):
+ >>> return float(value)
+ >>> def _compute(self, base_value):
+ >>> return 1.0 / base_value ** 0.5
+ >>>
+ >>> sigma = Sigma(3)
+ >>> sigma.name, sigma.value
+ sigma, 3
+
+L{AbstractParameter}s holding a single float value are indistinguishable from
+the simple float parameters used in L{csb.statistics.pdf.BaseDensity}.
+However, each L{AbstractParameter} supports the concept of binding:
+
+ >>> sigma.is_virtual
+ False
+ >>> precision = Precision(1)
+ >>> sigma.bind_to(precision)
+ >>> sigma.is_virtual
+ True
+ >>> precision.set(2) # triggers an implicit, lazy update in sigma
+ >>> sigma.set(1)
+ ParameterizationError: Virtual parameters can't be updated explicitly
+
+The instance of Sigma is now a virtual parameter which receives automatic updates
+from another base parameter using a predefined rule (L{AbstractParameter._compute}).
+This is a lazy, on demand process. As soon as Sigma's computed value is
+requested (via C{s.value}), Sigma will query the parameter it depends on
+(Precision), which in turn will get recomputed first based on its own base, etc.
+Thus, the L{AbstractParameter} model supports a parameter dependency chain with
+linear or tree-like topologies::
+
+ sigma -- y
+ /
+ precision -- sigma2 -- x
+
+In this graph precision is a base (non-virtual) parameter and sigma, sigma2, x, and y
+are all virtual (computed). Binding precision to another parameter will immediately
+turn it into a virtual one. However, cycles are not allowed (e.g. precision can't
+be bound to sigma2 or x) and each virtual parameter must have exactly one base.
+
+Computed parameters can then be used to implement custom PDFs with dependent
+parameters within one PDF or spanning multiple PDFs.
+"""
+
+import csb.statistics.pdf as pdf
+
+from abc import abstractmethod
+
+
+class ParameterizationError(ValueError):
+ pass
+
+class ParameterValueError(pdf.ParameterValueError):
+ pass
+
+
+class ParameterizedDensity(pdf.AbstractDensity):
+ """
+ Base abstract class for all PDFs, which operate on simple or computed
+ (chained) parameters. Parameters of type different from L{AbstractParameter}
+ will trigger TypeError-s.
+ """
+
+ def _set(self, param, value):
+
+ if not isinstance(value, AbstractParameter):
+ raise TypeError(value)
+
+ super(ParameterizedDensity, self)._set(param, value)
+
+
+class AbstractParameter(object):
+ """
+ Abstract parameterization, which can exist independently or be coupled
+ to other parameters upon request. Virtual/coupled/derived parameters cannot
+ be overwritten explicitly, but their values will get recomputed once their
+ corresponding base parameters get updated. This is a lazy process - parameter
+ recalculation happens only when an out of date parameter is requested. This
+ triggers a real-time cascaded update which affects all parameters from the
+ nearest consistent base down to the current inconsistent node.
+
+ Implementing subclasses must override L{AbstractParameter._validate}
+ and virtual parameters should additionally override L{AbstractParameter._compute}.
+
+ @param value: initial value (defaults to None / AbstractParameter.NULL)
+ @type value: object
+ @param name: name of parameter (this is the name of the class by default)
+ @type name: str
+ @param base: optional base parameter to compute this instance from
+ @type base: L{AbstractParameter}
+ """
+
+ NULL = None
+
+ def __init__(self, value=NULL, name=None, base=None):
+
+ self._derivatives = set()
+ self._base = None
+ self._consistent = True
+
+ if name is None:
+ name = self.__class__.__name__.lower()
+
+ self._name = str(name)
+ self._value = AbstractParameter.NULL
+
+ self._update(value)
+
+ if base is not None:
+ self.bind_to(base)
+
+ @property
+ def name(self):
+ """
+ Parameter name
+ """
+ return self._name
+
+ @property
+ def value(self):
+ """
+ Parameter value (guaranteed to be up to date)
+ """
+ self._ensure_consistency()
+ return self._value
+
+ @property
+ def is_virtual(self):
+ """
+ True if this parameter is virtual (computed)
+ """
+ return self._base is not None
+
+ def set(self, value):
+ """
+ Update the value of this parameter. This is not possible for
+ virtual parameters.
+
+ @param value: new value
+ @type value: object
+
+ @raise ParameterizationError: if this is a virtual parameter
+ @raise ParameterValueError: on invalid value
+ """
+ if self.is_virtual:
+ raise ParameterizationError(
+ "Virtual parameters can't be updated explicitly")
+
+ self._update(value)
+
+ self._invalidate()
+ self._consistent = True
+
+ def bind_to(self, parameter):
+ """
+ Bind the current parameter to a base parameter. This converts
+ the current parameter to a virtual one, whose value will get
+ implicitly updated to be consistent with its base.
+
+ Note that virtual parameters must have exactly one base; computing a
+ parameter from multiple bases is not allowed. Cycles are also not
+ allowed; the topology must always stay a tree with a non-virtual
+ parameter at the root.
+
+ @param parameter: base parameter to compute this instance from
+ @param parameter: L{AbstractParameter}
+
+ @raise ParameterizationError: if this parameter is already virtual
+ @raise ParameterizationError: on attempt to create a circular dependency
+
+ """
+
+ if not isinstance(parameter, AbstractParameter):
+ raise TypeError(parameter)
+
+ if parameter.find_base_parameter() is self:
+ raise ParameterizationError("Circular dependency detected")
+
+ if self.is_virtual:
+ msg = "Parameter {0.name} is already bound to {1.name}"
+ raise ParameterizationError(msg.format(self, self._base))
+
+ self._set_base(parameter)
+ self._base._add_derived(self)
+
+ self._invalidate()
+
+ def _set_base(self, parameter):
+ self._base = parameter
+
+ def _add_derived(self, parameter):
+ self._derivatives.add(parameter)
+
+ def _invalidate(self):
+ """
+ Mark self and its virtual children as inconsistent
+ """
+ for p in self._derivatives:
+ p._invalidate()
+
+ self._consistent = False
+
+ def _update(self, value):
+ """
+ Overwrite the current value of the parameter. This triggers
+ an abstract (custom) validation hook, but has no side effects
+ (i.e. it doesn't propagate!)
+ """
+ sanitized = self._validate(value)
+ self._value = sanitized
+
+ @abstractmethod
+ def _validate(self, value):
+ """
+ Validate and sanitize the specified value before assignment.
+ @return: sanitized value
+
+ @raise ParameterValueError: on invalid value
+ """
+ return value
+
+ def _compute(self, base_value):
+ """
+ Compute a new value for the current parameter given the value
+ of a base parameter (assuming self.is_virtual). By default this returns
+ the value of the base parameter (i.e. self just inherits the value
+ of its base untouched).
+ """
+ return base_value
+
+ def _ensure_consistency(self):
+ """
+ Make sure that the current value is up to date. If it isn't,
+ trigger a real-time cascaded update following the path from the
+ nearest consistent base down to self. Also mark all nodes consistent
+ in the course of doing this update.
+ """
+ if not self._consistent:
+ path = self._nearest_consistent_base()
+
+ for parameter in reversed(path):
+ parameter._recompute(consistent=True)
+
+ def _recompute(self, consistent=True):
+ """
+ If self is virtual, force the current parameter to recompute itself from
+ its immediate base. This operation has no side effects and does not
+ propagate.
+ """
+ if self.is_virtual:
+ value = self._compute(self._base._value)
+ self._update(value)
+
+ if consistent:
+ self._consistent = True
+
+ def _recompute_derivatives(self):
+ """
+ Recompute all derived parameters starting from self and mark
+ them consistent.
+ """
+ self._recompute(consistent=True)
+
+ for p in self._derivatives:
+ p._recompute_derivatives()
+
+ def _nearest_consistent_base(self):
+ """
+ Compute and return the path from self to the nearest consistent
+ base parameter.
+
+ @return: path, leaf-to-root
+ @rtype: list of L{AbstractParameter}
+ """
+ root = self
+ path = [self]
+
+ while not root._consistent:
+ root = root._base
+ path.append(root)
+
+ return path
+
+ def find_base_parameter(self):
+ """
+ Find and return the non-virtual base parameter that is the root
+ of the current hierarchy. If self is not virtual, return self.
+
+ @return: base parameter
+ @rtype: L{AbstractParameter}
+ """
+ root = self
+
+ while root.is_virtual:
+ root = root._base
+
+ return root
+
+
+class Parameter(AbstractParameter):
+ """
+ Default parameter implementation which accepts float values only.
+ """
+
+ def __init__(self, value=0.0, name=None, base=None):
+ super(Parameter, self).__init__(value, name, base)
+
+ def _validate(self, value):
+
+ try:
+ return float(value)
+ except (ValueError, TypeError):
+ raise ParameterValueError(self.name, value)
+
+
+class NonVirtualParameter(Parameter):
+ """
+ A float L{Parameter} that is explicitly non-computed and cannot be
+ bound to another L{Parameter}.
+ """
+
+ def bind_to(self, parameter):
+ raise ParameterizationError(
+ "This parameter is explicitly non-computed")
+
+ @property
+ def is_virtual(self):
+ return False
+
+
+
+
+
+
+
diff --git a/csb/statistics/rand.py b/csb/statistics/rand.py
index bd35132..d0425ba 100644
--- a/csb/statistics/rand.py
+++ b/csb/statistics/rand.py
@@ -85,7 +85,7 @@ def sample_dirichlet(alpha, n_samples=1):
Sample points from a dirichlet distribution with parameter alpha.
@param alpha: alpha parameter of a dirichlet distribution
- @type alpha:
+ @type alpha: array
"""
from numpy import array, sum, transpose, ones
from numpy.random import gamma
diff --git a/csb/statistics/samplers/mc/singlechain.py b/csb/statistics/samplers/mc/singlechain.py
index 77a56f6..8f60791 100644
--- a/csb/statistics/samplers/mc/singlechain.py
+++ b/csb/statistics/samplers/mc/singlechain.py
@@ -171,17 +171,21 @@ class AbstractSingleChainMC(AbstractMC):
@property
def energy(self):
"""
- Log-likelihood of the current state.
+ Negative log-likelihood of the current state.
@rtype: float
"""
- return self._pdf.log_prob(self.state.position)
+ return -self._pdf.log_prob(self.state.position)
@property
def acceptance_rate(self):
"""
Acceptance rate.
"""
- return float(self._accepted) / float(self._nmoves)
+
+ if self._nmoves > 0:
+ return float(self._accepted) / float(self._nmoves)
+ else:
+ return 0.0
@property
def last_move_accepted(self):
@@ -241,6 +245,16 @@ class HMCSampler(AbstractSingleChainMC):
self.timestep = timestep
self._nsteps = None
self.nsteps = nsteps
+ self._mass_matrix = None
+ self._momentum_covariance_matrix = None
+ self._integrator = integrator
+ self._gradient = gradient
+
+ self._setup_matrices(mass_matrix)
+
+ self._propagator = self._propagator_factory()
+
+ def _setup_matrices(self, mass_matrix):
self._d = len(self.state.position)
@@ -250,12 +264,18 @@ class HMCSampler(AbstractSingleChainMC):
self._momentum_covariance_matrix = self._temperature * self.mass_matrix
- self._integrator = integrator
- self._gradient = gradient
+ def _propagator_factory(self):
+ """
+ Factory which produces a L{MDPropagator} object initialized with
+ the MD parameters given in __init__().
- self._propagator = MDPropagator(self._gradient, self._timestep,
- mass_matrix=self._mass_matrix,
- integrator=self._integrator)
+ @return: L{MDPropagator} instance
+ @rtype: L{MDPropagator}
+ """
+
+ return MDPropagator(self._gradient, self._timestep,
+ mass_matrix=self._mass_matrix,
+ integrator=self._integrator)
def _propose(self):
@@ -265,16 +285,29 @@ class HMCSampler(AbstractSingleChainMC):
return SimpleProposalCommunicator(current_state, proposal_state)
+ def _hamiltonian(self, state):
+ """
+ Evaluates the Hamiltonian consisting of the negative log-probability
+ and a quadratic kinetic term.
+
+ @param state: State on which the Hamiltonian should be evaluated
+ @type state: L{State}
+
+ @return: Value of the Hamiltonian (total energy)
+ @rtype: float
+ """
+
+ V = lambda q: -self._pdf.log_prob(q)
+ T = lambda p: 0.5 * numpy.dot(p.T, numpy.dot(self.mass_matrix.inverse, p))
+
+ return T(state.momentum) + V(state.position)
+
def _calc_pacc(self, proposal_communicator):
- current_state = proposal_communicator.current_state
- proposal_state = proposal_communicator.proposal_state
+ cs = proposal_communicator.current_state
+ ps = proposal_communicator.proposal_state
- E = lambda x: -self._pdf.log_prob(x)
- K = lambda x: 0.5 * numpy.dot(x.T, numpy.dot(self.mass_matrix.inverse, x))
-
- pacc = csb.numeric.exp((-K(proposal_state.momentum) - E(proposal_state.position)
- + K(current_state.momentum) + E(current_state.position)) /
+ pacc = csb.numeric.exp(-(self._hamiltonian(ps) - self._hamiltonian(cs)) /
self.temperature)
if self.state.momentum is None:
@@ -348,7 +381,8 @@ class RWMCSampler(AbstractSingleChainMC):
self.stepsize = stepsize
if proposal_density == None:
self._proposal_density = lambda x, s: x.position + \
- s * numpy.random.uniform(size=x.position.shape, low=-1., high=1.)
+ s * numpy.random.uniform(size=x.position.shape,
+ low=-1., high=1.)
else:
self._proposal_density = proposal_density
@@ -364,10 +398,11 @@ class RWMCSampler(AbstractSingleChainMC):
current_state = proposal_communicator.current_state
proposal_state = proposal_communicator.proposal_state
- E = lambda x:-self._pdf.log_prob(x)
+ E = lambda x: -self._pdf.log_prob(x)
pacc = csb.numeric.exp((-(E(proposal_state.position) - E(current_state.position))) /
self.temperature)
+
return pacc
@property
diff --git a/csb/statistics/scalemixture.py b/csb/statistics/scalemixture.py
index 60c54df..f8db8f5 100644
--- a/csb/statistics/scalemixture.py
+++ b/csb/statistics/scalemixture.py
@@ -8,13 +8,14 @@ import csb.core
import csb.statistics.ars
import csb.statistics.rand
-from csb.core import typedproperty, validatedproperty
+from csb.core import typedproperty
from abc import abstractmethod, ABCMeta
from csb.numeric import log, exp, approx_psi, inv_psi, d_approx_psi
from scipy.special import psi, kve
from csb.statistics import harmonic_mean, geometric_mean
-from csb.statistics.pdf import AbstractEstimator, AbstractDensity, Gamma, InverseGamma, NullEstimator
+from csb.statistics.pdf import AbstractEstimator, AbstractDensity, BaseDensity
+from csb.statistics.pdf import Gamma, InverseGamma, NullEstimator
def inv_digamma_minus_log(y, tol=1e-10, n_iter=100):
@@ -81,7 +82,7 @@ class ScaleMixturePrior(object):
self._scale_estimator = NullEstimator()
-class ScaleMixture(AbstractDensity):
+class ScaleMixture(BaseDensity):
"""
Robust probabilistic superposition and comparison of protein structures
Martin Mechelke and Michael Habeck
diff --git a/csb/test/__init__.py b/csb/test/__init__.py
index c9a48e2..a822aa4 100644
--- a/csb/test/__init__.py
+++ b/csb/test/__init__.py
@@ -159,16 +159,14 @@ decorators you would need in order to write tests for CSB.
is missing some features, that's why csb.test will take care of
replacing it with unittest2 instead.
"""
-from __future__ import print_function
-
import os
import sys
import imp
import types
import time
-import getopt
import tempfile
import traceback
+import argparse
import csb.io
import csb.core
@@ -202,16 +200,14 @@ class Config(object):
"""
@cvar: path to the default test data directory: <install dir>/csb/test/data
"""
+ GENERATED_DATA = DATA
+ """
+ @cvar: path to the default data directory for generated test files
+ """
TEMP = os.path.abspath(tempfile.gettempdir())
"""
@cvar: path to the default system's temp directory
"""
-
- def __init__(self):
-
- self.__config = Config
- self.__data = Config.DATA
- self.__temp = Config.TEMP
@staticmethod
def setDefaultDataRoot(path):
@@ -225,6 +221,19 @@ class Config(object):
raise IOError('Path not found: {0}'.format(path))
Config.DATA = os.path.abspath(path)
+
+ @staticmethod
+ def setDefaultGeneratedDataRoot(path):
+ """
+ Override the default L{Config.GENERATED_DATA} with a new data root directory.
+
+ @param path: full directory path
+ @type path: str
+ """
+ if not os.path.isdir(path):
+ raise IOError('Path not found: {0}'.format(path))
+
+ Config.GENERATED_DATA = os.path.abspath(path)
@property
def data(self):
@@ -232,19 +241,28 @@ class Config(object):
Test data directory
@rtype: str
"""
- return self.__data
-
+ return Config.DATA
+
+ @property
+ def generated_data(self):
+ """
+ Test data directory for generated files
+ @rtype: str
+ """
+ return Config.GENERATED_DATA
+
@property
def temp(self):
"""
Test temp directory
@rtype: str
"""
- return self.__temp
+ return Config.TEMP
def getTestFile(self, fileName, subDir=''):
"""
- Search for C{fileName} in the L{Config.DATA} directory.
+ Search for C{fileName} in the L{Config.DATA} directory. If not found,
+ try also L{Config.GENERATED_DATA} (if different).
@param fileName: the name of a test file to retrieve
@type fileName: str
@@ -255,20 +273,27 @@ class Config(object):
@rtype: str
@raise IOError: if no such file is found
- """
- file = os.path.join(self.data, subDir, fileName)
- if not os.path.isfile(file):
- raise IOError('Test file not found: {0}'.format(file))
- return file
+ """
+ for data in [self.data, self.generated_data]:
+ file = os.path.join(data, subDir, fileName)
+
+ if os.path.isfile(file):
+ return file
+
+ raise IOError('Test file not found: {0}'.format(fileName))
def getPickle(self, fileName, subDir=''):
"""
- Same as C{self.getTestFile}, but try to unpickle the data in the file.
+ Same as C{self.getTestFile}, but try to unpickle the the file
+ and return the unpickled object. Pickles are usually stored in
+ L{Config.GENERATED_DATA}.
@param fileName: the name of a test file to retrieve
@type fileName: str
@param subDir: scan a sub-directory of L{Config.DATA}
- @type subDir: str
+ @type subDir: str
+
+ @rtype: object
"""
file = self.getTestFile(fileName, subDir)
return csb.io.Pickle.load(open(file, 'rb'))
@@ -281,7 +306,9 @@ class Config(object):
@param fileName: the name of a test file to retrieve
@type fileName: str
@param subDir: scan a sub-directory of L{Config.DATA}
- @type subDir: str
+ @type subDir: str
+
+ @rtype: str
"""
with open(self.getTestFile(fileName, subDir)) as f:
return f.read()
@@ -327,13 +354,13 @@ class Config(object):
ensemble = Ensemble()
ensemble.models.append(model1)
ensemble.models.append(model2)
- Pickle.dump(ensemble, open(os.path.join(self.data, '1nz9.full.pickle'), 'wb'))
+ Pickle.dump(ensemble, open(os.path.join(self.generated_data, '1nz9.full.pickle'), 'wb'))
mse = model1.chains['A'].find(164)
mse.label = 'MSE'
mse.atoms['SD']._element = ChemElements.Se
mse.atoms['SD']._full_name = 'SE '
- Pickle.dump(model1, open(os.path.join(self.data, '1nz9.model1.pickle'), 'wb'))
+ Pickle.dump(model1, open(os.path.join(self.generated_data, '1nz9.model1.pickle'), 'wb'))
class Case(unittest.TestCase):
"""
@@ -752,39 +779,17 @@ class Console(object):
@type verbosity: int
@param update: if True, refresh all pickles in csb/test/data
@type update: bool
+ @param generated_data: where to cache generated test files (directory)
+ @type generated_data: str
"""
BUILDERS = {'unit': UnitTestBuilder, 'functional': FunctionalTestBuilder,
'custom': CustomTestBuilder, 'any': AnyTestBuilder,
'regression': RegressionTestBuilder}
- USAGE = r"""
-CSB Test Runner Console. Usage:
-
- python {0.program} [-u] [-t type] [-v verbosity] namespace(s)
-
-Options:
- namespace(s) A list of CSB test dotted namespaces, from which to
- load tests. '__main__' and '.' are interpreted as the
- current module. If a namespace ends with an asterisk
- '.*', all sub-packages will be scanned as well.
-
- Examples:
- "csb.test.cases.bio.*"
- "csb.test.cases.bio.io" "csb.test.cases.bio.utils"
- "."
-
- -t type Type of tests to load from each namespace. Possible
- values are:
- {0.builders}
-
- -v verbosity Verbosity level passed to unittest.TextTestRunner.
-
- -u update-files Force update of the test pickles in csb/test/data.
- """
def __init__(self, namespace=('__main__',), builder=AnyTestBuilder, verbosity=1,
- update=False, argv=None):
+ update=False, generated_data=Config.GENERATED_DATA, argv=None):
if not argv:
argv = sys.argv
@@ -793,12 +798,14 @@ Options:
self._builder = None
self._verbosity = 1
self._update = False
+ self._gendata = str(generated_data)
self._program = os.path.basename(argv[0])
self.namespace = namespace
self.builder = builder
self.verbosity = verbosity
self.update = update
+ self.generated_data = generated_data
self.parseArguments(argv[1:])
self.run()
@@ -841,9 +848,18 @@ Options:
@update.setter
def update(self, value):
self._update = bool(value)
+
+ @property
+ def generated_data(self):
+ return self._gendata
+ @generated_data.setter
+ def generated_data(self, value):
+ self._gendata = os.path.abspath(value)
def run(self):
+ Config.setDefaultGeneratedDataRoot(self.generated_data)
+
if self.update:
Config().updateDataFiles()
else:
@@ -853,44 +869,42 @@ Options:
suite = builder.loadMultipleTests(self.namespace)
runner = unittest.TextTestRunner(verbosity=self.verbosity)
- runner.run(suite)
-
- def exit(self, message=None, code=0, usage=True):
-
- if message:
- print(message)
- if usage:
- print(Console.USAGE.format(self))
-
- sys.exit(code)
+ runner.run(suite)
def parseArguments(self, argv):
- try:
-
- options, args = getopt.getopt(argv, 'hut:v:', ['help', 'update-files', 'type=', 'verbosity='])
-
- for option, value in options:
- if option in('-h', '--help'):
- self.exit(message=None, code=0)
- if option in('-t', '--type'):
- try:
- self.builder = Console.BUILDERS[value]
- except KeyError:
- self.exit(message='E: Invalid test type "{0}".'.format(value), code=2)
- if option in('-v', '--verbosity'):
- try:
- self.verbosity = int(value)
- except ValueError:
- self.exit(message='E: Verbosity must be an integer.', code=3)
- if option in('-u', '--update-files'):
- self.update = True
-
- if len(args) > 0:
- self.namespace = list(args)
-
- except getopt.GetoptError as oe:
- self.exit(message='E: ' + str(oe), code=1)
+ parser = argparse.ArgumentParser(prog=self.program, description="CSB Test Runner Console.")
+
+ parser.add_argument("-t", "--type", type=str, default="any", choices=list(Console.BUILDERS),
+ help="Type of tests to load from each namespace (default=any)")
+ parser.add_argument("-v", "--verbosity", type=int, default=1,
+ help="Verbosity level passed to unittest.TextTestRunner (default=1).")
+ parser.add_argument("-u", "--update-files", default=False, action="store_true",
+ help="Force update of the test pickles in " + Config.GENERATED_DATA)
+ parser.add_argument("-g", "--generated-resources", type=str, default=Config.GENERATED_DATA,
+ help="Generate, store and load additional test resources in this directory"
+ " (default=" + Config.GENERATED_DATA + ")")
+
+ parser.add_argument("namespaces", nargs='*',
+ help="""An optional list of CSB test dotted namespaces, from which to
+ load tests. '__main__' and '.' are interpreted as the
+ current module. If a namespace ends with an asterisk
+ '.*', all sub-packages will be scanned as well.
+
+ Examples:
+ "csb.test.cases.bio.*"
+ "csb.test.cases.bio.io" "csb.test.cases.bio.utils"
+ ".")""")
+
+ args = parser.parse_args(argv)
+
+ self.builder = Console.BUILDERS[args.type]
+ self.verbosity = args.verbosity
+ self.update = args.update_files
+ self.generated_data = args.generated_resources
+
+ if args.namespaces:
+ self.namespace = args.namespaces
if __name__ == '__main__':
diff --git a/csb/test/cases/bio/fragments/__init__.py b/csb/test/cases/bio/fragments/__init__.py
index 3c4dacd..093aae6 100644
--- a/csb/test/cases/bio/fragments/__init__.py
+++ b/csb/test/cases/bio/fragments/__init__.py
@@ -2,7 +2,7 @@ import random
import csb.test as test
import csb.test.cases.bio.hmm
-from csb.bio.fragments import Target, Assignment, TorsionAnglesPredictor
+from csb.bio.fragments import Target, Assignment, TorsionAnglePredictor
from csb.bio.fragments import FragmentCluster, ClusterNode, ClusterDivergingError, ClusterExhaustedError
from csb.bio.fragments import RosettaFragsetFactory
from csb.bio.fragments.rosetta import RosettaFragment, RosettaFragmentMap
@@ -148,8 +148,8 @@ class TestTargetResidue(test.Case):
def testAssign(self):
pass # implicitly covered
- def testVeryBest(self):
- b = self.residue.verybest()
+ def testClosest(self):
+ b = self.residue.closest()
self.assertEqual(b, self.residue.assignments[0].fragment)
def testLongest(self):
@@ -167,14 +167,14 @@ class TestTargetResidue(test.Case):
@test.unit
-class TestTorsionAnglesPredictor(test.Case):
+class TestTorsionAnglePredictor(test.Case):
def setUp(self):
- super(TestTorsionAnglesPredictor, self).setUp()
+ super(TestTorsionAnglePredictor, self).setUp()
self.target = SampleTarget.get()
- self.predictor = TorsionAnglesPredictor(self.target)
+ self.predictor = TorsionAnglePredictor(self.target)
def testTarget(self):
self.assertEqual(self.predictor.target, self.target)
@@ -227,7 +227,7 @@ class TestTorsionAnglesPredictor(test.Case):
# these fragments come from 12..19
target.assign(Assignment(source, 2 + 10, 9 + 10, 2, 9, 't', 1, 1))
- predictor = TorsionAnglesPredictor(target)
+ predictor = TorsionAnglePredictor(target)
pred = predictor.flat_torsion_map()
self.assertEqual(len(pred), target.length)
@@ -565,6 +565,5 @@ class TestRosettaFragmentMap(test.Case):
if __name__ == '__main__':
- TestFragmentCluster.execute()
- #test.Console()
+ test.Console()
\ No newline at end of file
diff --git a/csb/test/cases/bio/io/isites/__init__.py b/csb/test/cases/bio/io/isites/__init__.py
deleted file mode 100644
index 07bd825..0000000
--- a/csb/test/cases/bio/io/isites/__init__.py
+++ /dev/null
@@ -1,38 +0,0 @@
-import csb.test as test
-from csb.bio.io.isites import ISitesParser
-
-
- at test.functional
-class TestISitesParser(test.Case):
-
- def setUp(self):
-
- super(TestISitesParser, self).setUp()
-
- self.file = self.config.getTestFile('ISL5.1.isl')
- self.parser = ISitesParser(self.file)
-
- def testParseAll(self):
-
- lib = self.parser.parseall()
-
- self.assertEqual(len(lib.clusters), 145)
- self.assertEqual(lib[11007], lib['7pcy'][0])
-
- c = lib[11007]
- self.assertEqual(c.motiflen, 11)
- self.assertEqual(c.profilelen, 15)
- self.assertEqual(c.overhang, 2)
- self.assertEqual(c.mda, 96.00)
-
- self.assertEqual(c.representative.accession, '7pcy')
- self.assertEqual(c.representative.angles.length, c.motiflen)
- self.assertEqual(c.representative.angles[1].phi, -149.42)
- self.assertEqual(c.representative.angles[1].psi, 151.65)
- self.assertEqual(c.representative.angles[1].omega, 176.37)
-
-
-
-if __name__ == '__main__':
-
- test.Console()
\ No newline at end of file
diff --git a/csb/test/cases/statistics/pdf/parameterized.py b/csb/test/cases/statistics/pdf/parameterized.py
new file mode 100644
index 0000000..68f9b4a
--- /dev/null
+++ b/csb/test/cases/statistics/pdf/parameterized.py
@@ -0,0 +1,323 @@
+import numpy
+import csb.test as test
+
+from csb.numeric import log
+
+from csb.statistics.pdf.parameterized import ParameterizedDensity
+from csb.statistics.pdf.parameterized import ParameterValueError, ParameterizationError
+from csb.statistics.pdf.parameterized import AbstractParameter, Parameter, NonVirtualParameter
+
+
+
+class Location(NonVirtualParameter):
+
+ def _validate(self, value):
+ return float(value)
+
+class Scale(Parameter):
+
+ def _validate(self, value):
+ return float(value)
+
+ def _compute(self, base_value):
+
+ if base_value == 0.0:
+ return numpy.inf
+ else:
+ return 1.0 / base_value ** 0.5
+
+ def bind_to(self, base):
+
+ if base.name != "precision":
+ raise ValueError(base)
+
+ super(Scale, self).bind_to(base)
+
+class DoubleScale(Parameter):
+
+ def _validate(self, value):
+ return float(value)
+
+ def _compute(self, base_value):
+ return base_value * 2.0
+
+class Precision(Parameter):
+
+ def _validate(self, value):
+
+ if value < 0:
+ raise ParameterValueError(self.name, value)
+ return float(value)
+
+
+class FancyGaussian(ParameterizedDensity):
+
+ def __init__(self, mu=0, precision=1):
+
+ super(FancyGaussian, self).__init__()
+
+ self._register('mu')
+ self._register('sigma')
+ self._register('precision')
+
+ loc = Location(mu)
+ prec = Precision(precision)
+ sigma = Scale(0)
+ sigma.bind_to(prec)
+
+ self.set_params(mu=loc, sigma=sigma, precision=prec)
+
+ @property
+ def mu(self):
+ return self['mu'].value
+
+ @property
+ def sigma(self):
+ return self['sigma'].value
+
+ @property
+ def precision(self):
+ return self['precision'].value
+
+ def log_prob(self, x):
+
+ mu = self.mu
+ sigma = self.sigma
+
+ return log(1.0 / numpy.sqrt(2 * numpy.pi * sigma ** 2)) - (x - mu) ** 2 / (2 * sigma ** 2)
+
+
+ at test.unit
+class TestAbstractGenericParameter(test.Case):
+ """
+ Use AbstractParameter as a generic class which accepts values
+ of any type.
+ """
+
+ def setUp(self):
+
+ class Value(object):
+ pass
+
+ class Param(AbstractParameter):
+
+ def _validate(self, value):
+ if not isinstance(value, Value):
+ raise TypeError(value)
+ return value
+
+ self.value = Value()
+ self.param = Param(self.value)
+
+ def testValue(self):
+ self.assertIs(self.param.value, self.value)
+
+ def testSet(self):
+ self.assertRaises(TypeError, self.param.set, 3)
+
+
+ at test.unit
+class TestParameter(test.Case):
+ """
+ This is the main test case with complete coverage for AbstractParameter's
+ methods and behavior. Covers also Parameter.
+
+ computed -- leaf
+ /
+ base -- computed2
+ \
+ computed3
+ """
+
+ def setUp(self):
+ self.base = Precision(1.2)
+ self.computed = Scale(100, base=self.base)
+ self.computed2 = Scale(200, base=self.base)
+ self.computed3 = Scale(300, base=self.base)
+ self.leaf = DoubleScale(400, base=self.computed)
+
+ def testConstrucor(self):
+ # make sure newly constructed parameters are left in a consistent state
+ # to avoid unnecessary consistency updates
+ self.assertTrue(self.base._consistent)
+ self.assertTrue(Scale(1)._consistent)
+
+ def testName(self):
+
+ self.assertEqual(self.base.name, "precision")
+ self.assertEqual(self.computed.name, "scale")
+ self.assertEqual(Scale(name="TesT").name, "TesT")
+
+ def testValue(self):
+
+ self.assertEqual(self.base.value, 1.2)
+ self.assertEqual(self.computed.value, 1.0 / numpy.sqrt(self.base.value))
+ self.assertEqual(self.computed2.value, 1.0 / numpy.sqrt(self.base.value))
+ self.assertEqual(self.leaf.value, self.computed.value * 2)
+
+ # turn self.base into a virtual parameter
+ self.base.bind_to(Precision(12.2))
+ self.assertEqual(self.base.value, 12.2)
+
+ def testIsVirtual(self):
+
+ self.assertFalse(self.base.is_virtual)
+ self.assertTrue(self.computed.is_virtual)
+
+ self.base.bind_to(Precision(12.2))
+ self.assertTrue(self.base.is_virtual)
+
+ def testSet(self):
+
+ base_initial_value = self.base._value
+
+ # recompute all derivatives from the initial value of base
+ self.assertEqual(self.computed._value, 100)
+ self.leaf._ensure_consistency()
+ self.computed2._ensure_consistency()
+ self.computed3._ensure_consistency()
+
+ # set self.base - it should remain consistent because it is not computed
+ self.assertTrue(self.base._consistent)
+ self.base.set(2.2)
+ self.assertTrue(self.base._consistent)
+ self.assertEqual(self.base.value, 2.2)
+
+ # self.computed and self.leaf should be inconsistent now that their base is updated
+ self.assertFalse(self.computed._consistent)
+ self.assertFalse(self.leaf._consistent)
+ self.assertEqual(self.computed._value, 1.0 / numpy.sqrt(base_initial_value))
+ self.assertEqual(self.leaf._value, 2.0 / numpy.sqrt(base_initial_value))
+ # retrieving self.computed's value should trigger updates up to self.computed
+ recomputed = self.computed.value
+ self.assertTrue(self.computed._consistent)
+ self.assertEqual(recomputed, 1.0 / numpy.sqrt(self.base._value))
+ # self.leaf is still inconsistent
+ self.assertFalse(self.leaf._consistent)
+ self.assertEqual(self.leaf._value, 2.0 / numpy.sqrt(base_initial_value))
+ self.assertIs(self.leaf._nearest_consistent_base()[-1], self.computed)
+ # until we request its value
+ recomputed = self.leaf.value
+ self.assertTrue(self.leaf._consistent)
+ self.assertEqual(recomputed, 2.0 / numpy.sqrt(self.base._value))
+ self.assertEqual(recomputed, 2.0 * self.computed._value)
+ # make sure the other two branches are still inconsistent
+ initial_value = 1.0 / numpy.sqrt(base_initial_value)
+ self.assertEqual(self.computed2._value, initial_value)
+ self.assertEqual(self.computed3._value, initial_value)
+ # until they get used
+ recomputed = self.computed2.value
+ self.assertTrue(self.computed2._consistent)
+ self.assertEqual(recomputed, 1.0 / numpy.sqrt(self.base._value))
+
+ # attempt to set self.computed - not allowed
+ self.assertRaises(ParameterizationError, self.computed.set, 2)
+
+ # attempt to set a negative Precision
+ self.assertRaises(ParameterValueError, self.base.set, -2)
+ # attempt to assigned non-float - not allowed in the Parameter specialization
+ self.assertRaises(ParameterValueError, Parameter().set, object())
+
+ def testBindTo(self):
+
+ # can't bind self.base to itself
+ self.assertRaises(ParameterizationError, self.base.bind_to, self.base)
+ # deeper circular dependency
+ self.assertRaises(ParameterizationError, self.base.bind_to, self.computed)
+
+ # self.base is not virtual and therefore must be consistent
+ self.assertTrue(self.base._consistent)
+
+ # make it virtual - should get inconsistent now
+ self.base.bind_to(Precision(12.2))
+ self.assertFalse(self.base._consistent)
+ self.assertTrue(self.base.is_virtual)
+
+ # retrieving its value should trigger the consistency cascade
+ self.assertEqual(self.base.value, 12.2)
+ self.assertTrue(self.base._consistent)
+
+ def testFindBaseParameter(self):
+
+ self.assertIs(self.base.find_base_parameter(), self.base)
+ self.assertIs(self.computed.find_base_parameter(), self.base)
+
+
+ at test.unit
+class TestNonVirtualParameter(test.Case):
+ """
+ Make sure explicit NonVirtualParameter-s are updatable and
+ refuse binding requests
+ """
+
+ def setUp(self):
+ self.param = Location()
+
+ def testConstructor(self):
+ base = Parameter()
+ self.assertRaises(ParameterizationError, lambda: Location(base=base))
+
+ def testIsVirtual(self):
+ self.assertFalse(self.param.is_virtual)
+
+ def testBindTo(self):
+ base = Parameter()
+ self.assertRaises(ParameterizationError, self.param.bind_to, base)
+
+ def testSet(self):
+ self.param.set(22)
+ self.assertEqual(self.param.value, 22)
+
+
+ at test.unit
+class TestParameterizedDensity(test.Case):
+
+ def setUp(self):
+ self.pdf = FancyGaussian(2, 5)
+
+ def testConstructor(self):
+
+ class Density(ParameterizedDensity):
+
+ def __init__(self, p):
+ super(Density, self).__init__()
+ self._register('p')
+ self.set_params(p=p)
+
+ def log_prob(self, x):
+ return x
+
+ self.assertRaises(TypeError, Density, 4)
+
+ def testProperties(self):
+
+ self.assertEqual(self.pdf.mu, 2)
+ self.assertEqual(self.pdf.precision, 5)
+ self.assertAlmostEqual(self.pdf.sigma, 0.4472, places=3)
+
+ def testParameterChaining(self):
+
+ self.assertEqual(self.pdf.precision, 5)
+ self.assertAlmostEqual(self.pdf.sigma, 0.4472, places=3)
+
+ self.pdf['precision'].set(2)
+
+ self.assertEqual(self.pdf.precision, 2)
+ self.assertAlmostEqual(self.pdf.sigma, 0.7071, places=3)
+
+ def testAssignment(self):
+
+ self.pdf['sigma'] = Scale(55)
+ self.assertEqual(self.pdf.sigma, 55)
+ self.assertEqual(self.pdf['sigma'].name, 'scale')
+
+ def assign(i):
+ self.pdf['sigma'] = i
+
+ self.assertRaises(TypeError, assign, 55)
+
+
+
+if __name__ == '__main__':
+ test.Console()
+
diff --git a/csb/test/cases/statistics/samplers/__init__.py b/csb/test/cases/statistics/samplers/__init__.py
index 219648d..598862b 100644
--- a/csb/test/cases/statistics/samplers/__init__.py
+++ b/csb/test/cases/statistics/samplers/__init__.py
@@ -1,8 +1,9 @@
import numpy as np
+
import csb.test as test
-import csb
+import csb.numeric
-from csb.statistics.pdf import Normal, AbstractDensity
+from csb.statistics.pdf import Normal, BaseDensity
from csb.numeric.integrators import AbstractGradient, VelocityVerlet, LeapFrog, FastLeapFrog
from csb.numeric import InvertibleMatrix
@@ -17,14 +18,15 @@ from csb.statistics.samplers.mc.multichain import HMCStepRENS, HMCStepRENSSwapPa
from csb.statistics.samplers.mc.multichain import AbstractSwapCommunicator, AbstractExchangeMC
from csb.statistics.samplers.mc.multichain import AbstractSwapParameterInfo, ReplicaHistory
from csb.statistics.samplers.mc.singlechain import HMCSampler, RWMCSampler, AbstractNCMCSampler
-from csb.statistics.samplers.mc.singlechain import AbstractSingleChainMC
+from csb.statistics.samplers.mc.singlechain import AbstractSingleChainMC, SimpleProposalCommunicator
from csb.statistics.samplers.mc.propagators import RWMCPropagator, HMCPropagator, MDPropagator
from csb.statistics.samplers.mc.propagators import AbstractNCMCPropagator, AbstractPropagator
from csb.statistics.samplers.mc.neqsteppropagator import ReducedHamiltonian, HamiltonianSysInfo
from csb.statistics.samplers.mc.neqsteppropagator import PlainMDPropagation, PlainMDPropagationParam
from csb.statistics.samplers.mc.neqsteppropagator import AbstractMDPropagation, HMCPropagation
from csb.statistics.samplers.mc.neqsteppropagator import Protocol, Step, AbstractPerturbation
-from csb.statistics.samplers.mc.neqsteppropagator import ReducedHamiltonianPerturbation, AbstractPropagation
+from csb.statistics.samplers.mc.neqsteppropagator import ReducedHamiltonianPerturbation
+from csb.statistics.samplers.mc.neqsteppropagator import AbstractPropagation
from csb.statistics.samplers.mc.neqsteppropagator import NonequilibriumStepPropagator
from csb.statistics.samplers.mc.neqsteppropagator import NonequilibriumTrajectory
from csb.statistics.samplers.mc.neqsteppropagator import HMCPropagationParam
@@ -37,7 +39,7 @@ class SamplePDF(Normal):
def grad(self, x, t):
return x / (self.sigma ** 2)
-class MultimodalPDF(AbstractDensity):
+class MultimodalPDF(BaseDensity):
def log_prob(self, x):
return sum(-2.5 * np.cos(2.5 * x) - 0.04 * x ** 2)
@@ -45,7 +47,7 @@ class MultimodalPDF(AbstractDensity):
def grad(self, x, t):
return -6.25 * np.sin(2.5 * x) + 0.08 * x
-class Multimodal2DPDF(AbstractDensity):
+class Multimodal2DPDF(BaseDensity):
k = 0.5
@@ -62,6 +64,7 @@ class Multimodal2DPDF(AbstractDensity):
return np.array([(-6.25 * np.sin(2.5 * x[0]) + 0.08 * x[0]) * self._E2(x),
self._E1(x) * self.k * x[1]])
+
@test.functional
class TestMCPropagators(test.Case):
@@ -160,6 +163,7 @@ class TestMCPropagators(test.Case):
traj = gen.generate(init_state, self.nits, return_trajectory=True)
self.checkResult(traj)
+
@test.functional
class TestMultichain(test.Case):
@@ -295,17 +299,17 @@ class MockSampler(AbstractSingleChainMC):
def __init__(self, pdf, state, temperature=1.0):
- self._state = state
- self._pdf = pdf
- self._temperature = temperature
-
+ super(MockSampler, self).__init__(pdf, state, temperature)
+
def _propose(self):
+
+ pcom = SimpleProposalCommunicator(self._state, State(self._state.position * 2.0))
- pass
+ return pcom
- def _calc_pacc(self):
+ def _calc_pacc(self, proposal_communicator):
- pass
+ return 0.42
class MockedAbstractExchangeMC(AbstractExchangeMC):
@@ -432,12 +436,6 @@ class MockPropagator(AbstractPropagator):
final_state = State(init_state.position * 2, init_state.momentum * 2)
return Trajectory([init_state, final_state])
-
-class MDPropagationMocked(AbstractMDPropagation):
-
- def _propagator_factory(self):
-
- return MockMDPropagator()
class PlainMDPropagationMocked(PlainMDPropagation):
@@ -911,6 +909,160 @@ class TestReplicaHistory(test.Case):
assert(self._assertIdenticalProjTrajs(samples, swap_interval, first_swap))
+ at test.unit
+class TestAbstractSingleChainMC(test.Case):
+
+ def setUp(self):
+
+ self._sampler = MockSampler(pdf=SamplePDF(), state=State(np.array([1.0])))
+
+ def testAcceptProposal(self):
+
+ proposal_state = State(np.array([1.234]))
+ res = self._sampler._accept_proposal(proposal_state)
+
+ assert(self._sampler.state == proposal_state)
+
+ def testUpdateStatistics(self):
+
+ nmoves_old = self._sampler._nmoves
+ accepted_old = self._sampler._accepted
+
+ self._sampler._update_statistics(True)
+
+ assert(self._sampler._nmoves == nmoves_old + 1)
+ assert(self._sampler._accepted == accepted_old + 1)
+
+ nmoves_old = self._sampler._nmoves
+ accepted_old = self._sampler._accepted
+
+ self._sampler._update_statistics(False)
+
+ assert(self._sampler._nmoves == nmoves_old + 1)
+ assert(self._sampler._accepted == accepted_old)
+
+ def testEnergy(self):
+
+ E = -self._sampler._pdf.log_prob(self._sampler.state.position)
+
+ assert(E == self._sampler.energy)
+
+ def testAcceptanceRate(self):
+
+ self._sampler._nmoves = 6
+ self._sampler._accepted = 3
+
+ assert(self._sampler.acceptance_rate == 0.5)
+
+ self._sampler._nmoves = 0
+ self._sampler._accepted = 0
+
+ assert(self._sampler.acceptance_rate == 0.0)
+
+ def testLastMoveAccepted(self):
+
+ self._sampler._last_move_accepted = False
+
+ np.random.seed(5)
+
+ self._sampler.sample()
+
+ assert(self._sampler.last_move_accepted == True)
+
+ def testTemperature(self):
+
+ assert(self._sampler.temperature == self._sampler._temperature)
+
+ def testSample(self):
+
+ np.random.seed(5)
+
+ ipos = np.array([2.0])
+ self._sampler.state = State(ipos)
+
+ res = self._sampler.sample()
+
+ assert(res.position == ipos * 2.0)
+ assert(res.momentum == None)
+
+ assert(self._sampler.state.position == ipos * 2.0)
+ assert(self._sampler.state.momentum == None)
+
+
+class MockedHMCSampler(HMCSampler):
+
+ def _propagator_factory(self):
+
+ return MockPropagator()
+
+ # def _propose(self):
+
+ # print np.random.normal(size=2)
+
+
+
+ at test.unit
+class TestHMCSampler(test.Case):
+
+ def setUp(self):
+
+ self._mass_matrix = InvertibleMatrix(np.array([[2.0, 0.0], [1.0, 3.0]]))
+ self._initstate = State(np.array([1.0, 2.0]))
+ self._sampler = MockedHMCSampler(pdf=SamplePDF(), state=self._initstate.clone(),
+ gradient=SamplePDF().grad,
+ timestep=0.3, nsteps=25, mass_matrix=self._mass_matrix)
+
+ def testPropose(self):
+
+ np.random.seed(5)
+ initmom = np.random.multivariate_normal(mean=np.zeros(len(self._initstate.position)),
+ cov=self._mass_matrix)
+
+ self._sampler.state = self._initstate.clone()
+
+ np.random.seed(5)
+ res = self._sampler._propose()
+
+ assert(np.all(res.current_state.position == self._initstate.position))
+ assert(np.all(res.current_state.momentum == initmom))
+
+ assert(np.all(res.proposal_state.position == self._initstate.position * 2.0))
+ assert(np.all(res.proposal_state.momentum == initmom * 2))
+
+ def testHamiltonian(self):
+
+ state = State(np.array([1.0, 2.0]), np.array([2.0, 1.0]))
+
+ assert(self._sampler._hamiltonian(state) == 3.5 - np.log(1.0 / (2.0 * np.pi)))
+
+ def testCalcPacc(self):
+
+ istate = State(np.array([1.0, 2.0]), np.array([2.0, 1.0]))
+ fstate = State(np.array([1.0, 2.0]) * 2.0, np.array([2.0, 1.0]) * 2.0)
+ pcom_fstate = fstate.clone()
+ pcom = SimpleProposalCommunicator(istate.clone(), pcom_fstate)
+ self._sampler.state.momentum = None
+
+ pacc = self._sampler._calc_pacc(pcom)
+
+ dH = self._sampler._hamiltonian(fstate) - self._sampler._hamiltonian(istate)
+
+ assert(csb.numeric.exp(-dH / self._sampler.temperature) == pacc)
+ assert(pcom_fstate.momentum == None)
+
+
+ pcom_fstate = fstate.clone()
+ pcom = SimpleProposalCommunicator(istate.clone(), pcom_fstate)
+ self._sampler.state.momentum = np.array([1.0, 4.0])
+
+ pacc = self._sampler._calc_pacc(pcom)
+
+ dH = self._sampler._hamiltonian(fstate) - self._sampler._hamiltonian(istate)
+
+ assert(csb.numeric.exp(-dH / self._sampler.temperature) == pacc)
+ assert(np.all(pcom_fstate.momentum == np.array([1.0, 4.0])))
+
+
if __name__ == '__main__':
test.Console()
--
Alioth's /git/debian-med/git-commit-notice on /srv/git.debian.org/git/debian-med/python-csb.git
More information about the debian-med-commit
mailing list