[med-svn] [profit] 08/11: New upstream version 3.1
Andreas Tille
tille at debian.org
Mon Dec 18 19:57:28 UTC 2017
This is an automated email from the git hooks/post-receive script.
tille pushed a commit to branch master
in repository profit.
commit 96785571da3bb189199365b9842253299fc5252d
Author: Andreas Tille <tille at debian.org>
Date: Mon Dec 18 20:55:31 2017 +0100
New upstream version 3.1
---
00Read.Me | 247 ++
debian/copyright => COPYING.DOC | 51 +-
DOS.txt | 233 ++
HISTORY | 254 ++
INSTALL | 220 ++
ProFit.help | 867 ++++
debian/README.Debian | 12 -
debian/README.source | 7 -
debian/bin/profit | 5 -
debian/changelog | 6 -
debian/compat | 1 -
debian/control | 46 -
debian/docs | 4 -
debian/patches/AllowReadline.patch | 15 -
debian/patches/AllowRedundantCleans.patch | 13 -
debian/patches/series | 2 -
debian/profit-3d.dirs | 2 -
debian/profit-3d.install | 1 -
debian/profit-data.dirs | 1 -
debian/profit-data.install | 2 -
debian/profit.links | 1 -
debian/rules | 18 -
debian/source/format | 1 -
debian/watch | 7 -
doc/Makefile | 25 +
doc/ProFit.css | 52 +
doc/ProFit.pdf | Bin 0 -> 194298 bytes
doc/ProFit.tex | 1626 ++++++++
mdm78.mat | 26 +
src/Makefile | 62 +
src/Makefile_dos | 68 +
src/NWAlign.c | 2344 +++++++++++
src/NWAlign.p | 33 +
src/ProFit.h | 343 ++
src/bioplib/00READ.ME | 11 +
src/bioplib/ApMatPDB.c | 96 +
src/bioplib/AtomNameMatch.c | 265 ++
src/bioplib/COPYING.DOC | 170 +
src/bioplib/CopyPDB.c | 96 +
src/bioplib/CreateRotMat.c | 128 +
src/bioplib/DupePDB.c | 120 +
src/bioplib/FindNextResidue.c | 122 +
src/bioplib/FindZonePDB.c | 237 ++
src/bioplib/FreeStringList.c | 117 +
src/bioplib/GetPDBChainLabels.c | 135 +
src/bioplib/GetWord.c | 261 ++
src/bioplib/IndexPDB.c | 108 +
src/bioplib/KillLeadSpaces.c | 102 +
src/bioplib/LegalAtomSpec.c | 100 +
src/bioplib/MatMult33_33.c | 98 +
src/bioplib/MatMult3_33.c | 91 +
src/bioplib/MathType.h | 74 +
src/bioplib/OpenFile.c | 158 +
src/bioplib/PDB2Seq.c | 251 ++
src/bioplib/ParseRes.c | 233 ++
src/bioplib/ReadPDB.c | 1074 +++++
src/bioplib/ReadPIR.c | 445 +++
src/bioplib/SelAtPDB.c | 186 +
src/bioplib/StoreString.c | 160 +
src/bioplib/StringToUpper.c | 108 +
src/bioplib/SysDefs.h | 80 +
src/bioplib/TranslatePDB.c | 86 +
src/bioplib/WholePDB.c | 335 ++
src/bioplib/WindIO.c | 352 ++
src/bioplib/WindIO.h | 60 +
src/bioplib/WritePDB.c | 174 +
src/bioplib/aalist.c | 432 ++
src/bioplib/aalist.h | 87 +
src/bioplib/align.c | 1327 +++++++
src/bioplib/angle.c | 148 +
src/bioplib/angle.h | 80 +
src/bioplib/array.h | 73 +
src/bioplib/array2.c | 154 +
src/bioplib/chindex.c | 110 +
src/bioplib/countchar.c | 108 +
src/bioplib/fit.c | 373 ++
src/bioplib/fit.h | 58 +
src/bioplib/fsscanf.c | 344 ++
src/bioplib/fsscanf.h | 67 +
src/bioplib/general.h | 111 +
src/bioplib/help.c | 323 ++
src/bioplib/help.h | 51 +
src/bioplib/macros.h | 483 +++
src/bioplib/matrix.h | 72 +
src/bioplib/openorpipe.c | 147 +
src/bioplib/padterm.c | 103 +
src/bioplib/parse.c | 467 +++
src/bioplib/parse.h | 116 +
src/bioplib/pdb.h | 360 ++
src/bioplib/port.h | 99 +
src/bioplib/seq.h | 122 +
src/bioplib/throne.c | 198 +
src/bioplib/upstrncmp.c | 109 +
src/fitting.c | 3359 ++++++++++++++++
src/fitting.p | 66 +
src/main.c | 6159 +++++++++++++++++++++++++++++
src/main.p | 123 +
src/protos.h | 87 +
src/todo.c | 95 +
src/todo.p | 2 +
100 files changed, 28459 insertions(+), 182 deletions(-)
diff --git a/00Read.Me b/00Read.Me
new file mode 100644
index 0000000..592c03c
--- /dev/null
+++ b/00Read.Me
@@ -0,0 +1,247 @@
+
+ ProFit V3.1
+ ===========
+
+**********************************************************************
+* NOTE: This document is rather out of date - it does not describe *
+* all the current features of ProFit. You should look at the *
+* ProFit manual at http://www.bioinf.org.uk/programs/profit/doc/ *
+**********************************************************************
+
+ (c) Dr. Andrew C.R. Martin, SciTech Software, 1992-2009
+ Some features added and bugs fixed while working
+ for University College London.
+ Some features added and bugs fixed while working
+ for University of Reading under consultancy to
+ Inpharmatica, Ltd.
+ Features in V3.0 (c) Dr. Craig T. Porter, UCL, 2008-2009
+ funded by BBSRC
+
+ The ultimate protein least squares fitting program!
+
+
+ To access the ProFit download page, point your browser at
+ http://www.bioinf.org.uk/software/
+
+ ProFit (pronounced Pro-Fit, not profit!) is designed to be the
+ultimate least squares fitting program and is written to be as easily
+portable between systems as possible. It performs the basic function
+of fitting one protein structure to another, but allows as much
+flexibility as possible in this procedure. Thus one can specify
+subsets of atoms to be considered, specify zones to be fitted by
+number, sequence, or by sequence alignment.
+
+ The program will output an RMS deviation and optionally the fitted
+coordinates. RMS deviations may also be calculated without actually
+performing a fit. Zones for calculating the RMS can be different from
+those used for fitting.
+
+ The interface is command driven, but may offer a graphical
+environment at a later date.
+
+ This document gives an overview of the features and command
+language, but isn't regularly kept up-to-date to reflect new features.
+There is a full comprehensive manual describing all the command in the
+'doc' subdirectory of the full distribution.
+
+ Features:
+ 1. Portability
+ 2. Specify atom subsets
+ 3. Specify zones:
+ a) Numerically
+ b) By sequence
+ c) By auto sequence alignment
+ 4. Output RMS over
+ a) Fitted region
+ b) Any other region
+ 5. Output fitted coordinates
+ 6. Help facility
+
+1. Portability
+--------------
+The code is written in highly portable ANSI-C. A future feature may
+be a Tcl/Tk-based graphical interface, but this will be separate
+from the main program. Earlier version of the code supported
+windowing, but this has been dropped from the release version in
+favour of a separate GUI layer which will call ProFit with a
+control script.
+
+2. Atom Subsets
+---------------
+These may be specified for both fitting and RMS calculation.
+e.g. ATOMS N,CA,C,O
+
+3a. Numeric zones
+-----------------
+These are specified as follows:
+ ZONE * Fits all residues in both structures, clearing
+ other zone specifications.
+ ZONE *:1-100 Fits all in Reference with 1-100 in Mobile
+ ZONE 10-20 Fits 10-20 in Reference with 10-20 in Mobile
+ ZONE 10-20:30-40 Fits 10-20 in Reference with 30-40 in Mobile
+ ZONE -10:50-59 Fits 1-10 in Reference with 50-59 in Mobile
+
+If you have multiple chains, the chain name may preceed
+the residue number (e.g. ZONE L24-L24); if unspecified, the first
+chain is assumed. If the PDB file has insertions, the zone may include
+an insert code by placing it after the residue number (e.g. L25B or 50C)
+
+Optionally, the chain name may be separated from the residue number by a
+full stop. Using the full stop also makes the statement case-sensitive.
+In practice, the full stop separator is used with either numeric or
+lowercase chain names.
+
+
+3b. Sequence zone specification
+-------------------------------
+Typing a sequence will search for this seq in both structures and fit
+these regions.
+
+ ZONE CAR:VNS Fits first occurence of CAR in Reference with
+ first occurence of VNS in Mobile
+ ZONE CAR,10:VNS,10 Fits 10 residues from first occurence of CAR
+ in Reference with first occurence of VNS in
+ Mobile
+ ZONE CAR,5/2 Fits 5 residues from secon occurence of CAR
+ in both structures
+ ZONE 24-34:EIR,11 Fits 24-34 in Reference with 11 residues from
+ first occurence of EIR in Mobile
+
+3c. Auto sequence alignment
+---------------------------
+Needleman & Wunsch sequence alignment is provided to perform
+fitting on sequence aligned residues. The keyword ALIGN sets up zones
+from the matched parts of the sequence alignment. Alternatively, one
+may read a sequence alignment from a file using the READALIGN
+keyword.
+
+4a. RMS over fitted region
+--------------------------
+The program will report the RMS deviation over the fitted region once
+fitting has been performed with the command FIT. The RATOMS keyword
+has the same syntax as the ATOMS keyword and will cause the specified
+atoms to be included in the RMS calculation.
+
+4b. RMS over other regions
+--------------------------
+The keyword RZONE has the same syntax as the keywords ZONE, but causes
+the RMS to be calculated over the specified residue zone.
+
+5. Output fitted coordinates
+----------------------------
+WRITE <file>
+causes the fitted coordinates to be written to file.
+
+6. Help facility
+----------------
+A comprehensive VAX style help facility is provided. Type HELP
+once in ProFit to enter the help facility.
+
+Running ProFit
+==============
+profit [-h] [reference.pdb mobile.pdb]
+
+Optionally you my specify the reference and mobile PDB files on
+the command line. Otherwise you may use the REFERENCE and MOBILE
+commands once in the program.
+
+The -h flag causes the program to read HETATM records from the
+PDB files. The default is to ignore them. This may be changed
+using the (NO)HETATOMS commands.
+
+Keywords
+========
+
+Basic commands
+--------------
+REFERENCE file.pdb Specify the reference structure
+MOBILE file.pdb Specify the structure to be fitted
+FIT Causes the structures to be fitted on the
+ current residue/atom selection. The RMS
+ deviation will be reported over the fitted
+ selection.
+
+Specifying atoms
+----------------
+ATOMS atm[,atm]... Specify atoms to be considered in fitting.
+BVALUE cutoff [REF|MOB] Ignore atoms (in fitting and RMS calculation)
+ if B-value greater than specified cutoff.
+ Optional REF or MOB causes calculation to
+ be restricted to the specified structure.
+ A negative cutoff may be used to ignore
+ atoms with B-values less than the absolute
+ (i.e. unsigned) value of the cutoff.
+
+Specifying zones
+----------------
+ZONE CLEAR|((*|(X...[,n][/m])|(j[-k]))[:(*|(X...[,n][/m])|(j[-k]))])
+ Specify the zone to be fitted either
+ numerically, or as a sequence. Repeating the
+ specification after a colon refers to the
+ mobile structure.
+ X represents a residue type or ? wildcard.
+ n represents a number of residues.
+ m represents the m'th occurence of the sequence.
+ j and k represent residue numbers.
+DELZONE CLEAR|((*|(X...[,n][/m])|(j[-k]))[:(*|(X...[,n][/m])|(j[-k]))])
+ Delete fitting zone. This command uses the same
+ syntax as ZONE.
+SETCENTRE CLEAR|(*|j[:j]) Specifies a single residue as the centre of
+ fitting.
+NUMBER (RESIDUE|SEQUENTIAL) Specify zones based on residue numbering or
+ sequential numbering
+ALIGN Perform N/W alignment. Set the fitting zones
+ based on the sequence aligned residues.
+GAPPEN penalty [extend_penalty] Specify gap penalty and extension penalty
+ for ALIGN command (default: 10 and 2 resp.)
+READALIGNMENT file.pir Read a PIR alignment file and set zones from
+ that.
+
+Calculating RMSD over different atoms/zones
+-------------------------------------------
+RATOMS atm[,atm]... Specify atoms to be considered in RMSd calc.
+ Then reports the RMSd.
+RZONE CLEAR|((*|(X...[,n][/m])|(j[-k]))[:(*|(X...[,n][/m])|(j[-k]))])
+ Specify zone to be considered in RMSd calc.
+ Then reports the RMSd.
+DELRZONE CLEAR|((*|(X...[,n][/m])|(j[-k]))[:(*|(X...[,n][/m])|(j[-k]))])
+ Remove zone for RMSd calculation.
+
+Obtaining output
+----------------
+WRITE file.pdb Writes the fitted mobile coordinates
+MATRIX Displays the rotation and translation matrices
+ for fitting.
+STATUS Shows information on the current selections.
+RMS Redisplays the RMS calculated over the current
+ zones.
+RESIDUE [filename] Display a by-residue RMS over current RATOMS
+ and RZONES (ATOMS and ZONES if RATOMS and RZONES
+ are not specified)
+NFITTED Report number of atom pairs fitted
+
+Reading non-protein atoms
+-------------------------
+HETATOMS Read HETATOMS (default if started with -h)
+NOHETATOMS Do not read HETATOMS (default unless started with
+ -h)
+
+Weighting the fit
+-----------------
+WEIGHT Weight the fitting by the B-value
+BWEIGHT Weight the fitting by the inverse of the B-value
+NOWEIGHT No weighting (default)
+
+Miscellaneous
+-------------
+HELP [keywd] Gives help on keywd if specified; otherwise
+ lists valid keywords.
+IGNOREMISSING Ignore missing atoms during fitting
+NOIGNOREMISSING Restore default of generating an error if there are
+ any missing atoms
+QUIT Exits the program.
+
+
+Not yet implemented:
+====================
+GRAPHIC Start graphical alignment
diff --git a/debian/copyright b/COPYING.DOC
similarity index 63%
rename from debian/copyright
rename to COPYING.DOC
index 80cbd7b..1960f4a 100644
--- a/debian/copyright
+++ b/COPYING.DOC
@@ -1,73 +1,48 @@
-Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
-Upstream-Name: profit
-Source: http://www.bioinf.org.uk/software/swreg.html
-Files: *
-Copyright: <years> <put author's name and email here>
- <years> <likewise for another author>
-License: non-distributable
+
ProFit
- .
+
Protein Least Squares Fitting
- .
- .
- .
- .
+
+
+
+
ProFit is copyright Dr. Andrew C.R. Martin, SciTech Software and
UCL 1992-2009
Dr. Craig Porter, UCL 2008-2009
Modifications and enhancements were also made while working at
The University of Reading under consultancy to Inpharmatica, Ltd.
Development of V2.6-V3.1 was funded by the BBSRC
- .
+
This program is not in the public domain.
- .
+
It may not be copied or made available to third parties, but may be
freely used by anyone who has obtained it directly from the author's
Web Site. Please register your name and email address on the web site
when you download the software.
- .
+
The code may not be made available on other FTP sites without express
permission from the author.
- .
+
The code may be modified as required, but any modifications must be
documented so that the person responsible can be identified. If
someone else breaks this code, the author doesn't want to be blamed
for code that does not work! You may not distribute any
modifications, but are encouraged to send them to the author so
that they may be incorporated into future versions of the code.
- .
+
The code may not be sold commercially, but the program may be used for
commercial purposes.
- .
+
IN NO EVENT SHALL THE AUTHOR OR ANY INSTITUTION IN WHICH HE IS WORKING
(INCLUDING, BUT NOT LIMITED TO, UNIVERSITY COLLEGE LONDON) BE LIABLE
TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL
DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE
AND ITS DOCUMENTATION, EVEN IF THE AUTHOR HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
- .
+
THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS"
BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT,
UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
-
-Files: debian/*
-Copyright: 2013 Steffen Moeller <steffen_moeller at gmx.de>
-License: GPL-2+
- This package is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2 of the License, or
- (at your option) any later version.
- .
- This package is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
- .
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>
- .
- On Debian systems, the complete text of the GNU General
- Public License version 2 can be found in "/usr/share/common-licenses/GPL-2".
diff --git a/DOS.txt b/DOS.txt
new file mode 100644
index 0000000..0d30f1a
--- /dev/null
+++ b/DOS.txt
@@ -0,0 +1,233 @@
+This file contains a makefile for building ProFit under DOS using the
+GNU C-compiler
+
+This was kindly supplied by Wolfgang Schechinger
+(diabetes at lrz.uni-muenchen.de). I have neither tried this, nor can in
+any way support it, correct any problems, etc. Please address any
+problems or queries to Wolfgang.
+
+I have tidied up the text a little, but otherwise the rest of this
+file is attributable to Wolfgang.
+
+
+-------------------- START OF WOLFGANG'S FILE ----------------------
+
+Some notes:
+
+The command.com environment size in the config.sys file was set to
+2048 bytes.
+
+The major problems that occured during the setup of the makefile were:
+
++ DOS command lines have a length limit. Thus I decided to make the
+compiler produce all object files with very short names into one
+directory.
+
++ First, the linker couldn't find some mathematical functions until I
+added the parameter -lm as a linking option.
+
+Further details you will find as remarks in the makefiles.
+
+
+Since I had to add some modifications to the gognu.bat file that must
+be executed to set some environment variables to make the gcc work, I
+will send you this file too.
+
+The gcc compiler may be obtained via anonymous ftp from
+ftp.uni-regensburg.de. It's located in the directory
+/pub/freeware/software/dos/sprachen.386/
+
+[*** Added ACRM:
+You may also obtain the software from:
+ftp://micros.hensa.ac.uk/micros/ibmpc/f/f461/
+According to the FAQ in this directory, the minimum set of files required is:
+djeoeXXX.zip, djdevXXX.zip, gccXXXbn.zip, gasXXbn.zip, and bnuXXbn.zip
+***]
+
+I'm sure the work I did is quite a dirty hack: This is my first
+experience with a c compiler and the c language, so don't expect
+too much.
+
+If you should want to give away or sell executables produced with gcc,
+please follow the licence notes from gnu-c. (You then should give
+away the intermediate profit, not profit.exe. profit may be run using
+the runtime module go32.exe by entering go32 profit at the command
+line)
+
+Here comes the makefile. It must be placed in the ProFit/SRC directory.
+
+
+
+(beginning of file make.bat)
+-------------
+
+ at echo off
+goto start
+:: let's jump over the following section
+*************************************************************************
+This Batchfile was written at April 12 1996 by
+
+Wolfgang Schechinger
+Doktorshofstrasse 12
+92313 Berg-Hausheim
+Germany
+
+
+Phone: +49 (89) 30 79 31 24
+Fax: 30 81 73 3
+EMail: DIABETES at LRZ.UNI-MUENCHEN.DE
+
+Standard general disclaimer applies: no warranty for anything !
+I would be glad if anyone who is using this makefile would send me a picture
+postcard from her/his hometown!
+This code is dedicated to the public domain, but respect the licence from
+ProFit and the compiler
+
+**************************************************************************
+
+ Usage: of this file (MAKE.BAT)
+
+ After your system is set up for the gcc compiler, just type MAKE to
+ generate the 386 executable PROFIT.EXE
+ If you desire anonther filename for the executable than profit you may
+ specify a filename at the command line, e.g.: MAKE PROFIT_2
+
+
+ Enter make /h for obtaining help on command line options
+
+****************************************************************************
+
+:start
+:: now we will begin
+set cc=gcc
+:: the compiler we will use is gcc
+set flags= -c -O
+:: set the compiler flags to compile only and optimize
+
+set outfile=profit
+::the default executable filename will be profit
+
+if +%1+==+/?+ goto usage
+if +%1+==+?+ goto usage
+if +%1+==+/h+ goto usage
+if +%1+==+h+ goto usage
+goto start2
+:usage
+
+echo usage: make [outfile] [-detail]
+echo options MUST start witha a "-"
+echo you may specify an alternate output filename [outfile]
+echo if you want to see details during compilation, specify "-detail"
+goto stop
+
+:start2
+if +%1+==++ goto start3
+::no arguments ?
+if +%2+==+-detail+ set flags2= -v
+if +%2+==+-detail+ set outfile=%1
+if +%1+==+-detail+ set flags2= -v
+if not +%1+==+-detail+ set outfile=%1
+:: allow the user to define an alternate name for the executable
+::
+
+:start3
+:: step 1: generate object code from BIOPLIB files
+
+echo generating object code
+echo - be patient, this may take a while -
+
+%cc% %flags% main.c -o 1.o %flags2%
+%cc% %flags% todo.c -o 2.o %flags2%
+%cc% %flags% nwalign.c -o 3.o %flags2%
+%cc% %flags% fitting.c -o 4.o %flags2%
+cd bioplib
+%cc% %flags% apmatpdb.c -o ..\5.o %flags2%
+%cc% %flags% calcpdb.c -o ..\6.o %flags2%
+%cc% %flags% modpdb.c -o ..\7.o %flags2%
+%cc% %flags% pdb2seq.c -o ..\8.o %flags2%
+%cc% %flags% pdblist.c -o ..\9.o %flags2%
+%cc% %flags% parseres.c -o ..\a.o %flags2%
+%cc% %flags% readpdb.c -o ..\b.o %flags2%
+%cc% %flags% readpir.c -o ..\c.o %flags2%
+%cc% %flags% windio.c -o ..\d.o %flags2%
+%cc% %flags% writepdb.c -o ..\e.o %flags2%
+%cc% %flags% align.c -o ..\f.o %flags2%
+%cc% %flags% angle.c -o ..\g.o %flags2%
+%cc% %flags% array.c -o ..\h.o %flags2%
+%cc% %flags% fit.c -o ..\i.o %flags2%
+%cc% %flags% fsscanf.c -o ..\k.o %flags2%
+%cc% %flags% general.c -o ..\l.o %flags2%
+%cc% %flags% help.c -o ..\m.o %flags2%
+%cc% %flags% matrix.c -o ..\n.o %flags2%
+%cc% %flags% parse.c -o ..\o.o %flags2%
+%cc% %flags% throne.c -o ..\p.o %flags2%
+%cc% %flags% zonepdb.c -o ..\q.o %flags2%
+cd ..
+:: go back to mainfile directory
+
+:: step 2: generate object code from main files
+
+::now the .o files will be linked
+echo linking object code
+%cc% -o %outfile% 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o a.o b.o c.o d.o e.o f.o g.o h.o i.o k.o l.o m.o n.o o.o p.o q.o -lm -v
+
+:: the renaming/numbering of all .o-files is due to commad line length
+:: limitations -lm is necessary because the gcc-linker must find the
+:: /LIB/libm.a math library
+:: the executable outputfile is named profit
+
+echo stripping the executable
+strip %outfile%
+::remove some surplus stuff
+echo generating the DOS executable
+call aout2exe %outfile%
+:: generate an .exe file
+
+::do some cleanup
+echo doing some cleanup
+set flags=
+set flags2=
+set outfile=
+::free the environment
+del *.o
+::remove all .o files (they are not needed anymore)
+:end
+:stop
+echo finished!
+echo to run the executable you will need the file go32.exe
+
+
+
+--------------------------
+(end of make.bat)
+
+Here comes my modification of gognu.bat (I replaced some "/"s by "\"s)
+
+beginning of file gognu.bat
+--------------------------
+ at echo off
+echo environment variables prepared for gnu-cpp
+rem Ulrich Windl 18-Apr-91
+rem Minor modifications by Wolfgang Schechinger
+
+
+subst s: i:\compiler\gnu-cpp
+set gnucdrive=s:
+rem if "%1"=="" goto info
+set GCCINC=%gnucdrive%\INCLUDE
+set GCCLIB=%gnucdrive%\LIB
+set GCCBIN=%gnucdrive%\BIN
+set GCCTMP=%gnucdrive%\.
+set GO32=ansi driver %gnucdrive%/drivers/vga.grd gw 800 gh 600 tw 80 th 25
+set bison_simple=%GCCLIB%/bison.simple
+set bison_hairy=%GCCLIB%/bison.hairy
+set flex_skeleton=%GCCLIB%/flex.skeleton
+set oldpath=%path%
+set path=%oldpath%;%gnucdrive%\bin
+goto ende
+:info
+echo Aufruf: GOGNU Gnu-Wurzelverzeichnis
+:ende
+
+--------------------------
+end of file gognu.bat
diff --git a/HISTORY b/HISTORY
new file mode 100644
index 0000000..38647b6
--- /dev/null
+++ b/HISTORY
@@ -0,0 +1,254 @@
+ V0.1 25.09.92 Original
+ V0.2 02.10.92 Added CURSES support
+ V0.3 07.10.92 Added Amiga windows and paging support
+ V0.4 09.10.92 Added N&W alignment support & fixed bug in multi-zones
+ V0.5 08.10.93 Various tidying for Unix & chaned for booklib
+ V0.6 05.01.94 Modified MDMFILE for Unix getenv()
+ More tidying
+ Reads HELPDIR and DATADIR environment variables under
+ V0.7 24.11.94 Uses ReadPDBAtoms()
+ The DATAENV environment variable is now handled by code
+ in bioplib/align.c/ReadMDM; Checks the return from
+ ReadMDM(). Fixed bug in multi-zone align.
+ Some ammendments to ProFit.help to explain the use
+ of chain names in ZONE specifications.
+ V0.8 17.07.95 Fixed bugs in multi-chain fitting. Chain name must
+ come *before* residue number. Removed windowing code.
+ V1.0 19.07.95 Allows inserts in structures including insert names
+ in zones. Allows HETATM records to be included.
+ HETATMs are not read by default. However, by putting
+ a -h on the command line, or giving the HETATOM
+ command you may get ProFit to read HETATM
+ records. The extraction of sequence now ignores
+ HETATM records, so the problem with the ALIGN
+ command when there were HETATMs should have gone
+ away. There is now a RESIDUE command which gives a
+ by-residue RMS over the current RATOMS and RZONE
+ (ATOMS/ZONE if RATOMS/RZONE unspecified). If you
+ are reading commands into ProFit from a file rather
+ than from the keyboard, you no longer get loads of
+ ProFit> prompts appearing in your output which
+ tidies things up a bit. An end of file also ends the
+ ProFit program, so you no longer *need* to put QUIT
+ in the command file.
+ V1.1 20.07.95 Added WEIGHT/NOWEIGHT commands
+ Added translation to MATRIX output
+ V1.1a 21.07.95 Fixed core dump when trying to write unfitted file
+ V1.2 22.07.95 Added GAPPEN command
+ 25.07.95 Added printing of chain labels in Status
+ Bioplib's three->one translation now handles nucleic
+ acids.
+ RESIDUE command takes optional filename parameter to
+ direct output to a file.
+ V1.3 31.07.95 Fixed bug in ZONE command when end of zone=end of chain
+ which was not the last chain.
+ Prints number of residues in zones when a mismatch
+ occurs.
+ Potential out-of-bounds memory access fixed in weights
+ array.
+ V1.4 14.08.95 Fixed printing by residue which was not printing the
+ last residue.
+ V1.5 21.08.95 Fixed bug in creating zones from NW alignment when last
+ zone != end of chain. Also fixed bug in Bioplib/align.c
+ which was messing up the end of an alignment.
+ 24.08.95 Fixed a bug in Bioplib/align.c which caused potential
+ core dumps when an amino acid type not in the MDM matrix
+ was found. This should never cause a problem in ProFit
+ since these junk characters are never generated.
+--------
+ 27.09.95 Updated profit.hlp which still said insert codes are
+ not allowed in ZONEs
+ V1.5a 02.10.95 Added centre of geometry data to output of the MATRIX
+ command and updated profit.hlp with this information.
+ V1.5b 15.11.95 Prints score normalised by shorter sequence length
+ as well as matrix maximum score.
+ Updated profit.hlp which said numeric zones could be
+ specified as a single residue.
+ V1.6 20.11.95 Added READALIGNMENT command.
+--------
+ V1.6a 21.11.95 Fixed a couple of warnings under gcc
+ V1.6b 22.11.95 Modified code in SetNWZones() such that deletions at
+ the same position in both sequences don't cause a
+ problem. This was only a problem if an alignment was
+ read using READALIGNMENT which had a deletion in both
+ sequences.
+ V1.6c 13.12.95 Fix 1.6b wasn't working if a double deletion occurred
+ in the middle of a long run of deletions
+ V1.6d 24.01.96 Fixed bug in STATUS command when atom names contain
+ spaces. Modifications to docs.
+ Error reporting improved: no atoms read from PDB files
+ distinguished from no memory.
+ Fixed bioplib bug in converting an all HETATM file to
+ sequence.
+ V1.6e 31.05.96 Added BVALUE command
+ V1.6f 13.06.96 Added BWEIGHT command
+ V1.6g 18.06.96 Internal changes only --- replaced FindZone() by
+ FindZonePDB() in Bioplib
+ V1.7 23.07.96 Some tidying of comments
+ Fixed potential out-of-bounds array errors in atom
+ specifications
+ Added atom wildcard specifications
+ Various internal tidying and reorganisation of code
+ Bugs and potential crashes fixed:
+ a) When setting zones from a PIR alignment file, used
+ to fail if there were deletions in both sequence.
+ b) STATUS command didn't work properly if atom names
+ contained spaces.
+ c) Used to crash if you tried to use ALIGN on a file
+ which contained no amino acids.
+ d) Used to crash if you specified atom names of more
+ than 7 characters or more than 50 atom names
+ Error reporting improved: no atoms read from PDB files
+ distinguished from no memory.
+ New features:
+ a) BVALUE command: Allows a cutoff to be applied to
+ B-values; atom pairs where the atoms from either
+ structure has a B-value above this value are ignored
+ b) BWEIGHT command: Weights the fitting by the inverse
+ of the B-value such that mobile atoms are less
+ heavily weighted.
+ c) Atom name specifications may now include wildcards
+--------
+ V1.7a 06.11.96 Added -ve values for BVALUE
+ Added IGNORE/NOIGNORE
+ V1.7b 11.11.96 Added REF and MOB options to BVALUE keyword to allow
+ only one to be examined rather than an average
+ V1.7c 18.11.96 Added IGNOREMISSING option
+ V1.7d 20.12.96 Added NFITTED command
+ Fixed a bug in bioplib/ReadPDB() which would cause
+ completely blank lines in a PDB file to produce data
+ from the previous line.
+ V1.7e 27.06.97 Added ability to output from RESIDUE and WRITE to
+ a pipe.
+ V1.7f 03.07.97 Added break into CreateFitArrays() to fix core dump
+ on bad multiple-occupancy PDB files. If the 'same'
+ atoms don't appear immediately next to eachother
+ (they should), then all are read by ReadPDB() and
+ this routine was then comparing all with all.
+ Also warns of multiple occupancies on reading file.
+ Changed to use CD1 for ILE rather than CD.
+ V1.7g 06.05.98 Complete rewrite of NWAlign/SetNWZones(). The new
+ version is much simpler; it also fixes a bug which
+ occurred following 1-residue zones.
+ V1.8 07.05.98 Release of V1.7g
+ 1. New features:
+ a) The BVALUE command now takes a negative value
+ to reverse the sense of the cutoff and can
+ also be restructed to reference or mobile
+ structures rather than using an average
+ b) New option to ignore missing atoms
+ c) Added NFITTED command to show number of fitted atoms
+ d) RESIDUE and WRITE commands can now output to a pipe
+ 2. Bug fixes:
+ a) Completely blank lines in a PDB file caused
+ the previous line to be repeated
+ b) Multiple-occupancy PDB files where the 'same'
+ atoms didn't appear next to eachother in the
+ file caused a core dump
+ c) ILE C-delta correctly written as CD1 rather than CD
+ d) Zone setting from sequence alignment had a
+ bug with single residue zones
+ 3. Internal changes:
+ a) Complete rewrite of the zone setting based on
+ sequence alignment.
+--------
+ V2.0 01.03.01 Added support for multiple structure fitting and
+ iterative zone updating. Output atom names now exactly
+ match input names. Can write coordinates with centre
+ of geometry of fitted region at the origin. Added QUIET
+ option. Added LIMIT command to restrict region read from
+ an alignment to create fitting zones.
+ (not publicly released)
+ V2.1 28.03.01 Added cutoff parameter for ITERATE and added CENTRE
+ command
+ (not publicly released)
+ V2.2 20.12.01 Fixed some Bioplib problems related to raw atom names
+ and multiple occupancies
+ 1. New features:
+ a) Now supports fitting of multiple
+ structures. Each structure is fitted to the
+ reference and the average coordinates are
+ calculated. All structures are fitted to the
+ average and the process iterates to convergence.
+ b) Now supports iterative zone updating. After
+ an initial fit, atom pairs more distant than
+ a cutoff are discarded and atom pairs closer
+ than the cutoff are pulled in. The fitting
+ iterates to convergence
+ c) Coordinates can be written with centre of
+ geometry of fitted region at the origin
+ instead of translated to the reference set
+ d) Added QUIET option to switch off warning and
+ informational messages
+ e) Added LIMIT command to restrict the region
+ read from an alignment to create fitting zones
+ 2. Internal changes:
+ a) Code now writes the atom names exactly as
+ they were read instead of cleaning them up.
+ b) Some fairly major rewrites to support the
+ multiple structure fitting.
+ V2.3 01.12.04 Fixed a bug in reading sequence alignment with multiple
+ structures. Was core-dumping if there were double
+ deletions
+ V2.4 03.06.05 Fixed some more Bioplib problems related to raw atom
+ names and multiple occupancies
+ V2.3 and 2.4 are bug fix releases.
+ Bugs fixed were:
+ 1. handling of multiple occupancy atoms with new
+ Bioplib code for writing atom names back
+ properly after reading them
+ 2. a core dump on reading alignment files with
+ 'double deletions' (i.e. a deletion in both
+ sequences at the same position - when an
+ alignment is grabbed from a multiple alignment file)
+ V2.5 08.06.05 Fixed more related bugs
+ V2.5.1 10.06.05 Fixed bug in Bioplib related to getting the sequence from
+ a CA-only chain
+ V2.5.2 14.10.05 Enhanced Bioplib/ReadPDB() to cope with corrupt multiple
+ occupancies such as 1zeh/B13
+ V2.5.3 06.07.06 Fixed bug in correcting ILE-CD to ILE-CD1 - was writing
+ an extra space.
+ V2.5.3.1 Fixed a distribution bug - a file was missing from the
+ V2.5.3 distribution
+ V2.5.4 29.06.07 Prototypes in bioplib functions are now conditional
+ for MAC OSX to allow the code to compile
+ V2.6 16.06.08 Can centre the fitting on the CofG of a
+ user-specified residue rather than on the fitting
+ zones (SETCENTRE)
+ Can write the whole of a PDB file instead of just
+ the ATOM/HETATM records (HEADER)
+ Can delete zones (DELZONE/DELRZONE)
+ Can run scripts (SCRIPT)
+ Can specify zones using the B-value column (BZONE)
+ Can print distances between equivalenced atoms (PAIRDIST)
+ Can ignore atom pairs where distance is greater
+ than a cutoff in calculating RMSD (DISTCUTOFF)
+ Can read lower occupancy atoms (OCCRANK)
+ RESIDUE command now indicates atoms either wholly
+ or partly outside any cutoff specified by DISTCUTOFF
+ Handles lower-case chain names
+ Checks for overlapping zones
+ Checks sequences read with READALIGNMENT
+ V3.0 28.01.09 Multiple structure fitting now gives RMSD to first mobile
+ structure by default.(SETREF)
+ Calculation of the averaged reference structure for
+ multiple structure fitting is now weighted by the number of
+ mobile structures.(WTAVERAGE)
+ Can fit multiple structures in order of RMSD.(ORDERFIT)
+ Can run an all vs all comparison of mobile structures.
+ (ALLVSALL)
+ Can match symmetrical atoms.(SYMMATM)
+ Can perform iterative fitting on structures with more than
+ one chain.
+ Can read PIR alignments with more than one chain.(READALIGN)
+ Can align structures with more than one chain.(ALIGN)
+ Can output fitting zones as an alignment.(PRINTALIGN)
+ Added support for GNU Readline library.
+ V3.1 02.04.09 Windows support
+ Some fixes for NOFIT (clear structures on reading new
+ structure).
+ Added temporary fix to remove padding of input atom names
+ This fixes problems with specifying 3-letter atom names,
+ but still needs some further investigation
+
+--------
diff --git a/INSTALL b/INSTALL
new file mode 100644
index 0000000..5520f84
--- /dev/null
+++ b/INSTALL
@@ -0,0 +1,220 @@
+ ProFit V3.1
+ ===========
+
+ (c) Dr. Andrew C.R. Martin, SciTech Software, UCL, 1992-2009
+ (c) Dr. Craig Porter, UCL, 2008-2009
+
+
+To install ProFit, unpack the tar file. This will create a ProFit
+directory with a src subdirectory.
+
+Compiling under UNIX-like operating systems
+-------------------------------------------
+
+After unpacking the tar file, go into the src subdirectory of the
+ProFit directory. It is possible to edit the Makefile to allow the
+(optional) support for the XMAS library, the GNU Readline library or
+decompression of gzipped PDB files.
+
+Type:
+ make
+to create an executable file called 'profit'.
+
+Under recent Linux installations, GNU Readline support should just
+work. Uncomment the two lines in the Makefile:
+ READLINE = -DREADLINE_SUPPORT
+ READLINELIB = -lreadline -lcurses
+and compile as normal. You may need to install the readline development
+libraries first, which is done with a command like:
+ yum install readline-devel (RPM-based systems)
+ apt-get install libreadline5-dev (Debian-based systems)
+
+On other systems, you will need to obtain and install readline. See:
+http://directory.fsf.org/project/readline/
+
+If you need to install GNU readline manually, some notes are appended
+at the end of this document.
+
+
+Installing under UNIX-like operating systems
+--------------------------------------------
+
+Move the profit executable to somewhere in your path (e.g. ~/bin/ or
+/usr/local/bin)
+
+You should now create the environment variables HELPDIR and
+DATADIR. These should both point to the top ProFit directory where the
+files ProFit.help and mdm78.mat are stored. e.g.
+
+(csh) setenv HELPDIR /home/andrew/ProFitV3.1
+ setenv DATADIR /home/andrew/ProFitV3.1
+(sh) export HELPDIR=/home/andrew/ProFitV3.1
+ export DATADIR=/home/andrew/ProFitV3.1
+
+Alternatively, you may wish to store these files elsewhere, or have
+all help files and data files in a single directory.
+
+Under VAX/VMS-like operating systems, these should be ASSIGNs. e.g.
+ ASSIGN $A:[ANDREW.PROFIT] DATADIR
+ ASSIGN $A:[ANDREW.PROFIT] HELPDIR
+
+Compiling under DOS/Windows operating systems
+---------------------------------------------
+
+ProFit V3.1 compiles under Windows using the open source mingw
+compiler. See the mingw web site for details:
+http://www.mingw.org/wiki/HOWTO_Install_the_MinGW_GCC_Compiler_Suite
+It should also compile cleanly using commercial compilers such as the
+Microsoft, Intel or Borland compilers (though this has not been tested).
+
+To compile with mingw, first open a DOS shell and ensure that the
+mingw binary directory from your installation of mingw is in your
+path. For example:
+ PATH=%PATH%;C:\Qt\2009.01\mingw\bin
+Now change to the ProFit source directory:
+ cd ProFitVx.y\src
+Now run make by doing:
+ mingw32-make -f Makefile_dos
+That will create the executable profit.exe
+
+Installing under DOS/Windows operating systems
+---------------------------------------------
+
+If you only wish to run ProFit from the DOS prompt command line, or
+you are using Windows 95/98/ME, you can edit C:\autoexec.bat and
+add the lines
+ PATH=%PATH%;C:\My Documents\ProFitV3.1\src
+ SET HELPDIR=C:\My Documents\ProFitV3.1
+ SET DATADIR=C:\My Documents\ProFitV3.1
+(Note no double-inverted commas or escaping is required for spaces in
+directory names.)
+
+This will put the profit executable in your path and set the two
+environment variables. Of course you can move the files anywhere you
+want and modify the above commands as required.
+
+If you are using Windows NT/2000/XP or later, you must set environment
+variables as follows:
+ 1. Open Control Panel
+ 2. Click the System icon
+ 3. Go to the Advanced pane
+ 4. Click the "Environment Variables" button
+ 5. Select the "new" button to create a new environment variable
+ 6. Enter the variable name and value in the appropriate boxes,
+ creating HELPDIR and DATADIR as above.
+ 7. Edit the "PATH" variable, such that the directory in which you
+ have saved profit.exe is added to your path (or move profit.exe
+ to a directory already in your path).
+
+Alternatively, if you only plan to run ProFit by double-clicking its
+icon, simply ensure that
+ profit.exe
+ mdm78.mat
+ ProFit.help
+are all in the same directory. Double clicking the ProFit icon will
+then find the required files automatically.
+
+Running ProFit
+--------------
+
+You are now ready to run ProFit. You may specify the PDB files on the
+command line and may use a -h flag to include HETATM records. If you
+do not specify the files, you may read them in with the REFERENCE and
+MOBILE commands. The HETATOM and NOHETATOM commands toggle the reading
+of HETATM records.
+
+Once in ProFit, type HELP for further information or read the
+documentation!
+
+
+WARNING
+-------
+
+There is a known (but rarely seen) bug with ProFit where a fitted
+structure may be fitted 180 degrees away from its optimum fit. This
+only seems to affect fitting of identical structures. This appears to
+result from a saddle point in the RMS surface resulting in apparent
+convergence. While the effort to correct the bug is ongoing we have
+taken steps to fix the effect of the bug.
+
+Compiling with GCC with optimization on (-O3) seems to hide the
+bug. Alternatively, editing the Makefile and uncommenting the line
+ ROTATEREFIT = -DROTATE_REFIT
+will result in ProFit rotating a fitted structure (42 degrees, Z axis),
+refitting the structure then selecting the better fit.
+
+
+Installing the GNU Readline library
+-----------------------------------
+
+As noted above, most Linux installations will have the readline
+library installed already and all you need to do is uncomment the two
+lines in the Makefile:
+ READLINE = -DREADLINE_SUPPORT
+ READLINELIB = -lreadline -lcurses
+
+On recent versions of Linux, if this doesn't work, then you may have
+to install the readline development libraries with a command like:
+ yum install readline-devel (RPM-based systems)
+ apt-get install libreadline5-dev (Debian-based systems)
+
+If this doesn't work, or you are using another Unix system then
+proceed as follows:
+
+Download the latest version of GNU readline from
+http://cnswww.cns.cwru.edu/php/chet/readline/rltop.html
+At the time of writing, this is readline-6.0.tar.gz
+
+Unpack the gzipped tar file under /tmp
+ cd /tmp
+ tar -zxvf readline-6.0.tar.gz
+Change to the directory this creates and run configure:
+ cd readline-6.0
+ ./configure
+If you do not have write access to the /usr/local/ hierarchy,
+then you can install the files somewhere else:
+ ./configure --prefix=/home/my-user-name/packages
+Now build the readline library
+ make
+and install (become superuser first if installing under /usr/local)
+ make install
+
+Now, ensure that the directory where the library has been installed
+(/usr/local/lib/ by default) is in the search path. You can do this
+by setting the environment variable LD_LIBRARY_PATH
+(csh) setenv LD_LIBRARY_PATH /usr/local/lib
+(sh) export LD_LIBRARY_PATH=/usr/local/lib
+
+Alternatively, if you have root access, you can edit the file
+/etc/ld.so.conf to add the directory in which the library has been
+installed. Under recent Linux installations, there is another
+alternative which is to create a file /etc/ld.so.conf.d/readline.conf
+just containing a single line with the directory where the library has
+been installed. In either case, you must now (as root) type the command:
+ /sbin/ldconfig
+
+Now, modify the Makefile, such that this directory is in the linker's
+library path. Change:
+ READLINELIB = -lreadline -lcurses
+to:
+ READLINELIB = -L/usr/local/lib -lreadline -lcurses
+
+Now build with 'make' as usual, but endure that LD_LIBRARY_PATH is
+set whenever you want to run the program. Alternatively, install
+with
+ ./configure --prefix=/usr/lib
+to install in the main system directories and then it will be in the
+default search path. This isn't recommended unless you know what you
+are doing!
+
+You can also link the readline library statically to ensure
+portability to machines with Linux machines having different versions
+of the readline library installed. In this case you will not need the
+LD_LIBRARY_PATH or changes to /etc/ld.so.conf. To do this, edit the
+Makefile and change:
+ READLINELIB = -lreadline -lcurses
+to
+ READLINELIB = /usr/lib/libreadline.a -lcurses
+(changing '/usr/lib/' as required to point to wherever libreadline has
+been installed).
+
diff --git a/ProFit.help b/ProFit.help
new file mode 100644
index 0000000..10eb325
--- /dev/null
+++ b/ProFit.help
@@ -0,0 +1,867 @@
+#$
+
+
+ $ command
+
+ Any command preceeded by a $ is passed to the operating system.
+ This is useful for directory listings, typing or editing files,
+ etc.
+#REFERENCE
+
+
+ REFERENCE filename
+
+ Reads a reference PDB structure. Reading a structure will cause
+ any zones or atoms specified for RMS calculation to be reset.
+ By default, HETATM records will be ignored. (Use the HETATOM
+ command to change this, or start ProFit with -h.)
+#MOBILE
+
+
+ MOBILE filename
+
+ Reads a mobile PDB structure. This will be fitted to the Reference
+ structure. Reading a structure will cause any zones or atoms
+ specified for RMS calculation to be reset.
+ By default, HETATM records will be ignored. (Use the HETATOM
+ command to change this, or start ProFit with -h.)
+#FIT
+
+
+ FIT
+
+ Performs the actual fitting. Returns the RMS deviation over the
+ atoms included in the fit. Any zones or atoms specified for RMS
+ calculation will be reset to those specified for fitting. Should
+ RMS deviations be required over different areas of structure or
+ groups of atoms, the RZONE and RATOMS commands should be used to
+ specify the appropriate atoms. The RMS over these atoms will be
+ displayed immediately.
+#ATOMS
+
+
+ ATOMS atm[,atm]...
+
+ Specifies the atom subset to fit. * fits all atoms. A ~ or ^ may
+ be used to inverse the selection. If specified, it must be placed
+ at the start of the atom list.
+
+ Wildcards are also allowed. A % or a ? may be used to match a
+ single letter at any point in the specification while a * may be
+ used to match all remaining characters (thus C* is allowed,
+ but *G is not). The special characters may be escaped by
+ preceding them with a \.
+
+ The PDB atom name field is 4 characters wide followed by a space.
+ The first two characters are the right-justified element type, so
+ for normal protein and DNA atoms consist of a space followed by a N,
+ C, O, S or P. Thus the atom name field for a C-alpha contains
+ " CA ". HETATMs such as calcium will contain the two characters
+ CA in the first two fields. i.e. "CA ". When you specify an atom
+ type it is matched against the atom name field from the SECOND
+ CHARACTER ONWARDS, unless you preceed it with a <. Thus to match
+ a C-alpha you use "CA", but to match Calcium, you use "<CA".
+
+ If atom names contain spaces (e.g. in heme groups) the whole atom
+ specification must be enclosed in double inverted commas.
+
+ Examples:
+ ATOMS CA Fit only C-alphas
+ ATOMS <CA Fit only calcium atoms
+ ATOMS <CA,CA Fit calciums and C-alphas
+ ATOMS N,CA,C,O Fit N, C-alpha, C and O
+ ATOMS * Fit all atoms
+ ATOMS ~N,CA,C,O Fit all atoms except N, C-alpha, C and O
+ ATOMS C* Fit all carbon atoms
+ ATOMS ?G* Match all atoms at the gamma position
+ ATOMS C4\* Match the C4* atoms in DNA
+ ATOMS "N A,N B,N C" Fit atoms with names containing spaces
+#ZONE
+
+
+ ZONE CLEAR|((*|(X...[,n][/m])|(j-k))[:(*|(X...[,n][/m])|(j-k))])
+ where X... is an amino acid sequence
+ n is a number of residues
+ m is the occurrence number
+ j and k are residue specifications of the form
+ [chain][.]renum[insert]
+ Optional items are in square brackets
+ Alternatives are marked by a | and grouped in parentheses
+
+ Specifies residue zones to be fitted. Each zone is added to those
+ currently active. To clear all zones (i.e. fit all residues), the
+ ZONE CLEAR command may be given. ZONE * has the same effect.
+
+ In Residue Number mode, the residue numbers may be preceded by a
+ chain name. In Sequential Number mode, chain names will be ignored.
+
+ Optionally, the chain name may be separated from the residue number
+ by a full stop. Using the full stop also makes the statement case-
+ sensitive. In practice, the full stop separator is used with either
+ numeric or lowercase chain names.
+
+ Insertion codes may follow a residue number in Residue Number
+ mode.
+
+ Negative residue numbers must be escaped with a \
+
+ Note that sequence-specified zones automatically imply sequential
+ numbering.
+
+ Examples:
+ ZONE 24-34 Fits 24-34 in Reference with 24-34 in Mobile
+ ZONE H5-H10 Fits 5-10 in chain H of Reference with the
+ same region in Mobile
+ ZONE H* Fits just the H chains
+ ZONE 24-34:25-35 Fits 24-34 in Reference with 25-35 in Mobile
+ ZONE L25A-L30 Fits residues 25A-30 in the L chain
+ ZONE -10:50-59 Fits up to 10 in Reference with 50-59 in Mobile
+ ZONE \-4-10:\-1-13 Fits residues -4 to 10 with -1 to 13 in Mobile
+ ZONE *:1-100 Fits all in Reference with 1-100 in Mobile
+ ZONE CAR:VNS Fits first occurence of CAR in Reference with
+ first occurence of VNS in Mobile
+ ZONE CAR,10:VNS,10 Fits 10 residues from first occurence of CAR
+ in Reference with first occurence of VNS in
+ Mobile
+ ZONE CAR,5/2 Fits 5 residues from second occurence of CAR
+ in both structures
+ ZONE 24-34:EIR,11 Fits 24-34 in Reference with 11 residues from
+ first occurence of EIR in Mobile
+ ZONE b.1-b.60:4.* Fits 1-60 of chain b in Reference with all of
+ chain 4 in Mobile.
+#ALIGN_OLD
+
+
+ ALIGN
+
+ Performs Needleman and Wunsch sequence alignment on the sequences
+ of the two structures and displays the alignment. Any fitting zones
+ are automatically cleared and replaced with zones derived from the
+ equivalent regions in the alignment.
+
+ It will normally be necessary to use the ATOMS command to specify
+ that only backbone or C-alpha atoms are included in the fitting
+ calculations.
+#READALIGNMENT
+
+
+ READALIGNMENT filename
+
+ Reads an alignment in PIR sequence file format and sets zones based
+ on that alignment. Any previously defined fitting zones are
+ automatically cleared first.
+
+ The file format is standard PIR with the two sequences represented
+ by separate entries i.e. each must have a header of the form:
+
+ >P1;xxxxx
+ title text .......
+
+ If the PIR file contains multiple chains, it will be rejected. The
+ alignment is specified by introducing dash (-) signs into the
+ sequence.
+
+ The first sequence will be assumed to be that of the reference
+ structure and the second is that of the mobile structure. Any other
+ sequences in the file are ignored.
+
+ It will normally be necessary to use the ATOMS command to specify
+ that only backbone or C-alpha atoms are included in the fitting
+ calculations.
+#RATOMS
+
+
+ RATOMS atm[,atm]...
+
+ Specifies atoms over which to calculate the RMS. Fitting must
+ already have been performed. Any RZONE specifications which have
+ been given will be considered in the calculation. The RMS over
+ the specified atoms is displayed. * includes all atoms. A ~
+ or ^ may be used to inverse the selection. If specified, it must
+ be placed at the start of the atom list.
+
+ Wildcards are also allowed. A % or a ? may be used to match a
+ single letter at any point in the specification while a * may be
+ used to match all remaining characters (thus C* is allowed,
+ but *G is not). The special characters may be escaped by
+ preceding them with a \.
+
+ The PDB atom name field is 4 characters wide followed by a space.
+ The first two characters are the right-justified element type, so
+ for normal protein and DNA atoms consist of a space followed by a N,
+ C, O, S or P. Thus the atom name field for a C-alpha contains
+ " CA ". HETATMs such as calcium will contain the two characters
+ CA in the first two fields. i.e. "CA ". When you specify an atom
+ type it is matched against the atom name field from the SECOND
+ CHARACTER ONWARDS, unless you preceed it with a <. Thus to match
+ a C-alpha you use "CA", but to match Calcium, you use "<CA".
+
+ If atom names contain spaces (e.g. in heme groups) the whole atom
+ specification must be enclosed in double inverted commas.
+
+ Examples:
+ RATOMS CA RMS over C-alphas
+ RATOMS <CA RMS over calcium atoms
+ RATOMS <CA,CA RMS over calciums and C-alphas
+ RATOMS N,CA,C,O RMS over N, C-alpha, C and O
+ RATOMS * RMS over all atoms
+ RATOMS ~N,CA,C,O RMS over all atoms except N, C-alpha, C and O
+ RATOMS C* RMS over all carbon atoms
+ RATOMS ?G* RMS over all atoms at the gamma position
+ RATOMS C4\* RMS over the C4* atoms in DNA
+ ATOMS "N A,N B,N C" RMS over atoms with names containing spaces
+#RZONE
+
+
+ RZONE CLEAR|((*|(X...[,n][/m])|(j-k))[:(*|(X...[,n][/m])|(j-k))])
+ where X... is an amino acid sequence
+ n is a number of residues
+ m is the occurrence number
+ j and k are residue specifications of the form
+ [chain][.]renum[insert]
+ Optional items are in square brackets
+ Alternatives are marked by a | and grouped in parentheses
+
+ Specifies residue zones over which to calculate the RMS. Each zone
+ is added to those currently active. The RZONE CLEAR command will
+ cancel any RMS calculation zones that have been specified causing
+ the RMS to be calculated over the same zones used for fitting.
+ RZONE * has the same effect. Any RATOMS specifications which have
+ been given will be considered in the calculation.
+
+ In Residue Number mode, the residue numbers may be preceded by a
+ chain name. In Sequential Number mode, chain names will be ignored.
+
+ Optionally, the chain name may be separated from the residue number
+ by a full stop. Using the full stop also makes the statement case-
+ sensitive. In practice, the full stop separator is used with either
+ numeric or lowercase chain names.
+
+ Insertion codes may follow a residue number in Residue Number
+ mode.
+
+ Negative residue numbers must be escaped with a \
+
+ Note that sequence-specified zones automatically imply sequential
+ numbering.
+
+ Examples:
+ RZONE 24-34 RMS of 24-34 in Reference with 24-34 in Mobile
+ RZONE L25A-L30 RMS of residues 25A-30 in the L chain
+ RZONE 24-34:25-35 RMS of 24-34 in Reference with 25-35 in Mobile
+ RZONE -10:50-59 RMS of up to 10 in Reference with 50-59 in Mobile
+ RZONE \-4-10:\-1-13 RMS of -4 to 10 with -1 to 13 in Mobile
+ RZONE *:1-100 RMS of all in Reference with 1-100 in Mobile
+ RZONE H* RMS over just the H chain
+ RZONE CAR:VNS RMS of first occurence of CAR in Reference with
+ first occurence of VNS in Mobile
+ RZONE CAR,10:VNS,10 RMS of 10 residues from first occurence of CAR
+ in Reference with first occurence of VNS in
+ Mobile
+ RZONE CAR,5/2 RMS of 5 residues from second occurence of CAR
+ in both structures
+ RZONE 24-34:EIR,11 RMS of 24-34 in Reference with 11 residues from
+ first occurence of EIR in Mobile
+ RZONE b.1-b.60:4.* Fits 1-60 of chain b in Reference with all of
+ chain 4 in Mobile.
+#RMS
+
+
+ RMS
+
+ Recalculate the RMS deviation over the zones and atoms currently
+ defined with RZONE and RATOMS.
+#WRITE
+
+
+ WRITE [REFerence] filename
+
+ Writes the fitted structure to a PDB file. If the filename begins
+ with a pipe character (|), the coordinates are piped into the
+ specified program.
+
+ If the REFERENCE parameter is given then the reference set will be
+ written. This is only useful if the CENTRE command has been used!
+#MATRIX
+
+
+ MATRIX
+
+ Displays the centres of geometry, rotation matrix and translation
+ vector. The translation vector is the vector between the centres
+ of geometry. Thus to superimpose the mobile structure on the
+ reference structure using these data, you should translate the
+ mobile to the origin, apply the rotation matrix, translate back to
+ the original centre of geometry and finally apply the translation
+ vector.
+#STATUS
+
+
+ STATUS [filename]
+
+ Reports current program status.
+
+ If the optional filename parameter is given, output is directed to
+ the specified file. If the file cannot be opened or a filename is
+ not specified, output appears on the screen. If the filename begins
+ with a pipe character (|), the results are piped into the
+ specified program.
+#QUIT
+
+
+ QUIT
+
+ Exits from the program.
+#NUMBER
+
+
+ NUMBER (RESIDUE|SEQUENTIAL)
+
+ Specifies whether numeric zones are based on residue numbers in the
+ PDB file or on sequential numbering (running through all chains).
+
+ In Residue Number mode, the residue numbers may be preceeded by a
+ chain name. In Sequential Number mode, chain names will be ignored.
+#RESIDUE
+
+
+ RESIDUE [filename]
+
+ Gives a by-residue RMS on the currently specified RATOMS (or ATOMS
+ if RATOMS has not been specified) over the currently specified
+ RZONE (or ZONE if RZONE has not been specified).
+
+ If a distance cutoff is set (using DISTCUTOFF) then residues fully
+ outside the distance cutoff are flagged with "**" and residues
+ partially outside distance cutoff are flagged with "*".
+
+ If the optional filename parameter is given, output is directed to
+ the specified file. If the file cannot be opened or a filename is
+ not specified, output appears on the screen. If the filename begins
+ with a pipe character (|), the results are piped into the
+ specified program.
+#HETATOMS
+
+
+ HETATOMS
+
+ Read HETATM records with subsequent MOBILE and REFERENCE commands.
+ This is the default if the -h flag has been given on starting
+ ProFit.
+#NOHETATOMS
+
+
+ NOHETATOMS
+
+ Do not read HETATM records with subsequent MOBILE and REFERENCE
+ commands. This is the default unless the -h flag has been given
+ on starting ProFit.
+#WEIGHT
+
+
+ WEIGHT
+
+ Weight the fitting by the mean of the B-values in the equivalent
+ atoms. Normally, you wouldn't use this with real B-values, but
+ with some other weight parameter (e.g. SSAP scores).
+#BWEIGHT
+
+
+ BWEIGHT
+
+ Weight the fitting by the inverse of the mean of the B-values in
+ the equivalent atoms. This is useful for genuine weighting by
+ B-values (i.e. mobile atoms will be less heavily weighted).
+#NOWEIGHT
+
+
+ NOWEIGHT
+
+ Normal, non-weighted fitting.
+#GAPPEN
+
+
+ GAPPEN val [val]
+
+ Allows you to specify an integer gap penalty and gap extension penalty
+ for the sequence alignment performed by the ALIGN command.
+ The default values for the gap penalty and gap extension penalty are
+ 10 and 2 resp.
+#BVALUE
+
+
+ BVALUE cutoff [REF|MOB]
+
+ Allows you to specify a B-value cutoff. Any atoms with B-values
+ greater than this value will be ignored completely in both the
+ fitting and RMS calculations. The B-value may not be higher than
+ this value in either the reference or the mobile structure.
+ For example, if you specify 10, then atoms with B-values greater
+ than 10 will be ignored.
+
+ If the cutoff you supply is negative, any atoms with B-values
+ less than the absolute value you specify will be ignored.
+ For example, if you specify -10, then atoms with B-values less
+ than 10 will be ignored.
+
+ The optional REF or MOB parameter restricts checking of B-values
+ to the specified structure.
+
+ Specify BVALUE * to allow all BValues
+#IGNOREMISSING
+
+
+ IGNOREMISSING
+
+ By default, atom mismatches during fitting caused by atoms
+ missing in one of the two structures causes the program to report
+ an error. This command causes the program to issue a warning,
+ but to proceed with the fitting, ignoring the mismatched atom(s).
+ (See NOIGNOREMISSING.)
+#NOIGNOREMISSING
+
+
+ NOIGNOREMISSING
+
+ Restores the default behaviour of reporting an error and stopping
+ the fitting procedure if atom mismatches occur during fitting.
+ (See IGNOREMISSING.)
+#NFITTED
+
+
+ NFITTED
+
+ Reported the number of atom pairs fitted in the last fitting
+ operation. Note that this is not the number of residues fitted
+ unless you are only fitting on CA atoms.
+#ITERATE
+
+
+ ITERATE [limit | OFF]
+
+ Iterates the fitting zones during future FIT commands.
+
+ Note that this immediately does an "ATOMS CA" since iteration of
+ zones is only performed on C-alpha atoms. See notes below if you
+ want to calculate an RMSD over other atoms.
+
+ After the initial fit on the specified zones, the zones are updated
+ such that residue pairs with C-alpha atoms within 'limit' Angstroms
+ (default 3.0A) are included and those more distant are excluded.
+ The optimum set of equivalences is obtained using a dynamic
+ programming method.
+
+ After updating the zones, the structures are refitted and the
+ procedure iterates to convergence (typically 3 or 4 cycles). The
+ RMSD on C-alpha atoms is shown after each cycle unless the QUIET
+ command is given.
+
+ As stated above, the ITERATE command implies ATOMS CA. Having
+ fitted on C-alpha atoms, you can of course display the RMSD over
+ other atom sets in the usual way using the RATOMS command (e.g.
+ RATOMS N,CA,C,O will display the backbone RMSD).
+
+ Should you wish to refit on another atom set using the iterated
+ zones, simply use ITERATE OFF to switch off iteration, select the
+ atom set required using the ATOMS command and use FIT to refit the
+ structures in the usual way. For example, to fit on backbone atoms:
+
+ ITERATE OFF
+ ATOMS N,CA,C,O
+ FIT
+#MULTI
+
+
+ MULTI filename
+
+ Fits multiple structures. 'filename' contains a list of structure
+ files to be fitted.
+
+ MULTI causes a set of structure files to be read in. The filenames
+ are given within the file specified in the command. The first
+ structure file is used as a reference set for the first fitting
+ stage but the coordinates are averaged after each fitting stage
+ to derive an averaged template used for subsequent fitting.
+
+ i.e. Given N files to fit, file 2 is fitted to file 1 and an
+ averaged structure, A, is calculated, file 3 is then fitted to A
+ and a new average, A' is calculated. This continues until all N
+ structures have been fitted. The whole procedure iterates until
+ convergence (typically 3 or 4 cycles).
+
+ The resulting fitted files are written with the MWRITE command.
+ Note that there is no "reference" set in the sense used for
+ 2-structure fitting; fitted versions of all N files will be written
+ since the reference set is actually an averaged template.
+
+ Progress and RMSDs are reported at each iteration unless the QUIET
+ command is used.
+#QUIET
+
+
+ QUIET [OFF]
+
+ Switches off all warning and informational messages
+#MWRITE
+
+
+ MWRITE [ext]
+
+ Writes multiple fitting results.
+
+ The filenames are the same as the input file but with the extension
+ replaced by that specified in the command. If no extension is
+ specified then '.fit' will be used. If the input file contained no
+ extension, then the extension specified will be appended to the
+ filename.
+
+ Note that since only the extension is changed when writing back the
+ fitted files, you must have permission to write to the directory
+ from which the original files were read.
+#LIMIT
+
+
+ LIMIT (start stop | OFF)
+
+ When obtaining fit zones from a sequence alignment, using READALIGNMENT,
+ limit the zones to use residues between the specified positions in the
+ alignment.
+
+ For example, if the alignment were:
+
+ 1 2 3
+ 123456789012345678901234567890123
+ ASAHSTGEHNM--PLELLGHISLAM---NPRTY
+ ---HSTADHNLRTPLEVLG--SLAMEDRQPRTY
+
+ the zones would normally be taken from the following positions
+ in the alignment:
+ 4-11, 14-19, 22-25, 29-33
+
+ By using the command:
+ LIMIT 20 28
+ only the zone from 22-25 would be included.
+
+ This is particularly useful in conjunction with the ITERATE command
+ and fitting of multiple structures.
+#CENTRE
+
+
+ CENTRE [OFF]
+
+ Causes coordinates to be written centred about the centre of
+ geometry of the fitted region rather than translated back to the
+ reference set's location. If only fitting 2 structures, the
+ WRITE REFERENCE command is required to write the reference set
+ in the same coordinate frame. (With multiple structure, MWRITE
+ will write the reference set in any case.)
+#CENTER
+
+
+ CENTER [OFF]
+
+ See CENTRE
+#DELZONE
+
+
+ DELZONE ALL|((*|(X...[,n][/m])|(j-k))[:(*|(X...[,n][/m])|(j-k))])
+ where X... is an amino acid sequence
+ n is a number of residues
+ m is the occurrence number
+ j and k are residue specifications of the form
+ [chain][.]renum[insert]
+ Optional items are in square brackets
+ Alternatives are marked by a | and grouped in parentheses
+
+ Specifies zones to be deleted from the user-defined list of fit zones.
+ The DELZONE command uses the same syntax as the ZONE command. The command
+ matches the specified zone with a zone in the user-defined list of fitting
+ zones and deletes the matching zone from the list. Entering either
+ DELZONE ALL or DELZONE * will delete all user-defined zones.
+#DELRZONE
+
+
+ DELRZONE ALL|((*|(X...[,n][/m])|(j-k))[:(*|(X...[,n][/m])|(j-k))])
+ where X... is an amino acid sequence
+ n is a number of residues
+ m is the occurrence number
+ j and k are residue specifications of the form
+ [chain][.]renum[insert]
+ Optional items are in square brackets
+ Alternatives are marked by a | and grouped in parentheses
+
+ Specifies zones to be deleted from the list of user-defined zones
+ for calculating the RMSd. The DELRZONE command uses the same syntax
+ as the ZONE command. The command matches the specified zone with a
+ zone in the user-defined list of RMSd calculation zones and deletes
+ the matching zone from the list. Unlike the RZONE command, entering
+ either DELRZONE ALL or DELRZONE * will delete all user-defined RMSd
+ calculation zones rather than returning to the default condition where
+ the RMSd calculation zones are set to the user-defined fitting zones.
+#BZONE
+
+
+ BZONE
+
+ Sets fitting zones based on markers in the temperature factor (B value)
+ column Zones are marked using a positive whole numbers while zeros are
+ ignored. Multiple zones can be marked using additional numbers.
+
+ Assignment of zones is carried out in two possible ways. If only the
+ reference structure is marked then the marked section will be added as a
+ fitting zone in both the reference and mobile structure. If both the
+ refernece and the mobile structure are marked then fitting zones are
+ assigned by scanning through and setting zones for coressponding
+ continious stretches of flagged residues in either the reference or mobile
+ structures.
+#SETCENTRE
+
+
+ SETCENTRE CLEAR|(*|i[:j])
+
+ where i and j are residue specifications of the form [chain]resnum[insert]
+ Optional items are in square brackets
+ Alternatives are marked by a | and grouped in parentheses
+
+ Specifies a single residue as the centre of fitting. Entering
+ SETCENTRE CLEAR or SETCENTRE * will clear the centre residue.
+#SETCENTER
+
+
+ SETCENTER
+
+ See SETCENTRE
+#SCRIPT
+
+
+ SCRIPT filename
+
+ Executes a script file. When a script file is run, messages
+ indicating the start and end of the script are sent to stdout if quiet
+ mode is off. A comment marker (#) at the beginning of a line will echo
+ the line to stdout, a useful method for annotating output.
+
+ It is possible to run a script from within a script using the
+ SCRIPT command. ProFit tracks the number of open/nested scripts and
+ will allow up to 1000 nested scripts to be open. The assumption is
+ that if over a thousand scripts are open then ProFit has been sent
+ into an infinite loop (for instance by having a script call itself).
+#DISTCUTOFF
+
+
+ DISTCUTOFF [cutoff|ON|OFF]
+
+ Specifies a distance cutoff for ignoring atom pairs outside a
+ specified distance. Entering "DISTCUTOFF ON" or "DISTCUTOFF OFF" will
+ turn the distance cutoff on or off. Entering "DISTCUTOFF 2.5" will set
+ the value of the distance cutoff to 2.5 Angstroms and turn the
+ distance cutoff on. A warning is displayed if the distance cutoff is
+ set to zero and turned on.
+#PAIRDIST
+
+
+ PAIRDIST
+
+ Prints the pairwise distances between equivalent atom pairs in the
+ reference and mobile structure. If the distance cutoff is set then
+ residues outside the distance cutoff are flagged with "*".
+#HEADER
+
+
+ HEADER [ON | OFF]
+
+ Include PDB header and trailer records when writing structures.
+ By default, only the coordinate section of a file is output when a
+ structure is written.
+#OCCRANK
+
+
+ OCCRANK n
+
+ Sets ProFit to read the nth ranked highest occupancy atom position for
+ alternative atom positions.
+
+ For structure files containing partial occupancies, lower occupancy atoms
+ can be read using by setting the occupancy rank parameter to read
+ alternative atom positions.
+
+ By default, OCCRANK is set to 1 and reads the highest ranked atom position,
+ a setting of 2 will read the second most occupied position and a setting of
+ 3 will read the third most occupied position, etc.
+##
+
+
+ # comment
+
+ Any comment preceeded by the comment marker, #, is echoed to to stdout.
+ This is useful for including comments from scripts.
+#ALLVSALL
+
+
+ ALLVSALL [filename]
+
+ Performs an all versus all comparison of the mobile structures when using
+ MULTI. Results are presented as tab-delimited text suitable for loading
+ into a spreadsheet.
+
+ If the optional filename parameter is given, output is directed to
+ the specified file. If the file cannot be opened or a filename is
+ not specified, output appears on the screen. If the filename begins
+ with a pipe character (|), the results are piped into the
+ specified program.
+
+ ALLVSALL outputs the RMSD for each pair of mobile structures. Followed by
+ a tab-delimited grid of all the results suitable for importing into a
+ speadsheet. As the output grid fills screen when comparing large numbers
+ of structures, the tab-delimited section of the output can be turned off
+ using the QUIET command.
+
+ ALLVSALL calls TRIMZONES command.
+#SETREF
+
+
+ SETREF [n]
+ where n is the number of the mobile structure.
+
+ Sets the reference structure to the nth mobile structure when using MULTI.
+
+ If no structure number is given then the reference is automatically set
+ by performing an all versus all comparison of the mobile structures then
+ selecting the structure with the lowest overall RMSD to the other mobile
+ structures.
+#TRIMZONES
+
+
+ TRIMZONES
+
+ This command is used primarily with fitting zones derived using ALIGN.
+ With pairwise alignments, the lengths of the aligned regions may vary and
+ there may be gaps in the alignments from one structure to another. The
+ TRIMZONES command trims the ends of the aligned zones and adds gaps
+ allowing for a like versus like comparison by using fitting zones that are
+ common to all the structures.
+
+ TRIMZONES is automatically called by the ALLVSALL and SETREF commands.
+ This command is only used with MULTI.
+#ORDERFIT
+
+
+ ORDERFIT
+
+ Performs a fit of all mobile structures to the reference structure. The
+ most similar structures are fitted first.
+
+#PRINTALIGN
+
+
+ PRINTALIGN [FASTA|PIR] [filename]
+
+ Prints current fitting zones as a sequence alignment.
+
+ The default output is a (user-friendly) pairwise alignment with the
+ reference and mobile sequences printed as pairs of 60-character wide
+ lines. The optional FASTA or PIR parameters sets the printout to
+ (machine-friendly) FASTA or formatting.
+
+ If the first character of the (optional) filename is a pipe character,
+ then the results will be piped into the specified program.
+
+ ProFit can read PIR-formatted files using the READALIGN command.
+#ALIGN
+
+
+ ALIGN [[WHOLE|*]|[zone [APPEND]]]
+ where zone = a standard zone definition.
+
+ Performs Needleman and Wunsch sequence alignment on the sequences
+ and displays the alignment. Any fitting zones are automatically
+ cleared and replaced with zones derived from the equivalent regions
+ in the alignment.
+
+ It will normally be necessary to use the ATOMS command to specify
+ that only backbone or C-alpha atoms are included in the fitting
+ calculations.
+
+ There is a choice of three alignment options:
+
+ 1. A default chain-by-chain alignment:
+ Chain A in the mobile is aligned with chain A in the reference,
+ chain B with chain B, etc...
+ If the number of chains doesn't match then a warning is issued.
+
+ 2. The WHOLE option gives a whole sequence alignment.
+ The whole sequence (regardless of chain ID) is aligned. Breaks
+ between chains are introduced at the appropriate positions.
+
+ 3. Align Zones
+ Allows the user to define a zone to be aligned.
+ Also, it is possible to append new zones onto the end of the zone
+ list rather than overwrite the current zone list when using this
+ option.
+
+ For example one could use following commands:
+
+ align a*:b*
+ align b*:a* append
+
+ to align chain A with chain B and then B with A.
+
+ When doing multiple fitting, it is not possible to define regions
+ using the colon notation to define regions on both the reference
+ and mobile structures. (This is the same restriction as the ZONE
+ command.)
+#WTAVERAGE
+
+
+ WTAVERAGE [ON|OFF]
+
+ Sets the weighting system for the averaged reference structure to the
+ default weighting system where the change in the coordinates of the
+ reference structure is inversely proportional to the number of mobile
+ strucures. The weighted averaging scheme was introduced to lower the
+ effect that outlying structures have on the averaged reference.
+
+ The alternative weighting scheme sets the coordinates of the
+ reference stucture to the average of the reference and the mobile
+ structures. This was the scheme used by ProFit prior to version 3.0.
+#MULTREF
+
+
+ MULTREF [OFF]
+
+ Sets RMSd calculations to give values to the averaged reference rather
+ than the first mobile structure.
+#NOFIT
+
+
+ NOFIT
+
+ Sets the fitted flag in profit allowing the user to calculate the RMSD
+ on a structure without fitting.
+#SYMMATOMS
+
+
+ SYMMATOMS [[OFF|ON|ALL]|xxx [OFF|ON]]
+
+ where xxx is a three-letter amino acid code.
+
+ Enabes the auto-matching of symmetrical atoms (eg CD1 - CD2 and
+ CE1 - CE2 of tyrosine) in ProFit
+
+ SYMMATOMS matches charged oxygens and nitrogens on arginine, aspartate
+ and glutamate residues and the delta and epsilon carbons of phenylalanine
+ and tyrosine residues.
+
+ It is also possible to match the nitrogen and oxygen atoms of the amide
+ sidechains of asparagine and glutamine residues and the prochiral methyl
+ groups of valine and leucine.
+
+ Typing SYMMATOMS will display the pairs of atoms currently matched by
+ ProFit. Typing SYMMATOMS ON or SYMMATOMS OFF will turn symmetrical atom
+ matching on or off.
+
+ Individual residue types, for example ASP, can be turned-on or off by
+ typing SYMMATOMS ASP ON or SYMMATOMS ASP OFF, respectively. Alternatively,
+ SYMMATOMS ALL ON will turn all atom pairs on.
+
+ By default, the the matching of symmetrical atoms is turned-off
diff --git a/debian/README.Debian b/debian/README.Debian
deleted file mode 100644
index 60f469c..0000000
--- a/debian/README.Debian
+++ /dev/null
@@ -1,12 +0,0 @@
-profit for Debian
------------------
-
-/usr/bin/profit sadly clashes with the emboss package, for a very
-different funtion.
-
-The license of ProFit does not allow a redistribution.
-
-The documentation is not built, since it demanded too much of LaTeX to be installed,
-also for a conversion to HTML. It ships with the original source code.
-
- -- Steffen Moeller <moeller at debian.org> Wed, 05 Jun 2013 17:38:14 +0200
diff --git a/debian/README.source b/debian/README.source
deleted file mode 100644
index 686bb84..0000000
--- a/debian/README.source
+++ /dev/null
@@ -1,7 +0,0 @@
-profit for Debian
------------------
-
-No source should be redistributed.
-
-
-
diff --git a/debian/bin/profit b/debian/bin/profit
deleted file mode 100755
index 0148339..0000000
--- a/debian/bin/profit
+++ /dev/null
@@ -1,5 +0,0 @@
-#!/bin/sh
-
-export HELPDIR=/usr/share/profit/
-export DATADIR=/usr/share/profit/
-/usr/lib/profit/profit $*
diff --git a/debian/changelog b/debian/changelog
deleted file mode 100644
index b0ba23d..0000000
--- a/debian/changelog
+++ /dev/null
@@ -1,6 +0,0 @@
-profit (3.1-1) UNRELEASED; urgency=low
-
- * Initial packaging that would (closes: #525428) if package would be redistributable
- * The license does not allow a redistribution.
-
- -- Steffen Moeller <moeller at debian.org> Wed, 05 Jun 2013 17:38:14 +0200
diff --git a/debian/compat b/debian/compat
deleted file mode 100644
index f599e28..0000000
--- a/debian/compat
+++ /dev/null
@@ -1 +0,0 @@
-10
diff --git a/debian/control b/debian/control
deleted file mode 100644
index 1b8f2e1..0000000
--- a/debian/control
+++ /dev/null
@@ -1,46 +0,0 @@
-Source: profit
-Maintainer: Steffen Moeller <moeller at debian.org>
-Section: non-free/science
-XS-Autobuild: no
-Priority: optional
-Build-Depends: debhelper (>= 10),
- libreadline-dev
-Standards-Version: 3.9.8
-Vcs-Browser: http://anonscm.debian.org/viewvc/debian-med/trunk/packages/profit/trunk/
-Vcs-Svn: svn://anonscm.debian.org/debian-med/trunk/packages/profit/trunk/
-Homepage: http://www.bioinf.org.uk/software/profit/
-
-Package: profit-3d
-Architecture: any
-Depends: ${shlibs:Depends},
- ${misc:Depends},
- profit-data
-Description: Protein structure alignment
- ProFit is designed to be the ultimate protein least squares fitting
- program. It has many features including flexible specification of
- fitting zones and atoms, calculation of RMS over different zones or
- atoms, RMS-by-residue calculation, on-line help facility, etc.
- .
- The binary is named profit-3D because of a name clash with an
- EMBOSS tool.
-
-Package: profit
-Architecture: any
-Depends: ${shlibs:Depends},
- ${misc:Depends},
- profit-3d
-Conflicts: emboss
-Description: Protein structure alignment
- ProFit is designed to be the ultimate protein least squares fitting
- program. It has many features including flexible specification of
- fitting zones and atoms, calculation of RMS over different zones or
- atoms, RMS-by-residue calculation, on-line help facility, etc.
- .
- A symbolic link is provided to have the binary name back to how
- it is historically correct.
-
-Package: profit-data
-Architecture: all
-Recommends: profit
-Description: Help and data files for Profit
- Nothing to be redistributed.
diff --git a/debian/docs b/debian/docs
deleted file mode 100644
index 70d0f4c..0000000
--- a/debian/docs
+++ /dev/null
@@ -1,4 +0,0 @@
-DOS.txt
-HISTORY
-INSTALL
-00Read.Me
diff --git a/debian/patches/AllowReadline.patch b/debian/patches/AllowReadline.patch
deleted file mode 100644
index 02b9e80..0000000
--- a/debian/patches/AllowReadline.patch
+++ /dev/null
@@ -1,15 +0,0 @@
-Index: profit-3.1/src/Makefile
-===================================================================
---- profit-3.1.orig/src/Makefile
-+++ profit-3.1/src/Makefile
-@@ -6,8 +6,8 @@
- # To allow use of the GNU Readline library, uncomment the following two
- # lines - You may need to install the GNU readline development libraries
- # first!
--#READLINE = -DREADLINE_SUPPORT
--#READLINELIB = -lreadline -lcurses
-+READLINE = -DREADLINE_SUPPORT
-+READLINELIB = -lreadline -lcurses
-
- # Uncomment if you want to use the rotate and refit code for avoiding
- # local minimum problem. If compiled with gcc -O3 this problem seems to
diff --git a/debian/patches/AllowRedundantCleans.patch b/debian/patches/AllowRedundantCleans.patch
deleted file mode 100644
index d2ccd9f..0000000
--- a/debian/patches/AllowRedundantCleans.patch
+++ /dev/null
@@ -1,13 +0,0 @@
-Index: profit-3.1/src/Makefile
-===================================================================
---- profit-3.1.orig/src/Makefile
-+++ profit-3.1/src/Makefile
-@@ -54,7 +54,7 @@
- $(CC) $(COPT) $(READLINE) $(ROTATEREFIT) -o $@ -c $<
-
- clean :
-- /bin/rm $(OFILES) $(LFILES)
-+ /bin/rm -f $(OFILES) $(LFILES)
-
- protos : $(PFILES)
-
diff --git a/debian/patches/series b/debian/patches/series
deleted file mode 100644
index 57acc99..0000000
--- a/debian/patches/series
+++ /dev/null
@@ -1,2 +0,0 @@
-AllowReadline.patch
-AllowRedundantCleans.patch
diff --git a/debian/profit-3d.dirs b/debian/profit-3d.dirs
deleted file mode 100644
index ab91ea3..0000000
--- a/debian/profit-3d.dirs
+++ /dev/null
@@ -1,2 +0,0 @@
-usr/lib/profit
-usr/bin
diff --git a/debian/profit-3d.install b/debian/profit-3d.install
deleted file mode 100644
index 68f0f43..0000000
--- a/debian/profit-3d.install
+++ /dev/null
@@ -1 +0,0 @@
-src/profit usr/lib/profit/
diff --git a/debian/profit-data.dirs b/debian/profit-data.dirs
deleted file mode 100644
index 358de10..0000000
--- a/debian/profit-data.dirs
+++ /dev/null
@@ -1 +0,0 @@
-usr/share/profit
diff --git a/debian/profit-data.install b/debian/profit-data.install
deleted file mode 100644
index b7b3ef6..0000000
--- a/debian/profit-data.install
+++ /dev/null
@@ -1,2 +0,0 @@
-mdm78.mat usr/share/profit/
-ProFit.help usr/share/profit/
diff --git a/debian/profit.links b/debian/profit.links
deleted file mode 100644
index eae6ab6..0000000
--- a/debian/profit.links
+++ /dev/null
@@ -1 +0,0 @@
-usr/bin/profit-3D usr/bin/profit
diff --git a/debian/rules b/debian/rules
deleted file mode 100755
index 9b0a1a2..0000000
--- a/debian/rules
+++ /dev/null
@@ -1,18 +0,0 @@
-#!/usr/bin/make -f
-
-# Uncomment this to turn on verbose mode.
-#export DH_VERBOSE=1
-
-%:
- dh $@
-
-override_dh_auto_build:
- $(MAKE) -C src
-
-override_dh_auto_clean:
- $(MAKE) -C src clean
-
-override_dh_install:
- dh_install
- cp $(CURDIR)/src/profit $(CURDIR)/debian/profit-3d/usr/bin/profit-3D
-
diff --git a/debian/source/format b/debian/source/format
deleted file mode 100644
index 163aaf8..0000000
--- a/debian/source/format
+++ /dev/null
@@ -1 +0,0 @@
-3.0 (quilt)
diff --git a/debian/watch b/debian/watch
deleted file mode 100644
index 537278d..0000000
--- a/debian/watch
+++ /dev/null
@@ -1,7 +0,0 @@
-version=4
-
-opts=dversionmangle=s/.*/0.No-Track/ \
- https://people.debian.org/~eriberto/ FakeWatchNoUpstreamTrackingForThisPackage-(\d\S+)\.gz
-
-# Download page:
-# http://www.bioinf.org.uk/cgi-bin/AndrewMartin/swreg/swreg.pl
\ No newline at end of file
diff --git a/doc/Makefile b/doc/Makefile
new file mode 100644
index 0000000..c4d8923
--- /dev/null
+++ b/doc/Makefile
@@ -0,0 +1,25 @@
+DEST = /acrm/www/html/software/profit/
+all : ProFit.ps ProFit.pdf .web
+
+
+ProFit.ps : ProFit.dvi
+ dvips -o $@ $<
+
+ProFit.dvi : ProFit.tex
+ latex ProFit
+ latex ProFit
+
+ProFit.pdf : ProFit.tex ProFit.dvi
+ pdflatex ProFit
+
+.web : ProFit.pdf
+ latex2html -local_icons -split 4 ProFit
+ touch .web
+
+install :
+ \cp -R ProFit/* $(DEST)/doc
+ \cp ProFit.css $(DEST)/doc
+# \cp ProFit.pdf $(DEST)
+
+clean :
+ \rm -rf ProFit.aux ProFit.dvi ProFit.log ProFit.ps ProFit
diff --git a/doc/ProFit.css b/doc/ProFit.css
new file mode 100644
index 0000000..c2026fa
--- /dev/null
+++ b/doc/ProFit.css
@@ -0,0 +1,52 @@
+/* Century Schoolbook font is very similar to Computer Modern Math: cmmi */
+.MATH { font-family: "Century Schoolbook", serif; }
+.MATH I { font-family: "Century Schoolbook", serif; font-style: italic }
+.BOLDMATH { font-family: "Century Schoolbook", serif; font-weight: bold }
+
+/* implement both fixed-size and relative sizes */
+SMALL.XTINY { font-size : xx-small }
+SMALL.TINY { font-size : x-small }
+SMALL.SCRIPTSIZE { font-size : smaller }
+SMALL.FOOTNOTESIZE { font-size : small }
+SMALL.SMALL { }
+BIG.LARGE { }
+BIG.XLARGE { font-size : large }
+BIG.XXLARGE { font-size : x-large }
+BIG.HUGE { font-size : larger }
+BIG.XHUGE { font-size : xx-large }
+
+
+/* mathematics styles */
+DIV.displaymath { } /* math displays */
+TD.eqno { } /* equation-number cells */
+
+
+/* document-specific styles come next */
+/*** Basic style elements ***/
+h1 { margin: 0em;
+ border: none;
+ background: #333366;
+ color: white;
+ font: bold 26px Helvetica, Arial, sans-serif;
+ padding: 0.25em;
+ }
+h2 { font: bold 24px Helvetica, Arial, sans-serif;
+ padding: 4px 4px 4px 4px;
+ background: #666699;
+ }
+h3 { font: bold italic 20px Helvetica, Arial, sans-serif;
+ color: #666699;
+ }
+h4 { font: bold italic 14px Helvetica, Arial, sans-serif;
+ color: #666699;
+ }
+p, td, th, li { font: 14px Helvetica, Arial, sans-serif;}
+html { background: #ccccff;
+ margin: 0px 0px 0px 0px;
+ padding: 0px 10px 10px 10px;
+ }
+body { background: #c8c8c8;
+ margin: 12px 0px 0px 0px;
+ padding: 20px 20px 10px 20px;
+ border: 2px solid #333366;
+ }
diff --git a/doc/ProFit.pdf b/doc/ProFit.pdf
new file mode 100644
index 0000000..ad37244
Binary files /dev/null and b/doc/ProFit.pdf differ
diff --git a/doc/ProFit.tex b/doc/ProFit.tex
new file mode 100644
index 0000000..f8fd1b3
--- /dev/null
+++ b/doc/ProFit.tex
@@ -0,0 +1,1626 @@
+\documentclass{article}
+\usepackage{a4}
+
+\newcommand{\pf}{\mbox{\bfseries ProFit}}
+\title{ProFit Version 3.1}
+\author{Dr.\ Andrew C.R.\ Martin, Dr.\ Craig T.\ Porter\\ University College London}
+\date{Document First Written: 25th July, 1996 (University College %
+London)\\%
+Updated while at University of Reading\\%
+Last updated: 17th February, 2009}
+
+\begin{document}
+\maketitle
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Introduction and Methodology}
+
+\pf\ (pronounced Pro-Fit, not profit!) is designed to be the ultimate
+program for performing least squares fits of two or more protein
+structures. It performs a very simple and basic function, but allows
+as much flexibility as possible in performing this procedure. Thus one
+can specify subsets of atoms to be considered, specify zones to be
+fitted by number, sequence, or by sequence alignment.
+
+Early versions of \pf\ did not try to address the question of sorting out
+equivalent
+atoms for you beyond doing a sequence alignment. There are other
+programs such as SSAP and GAFIT which address that problem. You must
+specify which residues and atoms you consider to be equivalent
+although the program supports internal sequence alignment to set the
+zones automatically.
+
+As of \pf\ V2.0, iterative updating of fitting zones is now
+supported. Thus you may give a sequence alignment or just a small fragment
+to initiate the
+fitting process (a minimum of 3 amino acids). Fitting is performed on
+this region and then all residue pairs within 3\AA\ are included in
+the fitting zones and the fitting is repeated. This iterates until the
+C$\alpha$ RMSd converges to within 0.01\AA. This is particularly
+useful in conjunction with the initial zone specification based on
+sequence alignment. Convergence typically takes 3--4 cycles.
+
+\pf\ V2.0 also introduced multiple structure fitting. The first
+structure file is used as a reference set for the first fitting stage
+but the coordinates are averaged after each stage to derive a template
+used for subsequent fitting. i.e.\ Given $N$ files to fit, file 2 is
+fitted to file 1 and an averaged structure, $A$, is calculated, file 3
+is then fitted to $A$ and a new average, $A'$ is calculated. This
+continues until all $N$ structures have been fitted. The whole
+procedure iterates until convergence (typically 3 or 4 cycles).
+
+The program will output an RMS deviation and optionally the fitted
+coordinates. RMS deviations over alternate zones and atoms may also
+be calculated without performing a new fit. Thus the zones for
+calculating the RMS deviation can be different from those used for
+fitting.
+
+While optimised for proteins, non-protein structures may also be
+fitted if they are stored in the standard Protein Databank (PDB) format.
+
+\pf\ is written to be as easily portable between systems as possible
+and uses a command-driven interface.
+
+\pf\ uses the McLachlan fitting algorithm, essentially a steepest
+descents minimisation, as described in McLachlan, A.D. (1982)
+\emph{Rapid Comparison of Protein Structures}, \emph{Acta Cryst.}\
+{\bfseries A38}, 871--873. This part of the code is based on an
+implementation by Dr.\ Mike Sutcliffe.
+
+
+In summary, \pf\ has the following features:
+
+\begin{enumerate}
+\setlength{\parsep}{0pt}
+\setlength{\parskip}{0pt}
+\setlength{\itemsep}{0pt}
+\item Portability between different operating systems
+\item Ability to specify atom subsets
+\item Ability to specify zones:
+\begin{itemize}
+ \setlength{\parsep}{0pt}
+ \setlength{\parskip}{0pt}
+ \setlength{\itemsep}{0pt}
+ \item Numerically
+ \item By sequence
+ \item By auto sequence alignment
+ \item By iterative updating and optimization
+\end{itemize}
+\item Output RMS deviation over:
+\begin{itemize}
+ \setlength{\parsep}{0pt}
+ \setlength{\parskip}{0pt}
+ \setlength{\itemsep}{0pt}
+ \item Fitted region
+ \item Any other region
+ \item Any other atom set
+\end{itemize}
+\item Optionally output fitted coordinates in PDB format
+\item Integrated help facility
+\item Fitting zones derived from sequence alignment
+\item Iterative updating of fitting zones
+\item Multiple structure fitting
+\end{enumerate}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Version numbering}
+
+From V2.6, version numbering in \pf\ adopted the following scheme. A
+version of \pf\ has a version number of the form V$a.b.c.d$ where $c$
+and $d$ are optionally present.
+
+$a$ is the major version number. A change in this
+represents a significant rewrite of \pf\ and/or the addition of a
+major new feature set.
+
+$b$ is the minor version number. A change in $b$ represents a new
+feature added to \pf . It may also indicate a fix to a major
+show-stopper bug.
+
+$c$ is a bug-fix number. It indicates that a bug in the \pf\ code has
+been fixed compared with the previous release.
+
+$d$ is also a bug-fix number, but indicates that bugs have been fixed
+in the Bioplib libraries used by \pf\ and not in \pf\
+itself. Alternatively, it may indicate a distribution bug (i.e.\
+missing files for a distribution or a simple documentation change).
+
+In V2.5.$x$, $b$ was used to represent a new feature \emph{or} a bug
+fix while $c$ was used to indicate a bug-fix in Bioplib ($d$ is now
+used for that purpose).
+
+In earlier releases, the scheme was used much more loosely with $b$
+being used both for new features and bug fixes in \pf\ or in Bioplib.
+Lettered versions (e.g.\ V1.7g) were sometimes used for bug-fixes and
+sometimes for internal non-released versions. Lettered versions are
+now used exclusively for internal non-released versions.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Starting the program}
+
+The program is started from the command line by typing the command:
+\begin{verbatim}
+ profit
+\end{verbatim}
+Once the program is started, you may read in structures to be
+fitted. Alternatively, the PDB files may be specified on the command
+line:
+\begin{verbatim}
+ profit reference.pdb mobile.pdb
+\end{verbatim}
+By default, \pf\ does not read HETATM records from the PDB file. This
+may be changed from the command line by using the -h flag:
+\begin{verbatim}
+ profit -h
+ profit -h reference.pdb mobile.pdb
+\end{verbatim}
+Alternatively, once in the program you may give the \verb1HETATOMS1
+command before reading in the structures (see Section~\ref{sec:read}).
+
+\pf\ can run a script file, a text file of \pf\ commands, from the command
+line using the -f flag:
+\begin{verbatim}
+ profit -f myscriptfile.txt
+ profit -f myscriptfile.txt -h reference.pdb mobile.pdb
+\end{verbatim}
+Once in the program you may use the command \verb|SCRIPT myscriptfile.txt|
+to run a script file (see Section~\ref{sec:script}).
+
+If compiled with XMAS\footnote{XMAS is an XML-like file format
+developed at Inpharmatica, Ltd.\ which is designed for leaf-heavy data
+such as protein structure data} file support, the \verb1-x1 flag may
+be used to specify that the files named on the command line are XMAS
+files instead of PDB files. Note that the program currently will only
+write PDB format files.
+\begin{verbatim}
+ profit -x reference.xmas mobile.xmas
+\end{verbatim}
+
+If compiled with GUNZIP support
+then the program can read gzipped PDB files. This will only work on
+unix-like platforms and assumes that the \verb|gunzip| program is in
+your path. Note that the uncompressed files will remain in \verb|/tmp|
+with a name like \verb|readpdb_12345| where 12345 is a process
+number. You will need to delete these regularly!
+
+
+Once in the program, you issue commands by typing at the
+keyboard. These commands may always be abbreviated to the minimum
+non-ambiguous string. The program is mostly case insensitive; you may
+mix upper and lower case at will, though uppercase will be used
+throughout this documentation. The only times that \pf\ will be case sensitive are when dealing with file names or lowercase chain identifiers (see Section~\ref{sec:zone}).
+
+Support for the GNU Readline library\footnote{The GNU Readline library is
+available from http://directory.fsf.org/project/readline/}
+was introduced in \pf\ V3.0 allowing you to edit the command line and to
+recall previous commands. Again, this is a compile time option.
+
+You exit from the program by typing \verb1QUIT1.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Reading Structures}
+
+\label{sec:read}
+\pf\ reads files in PDB format. If compiled with XMAS support, then
+XMAS files may also be read, but only PDB files may be written. It
+uses the concept of a \emph{reference} structure and a \emph{mobile}
+structure. The reference structure remains static in space and the
+mobile structure is fitted onto it. When the files are specified on
+the command line, the reference structure is specified first, the
+mobile structure second.
+
+Once in the program, you may read the reference structure using the
+\verb1REFERENCE1 command and the mobile structure using the
+\verb1MOBILE1 command. Using these commands causes the equivalent
+current structure to be deleted from the program's memory
+first. However, any zone and atom specifications (see
+Sections~\ref{sec:atoms} and~\ref{sec:zone}) are not deleted. For
+example, you can read p3hfl.pdb as a new reference structure using the
+command:
+\begin{verbatim}
+ REFERENCE p3hfl.pdb
+\end{verbatim}
+and read p3hfm.pdb as a new mobile structure with:
+\begin{verbatim}
+ MOBILE p3hfm.pdb
+\end{verbatim}
+
+If compiled with XMAS support, then the XMAS format is specified by
+placing the keyword \verb1XMAS1 after the commands \verb1REFERENCE1 or
+\verb1MOBILE1:
+\begin{verbatim}
+ REFERENCE XMAS p3hfl.pdb
+ MOBILE XMAS p3hfm.pdb
+\end{verbatim}
+
+When you read a structure containing insertions, you will receive a
+warning message to this effect. This dates from when the program was
+unable to handle residue specifications containing insertion codes,
+but is still useful to draw your attention to the fact that they are
+present.
+
+Note that atoms with coordinates of 9999.00, 9999.00, 9999.00 will
+be ignored during all calculations allowing atoms with undefined
+coordinates to be handled.
+
+When fitting multiple structures (new in \pf\ V2.0), you use the
+\verb1MULTI1 command to read in the structures. See
+Section~\ref{sec:multi}.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Getting Help}
+
+To get help within the programs simply type \verb1HELP1 and you will
+be presented with a list of commands which the help facility knows
+about. The \verb1ProFit>1 prompt will also change to \verb1Help>1. You
+may then type the name of a command to get help on that
+command. Typing \verb1HELP1 once in the help facility will repeat the
+list of available help topics. Like the main command interface, the
+help facility will accept upper or lower case and you may abbreviate
+commands. If your abbreviation is ambiguous (i.e.\ more than one
+command starts with the letters you have specified), help will be
+supplied on all the commands which match\footnote{The one exception to
+this is if the letters you supply are an abbreviation of {\tt HELP},
+when the list of help topics will be shown again.}.
+
+If the help text is longer than 21 lines, you will see a prompt saying
+\begin{verbatim}
+ More...
+\end{verbatim}
+in which case you should hit the Return (or `Enter') key to get the
+next page of help.
+
+Once at the \verb1Help>1 prompt, you should simply hit the Return (or
+`Enter') key to get back to the main \verb1ProFit>1 prompt.
+
+If you know the topic on which you need help, you may type the name of
+the command after the \verb1HELP1 keyword at the main \verb1ProFit>1
+prompt. After the help message is printed, you will be returned
+directly to the \verb1ProFit>1 prompt. For example, if you want help
+on the \verb1ZONE1 command, you may type:
+\begin{verbatim}
+ HELP ZONE
+\end{verbatim}
+
+Allied to help, is the \verb1STATUS1 command. This tells you the
+current status of the program: what structures are loaded, fitting
+zones, atoms and the like.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Fitting Structures}
+
+Having read in a reference and a mobile structure, you actually fit
+them by giving the \verb1FIT1 command. When you do this, you will get
+a message like:
+\begin{verbatim}
+ Fitting structures...
+ RMS: 0.366
+\end{verbatim}
+However, this will only work if the two structures are of identical
+composition i.e.\ if the sequences are the same and the same atoms are
+present in both. If there are any mismatches, the first such mismatch
+will be reported and the RMS deviation will not be calculated.
+
+Since you will frequently need to fit non-identical structures, you
+may use the \verb1ZONE1 and \verb1ATOMS1 commands to specify which
+residues should be considered equivalent and which atoms should be
+considered in the calculation.
+
+If you are using zone or atom specifications, the RMS deviations will
+be displayed over the atoms and zones specified in those commands.
+
+Normally the fitting procedure will not be completed if there are
+any mismatched atoms or atoms missing from one of the two structures.
+The program issues an error message about atoms missing in the mobile
+structure which are found in the reference structure. The
+\verb1IGNOREMISSING1 command causes the program to issue a warning
+instead of an error and the fitting proceeds ignoring the mismatched
+atoms. The default behaviour is restored by using the
+\verb1NOIGNOREMISSING1 command.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Specifying Atom Subsets}
+
+\label{sec:atoms}
+The \verb1ATOMS1 command is used to specify a subset of atoms to be
+used in the calculations. It has the syntax:
+\begin{verbatim}
+ ATOMS atm[,atm]...
+\end{verbatim}
+i.e.\ you specify the \verb1ATOMS1 keyword followed by one or more
+atom names separated by commas. A \verb1*1 may be used to specify all
+atoms and a \verb1~1 or \verb1^1 may be placed at the beginning of the
+specification to inverse the selection. For example, to fit only
+C$\alpha$ atoms:
+\begin{verbatim}
+ ATOMS CA
+\end{verbatim}
+to fit N, C$\alpha$, C and O atoms:
+\begin{verbatim}
+ ATOMS N,CA,C,O
+\end{verbatim}
+to fit sidechains only (i.e.\ everything except N, C$\alpha$, C and O atoms):
+\begin{verbatim}
+ ATOMS ^N,CA,C,O
+\end{verbatim}
+to return to fitting all atoms:
+\begin{verbatim}
+ ATOMS *
+\end{verbatim}
+
+The PDB atom name field is 4 characters wide followed by a space. The
+first two characters are the right-justified element type, so for
+normal protein and DNA atoms consist of a space followed by a N, C, O,
+S or P. Thus the atom name field for a C$\alpha$ contains
+`~CA~'. HETATMs such as calcium will contain the two characters CA in
+the first two fields. i.e.\ `CA~~'. When you specify an atom type it is
+matched against the atom name field from the \emph{second character onwards},
+unless you preceed it with a \verb1<1. Thus to match a C$\alpha$ you use
+\verb1CA1, but to match Calcium, you use \verb1<CA1. For example, as
+stated above, to match C$\alpha$ atoms:
+\begin{verbatim}
+ ATOMS CA
+\end{verbatim}
+while to match calcium atoms
+\begin{verbatim}
+ ATOMS <CA
+\end{verbatim}
+and to match both C$\alpha$ and calcium:
+\begin{verbatim}
+ ATOMS <CA,CA
+\end{verbatim}
+
+
+Wildcards are also allowed. A \verb1%1 or a \verb1?1 may be used to
+match a single letter at any point in the specification while a
+\verb1*1 may be used to match all remaining characters (thus \verb1C*1
+is allowed, but \verb1*G1 is not). These special characters may be
+escaped by preceding them with a \verb1\1. For example to fit all
+carbons:
+\begin{verbatim}
+ ATOMS C*
+\end{verbatim}
+or to match all atoms at the $\gamma$ position:
+\begin{verbatim}
+ ATOMS ?G*
+\end{verbatim}
+and to match the \verb1C4*1 atoms in DNA:
+\begin{verbatim}
+ ATOMS C4\*
+\end{verbatim}
+
+If atom names contain spaces (e.g.\ in heme groups) the whole atom
+specification must be enclosed in double inverted commas:
+\begin{verbatim}
+ ATOMS "N A,N B,N C"
+\end{verbatim}
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Specifying Zones}
+
+\label{sec:zone}
+The \verb1ZONE1 command is used to specify zones in the two structures
+which are considered equivalent. The complete syntax for the command
+is:
+
+\begin{verbatim}
+ ZONE CLEAR|((*|(X...[,n][/m])|(j-k))[:(*|(X...[,n][/m])|(j-k))])
+\end{verbatim}
+where \verb1X...1 is an amino acid sequence, \verb1n1 is a number of
+residues, \verb1m1 is the occurrence number, \verb1j1 and \verb1k1 are
+residue specifications of the form \emph{[chain][.]resnum[insert]}. Items
+in square brackets are optional and alternatives are marked by a
+\verb1|1 and grouped in parentheses.
+
+\verb1ZONE1 commands are cumulative. Thus each zone you specify is
+added to those currently active. To clear all zones (i.e.\ fit all
+residues), the \verb1ZONE CLEAR1 or \verb1ZONE *1 command may be
+given. To clear a single zone, the DELZONE command can be used
+(see the end of this section).
+
+When a new zone is added, a warning message is displayed if the new zone
+overlaps an existing zone. Overlapping zones will be flagged with \verb1*1
+when using the \verb1STATUS1 command.
+
+Although it appears complex, the syntax is actually very simple and
+consists of two identical sections separated by a colon (:). The left
+half is applied to the reference structure and the right half to the
+mobile structure. In its simplest form, the right hand half of the
+expression is absent and the specification is applied to both
+reference and mobile structures. For example:
+\begin{verbatim}
+ ZONE 24-34
+\end{verbatim}
+will set the zone to include residues 24--34 in both structures. If you
+wanted to fit 24--34 in the reference structure with 25--35 in the
+mobile structure, this simply becomes:
+\begin{verbatim}
+ ZONE 24-34:25-35
+\end{verbatim}
+Single residues can be specified using the same syntax:
+\begin{verbatim}
+ ZONE 44-44:55-55
+\end{verbatim}
+
+You may also specify chain names and insertion codes. The chain name is placed
+before the residue number and the insertion code afterwards. For example:
+\begin{verbatim}
+ ZONE L25A-L30
+\end{verbatim}
+fits residues 25A--30 in the L chain of both structures. Optionally,
+the chain name may be separated from the residue number using a full
+stop. For example:
+\begin{verbatim}
+ ZONE L.25A-L.30
+\end{verbatim}
+Using the full stop also makes the statement case-sensitive. In practice, the
+full stop separator is used with numeric chain names to separate the
+chain name from the residue number and with lowercase chain names.
+\begin{verbatim}
+ ZONE 1.25-1.30
+ ZONE b.1-b.60:A.1-A.60
+\end{verbatim}
+
+Simple wildcards may also be used. For example
+\begin{verbatim}
+ ZONE H*:B*
+\end{verbatim}
+fits the reference H chain with the mobile B chain,
+\begin{verbatim}
+ ZONE -10:50-59
+\end{verbatim}
+fits from the first residue to residue 10 in the reference structure
+with 50--59 in the mobile structure.
+\begin{verbatim}
+ ZONE *:1-100
+\end{verbatim}
+fits all residues in the reference structure with 1-100 in the mobile
+structure.
+
+If the structure file contains negatively numbered residues and you
+are using residue numbering, you can escape the minus sign in the
+residue number using a backslash:
+\begin{verbatim}
+ ZONE \-4-10:\-1-13
+\end{verbatim}
+will fit residues $-4$ to 10 in one structure with $-1$ to 13 in the
+other.
+
+Alternatively, you may specify the zones to be fitted by giving a
+sequence fragment. Together with that fragment, you may specify the
+number of residues to consider starting at that point. If the fragment
+occurs more than once in the sequence you may specify which occurrence
+you wish to consider. For example:
+\begin{verbatim}
+ ZONE CAR:VNS
+\end{verbatim}
+fits the first occurrence of CAR in the reference set with first
+occurrence of VNS in the mobile set;
+\begin{verbatim}
+ ZONE CAR,10:VNS,10
+\end{verbatim}
+fits 10 residues starting at the first occurrence of CAR in the
+reference set with 10 residues from the first occurrence of VNS in
+the mobile set;
+\begin{verbatim}
+ ZONE CAR,5/2
+\end{verbatim}
+fits 5 residues from second occurrence of CAR in both structures;
+\begin{verbatim}
+ ZONE 24-34:EIR,ll
+\end{verbatim}
+fits 24-34 in the reference set with 11 residues starting at the
+first occurrence of EIR in the mobile set.
+
+By default, \pf\ works in `Residue Number' mode, i.e.\ the numbers
+used in zone commands are the numbers seen in the PDB file. The
+alternative mode is `Sequential' mode where residues are numbered
+sequentially throughout the structure (including throughout multiple
+chains). Any chain names appearing in zone specifications will be
+ignored in Sequential mode. To switch mode, you use the {\tt NUMBER
+SEQUENTIAL} or {\tt NUMBER RESIDUE} commands.
+
+The \verb1DELZONE1 command specifies zones to be deleted from the user-defined
+list of fit zones. \verb1DELZONE1 uses the same syntax as the \verb1ZONE1
+command. The command matches the specified zone with a zone in the user-defined
+list of fitting zones and deletes the matching zone from the list. Entering
+either \verb1DELZONE ALL1 or \verb1DELZONE *1 will delete all user-defined
+zones.
+
+\subsection{Sequence Alignment}
+%------------------------------
+Another way of specifying zones is to let the program do it. \pf\
+allows you to perform a simple Needleman and
+Wunsch sequence alignment and to apply zones automatically derived
+from that sequence alignment. This is done by issuing the \verb1ALIGN1
+command. The sequence alignment is displayed, any currently active
+fitting zones are cleared and replaced by zones derived from the
+alignment. Additional zones may also be specified in the usual way.
+
+As of Version 3.0, \pf\ offers a choice of three alignment options:
+
+\begin{enumerate}
+\setlength{\parsep}{0pt}
+\setlength{\parskip}{0pt}
+\setlength{\itemsep}{0pt}
+
+\item The default alignment option is a chain-by-chain alignment where the
+first chain in the mobile is aligned with first chain in the reference, the
+second chain in the mobile is aligned with the second chain in the reference,
+and so on. If the number of chains does not match then a warning is issued.
+
+\item The \verb1ALIGN1 \verb1WHOLE1 command gives a whole sequence alignment.
+The whole sequence (regardless of chain ID) is aligned. If the fitting zones
+assigned in this manner extend over more than one chain the zones are split
+into smaller zones at the breaks between chains.
+This may be useful if a sequence has been split into fragments.
+
+\item If a zone definition is supplied to the \verb1ALIGN1 command then \pf\
+will perform an alignment over the defined region to assign fitting zones.
+(See Section~\ref{sec:zone} for the syntax for defining zones.)
+
+It is also possible to append new zones onto the end of the zone list (rather
+than overwriting the current zone list) by adding \verb1APPEND1 after the zone
+definition.
+
+For example one could use following commands:
+\begin{verbatim}
+ ALIGN A*:B*
+ ALIGN B*:A* APPEND
+\end{verbatim}
+to align chain A with chain B and then B with A.
+This is useful when chains appear in different orders in the PDB files.
+
+When doing multiple fitting, it is not possible use the colon notation
+to define regions on both the reference and mobile structures. This is
+the same restriction as the \verb1ZONE1 command
+(see Section~\ref{subsec:multi_zone}).
+\end{enumerate}
+
+Clearly, it will normally be necessary to use the \verb1ATOMS1 command
+to specify that only backbone or C$\alpha$ atoms are included in the
+fitting. The \verb1TRIMZONES1 command can also be used when doing multiple
+structure fitting to ensure that the fitting zones are identical for all
+mobile structures. (See Section~\ref{subsec:multi_zone})
+
+The \verb1GAPPEN1 command allows you to specify an integer gap penalty
+and gap extension penalty for the sequence alignment performed by the
+\verb1ALIGN1 command. The default values for the gap penalty and gap
+extension penalty are 10 and 2 respectively.
+
+\subsection{Reading an Alignment}
+%--------------------------------
+
+If you have an alignment performed outside \pf\ you may use this to
+specify the equivalent zones. Any previously defined fitting zones are
+automatically cleared first. As of \pf\ V3.0, the \verb1READALIGN1 command
+can be used with structures having more than one chain.
+
+The alignment should be a file in PIR format using -- characters to
+align the sequences. The two sequences are represented by separate
+entries, i.e.\ each must have a header of the form:
+\begin{verbatim}
+ >P1;xxxxxx
+ title text .......
+\end{verbatim}
+
+When reading an alignment file for aligning a reference structure with a
+single mobile structure, the first sequence will be assumed to be that of
+the reference structure and the second is that of the mobile structure. Any
+other sequences in the file are ignored. Chain Breaks in a sequence are
+indicated with a \verb1*1.
+\begin{verbatim}
+ >P1;REFSEQ
+ Reference Sequence - first.pdb
+ WILLIAM*H-ARTNELL-*
+
+ >P1;M_0001
+ Mobile Sequence - second.pdb
+ --PATRI-K*TR--GHTN*
+\end{verbatim}
+
+The \verb1READALIGNMENT1 command is also used to read in the PIR files
+containing a multiple sequence alignment.
+When performing a multiple structure fit, the first sequence
+\emph{must appear twice} in the sequence alignment file. This is
+because it is used as both the initial reference and first mobile set:
+\begin{verbatim}
+ >P1;REFSEQ
+ Reference Sequence - first.pdb
+ ----WILLIAM*H-ARTNELL-*
+
+ >P1;M_0001
+ Mobile Sequence - first.pdb
+ ----WILLIAM*H-ARTNELL-*
+
+ >P1;M_0002
+ Mobile Sequence - second.pdb
+ ------PATRI-K*TR--GHTN*
+
+ >P1;M_0003
+ Mobile Sequence - third.pdb
+ PERTWEE---------------*
+\end{verbatim}
+
+Note that a bug in using the \verb1READALIGNMENT1 with multiple
+structure fitting was fixed in V2.3. (The bug caused the program
+to crash if a deletion appeared in the same place in two or
+more of the sequences.)
+
+\subsection{Limiting Zones Read From an Alignment}
+%-------------------------------------------------
+
+When obtaining fit zones from a sequence alignment, either from
+\verb1ALIGN1 or from \verb1READALIGNMENT1, it can be useful to limit
+the zones of residues used. Normally all aligned residue pairs will be
+used.
+
+For example, if the alignment were:
+\begin{verbatim}
+ 1 2 3
+ 123456789012345678901234567890123
+ ASAHSTGEHNM--PLELLGHISLAM---NPRTY
+ ---HSTADHNLRTPLEVLG--SLAMEDRQPRTY
+\end{verbatim}
+the zones would normally be taken from the following positions
+in the alignment: 4-11, 14-19, 22-25, 29-33
+
+By using the command:
+\begin{verbatim}
+ LIMIT 20 28
+\end{verbatim}
+only the zone from 22-25 would be included.
+
+This is particularly useful in conjunction with the ITERATE command
+(Section~\ref{subsec:iterate}) and when fitting multiple structures
+(Section~\ref{sec:multi}).
+
+The \verb|LIMIT OFF| command restores the default behaviour of
+deriving the zones from the whole alignment.
+
+
+
+
+\subsection{Iterative Updating of the Fitting Zones}
+%---------------------------------------------------
+
+\label{subsec:iterate}
+
+The \verb1ITERATE1 command switches on the iterative updating of
+fitted zones during subsequent \verb1FIT1 commands. The \verb1ITERATE1
+command may be followed by an optional parameter to specify the cutoff
+used to include or exclude pairs from the zones. (\verb1ITERATE OFF1
+is used to switch it off again.)
+
+Note that this immediately does an \verb1ATOMS CA1 since iteration of
+zones is only performed on C$\alpha$ atoms. The program gives an
+informational message to this effect. See notes below if you want to
+calculate an RMSd over other atoms.
+
+After the initial fit on the specified zones, the zones are updated
+such that residue pairs with C$\alpha$ atoms within a specified cutoff
+(default 3.0\AA) are included and those more distant are excluded. The
+optimum set of equivalences is obtained using a dynamic programming
+method.
+
+After updating the zones, the structures are refitted and the
+procedure iterates to convergence of $<0.01$\AA, (typically 3 or 4
+cycles). The RMSd on C$\alpha$ atoms is shown after each cycle unless
+the \verb1QUIET1 command is given before running \verb1ITERATE1.
+
+You may specify a minimal initial zone of say 3 amino acids on which
+to fit first. The zone iteration will expand the zones until as many
+residues as possible can be equivalenced. Alternatively, this option
+is particularly useful in conjunction with the \verb1ALIGN1
+command. Using \verb1ALIGN1 followed by \verb1ITERATE1 gives a
+particularly convenient method of fitting two arbitrary structures.
+
+As stated above, the \verb1ITERATE1 command implies
+\verb1ATOMS CA1. Having fitted on C$\alpha$ atoms, you can of course
+display the RMSd over other atom sets in the usual way using the
+\verb1RATOMS1 command (e.g.\ \verb1RATOMS N,CA,C,O1 will display the
+backbone RMSd).
+
+Should you wish to refit on another atom set using the iterated zones,
+simply use \verb1ITERATE OFF1 to switch off iteration, select the atom
+set required using the \verb1ATOMS1 command and use \verb1FIT1 to
+refit the structures in the usual way. For example, to fit on backbone
+atoms:
+
+\begin{verbatim}
+ ITERATE OFF
+ ATOMS N,CA,C,O
+ FIT
+\end{verbatim}
+
+
+\subsection{Fitting Zones based on the Temperature Factor Column.}
+%-----------------------------------------------------------------
+
+\label{sec:bzone}
+Note that this use of the B-value column is not compatible with the commands described in Section~\ref{sec:modfit}.
+
+It is possible to define zones by flagging residues in the temperature
+factor column of the PDB file using the \verb1BZONE1 command. Zones are marked
+using a positive whole numbers while zeros are ignored. Multiple zones can be
+marked using additional numbers.
+So, residues with the B-factor set to 1 will be fitted with one another,
+residues with the B-factor set to 2 will be fitted with one another, etc.
+
+Assignment of zones is carried out in two ways:
+
+If only the reference
+structure is marked then the same set of residue numbers will be added as a
+fitting zone in both the reference and mobile structure.
+
+If both the reference and the mobile
+structure are marked then fitting zones are assigned by scanning through and
+setting zones for corresponding continuous stretches of flagged residues in
+either the reference or mobile structures.
+
+
+\subsection{Centre of Fitting}
+%-----------------------------
+
+\label{sec:setcentre}
+The default method for fitting is to centre the fit around the centre of
+geometry of the fit atoms. Alternatively, fitting can be centred around
+the centre of geometry of a residue
+specified by the \verb1SETCENTRE1 (or \verb1SETCENTER1) command.
+
+
+\begin{verbatim}
+ SETCENTRE CLEAR|(*|i[:j])
+\end{verbatim}
+where i and j are residue specifications of the form \emph{[chain][.]resnum[insert]}.
+Items in square brackets are optional and alternatives are marked by a
+\verb1|1 and grouped in parentheses.
+
+The command:
+\begin{verbatim}
+ SETCENTRE 24:35
+\end{verbatim}
+will centre the fit around residue 24 of the reference structure and residue 25 of the mobile structure. The mobile residue number can be omitted. For example:
+\begin{verbatim}
+ SETCENTRE 33
+\end{verbatim}
+will centre the fit around residue 33 of the reference structure and residue 33 of the mobile structure.
+
+Entering \verb1SETCENTRE CLEAR1 or\verb1 SETCENTRE *1 will clear the centre
+residue.
+
+\subsection{Distance Cutoff for RMSd Calculations}
+%-------------------------------------------------
+
+\label{sec:distcutoff}
+The \verb1DISTCUTOFF1 command specifies a distance cutoff for ignoring atom
+pairs outside a specified distance when calculating RMSd.
+
+\begin{verbatim}
+ DISTCUTOFF [cutoff|ON|OFF]
+\end{verbatim}
+
+The \verb1DISTCUTOFF1 command specifies a distance cutoff for ignoring atom
+pairs outside a specified distance when calculating RMSd. Entering
+\verb1DISTCUTOFF ON1 or \verb1DISTCUTOFF OFF1 will turn the distance cutoff on
+or off. Entering \verb1DISTCUTOFF 2.51 will set the value of the distance
+cutoff to 2.5 Angstroms and turn the distance cutoff on. A warning is displayed
+if the distance cutoff is set to zero and turned on. Note that the cutoff is
+only applied to the final calculation of RMSD and not to the fitting.
+
+
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Multiple Structure Fitting}
+
+\label{sec:multi}
+
+The \verb1MULTI1 command allows a multiple set of structures to be
+read in for fitting. The filename specified for \verb1MULTI1 is a
+`file of files' i.e.\ it contains a list of filenames which will be
+read.
+
+\verb1MULTI1 is used in place of \verb1REFERENCE1 and \verb1MOBILE1 to
+read in a set of structure files. The first structure file is used as
+a reference set for the first fitting stage, but the coordinates are
+averaged after each fitting stage to derive an averaged template used
+for subsequent fitting.
+
+i.e.\ Given $N$ files to fit, file 2 is fitted to file 1 and an
+averaged structure, $A$, is calculated, file 3 is then fitted to $A$
+and a new average, $A'$ is calculated. This continues until all $N$
+structures have been fitted. The whole procedure iterates until
+convergence (typically 3 or 4 cycles).
+
+\pf\ V3.0 changes the default method of calculating the average template.
+As each new mobile structure is added, the degree of change in the averaged
+structure is inversely proportional to the total number of mobile structures.
+Consequently, outlying structures should have less effect on the averaged
+reference structure.
+
+Normally, the coordinates of the first structure in the \verb1MULTI1 list
+are taken as the starting point for the averaged reference structure. It is
+possible however, to select another mobile structure as the initial reference
+structure using the \verb1SETREF1 command.
+For example, \verb1SETREF 31 will use the third mobile structure as the
+reference strcture.
+If no structure number is
+specified, then the \verb1SETREF1 command carries out an all vs. all
+comparison and the coordinates
+of the mobile structure with the least overall RMSD to all the other
+mobile structures are selected as the initial reference structure.
+
+Multiple structures can be fitted with either the \verb1FIT1 or
+\verb1ORDERFIT1 command. The \verb1ORDERFIT1 command (new in V3.0) will
+perform the multiple structure fit in a similar manner to the \verb1FIT1
+command but fitting the most similar structures first. As the averaged
+template is updated with each new structure fitted, the order of fitting
+has a (small) influence on the template. The ORDERFIT command (possibly
+along with the SETREF command) can provide a standardized fitting scheme.
+
+
+Progress and RMSds are reported at each iteration unless the \verb1QUIET1
+command is used.
+
+By default, RMSDs, pairwise distances and transformation matrices are given in
+relation to the first mobile structure. The \verb1MULTREF1 command will set
+\pf\ to give results in relation to the averaged reference structure rather
+than the first mobile structure (\verb1MULTREF1 \verb1OFF1 restores the
+default behaviour).
+
+The resulting fitted files are written with the \verb1MWRITE1 command.
+Note that there is no ``reference'' set in the sense used for normal
+2-structure fitting; fitted versions of all $N$ files will be written
+since the reference set is actually an averaged template used purely as a
+guide for fitting.
+
+The averaged template can be written to a file using the \verb1WRITE1
+\verb1REF1 command. As it is a simple numerical average of the cartesian
+coordinates however, taking the reference structure generated by ProFit as a
+representation of an actual geometry/conformation accessible by the structure
+should be done with caution.
+
+When the \verb1MWRITE1 command is used, the output filenames are the
+same as the input files, but with the extension replaced by that
+specified in the \verb1MWRITE1 command. If no extension is specified,
+then `.fit' will be used. If the input structure files contained no
+extension, then the extension specified will be appended to the
+filenames.
+
+Note that since only the extension is changed when writing back the
+fitted files, you must have permission to write to the directory from
+which the original files were read.
+
+Multiple-structure fitting is particularly effective in combination
+with the \verb1ITERATE1 command (see Section~\ref{subsec:iterate})
+which refines the fitting zones iteratively. This can lead to
+extremely good multiple structures fits.
+
+Note that multiple structure fitting and zone iteration can be very
+slow as these have been added to the earlier pair-wise fitting
+engine. An increase in speed needs a complete re-design of the code.
+
+
+\subsection{Specifying Zones With Multiple Structure Fitting}
+%------------------------------------------------------------
+
+\label{subsec:multi_zone}
+
+Currently, the ZONE command may only be used with multiple structure
+fitting when the same zone specification may be applied to every
+structure. i.e.\ You cannot specify a zone for each structure
+separating the zones with a colon (:)
+
+Thus, the following are legal zones:
+\begin{verbatim}
+ ZONE 20-30
+ ZONE C,3
+\end{verbatim}
+while the following are not:
+\begin{verbatim}
+ ZONE 24-34:25-35
+ ZONE CAR:VNS
+ ZONE 24-34:EIR,11
+\end{verbatim}
+
+For normal use, it is recommended that the \verb1ALIGN1, \verb1TRIMZONES1 and
+\verb1READALIGNMENT1 commands (possibly in conjunction with the
+\verb1LIMIT1 command) are used for specifying zones when fitting
+multiple structures.
+
+
+As of \pf\ V3.0, the \verb1TRIMZONES1 command can be used in conjuction with
+the \verb1ALIGN1 command. The \verb1ALIGN1 command performs a pairwise
+alignment for each of the mobile structures with the reference.
+Although fitting each mobile
+using an individualized set of zones offers the best fitting for each mobile
+to the reference, there may be times when a like vs. like comparison is
+required. If the number of residues used for fitting varies, RMS
+deviation cannot be directly compared between structures.
+
+To allow for a like vs. like comparison, the \verb1TRIMZONES1 command resets
+the fitting zones for each mobile structure to include only fitting residues
+that are common to all the mobile structures. Thus, by ensuring that the
+fitting zones are the same for each mobile, the \verb1TRIMZONES1 command
+allows for a like vs. like comparison.
+
+When using \verb1READALIGNMENT1 with
+multiple structures, the first sequence \emph{must appear twice} in
+the alignment file. This is because it is used as both the first
+reference and mobile set.
+
+Note that a bug in using the \verb1READALIGNMENT1 with multiple
+structure fitting was fixed in V2.3. (The bug caused the program
+to crash if a deletion appeared in the same place in two or
+more of the sequences.)
+
+\subsection{All Versus All Comparisons}
+%--------------------------------------
+
+\label{subsec:multi_allvall}
+
+As of \pf\ V3.0, it is possible to perform an all versus all comparison of
+the mobile structures when fitting multiple structures. The \verb1ALLVSALL1
+command requires that the fitting zones set are identical for all mobile
+stuctures and automatically resets the fitting zones using the
+\verb1TRIMZONES1 command.
+
+Results are presented as tab-delimited text suitable for loading into a
+spreadsheet. If the optional filename parameter is given, output is
+directed to the specified file. If a filename is not specified, or the file
+cannot be opened, output appears on the screen. If the filename begins with
+a pipe character ($\mid$), the results are piped into the specified program.
+This is particularly useful with the \verb1more1 (or \verb1less1) Unix
+command.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Calculating the RMSd Over Other Zones and Atoms}
+Having fitted the structures using the \verb1ZONE1 and \verb1ATOMS1
+commands to specify which residues and atoms should be included in the
+fitting, the RMS deviation may then be calculated over a different
+region of the structure and/or a different atom set.
+
+This is achieved using the \verb1RZONE1 and \verb1RATOMS1
+commands. The syntax of these commands is identical to that of the
+\verb1ZONE1 and \verb1ATOMS1 commands described in
+Sections~\ref{sec:atoms} and~\ref{sec:zone}.
+
+As each \verb1RZONE1 or \verb1RATOMS1 command is given, the RMS
+deviation is reported over the new set of zones or over the new atom
+set. Don't forget the \verb1RZONE1 commands are cummulative, like the
+\verb1ZONE1 commands. Note that the \verb1RZONE *1 or \verb1RZONE1
+\verb1CLEAR1 behaves slightly differently from \verb1ZONE *1 or
+\verb1ZONE1 \verb1CLEAR1 since it resets the zones to be the same as
+those specified for the fitting using \verb1ZONE1, \verb1ALIGN1 or
+\verb1READALIGNMENT1 commands.
+
+The \verb1DELRZONE1 command specifies zones to be deleted from the list of
+user-defined zones for calculating the RMSd. The \verb1DELRZONE1 command uses
+the same syntax as the \verb1DELZONE1 command. The command matches the
+specified zone with a zone in the user-defined list of RMSd calculation zones
+and deletes the matching zone from the list. Unlike the \verb1RZONE1 command,
+entering either \verb1DELRZONE ALL1 or \verb1DELRZONE *1 will delete all
+user-defined RMSd calculation zones rather than returning to the default
+condition where the RMSd calculation zones are set to the user-defined fitting
+zones. Thus avoiding the somewhat counterintuitive situation where deleting
+the last RMSd zone restores all RMSd zones. If no RMSd calculation zones are
+defined then \pf\ will calculate the RMSd over all residues.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Obtaining Output}
+
+\subsection{The Fitted Structure}
+%--------------------------------
+The fitted mobile structure may be written to a file in PDB format
+using the \verb1WRITE1 command:
+\begin{verbatim}
+ WRITE fitted.pdb
+\end{verbatim}
+If the first character of the filename is a pipe character ($\mid$),
+then the results will be piped into the specified program. For
+example:
+\begin{verbatim}
+ WRITE |less
+\end{verbatim}
+will cause the coordinates to be displayed on the screen using the
+\verb1less1 pager program.
+
+The reference set may also be written:
+\begin{verbatim}
+ WRITE REFERENCE ref_fitted.pdb
+\end{verbatim}
+(only the three letters `REF' of the REFERENCE parameter are
+required). This is only useful if the CENTRE command has been used
+(see below).
+
+\subsection{Centering the Coordinates}
+%-------------------------------------
+By default, the mobile structure is moved to the coordinate frame of
+the reference set. If the \verb1CENTRE1 (or \verb1CENTER1) command is
+given then the centre of geometry of the fitted coordinates will be
+located at the origin.
+
+If a residue has been set as the centre of fitting using \verb1SETCENTRE1
+(see Section~\ref{sec:setcentre}) then that residue will be moved to the
+origin when the \verb1CENTRE1 command is used.
+
+If only two structures are fitted then the \verb1WRITE REFERENCE1
+command must be used to write the reference set in the origin-centred
+coordinate frame. If multiple structures are fitted and written using
+\verb1MWRITE1 then the reference set will be written automatically.
+
+
+
+
+
+\subsection{Details of the Fitting}
+%----------------------------------
+
+More details about the fitting may be obtained by using the
+\verb1MATRIX1 command. This displays the centres of geometry, the
+rotation matrix and the translation vector which is the vector between
+the centres of geometry. Thus to superimpose the mobile structure onto
+the reference structure using these data, you should translate the
+mobile set to the origin, apply the rotation matrix, translate back to
+the original centre of geometry and finally apply the translation
+vector.
+
+Note that the rotation matrix is not orthogonal and cannot therefore
+be used to extract Euler angles. This is a result of the fitting
+method used.
+
+The \verb1NFITTED1 command displays the number of atom pairs which
+were fitted in the last fitting operation. Note that this will not
+be the number of residues fitted unless you are only fitting one
+atom type per residue (typically C$\alpha$ atoms).
+
+
+\subsection{By-residue RMS Deviation}
+%------------------------------------
+
+The \verb1RESIDUE1 command is used to obtain a by-residue RMS
+deviation on the currently specified RMS atoms in the currently
+specified RMS zone. If no \verb1RATOMS1 and \verb1RZONE1 commands have
+been used, the atoms and zones used for the fitting will be used.
+
+The \verb1RESIDUE1 command may be followed by an optional filename
+parameter in which case output is directed to the specified file. If
+the file cannot be opened or a filename is not specified, output
+appears on the screen. If the first character of the filename is a
+pipe character ($\mid$), then the results will be piped into the
+specified program. For example:
+\begin{verbatim}
+ RESIDUE |less
+\end{verbatim}
+will cause the results to be displayed on the screen using the
+\verb1less1 pager program.
+
+If the distance cutoff is set then residues fully outside the distance cutoff
+are flagged with \verb1**1 and residues partially outside the distance cutoff
+are flagged with \verb1*1 (see Section~\ref{sec:distcutoff})
+
+
+The related command, \verb1PAIRDIST1 prints the pairwise distances between
+equivalent atom pairs in the reference and mobile structures. \verb1PAIRDIST1
+has the same syntax as \verb1RESIDUE1. If the distance cutoff is set then
+residues outside the distance cutoff are flagged with \verb1*1.
+
+\subsection{Outputting Fit Zones As An Alignment}
+%-----------------------------------------------
+
+As of \pf\ V3.0, it is possible to output the equivalenced regions found by
+iterative fitting as an alignment using the \verb1PRINTALIGN1 command. The
+default output is a (user-friendly) pairwise alignment with the reference and
+mobile sequences printed as pairs of 60-character wide lines. The optional
+\verb1FASTA1 and \verb1PIR1 parameters set the printout to (machine-friendly) FASTA or PIR formatting
+for the chain names and sequences.
+
+Alignments can be exported to a text file using the \verb1PRINTALIGN1 command. \pf\ V3.0 can read PIR formatted files for assigning zones.
+
+
+{\bfseries Note:} For a set of fit zones to be converted into an alignment,
+the fitting zones must occur sequentially along the protein chain.
+Additionally, the fit zones cannot overlap.
+In other words, to obtain a sequence alignment the fitting zones must be in
+sequence.
+
+
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Modifying the Fit}
+
+\label{sec:modfit}
+The commands described in this section make use of the temperature factor
+column as a 'flag' and are therefore not compatible with the \verb1BZONE1
+command (Section~\ref{sec:bzone}).
+
+Normally, no weighting is applied during the fitting i.e.\ all atoms
+are weighted equally. The \verb1WEIGHT1 command causes the fitting to
+be weighted by the mean of the B-values in the equivalent
+atoms. Normally, you wouldn't use this with real B-values, but with
+some other weight parameter (e.g.\ SSAP scores).
+
+The \verb1BWEIGHT1 command weights the fitting by the inverse of the
+mean of the B-values in the equivalent atoms. This is useful for
+genuine weighting by B-values (i.e.\ the mobile set atoms will be less
+heavily weighted).
+
+The \verb1NOWEIGHT1 command switches off weighting.
+
+Atoms can also be removed from consideration in the fitting and RMS
+deviation calculations using temperature factors as a cutoff. The
+\verb1BVALUE1 command allows you to specify a B-value cutoff and any
+atoms with B-values greater than this value will be \emph{ignored
+completely} in both the fitting and RMS deviation calculations. The
+B-value may not be higher than this value in either the reference set
+or the mobile set. For example, if you specify $10$, then atoms with
+B-values greater than $10$ will be ignored.
+
+By specifying a negative value for \verb1BVALUE1, you require that any
+atoms with B-values less than the absolute value you specify will be
+ignored. For example, if you specify $-10$, then atoms with B-values
+less than $10$ will be ignored.
+
+The value may be followed by an optional \verb1REF1 or \verb1MOB1
+parameter which restricts checking of B-values to the specified
+structure.
+
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Script Files}
+
+\label{sec:script}
+
+While it is possible to run a script from the unix command line using a redirection operator ($<$) or pipe ($\mid$), there are occasions when this is problematic such as when running \pf\ from within another application. It is possible to use a command line flag to run a script file.
+
+For example, a script file can be run using either a command line flag:
+\begin{verbatim}
+ profit -f myscriptfile.txt -h reference.pdb mobile.pdb
+\end{verbatim}
+By using the redirection operator:
+\begin{verbatim}
+ profit -h reference.pdb mobile.pdb < myscriptfile.txt
+\end{verbatim}
+Or by piping input from another program:
+\begin{verbatim}
+ cat myscriptfile.txt | profit -h reference.pdb mobile.pdb
+\end{verbatim}
+All three options produce identical outputs.
+
+It is also possible to run a script from within \pf\ using the SCRIPT command:
+\begin{verbatim}
+ SCRIPT myscriptfile.txt
+\end{verbatim}
+
+When a script file is run, messages indicating the start and end of the script are sent to stdout, if quiet mode is off. A comment marker (\verb1#1) at the beginning of a line will echo the line to stdout, a useful method for annotating an output file when running non-interactively.
+
+Finally, it is possible to run a script from within a script using the \verb1SCRIPT1 command. \pf\ tracks the number of open/nested scripts and will allow up to 1000 nested scripts to be open. The assumption is that if over a thousand scripts are open then \pf\ has been sent into an infinite loop (for instance by having a script call itself).
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Miscellaneous Commands}
+
+The \verb1RMS1 command may be used to reprint the RMS deviation over
+the currently defined set of RMS zones and RMS atoms.
+
+
+If you simply wish to calculate the RMSd between two or more structures
+without actually fitting them, defining fitting regions in the normal way then
+typing the \verb1NOFIT1 command (instead of the \verb1FIT1 command)
+will set up \pf\ to perform RMSd calculations but will not fit the structures.
+The \verb1RMS1 command can then be used to print the RMS deviation.
+
+
+As of \pf\ V3.0 it is possible to match symmetrical atoms automatically in
+amino acid sidechains (e.g.CD1 - CD2 and CE1 - CE2 of tyrosine) using the
+\verb1SYMMATOMS1 command. \verb1SYMMATOMS1 matches the charged oxygens and
+nitrogens on arginine, aspartate and glutamate residues and the delta and
+epsilon carbons of phenylalanine and tyrosine residues. It is also possible
+to match the nitrogen and oxygen atoms of the amide
+sidechains of asparagine and glutamine residues and the prochiral methyl
+groups of valine and leucine. Typing \verb1SYMMATOMS1 will display the pairs
+of atoms currently matched by \pf . Typing \verb1SYMMATOMS ON1 or
+\verb1SYMMATOMS OFF1 will turn symmetrical atom matching on or off.
+Individual residue types, for example ASP, can be turned-on or off by
+typing SYMMATOMS ASP ON or SYMMATOMS ASP OFF, respectively. Alternatively,
+SYMMATOMS ALL ON will turn all atom pairs on. By default, the matching of
+symmetrical atoms is turned-off.
+
+
+Any operating system command may be run from within \pf\ by
+preceding it with a \verb1$1. The string following the \verb1$1 is
+passed to the operating system exactly as given and is useful for
+obtaining directory listings, typing, editing or copying files.
+
+
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Command Summary}
+
+\begin{description}
+
+\item[\$ \emph{command}] Passes command to the operating system.
+
+\item[\# \emph{comment}] Echoes comment to stdout.
+
+\item[ALIGN [[WHOLE$\mid$*\mbox{]}$\mid$[\emph{zonespec} [APPEND\mbox{]}\mbox{]}\mbox{]}]
+Performs Needleman and Wunsch sequence alignment on the
+sequences of the two structures and derives zones from the equivalent
+regions in the alignment. For multiple structure fitting, \verb1ALIGN1
+performs a pairwise alignments for the reference sequence and each mobile
+sequence.
+
+ It will normally be necessary to use the ATOMS command to specify
+ that only backbone or C-alpha atoms are included in the fitting
+ calculations.
+
+\item[ALLVSALL [\emph{filename}\mbox{]}] Performs an all versus all
+comparison of the mobile structures when fitting multiple structures.
+Results are presented as tab-delimited text suitable for loading into
+a spreadsheet.
+
+If the optional filename parameter is given, output is directed to
+the specified file. If the file cannot be opened or a filename is
+not specified, output appears on the screen. If the filename begins
+with a pipe character ($\mid$), the results are piped into the
+specified program.
+
+\item[ATOMS \emph{atm[,atm]\ldots}] Specifies the atom subset to fit.
+
+\item[BVALUE \emph{cutoff} [ REF$\mid$MOB\mbox{]}] Specify a B-value
+cutoff. Any atoms with B-values greater than this value will be
+ignored completely. A negative cutoff specifies that atoms with
+B-values less than the absolute cutoff should be ignored. The optional
+\verb1REF1 or \verb1MOB1 parameter restricts B-value checking to
+the specified structure.
+
+\item[BWEIGHT] Weight the fitting by the inverse of the mean of the
+B-values in the equivalent atoms.
+
+\item[BZONE] Sets fitting zones based on markers in the temperature factor (B-value) column
+
+\item[CENTER [ OFF \mbox{]}] See \verb1CENTRE1.
+
+\item[CENTRE [ OFF \mbox{]}] Cause the coordinates to be written (using
+the \verb1WRITE1 or \verb1MWRITE1 commands), with the centre of
+geometry located at the origin instead of in the same coordinate frame
+as the reference set.
+
+\item[DELRZONE \emph{zonespec}] Removes a zone specification to the list of
+zones considered in RMS deviation calculation. \verb1DELRZONE *1 or
+\verb1DELRZONE1 \verb1ALL1 deletes all RMS deviation calculation zones.
+
+\item[DELZONE \emph{zonespec}] Removes a zone specification to the list of
+zones considered in fitting. \verb1DELZONE *1 or \verb1DELZONE1 \verb1ALL1
+removes all zone specifications.
+
+\item[DISTCUTOFF [\emph{cutoff} $\mid$ ON $\mid$ OFF\mbox{]}]
+Specifies a distance cutoff for RMSd calculations.
+
+\item[FIT] Performs the actual fitting. Returns the RMS deviation over
+the atoms included in the fit.
+
+\item[GAPPEN \emph{val} [\emph{val}\mbox{]}] Specifies an integer gap
+penalty and a gap extension penalty for the sequence alignment performed
+by the \verb1ALIGN1 command. The default values for the gap penalty and gap
+extension penalty are 10 and 2 respectively.
+
+\item[HEADER [ON $\mid$ OFF\mbox{]}] Include PDB header and trailer records when writing structures. By default, only the coordinate section of a file is output when a structure is written.
+
+\item[HETATOMS] Read HETATM records with subsequent \verb1MOBILE1 and
+\verb1REFERENCE1 commands.
+
+\item[IGNOREMISSING] Ignore any atom mismatches and proceed with the
+fitting. Such atoms are listed as warnings.
+
+\item[ITERATE [ (\emph{limit} $\mid$ OFF) \mbox{]}] Switches on (or
+off) iterative updating of the zones for fitting. The \verb1ITERATE1
+command may be followed by an optional distance cutoff (default:
+3.0\AA) or by the keyword `OFF' to switch off iterative zone
+calculation.
+
+\item[LIMIT (\emph{pos1 pos2} $\mid$ OFF)] Limits the range in an
+alignment (from \verb1READALIGNMENT1) used to derive zones.
+\verb|LIMIT OFF| restores the default behaviour.
+
+\item[MATRIX] Displays the centres of geometry, rotation matrix and
+translation vector.
+
+\item[MOBILE [ XMAS \mbox{]} \emph{\bfseries filename}] Reads a mobile PDB
+structure. If compiled with XMAS support, then the XMAS keyword
+specifies that the input is in XMAS format.
+
+\item[MULTI \emph{filename}] Reads a file of files containing a list
+of structures for multiple fitting.
+
+\item[MULTREF [OFF\mbox{]}]
+ Sets RMSd calculations to give values to the averaged reference rather
+ than the first mobile structure.
+
+
+\item[MWRITE [ \emph{ext}]] Write the results of multiple structure
+fitting. The structures are written back using the same filenames with
+which they were read, but with the extension changed to that
+specified. If no extension is given, then `.fit' is used. Note
+therefore, that you must have write permission to the directory from
+which the input files were read.
+
+\item[NFITTED] Reports the number of atom pairs fitted.
+
+\item[NOFIT] Sets the fitted flag in profit allowing the user to calculate
+the RMSD on a structure without fitting.
+
+\item[NOHETATOMS] Do not read HETATM records with subsequent
+\verb1MOBILE1 and \verb1REFERENCE1 commands.
+
+\item[NOIGNOREMISSING] Restore the default behaviour of issuing an
+error message for any atom mismatches and halting the fitting
+proceedure.
+
+\item[NOWEIGHT] Normal, non-weighted fitting.
+
+\item[NUMBER (RESIDUE$\mid$SEQUENTIAL)] Specifies whether zones are
+based on residue numbers in the PDB file or on sequential numbering
+(running through all chains).
+
+\item[OCCRANK \emph{n}] Sets ProFit to read the \emph{n}th ranked highest
+occupancy atom position for alternative atom positions.
+
+For structure files containing partial occupancies, lower occupancy atoms
+can be read using by setting the occupancy rank parameter to read alternative
+atom positions.
+
+By default, OCCRANK is set to 1 and reads the highest ranked atom position,
+a setting of 2 will read the second most occupied position and a setting of 3
+will read the third most occupied position, etc.
+
+\item[ORDERFIT]
+ Performs a fit of all mobile structures to the reference structure. The
+ most similar structures are fitted first.
+
+
+\item[PAIRDIST [ \emph{filename}\mbox{]}] Prints the pairwise distances between equivalent atom pairs.
+If the first character of the (optional) filename is a pipe character
+($\mid$), then the results will be piped into the specified
+program. For example:
+\begin{verbatim}
+ PAIRDIST |less
+\end{verbatim}
+will cause the results to be displayed on the screen using the
+\verb1less1 pager program.
+
+\item[PRINTALIGN [FASTA$\mid$PIR\mbox{]} [ \emph{filename}\mbox{]}]
+ Prints current fitting zones as a sequence alignment.
+ The default output is a (user-friendly) pairwise alignment with the
+ reference and mobile sequences printed as pairs of 60-character wide
+ lines.
+ The optional \verb1FASTA1 and \verb1PIR1 parameters set the printout
+ to FASTA or PIR formatting.
+ \pf\ can read PIR-formatted files using the \verb1READALIGN1 command.
+
+\item[QUIET [ OFF \mbox{]}] Switches on (or off) quiet mode. In quiet
+mode, warning messages are suppressed and progress of iterative zone
+updating and multiple structure fitting is not reported.
+
+\item[QUIT] Exits from the program.
+
+\item[RATOMS \emph{atm[,atm]\ldots}] Specifies atoms over which to
+calculate the RMS deviation. Fitting must already have been performed.
+
+\item[READALIGNMENT \emph{filename}] Reads an alignment in PIR
+sequence file format and sets zones based on that alignment. Note
+that when used with multiple structures, the first sequence \emph{must
+appear twice} in the alignment file. This is because it is used as
+both the first reference and mobile set.
+
+\item[REFERENCE [ XMAS \mbox{]} \emph{\bfseries filename}] Reads a
+reference PDB structure. If compiled with XMAS support, then the XMAS
+keyword specifies that the input is in XMAS format.
+
+\item[RESIDUE [ \emph{filename}]] Gives a by-residue RMS deviation.
+If the first character of the (optional) filename is a pipe character
+($\mid$), then the results will be piped into the specified
+program. For example:
+\begin{verbatim}
+ RESIDUE |less
+\end{verbatim}
+will cause the results to be displayed on the screen using the
+\verb1less1 pager program.
+
+\item[RMS] Recalculate the RMS deviation over the zones and atoms
+currently defined with \verb1RZONE1 and \verb1RATOMS1.
+
+\item[RZONE \emph{zonespec}] Adds a zone specification to the list of
+zones considered in RMS deviation calculation. \verb1RZONE *1 or
+\verb1RZONE1 \verb1CLEAR1 resets the zones for RMSD calculation to be
+the same as that specified with the \verb1ZONE1 command.
+
+\item[SCRIPT \emph{filename}] Executes a script file.
+
+\item[SETCENTER \emph{residue}] See \verb1SETCENTRE1.
+
+\item[SETCENTRE \emph{residue}] Specifies a single residue as the centre of fitting. Entering SETCENTRE CLEAR or SETCENTRE * will clear the centre residue.
+
+ \item[SETREF [\emph{n}\mbox{]}]
+ Sets the reference structure to the \emph{n}th mobile structure when fitting multiple structures.
+
+ If no structure number is given then the reference is automatically set
+ by performing an all versus all comparison of the mobile structures then
+ selecting the structure with the lowest overall RMSD to the other mobile
+ structures.
+
+\item[STATUS[ \emph{filename}]] Reports current program status.
+ If the optional filename parameter is given, output is directed to
+ the specified file. If the file cannot be opened or a filename is
+ not specified, output appears on the screen. If the filename begins
+ with a pipe character ($\mid$), the results are piped into the
+ specified program.
+
+\item[SYMMATOMS [[OFF$\mid$ON$\mid$ALL\mbox{]}$\mid$\emph{xxx} [OFF$\mid$ON\mbox{]}]
+
+ where \emph{xxx} is a three-letter amino acid code.
+
+ Enabes the auto-matching of symmetrical atoms (eg CD1 - CD2 and
+ CE1 - CE2 of tyrosine) in \pf\
+
+ \verb1SYMMATOMS1 matches charged oxygens and nitrogens on arginine,
+ aspartate and glutamate residues and the delta and epsilon carbons of
+ phenylalanine and tyrosine residues.
+
+ It is also possible to match the nitrogen and oxygen atoms of the amide
+ sidechains of asparagine and glutamine residues and the prochiral methyl
+ groups of valine and leucine.
+
+ Typing \verb1SYMMATOMS1 will display the pairs of atoms currently matched
+ by \pf. Typing \verb1SYMMATOMS1 \verb1ON1 or \verb1SYMMATOMS1 \verb1OFF1
+ will turn symmetrical atom matching on or off.
+
+ Individual residue types, for example ASP, can be turned-on or off by
+ typing \verb1SYMMATOMS1 \verb1ASP1 \verb1ON1 or \verb1SYMATM1 \verb1ASP1
+ \verb1OFF1, respectively. Alternatively, \verb1SYMMATOMS1 \verb1ALL1
+ \verb1ON1 will turn all atom pairs on.
+
+ By default, the the matching of symmetrical atoms is turned-off
+
+
+\item[TRIMZONES]
+ This command is used primarily with fitting zones derived using \verb1ALIGN1.
+ With pairwise alignments, the lengths of the aligned regions may vary and
+ there may be gaps in the alignments from one structure to another. The
+ \verb1TRIMZONES1 command trims the ends of the aligned zones and adds gaps
+ allowing for a like versus like comparison by using fitting zones that are
+ common to all the structures.
+
+ \verb1TRIMZONES1 is automatically called by the \verb1ALLVSALL1 and
+ \verb1SETREF1 commands.
+ This command is only used with multiple structures.
+
+\item[WEIGHT] Weight the fitting by the mean of the B-values in the
+equivalent atoms.
+
+\item[WRITE [ REFerence \mbox{]} \emph{\bfseries filename}] Writes the fitted structure
+to a PDB file. If the first character of the filename is a pipe
+character ($\mid$), then the results will be piped into the specified
+program. For example:
+\begin{verbatim}
+ WRITE |less
+\end{verbatim}
+will cause the coordinates to be displayed on the screen using the
+\verb1less1 pager program.
+
+If the \verb1REFERENCE1 keyword is given (only the letters `REF' are
+required), then the reference set will be written. This is used in
+conjunction with the \verb1CENTRE1 command.
+
+\item[WTAVERAGE [ ON$\mid$OFF \mbox{]}]
+
+ Sets the weighting system for the averaged reference structure to the
+ default weighting system where the change in the coordinates of the
+ reference structure is inversely proportional to the number of mobile
+ strucures. The weighted averaging scheme was introduced to lower the
+ effect that outlying structures have on the averaged reference.
+ (Default: ON)
+
+ The alternative weighting scheme sets the coordinates of the
+ reference stucture to the average of the reference and the mobile
+ structures. This was the scheme used by ProFit prior to version 3.0.
+ (\verb1WTAVERAGE OFF1)
+
+\item[ZONE \emph{zonespec}] Adds a zone specification to the list of
+zones considered in fitting. \verb1ZONE *1 or \verb1ZONE1 \verb1CLEAR1
+removes all zone specifications.
+\end{description}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Copyright}
+
+Please note that the program is called \pf\ --- not PROFIT, Profit or
+profit; this attempts to avoid confusion with the threading program
+known as PROFIT. \pf\ was written first and released to the public
+around the same time.
+
+\pf\ is pronounced as it is written, i.e.\ `pro' (as in `protein') then
+`fit' (not `profit' as in `make lots of money'!
+
+\pf\ was initially written by Dr Andrew C.R. Martin while self-employed and
+trading as {\sffamily SciTech Software}.
+Enhancements have been made since at UCL and at the University of Reading.
+Addition of iteration and multiple fitting was sponsored by Inpharmatica, Ltd.
+Enhancements in V2.6 and V3.0 were made possible by a Tools and Resources
+grant from the BBSRC.
+
+
+
+\begin{quotation}
+This program is not in the public domain.
+
+It may not be copied or made available to third parties, but may be
+freely used by non-profit-making organisations and commercial
+companies who have obtained it directly from the author or by FTP or
+HTTP from the author's web sites.
+
+If you did not register the program via the web site, you are
+requested to send EMail to the author to say that you are using this
+code so that you may be informed of future updates.
+
+The code may not be made available on other FTP or Web sites without
+express permission from the author.
+
+The code may be modified as required, but any modifications must be
+documented so that the person responsible can be identified. If
+someone else breaks this code, the author doesn't want to be blamed
+for code that does not work! You may not distribute any modifications,
+but are encouraged to send them to the author so that they may be
+incorporated into future versions of the code.
+
+While the compiled \pf\ program may be used by commercial companies,
+it may not be sold commercially or included as part of a commercial
+product. The source code or any derivative works may not be sold
+commercially or used for commercial purposes outside of \pf\ without
+prior permission from the author.
+\end{quotation}
+
+\vspace{2em}
+
+While this software is provided ``as is'' and free of charge, I
+do appreciate hearing from people who use it and find it useful. An
+EMail or a postcard would be nice.
+
+If you find \pf\ useful, please tell your colleagues about it. Please
+\emph{do not} pass copies of \pf\ on to them directly; ask them to
+obtain it \emph{via} my World Wide Web page
+(\verb1http://www.bioinf.org.uk/software/profit/1)
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{How do I Reference \pf?}
+
+No paper has been published describing \pf\ itself since it is simply
+a convenient program (I hope) to let you use a standard fitting
+algorithm; consequently, it is a little difficult to reference. The
+exact wording is up to you and dependent on the context, but I suggest
+something similar to:
+
+\begin{quotation}
+Fitting was performed using the McLachlan algorithm (McLachlan, A.D.,
+1982 ``Rapid Comparison of Protein Structres'', Acta Cryst A38,
+871-873) as implemented in the program ProFit (Martin, A.C.R. and
+Porter, C.T.,
+\verb1http://www.bioinf.org.uk/software/profit/1)
+\end{quotation}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Acknowledgements}
+
+Inpharmatica Ltd. are acknowledged for funding development of V2.0 of ProFit.
+The BBSRC are acknowledged for funding development of V2.6 and V3.0 of ProFit.
+
+\end{document}
diff --git a/mdm78.mat b/mdm78.mat
new file mode 100644
index 0000000..d1a4ef7
--- /dev/null
+++ b/mdm78.mat
@@ -0,0 +1,26 @@
+ 2 -2 0 0 -2 0 0 1 -1 -1 -2 -1 -1 -4 1 1 1 -6 -3 0 0 0 0 0 0
+ -2 6 0 -1 -4 1 -1 -3 2 -2 -3 3 0 -4 0 0 -1 2 -4 -2 -1 0 0 0 0
+ 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -4 -1 1 0 -4 -2 -2 2 1 0 0 0
+ 0 -1 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 -1 0 0 -7 -4 -2 3 3 0 0 0
+ -2 -4 -4 -5 12 -5 -5 -3 -3 -2 -6 -5 -5 -4 -3 0 -2 -8 0 -2 -4 -5 0 0 0
+ 0 1 1 2 -5 4 2 -1 3 -2 -2 1 -1 -5 0 -1 -1 -5 -4 -2 1 3 0 0 0
+ 0 -1 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 -1 0 0 -7 -4 -2 2 3 0 0 0
+ 1 -3 0 1 -3 -1 0 5 -2 -3 -4 -2 -3 -5 -1 1 0 -7 -5 -1 0 -1 0 0 0
+ -1 2 2 1 -3 3 1 -2 6 -2 -2 0 -2 -2 0 -1 -1 -3 0 -2 1 2 0 0 0
+ -1 -2 -2 -2 -2 -2 -2 -3 -2 5 2 -2 2 1 -2 -1 0 -5 -1 4 -2 -2 0 0 0
+ -2 -3 -3 -4 -6 -2 -3 -4 -2 2 6 -3 4 2 -3 -3 -2 -2 -1 2 -3 -3 0 0 0
+ -1 3 1 0 -5 1 0 -2 0 -2 -3 5 0 -5 -1 0 0 -3 -4 -2 1 0 0 0 0
+ -1 0 -2 -3 -5 -1 -2 -3 -2 2 4 0 6 0 -2 -2 -1 -4 -2 2 -2 -2 0 0 0
+ -4 -4 -4 -6 -4 -5 -5 -5 -2 1 2 -5 0 9 -5 -3 -3 0 7 -1 -5 -5 0 0 0
+ 1 0 -1 -1 -3 0 -1 -1 0 -2 -3 -1 -2 -5 6 1 0 -6 -5 -1 -1 0 0 0 0
+ 1 0 1 0 0 -1 0 1 -1 -1 -3 0 -2 -3 1 2 1 -2 -3 -1 0 0 0 0 0
+ 1 -1 0 0 -2 -1 0 0 -1 0 -2 0 -1 -3 0 1 3 -5 -3 0 0 -1 0 0 0
+ -6 2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 0 0 0
+ -3 -4 -2 -4 0 -4 -4 -5 0 -1 -1 -4 -2 7 -5 -3 -3 0 10 -2 -3 -4 0 0 0
+ 0 -2 -2 -2 -2 -2 -2 -1 -2 4 2 -2 2 -1 -1 -1 0 -6 -2 4 -2 -2 0 0 0
+ 0 -1 2 3 -4 1 2 0 1 -2 -3 1 -2 -5 -1 0 0 -5 -3 -2 2 2 0 0 0
+ 0 0 1 3 -5 3 3 -1 2 -2 -3 0 -2 -5 0 0 -1 -6 -4 -2 2 3 0 0 0
+ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+ A R N D C Q E G H I L K M F P S T W Y V B Z X - ?
diff --git a/src/Makefile b/src/Makefile
new file mode 100644
index 0000000..53f1f2d
--- /dev/null
+++ b/src/Makefile
@@ -0,0 +1,62 @@
+# You are advised to change this to provide warning messages for your
+# complier - the sample line is for GCC
+CC = cc -O3
+#CC = cc -O3 -ansi -Wall -pedantic
+
+# To allow use of the GNU Readline library, uncomment the following two
+# lines - You may need to install the GNU readline development libraries
+# first!
+#READLINE = -DREADLINE_SUPPORT
+#READLINELIB = -lreadline -lcurses
+
+# Uncomment if you want to use the rotate and refit code for avoiding
+# local minimum problem. If compiled with gcc -O3 this problem seems to
+# go away in any case and only affects fitting of identical structures
+# A fix will soon be provided in fit.c instead
+#ROTATEREFIT = -DROTATE_REFIT
+
+# To allow decompression of gzipped PDB files on the fly, change the
+# COPT line to
+# COPT = -DGUNZIP_SUPPORT
+COPT =
+
+# Uncomment these and fix the paths if you have the XMAS library and
+# want XMAS support
+# XMAS = -DUSE_XMAS -I/home/amartin/inpharmatica/cvs/software/include
+# XMASLIB = -L/home/amartin/inpharmatica/cvs/software/lib -lxmas -lacrm
+
+LINK1 =
+LINK2 =
+ANSI = ansi -p
+
+OFILES = main.o todo.o fitting.o NWAlign.o
+PFILES = main.p todo.p fitting.p NWAlign.p
+IFILES = ProFit.h protos.h
+LFILES = bioplib/WholePDB.o bioplib/WritePDB.o bioplib/parse.o \
+bioplib/PDB2Seq.o bioplib/WindIO.o bioplib/help.o \
+bioplib/fsscanf.o bioplib/ApMatPDB.o bioplib/align.o \
+bioplib/fit.o bioplib/throne.o bioplib/angle.o bioplib/array2.o \
+bioplib/ParseRes.o bioplib/OpenFile.o bioplib/padterm.o \
+bioplib/ReadPIR.o bioplib/openorpipe.o bioplib/GetWord.o \
+bioplib/SelAtPDB.o bioplib/IndexPDB.o bioplib/CopyPDB.o \
+bioplib/FindZonePDB.o bioplib/countchar.o bioplib/KillLeadSpaces.o \
+bioplib/MatMult3_33.o bioplib/StringToUpper.o bioplib/upstrncmp.o \
+bioplib/LegalAtomSpec.o bioplib/GetPDBChainLabels.o bioplib/DupePDB.o \
+bioplib/TranslatePDB.o bioplib/AtomNameMatch.o bioplib/chindex.o \
+bioplib/FindNextResidue.o bioplib/StoreString.o bioplib/FreeStringList.o \
+bioplib/MatMult33_33.o bioplib/ReadPDB.o bioplib/aalist.o \
+bioplib/CreateRotMat.o
+
+profit : $(OFILES) $(LFILES)
+ $(CC) -o profit $(OFILES) $(LFILES) $(XMAS) -lm $(LINK2) $(XMASLIB) $(READLINELIB)
+
+.c.o : $(IFILES)
+ $(CC) $(COPT) $(READLINE) $(ROTATEREFIT) -o $@ -c $<
+
+clean :
+ /bin/rm $(OFILES) $(LFILES)
+
+protos : $(PFILES)
+
+.c.p :
+ $(ANSI) $< $@
diff --git a/src/Makefile_dos b/src/Makefile_dos
new file mode 100644
index 0000000..011520e
--- /dev/null
+++ b/src/Makefile_dos
@@ -0,0 +1,68 @@
+# Ensure that the mingw bin dir from your installation of Qt and mingw
+# is in your path. For example:
+# PATH = %PATH%;c:\Qt\2009.01\mingw\bin
+# Then run make by doing:
+# mingw32-make
+#
+CC = gcc -o3 -g -Wall -fmessage-length=0
+#
+# Uncomment these and fix the paths if you have the XMAS library and
+# want XMAS support
+# XMAS = -DUSE_XMAS -I/home/amartin/inpharmatica/cvs/software/include
+# XMASLIB = -L/home/amartin/inpharmatica/cvs/software/lib -lxmas -lacrm
+
+#
+# To allow use of the GNU Readline library, uncomment the following two
+# lines.
+#READLINE = -DREADLINE_SUPPORT
+#READLINELIB = -lreadline
+
+# Uncomment if you want to use the rotate and refit code for avoiding
+# local minimum problem. If compiled with gcc -O3 this problem seems to
+# go away in any case and only affects fitting of identical structures
+# A fix will soon be provided in fit.c instead
+# ROTATEREFIT = -DROTATE_REFIT
+
+#
+# To allow decompression of gzipped PDB files on the fly, change the
+# COPT line to
+#COPT = -DGUNZIP_SUPPORT -g
+# COPT =
+#
+LINK1 =
+LINK2 =
+ANSI = ansi -p
+
+OFILES = main.o todo.o fitting.o NWAlign.o
+PFILES = main.p todo.p fitting.p NWAlign.p
+IFILES = ProFit.h protos.h
+LFILES = bioplib/WholePDB.o bioplib/WritePDB.o bioplib/parse.o \
+bioplib/PDB2Seq.o bioplib/WindIO.o bioplib/help.o \
+bioplib/fsscanf.o bioplib/ApMatPDB.o bioplib/align.o \
+bioplib/fit.o bioplib/throne.o bioplib/angle.o bioplib/array2.o \
+bioplib/ParseRes.o bioplib/OpenFile.o bioplib/padterm.o \
+bioplib/ReadPIR.o bioplib/openorpipe.o bioplib/GetWord.o \
+bioplib/SelAtPDB.o bioplib/IndexPDB.o bioplib/CopyPDB.o \
+bioplib/FindZonePDB.o bioplib/countchar.o bioplib/KillLeadSpaces.o \
+bioplib/MatMult3_33.o bioplib/StringToUpper.o bioplib/upstrncmp.o \
+bioplib/LegalAtomSpec.o bioplib/GetPDBChainLabels.o bioplib/DupePDB.o \
+bioplib/TranslatePDB.o bioplib/AtomNameMatch.o bioplib/chindex.o \
+bioplib/FindNextResidue.o bioplib/StoreString.o bioplib/FreeStringList.o \
+bioplib/MatMult33_33.o bioplib/ReadPDB.o bioplib/CreateRotMat.o \
+bioplib/aalist.o
+
+profit.exe : $(OFILES) $(LFILES)
+ $(CC) -g -o profit $(OFILES) $(LFILES) $(XMAS) -lm $(LINK2) $(XMASLIB) $(READLINELIB)
+
+.c.o : $(IFILES)
+ $(CC) $(COPT) $(READLINE) $(ROTATEREFIT) -o $@ -c $<
+
+clean :
+ /bin/rm $(OFILES) $(LFILES)
+
+protos : $(PFILES)
+
+.c.p :
+ $(ANSI) $< $@
+
+
diff --git a/src/NWAlign.c b/src/NWAlign.c
new file mode 100644
index 0000000..af547ef
--- /dev/null
+++ b/src/NWAlign.c
@@ -0,0 +1,2344 @@
+/*************************************************************************
+
+ Program: ProFit
+ File: NWAlign.c
+
+ Version: V3.1
+ Date: 31.03.09
+ Function: Protein Fitting program.
+
+ Copyright: SciTech Software 1992-2009
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain.
+
+ It may not be copied or made available to third parties, but may be
+ freely used by non-profit-making organisations who have obtained it
+ directly from the author or by FTP.
+
+ You are requested to send EMail to the author to say that you are
+ using this code so that you may be informed of future updates.
+
+ The code may not be made available on other FTP sites without express
+ permission from the author.
+
+ The code may be modified as required, but any modifications must be
+ documented so that the person responsible can be identified. If
+ someone else breaks this code, the author doesn't want to be blamed
+ for code that does not work! You may not distribute any
+ modifications, but are encouraged to send them to the author so
+ that they may be incorporated into future versions of the code.
+
+ Such modifications become the property of Dr. Andrew C.R. Martin and
+ SciTech Software though their origin will be acknowledged.
+
+ The code may not be sold commercially or used for commercial purposes
+ without prior permission from the author.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V0.1 25.09.92 Original
+ V0.5 08.10.93 Various tidying for Unix & chaned for booklib
+ V0.6 05.01.94 Modified MDMFILE for Unix getenv()
+ V0.7 24.11.94 The DATAENV environment variable is now handled by code
+ in bioplib/align.c/ReadMDM; Checks the return from this
+ Fixed bug in multi-zone align
+ V0.8 17.07.95 Replaced screen() stuff with printf()
+ Only allowed on single chains
+ V1.0 18.07.95 Insert codes now work.
+ First official release (at last!).
+ V1.1 20.07.95 Skipped
+ V1.2 22.07.95 Added GAPPEN command making gap penalty global variable
+ V1.3 31.07.95 Skipped
+ V1.4 14.08.95 Skipped
+ V1.5 21.08.95 Fixed bug in mapping alignment to zones. Also bug in
+ Bioplib align() routine
+ V1.5b 15.11.95 Now also prints a score normalised by the length of the
+ shorter sequence.
+ V1.6 20.11.95 Added ReadAlignment() code
+ V1.6b 22.11.95 Added check in SetNWZones() for a deletion in both
+ sequences
+ V1.6c 13.12.95 The check added in 1.6b wasn't working. Fixed!
+ V1.6g 18.06.96 Changed MODE_* to ZONE_MODE_*
+ V1.7 23.07.96 Skipped
+ V1.7g 06.05.98 Rewrite of SetNWZones()
+ V1.8 07.05.98 Skipped for release
+ V2.0 01.03.01 Additions for iterative zone updating
+ V2.1 28.03.01 Parameter for ITERATE and added CENTRE command
+ V2.2 20.12.01 Skipped for release
+ V2.3 01.12.04 Fixed bugs in removing double deletions from alignment
+ with multiple structures
+ V2.4 03.06.05 Skipped for release
+ V2.5 07.06.05 Skipped for release
+ V2.6 23.04.08 Added VerifySequence(). By: CTP
+ V3.0 06.11.08 Added multi-chain alignment and ability to print fitted
+ zones as an alignment.
+ V3.0 25.11.08 Changed output format for printing fitted zones as
+ alignment.
+ V3.0 13.01.08 Included output of fitting zones as PIR alignment.
+ V3.1 31.03.09 Skipped for release.
+
+*************************************************************************/
+/* Includes
+*/
+#include "ProFit.h"
+
+/************************************************************************/
+/*>NWAlign(int strucnum)
+ ---------------------
+ 28.09.92 Framework
+ 09.10.92 Original
+ 05.01.94 Modified to get data directory name from environment variable
+ under Unix
+ 17.07.95 Replaced screen() with printf()
+ Check only one chain in each structure.
+ 22.07.95 Made gap penalty a variable
+ 15.11.95 Also prints a score normalised by the length of the shorter
+ sequence.
+ 01.02.01 Added strucnum parameter
+ 16.07.08 Changed alignment function from align() to affinealign()
+ to allow for inclusion of gap extension penalty, gGapPenExt.
+ By: CTP
+*/
+void NWAlign(int strucnum)
+{
+ static int FirstCall = TRUE;
+ int ref_len,
+ mob_len,
+ align_len,
+ offset,
+ score,
+ i, j,
+ ai, aj;
+ char *ref_align = NULL,
+ *mob_align = NULL;
+
+ printf(" Performing N&W alignment...\n");
+
+ if(FirstCall)
+ {
+ if(!ReadMDM(MDMFILE))
+ {
+ printf(" Error==> Unable to read mutation data matrix\n");
+ return;
+ }
+
+ FirstCall = FALSE;
+ }
+
+ /* Make checks that structures read */
+ if(gRefSeq==NULL || gMobSeq[strucnum]==NULL)
+ {
+ printf(" Error==> Structures have not been read!\n");
+ return;
+ }
+
+ /* Check for numbers of chains */
+ if((countchar(gRefSeq,'*') > 0) ||
+ (countchar(gMobSeq[strucnum],'*') > 0))
+ {
+ printf(" Error==> Structures must have only one chain for \
+alignment\n");
+ return;
+ }
+
+ /* Find sequence lengths */
+ ref_len = strlen(gRefSeq);
+ mob_len = strlen(gMobSeq[strucnum]);
+
+ /* Allocate memory for alignment sequences */
+ if((ref_align = (char *)malloc((ref_len+mob_len)*sizeof(char)))==NULL)
+ {
+ printf(" Warning==> No memory for alignment!\n");
+ return;
+ }
+ if((mob_align = (char *)malloc((ref_len+mob_len)*sizeof(char)))==NULL)
+ {
+ printf(" Warning==> No memory for alignment!\n");
+ free(ref_align);
+ return;
+ }
+
+ /* Perform the alignment */
+ /* Alignment function modified 16.07.08 */
+ /*
+ score = align(gRefSeq, ref_len, gMobSeq[strucnum], mob_len, FALSE,
+ FALSE, gGapPen, ref_align, mob_align, &align_len);
+ */
+
+ score = affinealign(gRefSeq, ref_len, gMobSeq[strucnum], mob_len,
+ FALSE, FALSE, gGapPen, gGapPenExt,
+ ref_align, mob_align, &align_len);
+
+
+ if(!score)
+ {
+ printf(" Error==> Unable to perform alignment!\n");
+ return;
+ }
+
+ /* Display the fitted sequences */
+ offset = 0;
+ printf(" ");
+ for(i=0,ai=0,aj=0; ai<align_len; ai++) /* Prints ref sequence */
+ {
+ char buffer[8];
+
+ if(++i>60) /* If printed 60 chars, print equiv section of mob seq*/
+ {
+ i=1;
+ printf("\n ");
+ for(j=offset; j<60+offset; j++)
+ {
+ sprintf(buffer,"%c",mob_align[j]);
+ printf(buffer);
+ }
+ printf("\n\n ");
+ offset += 60;
+ }
+ printf("%c",ref_align[ai]);
+ }
+ printf("\n ");
+
+ for(j=offset; j<align_len; j++) /* Print remains of mob seq */
+ {
+ printf("%c",mob_align[j]);
+ }
+ printf("\n\n ");
+
+ printf("Score: %d Normalised score: %.2f\n",
+ score,
+ (REAL)score/(REAL)(MIN(ref_len,mob_len)));
+
+ /* Clear any current fitting zones */
+ SetFitZone("CLEAR", strucnum);
+
+ /* Now set zones based on alignment */
+ SetNWZones(ref_align, mob_align, align_len, NULL, NULL, strucnum);
+
+ /* Free allocated memory */
+ free(ref_align);
+ free(mob_align);
+
+ return;
+}
+
+
+
+/************************************************************************/
+/*>void ReadAlignment(char *alnfile)
+ ---------------------------------
+ Read the first two sequences out of an alignment file in PIR format
+ and set up zones based on the alignment.
+
+ 20.11.95 Original By: ACRM
+ 01.02.01 Modified to cope with multiple structures.
+ 01.12.04 Fixed problem with multiple structures - didn't actually
+ handle removing double deletions properly since the
+ sequences were modified in-place
+ 23.04.08 added check of read sequences against stored
+ sequences. By: CTP
+ 03.12.08 Adapted to read multi-chain sequences.
+ 13.01.09 Tidied code.
+
+*/
+void ReadAlignment(char *alnfile)
+{
+ FILE *fp;
+ char *seqa[MAXCHAIN],
+ *seqb[MAXCHAIN],
+ *tseqa = NULL,
+ *ref_string = NULL,
+ *mob_string = NULL;
+ BOOL punct, error;
+ int i,
+ nchain,
+ strucnum = 0,
+ chainlength = 0;
+
+ /* Open the PIR alignment file for reading */
+ if((fp=fopen(alnfile,"r"))==NULL)
+ {
+ printf(" Error==> Unable to read alignment file (%s)\n", alnfile);
+ return;
+ }
+
+ /* Read the first sequence from the file */
+ nchain = ReadPIR(fp, TRUE, seqa, MAXCHAIN, NULL, &punct, &error);
+
+ /* No sequence found */
+ if(nchain == 0)
+ {
+ printf(" Error==> No sequence read from alignment file.\n");
+ fclose(fp);
+ return;
+ }
+
+ /* Make reference string including whole sequence */
+ for(i=0;i<nchain;i++) chainlength += strlen(seqa[i]);
+ ref_string = (char *)malloc((nchain + chainlength)*sizeof(char));
+ strcpy(ref_string,seqa[0]);
+ free(seqa[0]);
+ for(i=1;i<nchain;i++)
+ {
+ strcat(ref_string,"*");
+ strcat(ref_string,seqa[i]);
+ free(seqa[i]);
+ }
+
+ /* Verify file sequence against Reference sequence */
+ if(!VerifySequence(ref_string, gRefSeq))
+ {
+ printf(" Error==> Reference sequence doesn't match.\n");
+ fclose(fp);
+ return;
+ }
+
+ /* Eliminate chain breaks */
+ ChainBreakToGap(ref_string);
+
+
+ /* 01.12.04 Allocate a buffer to store a working copy */
+ if((tseqa=(char *)malloc((1+strlen(ref_string))*sizeof(char)))==NULL)
+ {
+ printf(" Error==> No memory for working copy of sequence.\n");
+ fclose(fp);
+ return;
+ }
+
+ /* Read the second sequence from the file */
+ while((nchain = ReadPIR(fp, TRUE, seqb, MAXCHAIN, NULL,
+ &punct, &error)))
+ {
+ /* Check sequence found */
+ if(nchain == 0)
+ {
+ printf(" Error==> No sequence read from alignment file.\n");
+ fclose(fp);
+ return;
+ }
+
+
+ /* Make mobile string including whole sequence */
+ for(i=0;i<nchain;i++) chainlength += strlen(seqb[i]);
+ mob_string = (char *)malloc((nchain + chainlength)*sizeof(char));
+ strcpy(mob_string,seqb[0]);
+ free(seqb[0]);
+ for(i=1;i<nchain;i++)
+ {
+ strcat(mob_string,"*");
+ strcat(mob_string,seqb[i]);
+ free(seqb[i]);
+ }
+
+ /* 23.04.08 Verify file sequence against Mobile sequence */
+ /*if(!VerifySequence(seqb[0], gMobSeq[strucnum]))*/
+ if(!VerifySequence(mob_string, gMobSeq[strucnum]))
+ {
+ printf(" Error==> Mobile sequence: %d doesn't match.\n",
+ strucnum+1);
+ fclose(fp);
+ return;
+ }
+
+ /* Eliminate chain breaks */
+ ChainBreakToGap(mob_string);
+
+ /* 01.12.04 Make working copy of sequence A */
+ /*strcpy(tseqa,seqa[0]);*/
+ strcpy(tseqa,ref_string);
+
+ /* Clear any current fitting zones */
+ SetFitZone("CLEAR", strucnum);
+
+ /* Remove any deletions which appear in both sequences */
+ /* 01.12.04 Changed to tseqa rather than seqa[0] */
+ /*if(!RemoveDoubleDeletions(tseqa, seqb[0]))*/
+ if(!RemoveDoubleDeletions(tseqa, mob_string))
+ {
+ printf(" Warning==> No memory to remove double deletions.\n");
+ printf(" Will try to remove them as we go...\n");
+ }
+
+
+ /* Now set zones based on alignment */
+ /* 01.12.04 Changed to tseqa rather than seqa[0] */
+ /*
+ SetNWZones(tseqa, seqb[0], MIN(strlen(tseqa), strlen(seqb[0])),
+ NULL, NULL, strucnum);
+ */
+
+ SetNWZones(tseqa, mob_string, MIN(strlen(tseqa),
+ strlen(mob_string)),
+ NULL, NULL, strucnum);
+
+ /*free(seqb[0]);*/
+ free(mob_string);
+
+ if(++strucnum > gMultiCount)
+ {
+ printf(" Warning==> Alignment file contains more sequences than there\n");
+ printf(" are structures.\n");
+ break;
+ }
+ }
+
+ if(strucnum < gMultiCount)
+ {
+ printf(" Warning==> Insufficient sequences in alignment file.\n");
+ printf(" Fitting may fail!\n");
+ }
+
+ /* Free allocated memory and close file */
+ free(tseqa);
+ /*free(seqa[0]);*/
+ free(ref_string);
+ fclose(fp);
+
+ /* Convert to sequential numbering with breaks between chains */
+ ConvertAllZones(ZONE_MODE_RESNUM);
+ ConvertAllZones(ZONE_MODE_SEQUENTIAL);
+}
+
+
+/************************************************************************/
+/*>BOOL RemoveDoubleDeletions(char *seqa, char *seqb)
+ --------------------------------------------------
+ Remove deletions which appear in both sequences when reading an
+ alignment file. This often occurs when the two sequences have come
+ from part of a multiple alignment.
+
+ 13.12.95 Original By: ACRM
+ 26.11.04 Fixed to allocate both arrays to longer length
+*/
+BOOL RemoveDoubleDeletions(char *seqa, char *seqb)
+{
+ char *copya = NULL,
+ *copyb = NULL;
+ int i, j,
+ lena,
+ lenb,
+ maxlen;
+
+ lena = strlen(seqa);
+ lenb = strlen(seqb);
+ maxlen = MAX(lena, lenb);
+
+ /* Create temporary storage for the sequences */
+ copya = (char *)malloc((maxlen+1) * sizeof(char));
+ copyb = (char *)malloc((maxlen+1) * sizeof(char));
+ if(copya==NULL || copyb==NULL)
+ {
+ if(copya!=NULL) free(copya);
+ if(copyb!=NULL) free(copyb);
+ return(FALSE);
+ }
+
+ /* Copy in the sequences skipping any double deletions */
+ for(i=0, j=0; i<MAX(lena, lenb); i++)
+ {
+ if((seqa[i] != '-') || (seqb[i] != '-'))
+ {
+ if(i<lena)
+ copya[j] = seqa[i];
+ if(i<lenb)
+ copyb[j] = seqb[i];
+ j++;
+ }
+ }
+ copya[j] = copyb[j] = '\0';
+
+ /* Copy back into the original strings */
+ strcpy(seqa, copya);
+ strcpy(seqb, copyb);
+
+ /* Free up the temporary storage */
+ free(copya);
+ free(copyb);
+
+ return(TRUE);
+}
+
+/************************************************************************/
+/*>SetNWZones(char *ref_align, char *mob_align, int align_len,
+ PDB **RefIndex, PDB **MobIndex, int strucnum)
+ -----------------------------------------------------------
+ Searches through the N&W sequence alignment and creates fitting zones
+ from the equivalent regions.
+
+ 09.10.92 Original
+ 24.11.94 Fixed bug causing it to lose first zone in multi-zone match
+ 17.07.95 Replaced screen() with printf()
+ 18.07.95 Added initialisation of inserts in zones
+ 21.08.95 Fixed bug in additional non-existant zone being added when
+ last zone not at end of chain
+ 22.11.95 Added check on deletion in both sequences
+ 13.12.95 Wasn't doing this check when stepping through a block of
+ deleteions. Fixed.
+ 18.06.96 Changed MODE_* to ZONE_MODE_*
+
+ 06.05.98 Completely rewritten! New version is 27% shorter, MUCH
+ simpler and fixes a bug which occurred when a zone had only
+ one residue.
+ 15.01.01 Simplified even further by making each residue an individual
+ zone. this is not as elegant, but makes the implementation
+ of distance checking much easier. If RefIndex and MobIndex
+ are NULL, it behaves as before. If not then the distance
+ between an atom pair is checked before adding the residue
+ pair to the zone. Finally calls MergeZones() to merge
+ adjacent zones.
+ 01.02.01 Added strucnum parameter
+ 20.02.01 Added check on gLimit[]
+*/
+void SetNWZones(char *ref_align, char *mob_align, int align_len,
+ PDB **RefIndex, PDB **MobIndex, int strucnum)
+{
+ int i,
+ start,
+ stop,
+ ref_resnum = 0,
+ mob_resnum = 0;
+ ZONE *z;
+
+ if(gZoneList[strucnum])
+ {
+ FREELIST(gZoneList[strucnum], ZONE);
+ gZoneList[strucnum] = NULL;
+ }
+
+ /* Get the start and stop of the region we are going to look at from
+ gLimit[] if it has been specified. Otherwise just use the whole
+ alignment length
+ */
+ start = ((gLimit[0] < 1)||(gLimit[1] < 1)) ?
+ 0 : (gLimit[0] - 1);
+ stop = ((gLimit[0] < 1)||(gLimit[1] < 1)) ?
+ align_len : (gLimit[1]);
+
+ if(start > align_len)
+ start = align_len-1;
+ if(stop > align_len)
+ stop = align_len;
+
+ for(i=0; i<start; i++)
+ {
+ /* Find offsets for first zone */
+ if(ref_align[i] != '-') ref_resnum++;
+ if(mob_align[i] != '-') mob_resnum++;
+ }
+
+ for(i=start; i<stop; i++)
+ {
+ /* Find the residue number in each structure */
+ if(ref_align[i] != '-') ref_resnum++;
+ if(mob_align[i] != '-') mob_resnum++;
+
+ if((ref_align[i] != '-') && (mob_align[i] != '-'))
+ {
+ if(((RefIndex==NULL) && (MobIndex==NULL)) ||
+ (DISTSQ(RefIndex[ref_resnum-1], MobIndex[mob_resnum-1]) <=
+ gMaxEquivDistSq))
+ {
+ /* Allocate and store the zone */
+ if(gZoneList[strucnum])
+ {
+ /* Move to end of zone list */
+ z=gZoneList[strucnum];
+ LAST(z);
+ ALLOCNEXT(z,ZONE);
+ }
+ else
+ {
+ INIT(gZoneList[strucnum],ZONE);
+ z = gZoneList[strucnum];
+ }
+ if(z==NULL)
+ {
+ printf(" Error==> No memory for N&W fitting zones!\n");
+ return;
+ }
+
+ z->chain1 = ' ';
+ z->start1 = ref_resnum;
+ z->startinsert1 = ' ';
+ z->stop1 = ref_resnum;
+ z->stopinsert1 = ' ';
+ z->chain2 = ' ';
+ z->start2 = mob_resnum;
+ z->startinsert2 = ' ';
+ z->stop2 = mob_resnum;
+ z->stopinsert2 = ' ';
+ z->mode = ZONE_MODE_SEQUENTIAL;
+ }
+ }
+ }
+
+ MergeZones(strucnum);
+
+ /* Set fitting flags */
+ gFitted = FALSE;
+ gUserFitZone = TRUE;
+}
+
+
+/************************************************************************/
+/*>void MergeZones(int strucnum)
+ -----------------------------
+ Merges zones describing sequentially numbered adjacent amino acids
+
+ 15.01.01 Original By: ACRM
+ 01.02.01 Added strucnum parameter
+*/
+void MergeZones(int strucnum)
+{
+ ZONE *z = NULL,
+ *zn = NULL;
+ BOOL converged = TRUE;
+
+ if(gZoneList[strucnum])
+ {
+ do
+ {
+ /* Assume we have converged */
+ converged = TRUE;
+ for(z=gZoneList[strucnum]; z!=NULL; NEXT(z))
+ {
+ zn = z->next;
+ if(zn)
+ {
+ /* If both zones are in sequential mode */
+ if((z->mode == ZONE_MODE_SEQUENTIAL) &&
+ (zn->mode == ZONE_MODE_SEQUENTIAL))
+ {
+ /* See if the two zones are sequential */
+ if((zn->start1 == (z->stop1 + 1)) &&
+ (zn->start2 == (z->stop2 + 1)))
+ {
+ z->stop1 = zn->stop1;
+ z->stop2 = zn->stop2;
+ z->next = zn->next;
+ free(zn);
+ converged = FALSE;
+ }
+ }
+ }
+ }
+ } while(!converged);
+ }
+}
+
+
+
+/************************************************************************/
+/*>BOOL VerifySequence(char *seqa, char *seqb)
+ -------------------------------------------
+ Compare sequence A to sequence B ignoring deletions in sequence.
+
+ 23.04.08 Original By: CTP
+*/
+BOOL VerifySequence(char *seqa, char *seqb)
+{
+ int i=0;
+ int j=0;
+
+ /* Return if no sequences */
+ if(!seqa || !seqb)
+ return(FALSE);
+
+ /* Verify */
+ for(i=0;i<strlen(seqa);i++)
+ {
+ /* Skip '-' in Sequence A */
+ if(seqa[i] == '-')
+ continue;
+
+ /* Skip '-' in Sequence B */
+ while(j<strlen(seqb) && seqb[j] == '-')
+ j++;
+
+ /* Return if hit end of Sequence B */
+ if(j==strlen(seqb))
+ return(FALSE);
+
+ /* DEBUG */
+ /*
+ printf("%c -- %c \n",seqa[i], seqb[j]);
+ */
+
+ /* Compare Residues */
+ if(seqa[i] != seqb[j])
+ return(FALSE);
+
+ j++;
+ }
+
+ /* All tests completeted */
+ return(TRUE);
+}
+
+
+/************************************************************************/
+/*>BOOL BuildAlignment(char **seqs, ZONE *zones, char **alns,
+ char **flagseq)
+ ----------------------------------------------------------
+ Inputs: char **seqs Sequences to align
+ ZONE *zones Linked list of zones
+ Outputs: char **alns Aligned sequences
+ char **flagseq Seq of flags to indicate aligned resides
+ Returns: BOOL OK?
+
+ 21.08.06 Original By: ACRM
+ 17.06.08 Incorporated into ProFit with minor modification to ZONE
+ datatype By: CTP
+*/
+BOOL BuildAlignment(char **seqs, ZONE *zones, char **alns, char **flagseq)
+{
+ int i;
+ int len[2];
+ int idx[2], diff;
+ AA *aa[2];
+ ZONE *z;
+
+ /* Build the sequences as linked lists */
+ if((aa[0] = BuildAAList(seqs[0]))==NULL)
+ return(FALSE);
+ if((aa[1] = BuildAAList(seqs[1]))==NULL)
+ return(FALSE);
+
+ /* Cycle through the zones */
+ for(z=zones; z!=NULL; NEXT(z))
+ {
+ /* Find the position in the linked list of the start or the 2
+ equivalent zones
+ */
+ if((idx[0] = FindAAListOffsetByResnum(aa[0], z->start1))==(-1))
+ return(FALSE);
+ if((idx[1] = FindAAListOffsetByResnum(aa[1], z->start2))==(-1))
+ return(FALSE);
+
+ /* Determine the difference in the positions and make the appopriate
+ number of inserts in the right sequence
+ */
+ if(idx[0] > idx[1])
+ {
+ diff = idx[0] - idx[1];
+ if((aa[1] =
+ InsertResiduesInAAListAt(aa[1], '-', diff, idx[1]-1))==NULL)
+ return(FALSE);
+ }
+ else
+ {
+ diff = idx[1] - idx[0];
+ if((aa[0] =
+ InsertResiduesInAAListAt(aa[0], '-', diff, idx[0]-1))==NULL)
+ return(FALSE);
+ }
+
+ /* Flag the aligned residues */
+ for(i=z->start1; i<=z->stop1; i++)
+ {
+ SetAAListFlagByResnum(aa[0], i);
+ }
+ for(i=z->start2; i<=z->stop2; i++)
+ {
+ SetAAListFlagByResnum(aa[1], i);
+ }
+ }
+
+ /* Add inserts at the end of the shorter sequence */
+ len[0] = GetAAListLen(aa[0]);
+ len[1] = GetAAListLen(aa[1]);
+ if(len[0] > len[1])
+ {
+ diff = len[0] - len[1];
+ if((aa[1] =
+ InsertResiduesInAAListAt(aa[1], '-', diff, len[1]))==NULL)
+ return(FALSE);
+ }
+ else
+ {
+ diff = len[1] - len[0];
+ if((aa[0] =
+ InsertResiduesInAAListAt(aa[0], '-', diff, len[0]))==NULL)
+ return(FALSE);
+ }
+
+ /* Convert the linked lists back into sequences */
+ if((alns[0] = BuildSeqFromAAList(aa[0]))==NULL)
+ return(FALSE);
+ if((alns[1] = BuildSeqFromAAList(aa[1]))==NULL)
+ return(FALSE);
+
+ /* Build a sequence version of the flags to show aligned residues */
+ if((*flagseq = BuildFlagSeqFromAAList(aa[0], '*'))==NULL)
+ return(FALSE);
+
+ /* Free the linked lists */
+ FREELIST(aa[0], AA);
+ FREELIST(aa[1], AA);
+
+ return(TRUE);
+}
+
+
+/************************************************************************/
+/*>BOOL BuildAlignment(char **seqs, ZONE *zones, char **alns,
+ char **flagseq)
+ ----------------------------------------------------------
+ Inputs: char **seqs Sequences to align
+ ZONE *zones Linked list of zones
+ Outputs: char **alns Aligned sequences
+ char **flagseq Seq of flags to indicate aligned resides
+ Returns: BOOL OK?
+
+ 21.08.06 Original By: ACRM
+ 17.06.08 Incorporated into ProFit with minor modification to ZONE
+ datatype By: CTP
+ 23.06.08 Modified to build alignment from residue A to residue B.
+ 25.11.08 Changed output format - unaligned residues are matched with
+ gaps.
+*/
+BOOL BuildAlignmentAB(char **seqs_in, ZONE *zones, char **alns,
+ char **flagseq, int *res)
+{
+ int i;
+ int len[2];
+ int idx[2], diff;
+ int insert_idx = 0;
+ int insert_len[2];
+ int zonelength = 0;
+ AA *aa[2];
+ ZONE *z;
+
+ char *seqs[2];
+
+ /* Make truncated sequences By: CTP */
+ seqs[0] = seqs[1] = NULL;
+ seqs[0] = malloc((strlen(seqs_in[0])+1) * sizeof(char));
+ seqs[1] = malloc((strlen(seqs_in[1])+1) * sizeof(char));
+
+ TruncateSeq(seqs[0], seqs_in[0], res[0], res[1]);
+ TruncateSeq(seqs[1], seqs_in[1], res[2], res[3]);
+
+ /* Build the sequences as linked lists */
+ if((aa[0] = BuildAAList(seqs[0]))==NULL)
+ return(FALSE);
+ if((aa[1] = BuildAAList(seqs[1]))==NULL)
+ return(FALSE);
+
+ /* Run through the zones */
+ for(z=zones; z!=NULL; NEXT(z))
+ {
+ /* Check in region By: CTP */
+ if((z->start1 < res[0]) || (z->stop1 > res[1]) ||
+ (z->start2 < res[2]) || (z->stop2 > res[3]))
+ {
+ continue;
+ }
+
+ /* Find the position in the linked list of the start or the 2
+ equivalent zones
+ */
+ if((idx[0] =
+ FindAAListOffsetByResnum(aa[0], z->start1 - res[0] + 1))==(-1))
+ return(FALSE);
+ if((idx[1] =
+ FindAAListOffsetByResnum(aa[1], z->start2 - res[2] + 1))==(-1))
+ return(FALSE);
+
+ /* Determine the difference in the positions and make the appopriate
+ number of inserts in the right sequences
+ */
+
+ /* Check Length of Zone */
+ if(z->stop1 - z->start1 != z->stop2 - z->start2)
+ return(FALSE);
+ zonelength = z->stop1 - z->start1;
+
+ /* Insert into reference seq */
+ insert_len[0] = idx[1] - insert_idx - 1;
+ if((aa[0] = InsertResiduesInAAListAt(aa[0], '-', insert_len[0],
+ idx[0]-1))==NULL)
+ return(FALSE);
+
+ /* Insert into mobile seq */
+ insert_len[1] = idx[0] - insert_idx - 1;
+ if((aa[1] = InsertResiduesInAAListAt(aa[1], '-', insert_len[1],
+ insert_idx))==NULL)
+ return(FALSE);
+
+ /* Update insert start point */
+ insert_idx += insert_len[0] + insert_len[1] + zonelength + 1;
+
+ /* Flag the aligned residues */
+ for(i=z->start1 - res[0] + 1; i<=z->stop1 - res[0] + 1; i++)
+ {
+ SetAAListFlagByResnum(aa[0], i);
+ }
+
+ for(i=z->start2 - res[2] + 1; i<=z->stop2 - res[2] + 1; i++)
+ {
+ SetAAListFlagByResnum(aa[1], i);
+ }
+ }
+
+ /* Deal with end of chains */
+ len[0] = GetAAListLen(aa[0]);
+ len[1] = GetAAListLen(aa[1]);
+
+ /* Find length to end of ref and mobile chains */
+ insert_len[0] = len[1] - insert_idx;
+ insert_len[1] = len[0] - insert_idx;
+
+ if((aa[0] = InsertResiduesInAAListAt(aa[0], '-', insert_len[0],
+ len[0]))==NULL)
+ return(FALSE);
+ if((aa[1] = InsertResiduesInAAListAt(aa[1], '-', insert_len[1],
+ insert_idx))==NULL)
+ return(FALSE);
+
+ /* Add inserts at the end of the shorter sequence */
+ len[0] = GetAAListLen(aa[0]);
+ len[1] = GetAAListLen(aa[1]);
+ if(len[0] > len[1])
+ {
+ diff = len[0] - len[1];
+ if((aa[1] =
+ InsertResiduesInAAListAt(aa[1], '-', diff, len[1]))==NULL)
+ return(FALSE);
+ }
+ else
+ {
+ diff = len[1] - len[0];
+ if((aa[0] =
+ InsertResiduesInAAListAt(aa[0], '-', diff, len[0]))==NULL)
+ return(FALSE);
+ }
+
+ /* Convert the linked lists back into sequences */
+ if((alns[0] = BuildSeqFromAAList(aa[0]))==NULL)
+ return(FALSE);
+ if((alns[1] = BuildSeqFromAAList(aa[1]))==NULL)
+ return(FALSE);
+
+ /* Build a sequence version of the flags to show aligned residues */
+ if((*flagseq = BuildFlagSeqFromAAList(aa[0], '*'))==NULL)
+ return(FALSE);
+
+ /* Free the linked lists */
+ FREELIST(aa[0], AA);
+ FREELIST(aa[1], AA);
+ free(seqs[0]);
+ free(seqs[1]);
+
+ return(TRUE);
+}
+
+
+/************************************************************************/
+/*>void PrintSequence(char *sequence)
+ -----------------------------------
+ Prints sequence.
+
+ 21.08.06 Original By: ACRM
+ 14.01.08 added output to file. By: CTP
+*/
+void PrintSequence(FILE *fp, char *sequence)
+{
+ int width = 60;
+ int i = 0;
+ int j = 0;
+ char line[MAXBUFF];
+
+ if(strlen(sequence) > 0)
+ {
+ for(i=0, j=0; i<strlen(sequence); i++)
+ {
+ line[j++] = sequence[i];
+
+ if(j == width || i == strlen(sequence)-1)
+ {
+ line[j] = '\0';
+ fprintf(fp," %s\n",line);
+ j=0;
+ }
+ }
+ }
+ else
+ {
+ fprintf(fp," Undefined\n");
+ }
+
+ return;
+}
+
+
+/************************************************************************/
+/*>void PrintNiceAlignment(char *ref_align, char *mob_align)
+ ---------------------------------------------------------
+ Prints a pairwise alignment in user-friendly format.
+
+ 23.07.08 Original By: CTP
+ 14.01.09 Added output to file.
+*/
+void PrintNiceAlignment(FILE *fp,char *ref_align, char *mob_align)
+{
+ int i,j;
+ char refline[61];
+ char mobline[61];
+ int width = 60;
+
+ int alignlength = strlen(ref_align);
+
+ /* Deal with zero length sequences. */
+ if(!alignlength)
+ {
+ fprintf(fp," Undefined\n\n");
+ return;
+ }
+
+ /* Print sequence */
+ for(i=0, j=0; i<alignlength; i++)
+ {
+ refline[j] = ref_align[i];
+ mobline[j] = mob_align[i];
+ j++;
+
+ if(j == width || i == alignlength-1)
+ {
+ refline[j] = '\0';
+ mobline[j] = '\0';
+
+ fprintf(fp," %s\n", refline);
+ fprintf(fp," %s\n\n",mobline);
+ j=0;
+ }
+ }
+ fprintf(fp,"\n");
+ return;
+}
+
+
+/************************************************************************/
+/*>int KillChainBreak(char *outstring, char *instring)
+ ---------------------------------------------------
+ Removes chainbreak characters '*' from sequence instring.
+
+ 23.07.08 Original. By: CTP
+*/
+int KillChainBreak(char *outstring, char *instring)
+{
+ int x, y;
+
+ for(x=0, y=0; x<strlen(instring); x++)
+ {
+ if(instring[x] != '*')
+ {
+ outstring[y++] = instring[x];
+ }
+ }
+ outstring[y] = '\0';
+
+ return(0);
+}
+
+
+/************************************************************************/
+/*>int ChainBreakToGap(char *outstring, char *instring)
+ ----------------------------------------------------
+ Converts chainbreak characters '*' to gaps '-' in sequence instring.
+
+ 02.12.08 Original By: CTP
+*/
+int ChainBreakToGap(char *instring)
+{
+ int x;
+
+ for(x=0; x<strlen(instring); x++)
+ {
+ if(instring[x] == '*')
+ instring[x] = '-';
+ }
+ return(0);
+}
+
+
+/************************************************************************/
+/*>int TruncateSeq(char *outstring, char *instring, int start, int stop)
+ ---------------------------------------------------------------------
+ Writes section of sequence between positions start and stop in
+ instring to outstring.
+
+ 23.07.08 Original. By: CTP
+*/
+int TruncateSeq(char *outstring, char *instring, int start, int stop)
+{
+ int x, y;
+
+ for(x=start-1, y=0; x < stop; x++)
+ {
+ outstring[y++] = instring[x];
+ }
+ outstring[y] = '\0';
+
+ return(0);
+}
+
+
+/************************************************************************/
+/*>int AlignmentFromZones(BOOL fasta)
+ --------------------------------
+ Generates an alignment based on ProFit fitting Zones.
+
+ The default output is a (user-friendly) pairwise alignment with the
+ reference and mobile sequences printed as pairs of 60-character wide
+ lines.
+
+ The pir flag sets the printout to (machine-friendly) FASTA formatting
+ for the chain names and sequences.
+
+ 23.07.08 Original based on program by ACRM By: CTP
+ 10.09.08 Ensured sequentially numbered fitting zones had breaks
+ between chains.
+ 14.01.09 Added output to file.
+*/
+int AlignmentFromZones(char *filename, BOOL fasta)
+{
+ char *seqs[2], *alns[2], ids[2][MAXBUFF];
+ char *flagseq = NULL;
+ ZONE *chainlist[2];
+ ZONE *ref, *mob, *z;
+ int i=0;
+ FILE *fp = stdout;
+
+
+ /* Set Pointers to NULL */
+ chainlist[0] = chainlist[1] = NULL;
+ alns[0] = alns[1] = NULL;
+ seqs[0] = seqs[1] = NULL;
+
+ /* Convert to sequential numbering with breaks between chains */
+ if(ConvertAllZones(ZONE_MODE_RESNUM) ||
+ ConvertAllZones(ZONE_MODE_SEQUENTIAL))
+ {
+ printf(" Error: Could not find zones.\n");
+ return(1);
+ }
+
+ /* Sort Zones */
+ SortAllZones();
+
+ /* Set chainlist and sequence for reference */
+ strcpy(ids[0],gRefFilename);
+ chainlist[0] = ChainList(gRefPDB);
+ seqs[0] = malloc((strlen(gRefSeq)+1) * sizeof(char));
+ KillChainBreak(seqs[0],gRefSeq);
+
+ /* Open output file/pipe */
+ if(filename)
+ {
+ if((fp=OpenOrPipe(filename))==NULL)
+ {
+ printf(" Warning: Unable to open output file\n");
+ fp = stdout;
+ }
+ }
+
+ for(i=0; i<MAXSTRUC; i++)
+ {
+ if(!gZoneList[i]) break;
+
+ fprintf(fp,"\n Mobile Structure: %2d\n",i+1);
+
+ if(!OneToOneChains(i))
+ {
+ fprintf(fp,"\n Warning: Chain aligned to more than one \
+chain\n\n");
+ }
+
+ if(!SequentialZones(i))
+ {
+ fprintf(fp,"\n");
+ fprintf(fp," Error: Could not convert zones to alignment.\n");
+ fprintf(fp," Zones must not overlap and must be in\n");
+ fprintf(fp," sequence along chain.\n\n");
+ continue;
+ }
+
+ /* Set chainlist and sequence for mobile */
+ chainlist[1] = ChainList(gMobPDB[i]);
+ strcpy(ids[1],gMobFilename[i]);
+ seqs[1] = malloc((strlen(gMobSeq[i])+1) * sizeof(char));
+ KillChainBreak(seqs[1],gMobSeq[i]);
+
+ for(ref = chainlist[0]; ref!=NULL; NEXT(ref))
+ {
+ for(mob = chainlist[1]; mob!=NULL; NEXT(mob))
+ {
+ for(z=gZoneList[i]; z!=NULL; NEXT(z))
+ {
+ if((z->start1>=ref->start1)&&(z->stop1<=ref->stop1)&&
+ (z->start2>=mob->start1)&&(z->stop2<=mob->stop1)&&
+ (z->mode == ZONE_MODE_SEQUENTIAL))
+ {
+ int position[4];
+ char chainids[2][MAXBUFF];
+
+ position[0] = ref->start1;
+ position[1] = ref->stop1;
+ position[2] = mob->start1;
+ position[3] = mob->stop1;
+
+ sprintf(chainids[0],"%s Chain '%c'",ids[0],ref->chain1);
+ sprintf(chainids[1],"%s Chain '%c'",ids[1],mob->chain1);
+
+ /* NULL Alignment Pointers */
+ alns[0] = alns[1] = NULL;
+ flagseq = NULL;
+
+ if(BuildAlignmentAB(seqs, gZoneList[i], alns, &flagseq,
+ position))
+ {
+ if(fasta)
+ {
+ /* Print FASTA Alignment */
+ fprintf(fp," >%s\n",chainids[0]);
+ PrintSequence(fp, alns[0]);
+ fprintf(fp,"\n");
+ fprintf(fp," >%s\n",chainids[1]);
+ PrintSequence(fp, alns[1]);
+ fprintf(fp,"\n\n");
+ }
+ else
+ {
+ /* Print Nice Alignment */
+ fprintf(fp," %s\n %s\n",
+ chainids[0], chainids[1]);
+ PrintNiceAlignment(fp,alns[0],alns[1]);
+ }
+ }
+
+ /* Free Alignment Memory */
+ if(alns[0]) free(alns[0]);
+ if(alns[1]) free(alns[1]);
+ if(flagseq) free(flagseq);
+
+ /* NULL Alignment Pointers */
+ alns[0] = alns[1] = NULL;
+ flagseq = NULL;
+
+ break;
+ }
+ }
+ }
+ }
+
+ /* Free mobile sequence memory */
+ if(seqs[1]) free(seqs[1]);
+ if(chainlist[1]) FREELIST(chainlist[1],ZONE);
+ seqs[1] = NULL;
+ chainlist[1] = NULL;
+ }
+
+ /* Close output file/pipe */
+ if(fp != stdout)
+ CloseOrPipe(fp);
+
+ if(chainlist[0]) FREELIST(chainlist[0],ZONE);
+ if(chainlist[1]) FREELIST(chainlist[1],ZONE);
+ if(seqs[0]) free(seqs[0]);
+ if(seqs[1]) free(seqs[1]);
+ if(alns[0]) free(alns[0]);
+ if(alns[1]) free(alns[1]);
+ if(flagseq) free(flagseq);
+
+ return(0);
+}
+
+
+/************************************************************************/
+/*>ZONE *AlignToZone(char *ref_align, char *mob_align, int ref_start,
+ int mob_start)
+ --------------------------------------------------------------------
+ Derive a linked list of zones from an alignment.
+
+ 23.07.08 Original based on SetNWZones() By: CTP
+*/
+ZONE *AlignToZone(char *ref_align, char *mob_align,
+ int ref_start, int mob_start)
+{
+ ZONE *zonelist = NULL;
+ ZONE *z, *zn;
+ int i, ref, mob;
+ BOOL converged;
+
+ for(i=0, ref=0, mob=0; i<strlen(ref_align); i++)
+ {
+ /* Add single-residue zone */
+ if((ref_align[i] != '-') && (mob_align[i] != '-'))
+ {
+ /* Allocate memory */
+ if(!zonelist)
+ {
+ INIT(zonelist,ZONE);
+ z = zonelist;
+ }
+ else
+ {
+ z = zonelist;
+ LAST(z);
+ ALLOCNEXT(z,ZONE);
+ }
+
+ /* Set Start and Stop */
+ z->chain1 = ' ';
+ z->start1 = ref + ref_start;
+ z->startinsert1 = ' ';
+ z->stop1 = ref + ref_start;
+ z->stopinsert1 = ' ';
+ z->chain2 = ' ';
+ z->start2 = mob + mob_start;
+ z->startinsert2 = ' ';
+ z->stop2 = mob + mob_start;
+ z->stopinsert2 = ' ';
+ z->mode = ZONE_MODE_SEQUENTIAL;
+ z->next = NULL;
+ }
+
+ /* Increment residue count */
+ if(ref_align[i] != '-') ref++;
+ if(mob_align[i] != '-') mob++;
+ }
+
+
+ /* Merge Zones */
+ if(zonelist)
+ {
+ do
+ {
+ /* Assume we have converged */
+ converged = TRUE;
+ for(z=zonelist; z!=NULL; NEXT(z))
+ {
+ zn = z->next;
+ if(zn)
+ {
+ /* See if the two zones are sequential */
+ if((zn->start1 == (z->stop1 + 1)) &&
+ (zn->start2 == (z->stop2 + 1)))
+ {
+ z->stop1 = zn->stop1;
+ z->stop2 = zn->stop2;
+ z->next = zn->next;
+ free(zn);
+ converged = FALSE;
+ }
+ }
+ }
+ } while(!converged);
+ }
+
+ return(zonelist);
+}
+
+
+/************************************************************************/
+/*>void AlignChainStandard(int strucnum)
+ -------------------------------------
+ Default method of doing pairwise alignments in ProFit.
+ Performs chain by chain pairwise alignments.
+
+ 23.07.08 Original based on NWAlign() By: CTP
+ 29.08.08 Added normalised score.
+*/
+void AlignChainStandard(int strucnum)
+{
+ int ref_len, mob_len, align_len, score;
+ char *ref_seq_all = NULL,
+ *mob_seq_all = NULL,
+ *ref_seq = NULL,
+ *mob_seq = NULL,
+ *ref_align = NULL,
+ *mob_align = NULL;
+
+ ZONE *ref_chain = NULL,
+ *mob_chain = NULL,
+ *ref, *mob;
+ ZONE *zonelist, *z;
+
+ printf("\n Mobile Structure: %2d\n",strucnum+1);
+
+ /* Get Sequences and Chains */
+ ref_seq_all = malloc((strlen(gRefSeq)+1) * sizeof(char));
+ ref_seq = malloc((strlen(gRefSeq)+1) * sizeof(char));
+ KillChainBreak(ref_seq_all,gRefSeq);
+ ref_chain = ChainList(gRefPDB);
+
+ mob_seq_all = malloc((strlen(gMobSeq[strucnum])+1) * sizeof(char));
+ mob_seq = malloc((strlen(gMobSeq[strucnum])+1) * sizeof(char));
+ KillChainBreak(mob_seq_all,gMobSeq[strucnum]);
+ mob_chain = ChainList(gMobPDB[strucnum]);
+
+ /* Check same number of chains */
+ /* Print warning if number of chains doesn't match. */
+ if(!gQuiet)
+ {
+ for(ref = ref_chain, mob = mob_chain; ref != NULL && mob != NULL;
+ NEXT(ref), NEXT(mob))
+ {}
+
+ if(ref || mob)
+ {
+ printf(" Warning: Number of chains does not match.\n");
+ }
+ }
+
+ /* Free gZoneList */
+ if(gZoneList[strucnum])
+ {
+ FREELIST(gZoneList[strucnum],ZONE);
+ gZoneList[strucnum] = NULL;
+ gUserFitZone = FALSE;
+ }
+
+ /* Align Chain By Chain */
+ for(ref = ref_chain, mob = mob_chain; ref != NULL && mob != NULL;
+ NEXT(ref), NEXT(mob))
+ {
+ /* Set sequences */
+ TruncateSeq(ref_seq,ref_seq_all,ref->start1,ref->stop1);
+ TruncateSeq(mob_seq,mob_seq_all,mob->start1,mob->stop1);
+
+ ref_len = strlen(ref_seq);
+ mob_len = strlen(mob_seq);
+
+ ref_align = (char *)malloc((ref_len+mob_len)*sizeof(char));
+ mob_align = (char *)malloc((ref_len+mob_len)*sizeof(char));
+
+ score = affinealign(ref_seq, ref_len, mob_seq, mob_len, FALSE,
+ FALSE, gGapPen, gGapPenExt, ref_align,
+ mob_align, &align_len);
+
+ ref_align[align_len] = '\0';
+ mob_align[align_len] = '\0';
+
+ printf(" %s Chain '%c'\n",gRefFilename,ref->chain1);
+ printf(" %s Chain '%c'\n",gMobFilename[strucnum],mob->chain1);
+
+ /* printf(" Score: %d\n",score); */
+ printf(" Score: %d Normalised score: %.2f\n",
+ score, (REAL)score/(REAL)(MIN(ref_len,mob_len)));
+ PrintNiceAlignment(stdout,ref_align,mob_align);
+
+ /* Convert to Zones */
+ zonelist = AlignToZone(ref_align, mob_align, ref->start1,
+ mob->start1);
+
+ /* Append zonelist to gZoneList */
+ if(!gZoneList[strucnum])
+ {
+ /* Make zone list new gZoneList */
+ gZoneList[strucnum] = zonelist ;
+ if(zonelist) gUserFitZone = TRUE;
+ }
+ else
+ {
+ /* Append zonelist to gZoneList */
+ z = gZoneList[strucnum];
+ LAST(z);
+ z->next = zonelist;
+ }
+
+ /* Free Memory */
+ if(ref_align) free(ref_align);
+ if(mob_align) free(mob_align);
+ }
+
+ /* Tidy */
+ if(ref_seq_all) free(ref_seq_all);
+ if(mob_seq_all) free(mob_seq_all);
+ if(ref_seq) free(ref_seq);
+ if(mob_seq) free(mob_seq);
+
+ return;
+}
+
+
+/************************************************************************/
+/*>void AlignWholeSequence(int strucnum)
+ -------------------------------------
+ Method of doing pairwise alignments in ProFit.
+ Aligns whole sequence ignoring chain breaks.
+
+ 23.07.08 Original based on NWAlign() By: CTP
+ 29.08.08 Added normalised score.
+*/
+void AlignWholeSequence(int strucnum)
+{
+ int ref_len, mob_len, align_len, score;
+ char *ref_seq_all = NULL,
+ *mob_seq_all = NULL;
+ char *ref_align = NULL,
+ *mob_align = NULL;
+
+ ZONE *zonelist;
+
+ /* Get Sequences and Chains */
+ ref_seq_all = malloc((strlen(gRefSeq)+1) * sizeof(char));
+ KillChainBreak(ref_seq_all,gRefSeq);
+
+ mob_seq_all = malloc((strlen(gMobSeq[strucnum])+1) * sizeof(char));
+ KillChainBreak(mob_seq_all,gMobSeq[strucnum]);
+
+ ref_len = strlen(ref_seq_all);
+ mob_len = strlen(mob_seq_all);
+
+ ref_align = (char *)malloc((ref_len+mob_len)*sizeof(char));
+ mob_align = (char *)malloc((ref_len+mob_len)*sizeof(char));
+
+ score = affinealign(ref_seq_all, ref_len, mob_seq_all, mob_len,
+ FALSE, FALSE, gGapPen, gGapPenExt,
+ ref_align, mob_align, &align_len);
+
+ ref_align[align_len] = '\0';
+ mob_align[align_len] = '\0';
+
+ printf(" Mobile Structure: %2d\n",strucnum+1);
+ printf(" %s vs %s\n",gRefFilename,gMobFilename[strucnum]);
+ /* printf(" Score: %d\n",score); */
+ printf(" Score: %d Normalised score: %.2f\n",
+ score, (REAL)score/(REAL)(MIN(ref_len,mob_len)));
+ PrintNiceAlignment(stdout,ref_align,mob_align);
+
+ /* Convert to Zones */
+ zonelist = AlignToZone(ref_align, mob_align, 1, 1);
+
+ /* Set breaks between chains by converting to residue and back */
+ ConvertZoneList(zonelist, strucnum, ZONE_MODE_RESNUM);
+ ConvertZoneList(zonelist, strucnum, ZONE_MODE_SEQUENTIAL);
+
+ /* Replace global fit zones */
+ if(gZoneList[strucnum]) FREELIST(gZoneList[strucnum],ZONE);
+ gZoneList[strucnum] = zonelist ;
+ gUserFitZone = TRUE;
+
+ /* Free Memory */
+ if(ref_align) free(ref_align);
+ if(mob_align) free(mob_align);
+
+ /* Tidy */
+ if(ref_seq_all) free(ref_seq_all);
+ if(mob_seq_all) free(mob_seq_all);
+
+ return;
+}
+
+
+/************************************************************************/
+/*>void AlignZone(ZONE *alignzone, int strucnum, BOOL append)
+ -------------------------------------
+ Method of doing pairwise alignments in ProFit.
+ Performs pairwise alignment on zone.
+
+ 23.07.08 Original based on NWAlign() By: CTP
+ 29.08.08 Added normalised score.
+*/
+void AlignZone(ZONE *alignzone, int strucnum, BOOL append)
+{
+ int ref_len, mob_len, align_len, score;
+ char *ref_seq_all = NULL,
+ *mob_seq_all = NULL,
+ *ref_seq = NULL,
+ *mob_seq = NULL,
+ *ref_align = NULL,
+ *mob_align = NULL;
+
+ ZONE *ref_chain = NULL,
+ *mob_chain = NULL;
+ ZONE *zonelist, *z;
+
+ char zone1[64], zone2[64];
+
+ printf("\n Mobile Structure: %2d\n",strucnum+1);
+
+ /* Convert to sequential */
+ if(alignzone->mode == ZONE_MODE_RESNUM)
+ {
+ if(ConvertResidueToSequential(alignzone, strucnum))
+ {
+ printf(" Error: Error failed to find alignment zone.\n");
+ return;
+ }
+ }
+
+ /* Get Sequences and Chains */
+ ref_seq_all = malloc((strlen(gRefSeq)+1) * sizeof(char));
+ ref_seq = malloc((strlen(gRefSeq)+1) * sizeof(char));
+ KillChainBreak(ref_seq_all,gRefSeq);
+ ref_chain = ChainList(gRefPDB);
+
+ mob_seq_all = malloc((strlen(gMobSeq[strucnum])+1) * sizeof(char));
+ mob_seq = malloc((strlen(gMobSeq[strucnum])+1) * sizeof(char));
+ KillChainBreak(mob_seq_all,gMobSeq[strucnum]);
+ mob_chain = ChainList(gMobPDB[strucnum]);
+
+ /* Select zone/zones to align */
+ if(!alignzone)
+ {
+ printf(" Error: Cannot align non-existant zone.\n");
+ return;
+ }
+
+ if(alignzone->next)
+ {
+ printf(" Error: Stucture must be single zone.\n");
+ return;
+ }
+
+ TruncateSeq(ref_seq,ref_seq_all,alignzone->start1,alignzone->stop1);
+ TruncateSeq(mob_seq,mob_seq_all,alignzone->start2,alignzone->stop2);
+
+ ref_len = strlen(ref_seq);
+ mob_len = strlen(mob_seq);
+
+ ref_align = (char *)malloc((ref_len+mob_len)*sizeof(char));
+ mob_align = (char *)malloc((ref_len+mob_len)*sizeof(char));
+
+ score = affinealign(ref_seq, ref_len, mob_seq, mob_len, FALSE, FALSE,
+ gGapPen, gGapPenExt, ref_align, mob_align,
+ &align_len);
+
+ ref_align[align_len] = '\0';
+ mob_align[align_len] = '\0';
+
+ /* Format Zone */
+ FormatZone(zone1,' ',alignzone->start1,' ',alignzone->stop1,' ');
+ FormatZone(zone2,' ',alignzone->start2,' ',alignzone->stop2,' ');
+
+ /* Print Zone */
+ printf(" %-16s vs %-16s (Sequential numbering)\n",zone1,zone2);
+ /* printf(" Score: %d\n",score); */
+ printf(" Score: %d Normalised score: %.2f\n",
+ score, (REAL)score/(REAL)(MIN(ref_len,mob_len)));
+ PrintNiceAlignment(stdout,ref_align, mob_align);
+
+ /* Convert to alignment to zones */
+ zonelist = AlignToZone(ref_align, mob_align,
+ alignzone->start1, alignzone->start2);
+
+ /* Add zones to zone list */
+ if(!gZoneList[strucnum])
+ {
+ /* Make zone list new gZoneList */
+ gZoneList[strucnum] = zonelist ;
+ if(zonelist) gUserFitZone = TRUE;
+ }
+ else
+ {
+ if(append)
+ {
+ /* Append zonelist to gZoneList */
+ z = gZoneList[strucnum];
+ LAST(z);
+ z->next = zonelist;
+ }
+ else
+ {
+ /* Replace gZoneList with zonelist */
+ FREELIST(gZoneList[strucnum], ZONE);
+ gZoneList[strucnum] = zonelist;
+ if(zonelist) gUserFitZone = TRUE;
+ }
+ }
+
+ /* Case where no zones found .... */
+ if(!zonelist) printf(" Warning: No matching zones found...\n");
+
+ /* Free Memory */
+ if(ref_align) free(ref_align);
+ if(mob_align) free(mob_align);
+
+ /* Tidy */
+ if(ref_seq_all) free(ref_seq_all);
+ if(mob_seq_all) free(mob_seq_all);
+ if(ref_seq) free(ref_seq);
+ if(mob_seq) free(mob_seq);
+
+ return;
+}
+
+
+/************************************************************************/
+/*>int AlignmentWrapper(int strucnum, char *command, BOOL append)
+ --------------------------------------------------------------
+ Wrapper function calling AlignChainStandard(), AlignWholeSequence()
+ or AlignZone().
+
+ 23.07.08 Original By: CTP
+ 30.10.08 Reset gFitted to FALSE
+*/
+int AlignmentWrapper(int strucnum, char *command, BOOL append)
+{
+ int struc, start, stop, method;
+ ZONE *alignzone = NULL;
+
+ /* Read Mutation Data Matrix File */
+ static int FirstCall = TRUE;
+ if(FirstCall)
+ {
+ if(!ReadMDM(MDMFILE))
+ {
+ printf(" Error==> Unable to read mutation data matrix\n");
+ return(1);
+ }
+
+ FirstCall = FALSE;
+ }
+
+ /* Set structures to align */
+ if(strucnum == -1)
+ {
+ /* Align all structures */
+ start = 0;
+ stop = gMultiCount;
+ }
+ else
+ {
+ /* Align single structure */
+ start = strucnum;
+ stop = strucnum + 1;
+ }
+
+ /* Parse Command */
+ if(!command || strlen(command) == 0)
+ {
+ /* Do statandard align */
+ method = 0;
+ }
+ else if(!upstrncmp(command,"WHOLE",5) || !strcmp(command,"*"))
+ {
+ /* Perform whole sequence comparison */
+ method = 1;
+ }
+ else
+ {
+ /* Align zone */
+ int SeqZone = 0;
+ int start1, stop1, start2, stop2;
+ char chain1, startinsert1, stopinsert1,
+ chain2, startinsert2, stopinsert2;
+
+ SeqZone = ParseZone(command, &start1, &stop1, &chain1,
+ &startinsert1, &stopinsert1,
+ &start2, &stop2, &chain2,
+ &startinsert2, &stopinsert2,
+ strucnum);
+
+ if(SeqZone == (-2))
+ {
+ printf(" Error==> You cannot specify zones for both the \
+reference\n");
+ printf(" and mobile structures when performing \
+multiple\n");
+ printf(" structure fitting.\n");
+ return(1);
+ }
+
+ INIT(alignzone, ZONE);
+ alignzone->chain1 = chain1;
+ alignzone->start1 = start1;
+ alignzone->startinsert1 = startinsert1;
+ alignzone->stop1 = stop1;
+ alignzone->stopinsert1 = stopinsert1;
+ alignzone->chain2 = chain2;
+ alignzone->start2 = start2;
+ alignzone->startinsert2 = startinsert2;
+ alignzone->stop2 = stop2;
+ alignzone->stopinsert2 = stopinsert2;
+ alignzone->mode = SeqZone ? ZONE_MODE_SEQUENTIAL :
+ gCurrentMode;
+ alignzone->next = NULL;
+
+ method = 2;
+ }
+
+
+ /* Perform Alignment */
+ for(struc=start; struc<stop; struc++)
+ {
+ switch(method)
+ {
+ case 0: /* Standard Chain by Chain Alignment */
+ AlignChainStandard(struc);
+ break;
+
+ case 1: /* Whole Sequence Alignment */
+ AlignWholeSequence(struc);
+ break;
+
+ case 2: /* Align Zone */
+ AlignZone(alignzone, struc, append);
+ break;
+
+ default:
+ break;
+ }
+ }
+
+ /* Reset fitted flag */
+ gFitted = FALSE;
+ gUserFitZone = TRUE;
+
+ /* Tidy Up */
+ if(alignzone) FREELIST(alignzone, ZONE);
+ return(0);
+}
+
+
+/************************************************************************/
+/*>BOOL CommonZones(void)
+ ----------------------
+ Test to check if ZONEs for each mobile structure are identical.
+
+ 13.01.09 Original by CTP.
+*/
+BOOL CommonZones(void)
+{
+ int i;
+ ZONE *z1, *z2;
+
+ /* Convert to sequential numbering with breaks between chains */
+ /* Note - Could do this before calling function. */
+ if(ConvertAllZones(ZONE_MODE_RESNUM) ||
+ ConvertAllZones(ZONE_MODE_SEQUENTIAL))
+ {
+ printf(" Error: Could not find zones.\n");
+ return(FALSE);
+ }
+
+ /* Sort Zones */
+ SortAllZones();
+
+ /* Loop - Structures */
+ for(i=1;i<gMultiCount;i++)
+ {
+ /* Loop - Zones */
+ z1 = gZoneList[0];
+ z2 = gZoneList[i];
+ for(;z1 != NULL && z2 != NULL; NEXT(z1), NEXT(z2))
+ {
+ if((gZoneList[0]->chain1 != gZoneList[i]->chain1 ) ||
+ (gZoneList[0]->start1 != gZoneList[i]->start1 ) ||
+ (gZoneList[0]->startinsert1 != gZoneList[i]->startinsert1) ||
+ (gZoneList[0]->stop1 != gZoneList[i]->stop1 ) ||
+ (gZoneList[0]->stopinsert1 != gZoneList[i]->stopinsert1 ) ||
+ (gZoneList[0]->mode != gZoneList[i]->mode ))
+ {
+ return(FALSE);
+ }
+ }
+
+ if(z1 || z2) return(FALSE);
+ }
+
+ return(TRUE);
+}
+
+
+/************************************************************************/
+/*>int AlignmentFromZones_PIR(void)
+ --------------------------------
+ Generates an alignment based on ProFit fitting Zones.
+
+ Output is a multiple sequence alignment with the reference and mobile
+ sequences printed as series of 60-character wide lines.
+
+ The format is machine-readable (well... ProFit-readable) but requires
+ the zones generated by ProFit to be:
+
+ a) Non-Overlapping
+ b) Sequential along the WHOLE sequence - NOT just sequential for each
+ individual chain.
+ (e.g. The user can't align Chain A with B then B with A.)
+
+ 02.12.08 Original based on AlignmentFromZones(). By: CTP
+ 13.01.09 Removed requirement for common zones.
+ 14.01.09 Added output to file.
+ 29.01.09 Fixed bug in checking sequetial zones. (Only checked for
+ each individual chain) and added additional error checking at
+ start of function.
+*/
+int AlignmentFromZones_PIR(char *filename)
+{
+ char *seqs[MAXSTRUC + 1], *alns[MAXSTRUC + 1];
+ int i;
+ FILE *fp = stdout;
+
+ /*** Check Zones can be output in PIR Format ***/
+ /* Check for User-Defined Zones */
+ if(!gUserFitZone)
+ {
+ printf(" Error: No user-defined zones found.\n");
+ return(1);
+ }
+
+ /* Convert to sequential numbering with breaks between chains */
+ if(ConvertAllZones(ZONE_MODE_RESNUM) ||
+ ConvertAllZones(ZONE_MODE_SEQUENTIAL))
+ {
+ printf(" Error: Could not convert zones.\n");
+ return(1);
+ }
+
+ /* Check for sequential zones across whole sequence */
+ for(i=0; i<gMultiCount; i++)
+ {
+ if(!SequentialZonesWholeSeq(i))
+ {
+ printf("\n");
+ printf(" Error: Could not convert zones to PIR alignment.\n");
+ printf(" Zones must not overlap and must be in\n");
+ printf(" sequence over whole sequence.\n\n");
+ return(1);
+ }
+ }
+
+ /*** Make Sequences ***/
+ /* Set reference sequence as Sequence zero */
+ seqs[0] = malloc(strlen(gRefSeq)+1 *sizeof(char));
+ KillChainBreak(seqs[0],gRefSeq);
+
+ /* Set Mobile Sequences */
+ for(i=1;i<=gMultiCount;i++)
+ {
+ seqs[i] = malloc(strlen(gMobSeq[i-1])+1 *sizeof(char));
+ KillChainBreak(seqs[i],gMobSeq[i-1]);
+ }
+
+ /* Build Multiple Alignment from Zones */
+ if(!BuildMultiAlignment(seqs, alns))
+ {
+ printf(" Error: Could not create alignment from user zones.\n");
+ return(1);
+ }
+
+ /*** Print PIR Alignment ***/
+ /* Open output file/pipe */
+ if(filename)
+ {
+ if((fp=OpenOrPipe(filename))==NULL)
+ {
+ printf(" Warning: Unable to open output file\n");
+ fp = stdout;
+ }
+ }
+
+ /* Ref Sequence */
+ fprintf(fp,">P1;REFSEQ\n");
+ fprintf(fp,"Reference Sequence - %s\n",gRefFilename);
+ PrintSequencePIR(fp,alns[0],60);
+
+ /* Mobile Sequence */
+ for(i=1;i<=gMultiCount;i++)
+ {
+ fprintf(fp,"\n");
+ fprintf(fp,">P1;M_%04d\n",i);
+ fprintf(fp,"Mobile Sequence - %s\n",gMobFilename[i-1]);
+ PrintSequencePIR(fp,alns[i],60);
+ }
+
+ if(fp == stdout) printf("\n\n");
+
+ /* DEBUG: Print Multiple Alignment */
+/***
+ for(i=0;i<=gMultiCount;i++) printf("%s\n",alns[i]);
+***/
+
+ /* Close output file/pipe */
+ if(fp != stdout)
+ CloseOrPipe(fp);
+
+ /* Free Memory */
+ for(i=0;i<gMultiCount+1;i++)
+ {
+ if(seqs[i]) free(seqs[i]);
+ if(alns[i]) free(alns[i]);
+ }
+
+ return(0);
+}
+
+
+/************************************************************************/
+/*>void PrintSequencePIR(char *sequence)
+ --------------------------------------
+ Prints PIR sequence.
+
+ 17.12.08 Original based on PrintSequence(). By: CTP
+ 14.01.08 Added output to file.
+*/
+void PrintSequencePIR(FILE *fp, char *sequence, int width)
+{
+ int i = 0;
+ int j = 0;
+ char line[MAXBUFF];
+
+ if(strlen(sequence) > 0)
+ {
+ for(i=0, j=0; i<strlen(sequence); i++)
+ {
+ line[j++] = sequence[i];
+
+ if(j == width || i == strlen(sequence)-1)
+ {
+ line[j] = '\0';
+ fprintf(fp,"%s\n",line);
+ j=0;
+ }
+ }
+ }
+ else
+ {
+ fprintf(fp,"Undefined\n");
+ }
+
+ return;
+}
+
+
+/************************************************************************/
+/*>BOOL BuildMultiAlignment(char **seqs, char **alns)
+ --------------------------------------------------
+ Inputs: char **seqs Sequences to align
+ Outputs: char **alns Aligned sequences
+ Returns: BOOL OK?
+
+ Builds multiple sequence alignment based on zones for output in PIR
+ format.
+
+ Mobile sequences are added one at a time and gaps for unmatched
+ residues are added to the reference sequence and previously included
+ sequences. Gaps are then added to the mobile sequence. The ends of the
+ alignment are dealt with and chain breaks are added.
+
+ The zones used to create the alignment do not have to be identical for
+ each mobile but must occur sequentially along the whole structure. In
+ other words, zones can't be converted into a sequence alignment unless
+ they're in sequence.
+ eg
+ Zones 1-5:1-5, 6-10:11-15, 11-15:6-10 cannot be converted into a
+ sequence alignment.
+
+ 13.01.09 Original By: CTP
+ 29.01.09 Fixed bug handling chain ends.
+*/
+BOOL BuildMultiAlignment(char **seqs, char **alns)
+{
+ ZONE *mobzone[MAXSTRUC], *chainlist[MAXSTRUC +1]; /* Zone lists */
+ ZONE *z; /* Zone pointer */
+ AA *aa[MAXSTRUC + 1]; /* AA List */
+ AA *ref_aa, *mob_aa; /* AA pointers */
+
+ int mobile; /* Current mobile. */
+ int i, j, len, idx[2];
+ int insert_idx = 0;
+ int insert_len[2];
+ int max_len = 0;
+ int length_ref = 0;
+ int length_mob = 0;
+
+
+ /* Convert sequences into AA lists */
+ for(i=0;i<=gMultiCount;i++)
+ {
+ if((aa[i] = BuildAAList(seqs[i]))==NULL)
+ return(FALSE);
+ }
+
+ /* Set zone pointer for each sequence */
+ for(i=0;i<gMultiCount;i++)
+ {
+ mobzone[i] = gZoneList[i];
+ if(mobzone[i] == NULL)
+ return(FALSE);
+ }
+
+ /* Insert Gaps for each mobile */
+ /* --------------------------- */
+
+ /* Cycle Through Mobile Sequences */
+ for(mobile=1; mobile<=gMultiCount; mobile++)
+ {
+ /* Reset Insert Index */
+ insert_idx = 0;
+
+ /* Cycle Through Mobile Zones */
+ for(z=gZoneList[mobile - 1]; z!=NULL; NEXT(z))
+ {
+ /* Find offset for reference */
+ if((idx[0] = FindAAListOffsetByResnum(aa[0], z->start1))==(-1))
+ return(FALSE);
+
+ /* Find Offset for current mobile */
+ if((idx[1] = FindAAListOffsetByResnum(aa[mobile],
+ z->start2))==(-1))
+ return(FALSE);
+
+ /* Insert Length for Reference Sequence Gap */
+ insert_len[0] = idx[1] - insert_idx - 1;
+
+ /* Insert Length for Mobile Sequence Gap */
+ insert_len[1] = idx[0] - insert_idx - 1;
+
+ /* Add Gap to Reference and Previous Mobiles */
+ for(i=0;i<mobile;i++)
+ {
+ if((aa[i] = InsertResiduesInAAListAt(aa[i], '-',
+ insert_len[0],
+ idx[0]-1))==NULL)
+ return(FALSE);
+ }
+
+ /* Add Gap to Current Mobile */
+ if((aa[mobile] = InsertResiduesInAAListAt(aa[mobile], '-',
+ insert_len[1],
+ insert_idx))==NULL)
+ {
+ return(FALSE);
+ }
+
+ /* Update Insert Index */
+ insert_idx = FindAAListOffsetByResnum(aa[0], z->stop1);
+
+ /* Insert gaps within zone. */
+ /* ------------------------ */
+
+ /* Find start of zone */
+ idx[0] = FindAAListOffsetByResnum(aa[0], z->start1);
+
+ /* Set pointers for Ref and Mob sequences */
+ ref_aa = aa[0];
+ mob_aa = aa[mobile];
+
+ for(i=1; i<idx[0]; i++)
+ {
+ NEXT(ref_aa);
+ NEXT(mob_aa);
+ }
+
+ /* Loop through zone */
+ for(i=idx[0]; i<=insert_idx; i++)
+ {
+ /* Splice gaps into aa list */
+ if(ref_aa->res == '-')
+ {
+ AA *tmp_aa = NULL;
+ tmp_aa = InsertResidueInAAListAt(aa[mobile], '-', i-1);
+
+ /* No memory */
+ if(!tmp_aa) return(FALSE);
+
+ /* Do we need to reset pointer? */
+ if(tmp_aa != aa[mobile])
+ {
+ /* Reset pointer */
+ aa[mobile] = tmp_aa;
+ mob_aa = aa[mobile];
+ for(j=0; j<i && mob_aa != NULL; j++) NEXT(mob_aa);
+ }
+ }
+ else
+ {
+ NEXT(mob_aa);
+ }
+
+ NEXT(ref_aa);
+ }
+
+ } /* End of zones loop */
+
+
+ /* Deal with unaligned ends */
+ /* ------------------------ */
+
+ /* Find length of sequences */
+ length_ref = GetAAListLen(aa[0]);
+ length_mob = GetAAListLen(aa[mobile]);
+
+ /* Find Final Mobile Zone */
+ z=gZoneList[mobile - 1];
+ LAST(z);
+
+ /* Find offset for reference zones finish */
+ if((idx[0] = FindAAListOffsetByResnum(aa[0], z->stop1))==(-1))
+ return(FALSE);
+
+ /* Find Offset for current mobile zones finish */
+ if((idx[1] = FindAAListOffsetByResnum(aa[mobile], z->stop2))==(-1))
+ return(FALSE);
+
+ /* Insert Length for Reference Sequence Gap */
+ insert_len[0] = length_mob - idx[1];
+
+ /* Insert Length for Mobile Sequence Gap */
+ insert_len[1] = length_ref - idx[0];
+
+ /* Add Gap to End of Reference */
+ if((aa[0] = InsertResiduesInAAListAt(aa[0], '-', insert_len[0],
+ length_ref))==NULL)
+ return(FALSE);
+
+ /* Add Gap to Mobile */
+ if((aa[mobile] = InsertResiduesInAAListAt(aa[mobile], '-',
+ insert_len[1],
+ idx[1]))==NULL)
+ return(FALSE);
+ } /* End of mobiles loop */
+
+
+ /* Add inserts at the end of the shorter sequences */
+ for(i=0;i<gMultiCount+1;i++)
+ {
+ len = GetAAListLen(aa[i]);
+ max_len = (len > max_len) ? len : max_len;
+ }
+
+ for(i=0;i<gMultiCount+1;i++)
+ {
+ len = GetAAListLen(aa[i]);
+
+ if(max_len > len)
+ {
+ if((aa[i] = InsertResiduesInAAListAt(aa[i], '-', max_len-len,
+ len))==NULL)
+ return(FALSE);
+ }
+ }
+
+ /* Add Chain Breaks */
+ /* ---------------- */
+
+ /* Get Chain breaks */
+ chainlist[0] = ChainList(gRefPDB);
+ for(i=1; i<gMultiCount+1;i++)
+ {
+ chainlist[i] = ChainList(gMobPDB[i-1]);
+ }
+
+ /* Cycle Through Structures */
+ for(i=0; i<gMultiCount+1;i++)
+ {
+ ZONE *chain = NULL;
+ BOOL found = FALSE;
+
+ /* Cycle Through Chains */
+ chain = chainlist[i];
+ NEXT(chain);
+
+ for(;chain != NULL; NEXT(chain))
+ {
+ int index = 0;
+ /* Find offset of chainbreak */
+ if((index = FindAAListOffsetByResnum(aa[i],
+ chain->start1))==(-1))
+ return(FALSE);
+
+ /* Is this one we did earlier? */
+ for(j=0; j<i-1 && !found; j++)
+ {
+ z=chainlist[j];
+ NEXT(z);
+ for(;z!=NULL && !found;NEXT(z))
+ {
+ int curr_index = 0;
+ /* Find current offset */
+ if((curr_index =
+ FindAAListOffsetByResnum(aa[j], z->start1))==(-1))
+ return(FALSE);
+
+ if(curr_index == index)
+ found = TRUE;
+ }
+ }
+
+ /* Insert new column into alignment and add breaks */
+ if(!found)
+ {
+ /* Add column to alignment */
+ for(j=0; j<gMultiCount+1;j++)
+ {
+ if(i == j)
+ {
+ /* Add break */
+ if((aa[j] = InsertResiduesInAAListAt(aa[j],'*',1,
+ index-1))==NULL)
+ return(FALSE);
+ }
+ else
+ {
+ /* Add gap */
+ if((aa[j]=InsertResiduesInAAListAt(aa[j],'-',1,
+ index-1))==NULL)
+ return(FALSE);
+ }
+ }
+ }
+ else
+ {
+ AA *aa_index = aa[i];
+
+ /* Set pointer */
+ for(j=1;j<index-1;j++)
+ {
+ NEXT(aa_index);
+ }
+
+ /* Delete gap then insert break */
+ DELETE(aa[i],aa_index,AA);
+
+ if((aa[i]=InsertResiduesInAAListAt(aa[i],'*',1,index-2))
+ == NULL)
+ return(FALSE);
+ }
+ }
+ }
+
+ /* Add Chainbreak to end of sequence */
+ for(i=0; i<gMultiCount+1;i++)
+ {
+ len = GetAAListLen(aa[i]);
+ if((aa[i] = InsertResiduesInAAListAt(aa[i], '*',1, len))==NULL)
+ return(FALSE);
+ }
+
+ /* Convert the linked lists back into sequences */
+ for(i=0;i<gMultiCount+1;i++)
+ {
+ if((alns[i] = BuildSeqFromAAList(aa[i]))==NULL)
+ return(FALSE);
+ }
+
+ /* Free Linked Lists */
+ for(i=0;i<gMultiCount+1;i++)
+ {
+ FREELIST(aa[i], AA);
+ FREELIST(chainlist[i], ZONE);
+ }
+
+ return(TRUE);
+}
diff --git a/src/NWAlign.p b/src/NWAlign.p
new file mode 100644
index 0000000..ef8bebb
--- /dev/null
+++ b/src/NWAlign.p
@@ -0,0 +1,33 @@
+void NWAlign(int strucnum)
+;
+void ReadAlignment(char *alnfile)
+;
+BOOL RemoveDoubleDeletions(char *seqa, char *seqb)
+;
+void SetNWZones(char *ref_align, char *mob_align, int align_len,
+ PDB **RefIndex, PDB **MobIndex, int strucnum)
+;
+void MergeZones(int strucnum)
+;
+BOOL VerifySequence(char *seqa, char *seqb)
+;
+int AlignmentWrapper(int strucnum, char *command, BOOL append)
+;
+int AlignmentFromZones(char *filename, BOOL fasta)
+;
+int TruncateSeq(char *outstring, char *instring, int start, int stop)
+;
+int AlignmentFromZones_PIR(char *filename)
+;
+int ChainBreakToGap(char *instring)
+;
+ZONE *AlignToZone(char *ref_align, char *mob_align,
+ int ref_start, int mob_start)
+;
+BOOL CommonZones(void)
+;
+BOOL BuildMultiAlignment(char **seqs, char **alns)
+;
+void PrintSequencePIR(FILE *fp, char *sequence, int width)
+;
+
diff --git a/src/ProFit.h b/src/ProFit.h
new file mode 100644
index 0000000..4b056a1
--- /dev/null
+++ b/src/ProFit.h
@@ -0,0 +1,343 @@
+/*************************************************************************
+
+ Program: ProFit
+ File: ProFit.h
+
+ Version: V3.1
+ Date: 31.03.09
+ Function: Protein Fitting program. Includes and defines.
+
+ Copyright: SciTech Software / UCL 1992-2009
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain.
+
+ It may not be copied or made available to third parties, but may be
+ freely used by non-profit-making organisations who have obtained it
+ directly from the author or by FTP.
+
+ You are requested to send EMail to the author to say that you are
+ using this code so that you may be informed of future updates.
+
+ The code may not be made available on other FTP sites without express
+ permission from the author.
+
+ The code may be modified as required, but any modifications must be
+ documented so that the person responsible can be identified. If
+ someone else breaks this code, the author doesn't want to be blamed
+ for code that does not work! You may not distribute any
+ modifications, but are encouraged to send them to the author so
+ that they may be incorporated into future versions of the code.
+
+ Such modifications become the property of Dr. Andrew C.R. Martin and
+ SciTech Software though their origin will be acknowledged.
+
+ The code may not be sold commercially or used for commercial purposes
+ without prior permission from the author.
+
+**************************************************************************
+
+ Description:
+ ============
+ ProFit is a least squares fitting program for proteins.
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V0.1 25.09.92 Original
+ V0.5 08.10.93 Various tidying for Unix & book routines
+ V0.6 05.01.94 Modified help and data defines for unix getenv()
+ V0.7 24.11.94 Skipped
+ V0.8 17.07.95 Removed all windowing stuff
+ V1.0 18.07.95 First official release (at last!).
+ V1.1 20.07.95 Added WEIGHT command support and translation vector
+ output from MATRIX command
+ V1.2 22.07.95 Added GAPPEN command
+ V1.3 31.07.95 Skipped
+ V1.4 14.08.95 Skipped
+ V1.5 21.08.95 Skipped
+ V1.6 20.11.95 Added READALIGNMENT command
+ V1.6e 31.05.96 Added BVALUE command
+ V1.6f 13.06.96 Added BWEIGHT command; changed gDoWeights initialisation
+ V1.6g 18.06.96 Removed MODE_SEQUENTIAL and MODE_RESNUM. Replaced
+ by ZONE_* versions from pdb.h
+ V1.7 23.07.96 Added MAXATSPEC
+ V1.7b 11.11.96 gUseBVal is now handled as an int
+ V1.7c 18.11.96 Added IGNOREMISSING option
+ V1.7d 20.12.96 Added gNFittedCoor and NFITTED command
+ V1.8 07.05.98 Skipped for release
+ V2.0 01.03.01 More commands; various things made arrays for multiple
+ structure fitting
+ V2.1 28.03.01 Parameter for ITERATE and added CENTRE command
+ V2.2 20.12.01 Skipped for release
+ V2.3 01.12.04 Skipped for release
+ V2.4 03.06.05 Skipped for release
+ V2.5 07.06.05 Skipped for release
+ V-.- 28.03.08 Added gCZoneList[]. Increased NCOMM for new commands.
+ V-.- 02.04.08 Added globals for handling PDB headers and footers.
+ V-.- 02.04.08 Increased NCOMM for new command.
+ V-.- 07.04.08 Added distance cutoff for including atom pairs in RMSd.
+ V-.- 15.04.08 Added WHOLEPDB linked lists.
+ V-.- 01.05.08 Added gOccRank to control reading of low-occupancy atoms.
+ V-.- 02.05.08 Headers and footers now handled by WHOLEPDB.
+ V2.6 28.05.08 Removed unused globals and defines for headers and
+ footers.
+ V-.- 04.06.08 Added gMatchSymAtoms and gSymType[] for matching symmetrical
+ atom pairs(eg CD1 - CD2 and CE1 - CE2 in Tyr)
+ Increased NCOMM for new command to set symmetric matching.
+ V-.- 16.07.08 Added gap extension parameter, gGapPenExt, used in alignment
+ functions. Set default parameters for alignment to:
+ gGapPen = 10 and gGapPenExt to 2.
+ V-.- 18.07.08 Increased NCOMM for new command.
+ V-.- 30.07.08 Included bioplib/matrix.h.
+ V2.6 07.08.08 Added gMultiVsRef.
+ V2.6 20.08.08 Added gTwistAngle and gTwistMatrix used for rotating and
+ refitting a mobile structure.
+ V2.6 21.08.08 Added gRotateRefit flag to switch on rotate and refit.
+ V2.6 23.10.08 Added gWtAverage flag to allow old use of old weighting
+ system.
+ V3.0 06.11.08 Release Version.
+ V3.0 07.11.08 Added reference number,gMultiRef, for multistructure.
+ V3.0 14.11.08 Added GNU Readline Library support.
+ V3.0 04.02.09 Replaced gRotateRefit with ROTATE_REFIT #define
+ V3.0 18.02.09 Added #include for bioplib/aalist.h
+ V3.0 04.02.09 Removed ROTATE_REFIT #define - Now compile option.
+ V3.1 31.03.09 Skipped for release
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdio.h>
+#include <math.h>
+#include <string.h>
+#include <ctype.h>
+#include <stdlib.h>
+
+#include "bioplib/MathType.h"
+#include "bioplib/macros.h"
+#include "bioplib/pdb.h"
+#include "bioplib/parse.h"
+#include "bioplib/fit.h"
+#include "bioplib/seq.h"
+#include "bioplib/general.h"
+#include "bioplib/help.h"
+#include "bioplib/array.h"
+#include "bioplib/matrix.h"
+#include "bioplib/aalist.h"
+
+#ifdef READLINE_SUPPORT
+#undef NEWLINE
+#include <readline/readline.h>
+#include <readline/history.h>
+#endif
+
+/************************************************************************/
+/* Defines
+*/
+#define NUMTYPES 50 /* Number of atom types */
+#define MAXATSPEC 8 /* Max length of a single atom spec */
+#define NCOMM 53 /* Number of keywords */
+#define MAXNUMPARAM 2 /* Max number of numeric parameters */
+#define MAXSTRPARAM 2 /* Max number of string parameters */
+#define MAXSTRLEN 160 /* Max length of returned string */
+#define MAXCHAIN 160 /* Max number of chains in an align file */
+#define MAXBUFF 256 /* Buffer for filenames etc */
+#define MAXSTRUC 1000 /* Max number of structures to fit */
+#define MAXITER 1000 /* Max allowed number of iterations for
+ updating zones */
+#define MAXMULTIITER 100 /* Max allowed number of iterations for
+ multiple structures */
+#define DEF_MAXEQUIVDISTSQ 9.0/* Max distance before a pair is added to
+ equivalence list in iterative mode */
+#define ITER_STOP 0.01 /* Change in RMSD for zone convergence */
+#define MULTI_ITER_STOP 0.001 /* Change in RMSD for multi struc
+ convergence */
+#define STRUC_REFERENCE 0 /* Flags for which structure to be loaded */
+#define STRUC_MOBILE 1
+
+#define ATOM_FITTING 0 /* Flags for ValidAtom() */
+#define ATOM_RMS 1
+
+#define WEIGHT_NONE 0 /* B-value weighting schemes */
+#define WEIGHT_BVAL 1
+#define WEIGHT_INVBVAL 2
+
+#define SYMM_ATM_PAIRS 11 /* Number of symmetric atom pairs */
+
+typedef FILE file;
+
+#define PDBDISTSQ(a, b) ((((a)->x - (b)->x) * ((a)->x - (b)->x)) + \
+ (((a)->y - (b)->y) * ((a)->y - (b)->y)) + \
+ (((a)->z - (b)->z) * ((a)->z - (b)->z)))
+
+#define HELPFILE "ProFit.help" /* Help file */
+#define MDMFILE "mdm78.mat" /* Mutation data matrix */
+
+/************************************************************************/
+/* Structure definitions
+*/
+typedef struct zonestruct
+{
+ struct zonestruct *next;
+ int start1,
+ stop1,
+ start2,
+ stop2,
+ mode;
+ char chain1,
+ chain2,
+ startinsert1,
+ startinsert2,
+ stopinsert1,
+ stopinsert2;
+} ZONE;
+
+/* Type definition to store a X,Y coordinate pair in the matrix */
+typedef struct
+{
+
+ int x, y;
+
+}
+XY;
+
+/************************************************************************/
+/* Prototype definitions
+*/
+#include "protos.h"
+
+/************************************************************************/
+/* Globals
+*/
+#ifdef MAIN /*----------------------------------------------------------*/
+char gFitAtoms[NUMTYPES][MAXATSPEC], /* Atom types to be fitted */
+ gRMSAtoms[NUMTYPES][MAXATSPEC], /* Atom types for RMS calculation*/
+ gRefFilename[MAXBUFF], /* Reference filename */
+ gMobFilename[MAXSTRUC][MAXBUFF],/* Mobile filename */
+ *gRefSeq = NULL, /* Sequences */
+ *gMobSeq[MAXSTRUC],
+ gSymType[SYMM_ATM_PAIRS][4][5]; /* Symmetric Atom Types */
+WHOLEPDB *gRefWPDB = NULL, /* WHOLEPDB linked lists */
+ *gMobWPDB[MAXSTRUC],
+ *gFitWPDB[MAXSTRUC];
+PDB *gRefPDB = NULL, /* PDB linked lists */
+ *gMobPDB[MAXSTRUC],
+ *gFitPDB[MAXSTRUC];
+MKeyWd gKeyWords[NCOMM]; /* Array to store keywords */
+char *gStrParam[MAXSTRPARAM]; /* Array for returned strings */
+REAL gNumParam[MAXNUMPARAM], /* Array for returned numbers */
+ gRotMat[MAXSTRUC][3][3], /* Rotation matrix */
+ *gWeights = NULL, /* Weights array */
+ gBValue = 10000.0, /* Max BVal to consider in fitting */
+ gMaxEquivDistSq= DEF_MAXEQUIVDISTSQ, /* Max distance before a pair
+ is added to equivalence list in
+ iterative mode */
+ gDistCutoff = 0.0, /* Distance cutoff for including atom
+ pairs when calculating RMSd */
+ gTwistAngle = 42.0, /* Rotation angle for refit routine */
+ gRotMatTwist[3][3]; /* Rotation matrix for refit routine*/
+
+ZONE *gZoneList[MAXSTRUC], /* List of zones */
+ *gRZoneList[MAXSTRUC],
+ *gCZoneList[MAXSTRUC];
+int gCurrentMode = ZONE_MODE_RESNUM, /* Numbering mode */
+ gUserRMSZone = FALSE, /* User has specified things for RMS*/
+ gUserRMSAtoms = FALSE,
+ gUserFitZone = FALSE, /* User has specified fit zone */
+ gFitted = FALSE, /* Structures fitted */
+ gNOTFitAtoms = FALSE, /* NOT atom selections */
+ gNOTRMSAtoms = FALSE,
+ gHetAtoms = FALSE, /* Include het atoms? */
+ gIterate = FALSE, /* Iterative fitting? */
+ gDoWeights = WEIGHT_NONE,/* Weight by BVal column? */
+ gGapPen = 10, /* Align Gap penalty */
+ gGapPenExt = 2, /* Align Gap Extension penalty */
+ gUseBVal = 0, /* Use BValue cutoff on atom sel */
+ gIgnoreMissing = 0, /* Ignore missing atoms */
+ gNFittedCoor = 0, /* Number of coordinates fitted */
+ gMultiCount = 0, /* Number of strucs in multi mode */
+ gQuiet = FALSE, /* Stop warning messages */
+ gCentre = FALSE, /* Leave structure centred at origin*/
+ gLimit[2], /* Limit range from alignment */
+ gReadHeader = FALSE, /* Read/Write PDB Headers */
+ gUseDistCutoff = FALSE, /* Use distance cutoff for including
+ atom pairs when calculating RMSd */
+ gOccRank = 1, /* Occupancy ranking (>= 1) */
+ gMatchSymAtoms = FALSE, /* Match Symmetrical Atoms */
+ gMultiVsRef = FALSE, /* Set multi RMSD calculations */
+ gWtAverage = TRUE, /* Weighted averaging in multi mode */
+ gMultiRef = 0; /* Mobile used as multi reference */
+COOR *gRefCoor = NULL, /* Coordinate arrays */
+ *gMobCoor[MAXSTRUC];
+VEC3F gRefCofG, /* CofG of fitted region */
+ gMobCofG[MAXSTRUC];
+#else /*----------------------------------------------------------*/
+extern char gFitAtoms[NUMTYPES][MAXATSPEC],
+ gRMSAtoms[NUMTYPES][MAXATSPEC],
+ gRefFilename[MAXBUFF],
+ gMobFilename[MAXSTRUC][MAXBUFF],
+ *gRefSeq,
+ *gMobSeq[MAXSTRUC],
+ gSymType[SYMM_ATM_PAIRS][4][5];
+extern WHOLEPDB *gRefWPDB,
+ *gMobWPDB[MAXSTRUC],
+ *gFitWPDB[MAXSTRUC];
+extern PDB *gRefPDB,
+ *gMobPDB[MAXSTRUC],
+ *gFitPDB[MAXSTRUC];
+extern MKeyWd gKeyWords[NCOMM];
+extern char *gStrParam[MAXSTRPARAM];
+extern REAL gNumParam[MAXNUMPARAM],
+ gRotMat[MAXSTRUC][3][3],
+ *gWeights,
+ gBValue,
+ gMaxEquivDistSq,
+ gDistCutoff,
+ gTwistAngle,
+ gRotMatTwist[3][3];
+extern ZONE *gZoneList[MAXSTRUC],
+ *gRZoneList[MAXSTRUC],
+ *gCZoneList[MAXSTRUC];
+extern int gCurrentMode,
+ gUserRMSZone,
+ gUserRMSAtoms,
+ gUserFitZone,
+ gFitted,
+ gNOTFitAtoms,
+ gNOTRMSAtoms,
+ gHetAtoms,
+ gIterate,
+ gDoWeights,
+ gGapPen,
+ gGapPenExt,
+ gUseBVal,
+ gIgnoreMissing,
+ gNFittedCoor,
+ gMultiCount,
+ gQuiet,
+ gCentre,
+ gLimit[2],
+ gReadHeader,
+ gUseDistCutoff,
+ gOccRank,
+ gMatchSymAtoms,
+ gMultiVsRef,
+ gWtAverage,
+ gMultiRef;
+
+extern COOR *gRefCoor,
+ *gMobCoor[MAXSTRUC];
+extern VEC3F gRefCofG,
+ gMobCofG[MAXSTRUC];
+#endif /*----------------------------------------------------------*/
+
+
diff --git a/src/bioplib/00READ.ME b/src/bioplib/00READ.ME
new file mode 100644
index 0000000..a5ee276
--- /dev/null
+++ b/src/bioplib/00READ.ME
@@ -0,0 +1,11 @@
+
+
+The files in this directory form part of the Bioplib package.
+
+Please read the file COPYING.DOC for information on Bioplib and
+restrictions on the use of these files. You may obtain a licence
+form to use these files outside ProFit via the WWW address:
+http://www.biochem.ucl.ac.uk/~martin/text/BioplibLicence.ps
+
+
+
diff --git a/src/bioplib/ApMatPDB.c b/src/bioplib/ApMatPDB.c
new file mode 100644
index 0000000..1880aaf
--- /dev/null
+++ b/src/bioplib/ApMatPDB.c
@@ -0,0 +1,96 @@
+/*************************************************************************
+
+ Program:
+ File: ApMatPDB.c
+
+ Version: V1.0R
+ Date: August 1993
+ Function:
+
+ Copyright: (c) SciTech Software 1993
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+
+*************************************************************************/
+/* Includes
+*/
+#include <math.h>
+#include "MathType.h"
+#include "pdb.h"
+#include "matrix.h"
+#include "macros.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/* Variables global to this file only
+*/
+
+/************************************************************************/
+/*>void ApplyMatrixPDB(PDB *pdb, REAL matrix[3][3])
+ ------------------------------------------------
+ I/O: PDB *pdb PDB linked list
+ Input: REAL matrix[3][3] Matrix to apply
+
+ Apply a rotation matrix to a PDB linked list.
+
+ 22.07.93 Original (old RotatePDB()) By: ACRM
+*/
+void ApplyMatrixPDB(PDB *pdb,
+ REAL matrix[3][3])
+{
+ PDB *p;
+ VEC3F incoords,
+ outcoords;
+
+ for(p=pdb; p!=NULL; NEXT(p))
+ {
+ if(p->x != 9999.0 && p->y != 9999.0 && p->z != 9999.0)
+ {
+ incoords.x = p->x;
+ incoords.y = p->y;
+ incoords.z = p->z;
+ MatMult3_33(incoords,matrix,&outcoords);
+ p->x = outcoords.x;
+ p->y = outcoords.y;
+ p->z = outcoords.z;
+ }
+ }
+}
+
diff --git a/src/bioplib/AtomNameMatch.c b/src/bioplib/AtomNameMatch.c
new file mode 100644
index 0000000..81598fa
--- /dev/null
+++ b/src/bioplib/AtomNameMatch.c
@@ -0,0 +1,265 @@
+/*************************************************************************
+
+ Program:
+ File: AtomNameMatch.c
+
+ Version: V1.7R
+ Date: 11.10.99
+ Function: Tests for matching atom names with wild cards
+
+ Copyright: (c) SciTech Software 1993-9
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 01.03.94 Original
+ V1.1 07.07.95 Now non-destructive
+ V1.2 17.07.95 Now checks that a number was specified as part of the
+ spec. and returns a BOOL
+ V1.3 23.10.95 Moved FindResidueSpec() from PDBList.c
+ V1.4 08.02.96 Added FindResidue() and changed FindResidueSpec() to
+ use it
+ V1.5 23.07.96 Added AtomNameMatch() and LegalAtomSpec()
+ V1.6 18.03.98 Added option to include a . to separate chain and
+ residue number so numeric chain names can be used
+ V1.7 11.10.99 Allow a . to be used to start a number (such that the
+ default blank chain name is used). Allows negative
+ residue numbers
+
+*************************************************************************/
+/* Includes
+*/
+#include <ctype.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "macros.h"
+#include "SysDefs.h"
+#include "pdb.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>BOOL AtomNameMatch(char *atnam, char *spec, BOOL *ErrorWarn)
+ ------------------------------------------------------------
+ Input: char *atnam The atom name to test
+ char *spec The atom specification
+ I/O: BOOL *ErrorWarn On input, if TRUE, this routine will
+ indicate errors.
+ On output, indicates whether there
+ was an error.
+ Note that you must be careful to supply
+ an lvalue here, you can't just use TRUE
+ or FALSE since it's modified on return.
+ NULL is allowed if you don't care about
+ errors.
+
+ Tests whether an atom name matches an atom name specification.
+ ? or % is used to match a single character
+ * is used to match any trailing characters; it may not be used for
+ leading characters or in the middle of a specification (e.g. *B*,
+ C*2 are both illegal).
+ Wildcards may be escaped with a backslash.
+
+ For example: C* matches all carbon atoms,
+ O5\* matches an atom called O5*
+ ?B* matches all beta atoms
+
+ 23.07.96 Original By: ACRM
+*/
+BOOL AtomNameMatch(char *atnam, char *spec, BOOL *ErrorWarn)
+{
+ char *specp,
+ *atnamp;
+
+ /* Step through the specification and the atom name */
+ for(specp=spec, atnamp = atnam; *specp; specp++, atnamp++)
+ {
+ switch(*specp)
+ {
+ case '\\':
+ /* If the specification has a \ then we are escaping the next
+ character, so just step on to that character
+ */
+ specp++;
+ break;
+ case '?':
+ /* A query in the specification matches anything, so just
+ continue
+ */
+ continue;
+ case '*':
+ /* Matches the rest of the string */
+ if(ErrorWarn != NULL)
+ {
+ /* Check that there aren't any illegal characters following */
+ if(*(specp+1) && *(specp+1) != ' ')
+ {
+ if(*ErrorWarn)
+ {
+ fprintf(stderr,"Error in atom wildcard: %s\n",spec);
+ }
+ *ErrorWarn = TRUE;
+ }
+ else
+ {
+ *ErrorWarn = FALSE;
+ }
+ }
+ return(TRUE);
+ default:
+ break;
+ }
+
+ /* If there is a mismatch return FALSE */
+ if(*specp != *atnamp)
+ {
+ if(ErrorWarn != NULL)
+ *ErrorWarn = FALSE;
+ return(FALSE);
+ }
+
+ /* 07.06.05 If both specifications have ended with a space of
+ end of string then return TRUE. Fixed for if the atnam is
+ shorter (after moving the alternate atom indicator into its
+ own field)
+ */
+ if((*specp == ' ') && ((*atnamp == ' ') || (*atnamp == '\0')))
+ {
+ if(ErrorWarn != NULL)
+ *ErrorWarn = FALSE;
+ return(TRUE);
+ }
+ }
+
+ /* There have been no errors and we don't need the error flag again */
+ if(ErrorWarn != NULL)
+ *ErrorWarn = FALSE;
+
+ /* The specification has run out, see if there are any atom characters
+ left
+ */
+ if(*atnamp && *atnamp!=' ')
+ return(FALSE);
+
+ /* Both have ended OK, so the names match */
+ return(TRUE);
+}
+
+
+/************************************************************************/
+/*>BOOL AtomNameRawMatch(char *atnam, char *spec, BOOL *ErrorWarn)
+ ---------------------------------------------------------------
+ Input: char *atnam The atom name to check
+ char *spec The atom specification
+ I/O: BOOL *ErrorWarn On input, if TRUE, this routine will
+ indicate errors.
+ On output, indicates whether there
+ was an error.
+ Note that you must be careful to supply
+ an lvalue here, you can't just use TRUE
+ or FALSE since it's modified on return.
+ NULL is allowed if you don't care about
+ errors.
+
+ Tests whether an atom name matches an atom name specification.
+
+ This version should be given the raw atom name rather than the
+ massaged one. i.e. " CA " is C-alpha, "CA " is Calcium
+
+ Normally it checks against the second character onwards unless the
+ spec starts with a < in which case it checks from the beginning of
+ the string
+
+ Written as a wrapper to AtomNameMatch()
+
+ 15.02.01 Original By: ACRM
+*/
+BOOL AtomNameRawMatch(char *atnam, char *spec, BOOL *ErrorWarn)
+{
+ /* If atom spec starts with a < then just bump the spec pointer,
+ otherwise bump the atom name pointer since we will look from the
+ second character of the atom name
+ */
+ if(*spec == '<')
+ {
+ spec++;
+ }
+ else
+ {
+ atnam++;
+ }
+
+ return(AtomNameMatch(atnam, spec, ErrorWarn));
+}
+
+#ifdef TEST_MAIN
+int main(int argc, char **argv)
+{
+ char spec[8], atnam[8];
+
+ strcpy(atnam, " CA*");
+ printf("Atom name '%s':\n", atnam);
+
+ strcpy(spec,"CA");
+ printf("'%s' matches? %s\n", spec, (AtomNameRawMatch(atnam, spec, NULL)?"YES":"NO"));
+
+ strcpy(spec,"<CA");
+ printf("'%s' matches? %s\n", spec, (AtomNameRawMatch(atnam, spec, NULL)?"YES":"NO"));
+
+ strcpy(spec,"C*");
+ printf("'%s' matches? %s\n", spec, (AtomNameRawMatch(atnam, spec, NULL)?"YES":"NO"));
+
+ strcpy(spec,"CA*");
+ printf("'%s' matches? %s\n", spec, (AtomNameRawMatch(atnam, spec, NULL)?"YES":"NO"));
+
+ strcpy(spec,"CA?");
+ printf("'%s' matches? %s\n", spec, (AtomNameRawMatch(atnam, spec, NULL)?"YES":"NO"));
+
+ strcpy(spec,"C\\*");
+ printf("'%s' matches? %s\n", spec, (AtomNameRawMatch(atnam, spec, NULL)?"YES":"NO"));
+
+ strcpy(spec,"C?");
+ printf("'%s' matches? %s\n", spec, (AtomNameRawMatch(atnam, spec, NULL)?"YES":"NO"));
+
+ return(0);
+}
+#endif
diff --git a/src/bioplib/COPYING.DOC b/src/bioplib/COPYING.DOC
new file mode 100644
index 0000000..150a59d
--- /dev/null
+++ b/src/bioplib/COPYING.DOC
@@ -0,0 +1,170 @@
+ Bioplib
+ =======
+
+ (c)1990-1996, Dr. Andrew C.R. Martin
+
+ SciTech Software
+ 23 Stag Leys, Ashtead, Surrey, KT21 2TD, UK.
+
+ and
+
+ University College London
+ BSM Unit, Department of Biochemistry & Molecular Biology
+ Darwin Building
+ Gower Street
+ London WC1E 6BT
+
+ EMail: andrew at stagleys.demon.co.uk
+ martin at biochem.ucl.ac.uk
+ Fax: +44 (0) 1372 813069
+
+
+
+
+ Bioplib is a library of routines for the manipulation of protein
+structure and sequence using the C programming language.
+
+
+
+
+In these conditions, the term `Bioplib' refers to link libraries, and
+to the source code and object code which comprises these
+libraries. Should the organisation of the code into libraries change
+in the future, the term `Bioplib' will refer to any such link
+libraries (and associated source and object code) which comprise the
+future release of a package described by the author as `Bioplib'.
+
+The term `compiling' is used to describe the process of compiling and
+linking software on a computer to form an executable program.
+
+Bioplib was written by and is copyright (c)1990-1996, Dr. Andrew
+C. R. Martin and was mostly written while self-employed. A number of
+enhancements and additions have been made since working at University
+College London. All copyright and other intellectual property rights
+remain with the author. Bioplib may not be used for any purposes other
+than as described in the following terms and conditions.
+
+
+
+
+
+
+If you have received the Bioplib package directly from the author for
+the purposes of compiling your own software, see Section 1 of these
+conditions.
+
+If you have received Bioplib as part of another software package, see
+Section 2 of these conditions.
+
+
+
+
+
+
+
+Section 1
+=========
+If you have received the Bioplib package directly from the author:
+------------------------------------------------------------------
+
+ If you (the LICENSEE) received Bioplib from the author (Dr. Andrew
+C.R. Martin, hereinafter called the LICENSOR) for the purposes of
+developing your own software, then you will have signed a licence
+agreement. The conditions to which you have agreed are repeated here.
+
+1 Bioplib may only be used by the LICENSEE and members of the
+ LICENSEE's laboratory for compiling software provided by the
+ LICENSOR and for compiling software developed in the LICENSEE's
+ laboratory for research carried out there. Bioplib will be used
+ only by those members of the LICENSEE's laboratory to whom it must
+ reasonably be communicated to enable research to be undertaken and
+ who agree to be bound by the same conditions. The LICENSEE shall
+ procure and enforce such agreement from members of the LICENSEE's
+ laboratory for the benefit of the LICENSOR.
+
+2 The publication of research using software depending on Bioplib
+ must reference the Bioplib library and the LICENSOR. This agreement
+ may be taken as permission for citation as a `Personal
+ Communication' from the LICENSOR.
+
+3 All software compiled with Bioplib must include reference to Bioplib
+ and the LICENSOR in any copyright messages.
+
+4 All forms of Bioplib will be kept in a reasonably secure place to
+ prevent unauthorised access.
+
+5 Bioplib may not be copied for distribution to any third party except
+ as expressly described below.
+
+6 The complete Bioplib library may be distributed to third parties,
+ unmodified, solely for the purpose of compiling software developed
+ by the LICENSEE. The LICENSEE must make clear to the third party in
+ the documentation accompanying the LICENSEE's software, that the
+ Bioplib library has been supplied by the LICENSOR and that the third
+ party may only use Bioplib for the purposes of compiling the
+ LICENSEE's software. The third party may not redistribute Bioplib in
+ any circumstances except as part of the LICENSEE's software. Third
+ parties wishing to use Bioplib as part of their own software should
+ contact the LICENSOR. A file describing these conditions to third
+ parties is included with Bioplib.
+
+7 Any modifications or changes made to Bioplib should be sent to
+ the LICENSOR for possible inclusion in future versions of
+ Bioplib. Any such changes which are incorporated into Bioplib will
+ be acknowledged, but become the property of the LICENSOR.
+
+8 Bioplib shall be used exclusively for academic teaching and
+ research. Bioplib will not be used for any commercial research or
+ research associated with an industrial company unless a separate
+ agreement is made with the LICENSOR.
+
+9 Bioplib may not be used as part of any software which is sold for
+ more than a nominal sum to cover copying and distribution
+ costs. Should the LICENSEE wish to sell for profit software which
+ relies on Bioplib, or any part thereof, then a separate agreement
+ must be made with the LICENSOR.
+
+
+
+
+
+
+
+Section 2
+=========
+If you have received Bioplib as part of another software package:
+-----------------------------------------------------------------
+
+ The following conditions apply whether you have received Bioplib
+from the author (Dr. Andrew C.R. Martin, hereinafter called the
+LICENSOR), or from another source.
+
+ If you have received Bioplib for the purposes of compiling some
+other software package (hereinafter called the SOFTWARE PACKAGE) then
+you may only use Bioplib in order to compile and link the SOFTWARE
+PACKAGE. Bioplib has been provided by the LICENSOR solely for this
+purpose.
+
+ You may not use Bioplib to develop your own software and may not
+redistribute Bioplib except as part of the SOFTWARE PACKAGE.
+
+ Whether or not, under the conditions of redistribution of the
+SOFTWARE PACKAGE, you are permitted to redistribute modifications to
+the said SOFTWARE PACKAGE, you are not allowed to redistribute any
+modifications or changes to Bioplib. If you do make modifications or
+changes to Bioplib as part of a modification you make to the SOFTWARE
+PACKAGE for your own purposes, you should send the changes to the
+LICENSOR for possible inclusion in a future version of Bioplib. Any
+such changes which are incorporated into Bioplib will be acknowledged,
+but become the property of the LICENSOR.
+
+ If, under the conditions of redistribution of the SOFTWARE PACKAGE,
+you are permitted to redistribute modifications to the said SOFTWARE
+PACKAGE, then you must not remove, or modify, any copyright messages
+relating to Bioplib.
+
+ If you wish to use Bioplib for the development of your own
+software, you should contact the LICENSOR in order to obtain a full
+licence agreement.
+
+
diff --git a/src/bioplib/CopyPDB.c b/src/bioplib/CopyPDB.c
new file mode 100644
index 0000000..8048d1f
--- /dev/null
+++ b/src/bioplib/CopyPDB.c
@@ -0,0 +1,96 @@
+/*************************************************************************
+
+ Program:
+ File: CopyPDB.c
+
+ Version: V1.10R
+ Date: 08.10.99
+ Function: PDB linked list manipulation
+
+ Copyright: (c) SciTech Software 1992-6
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at stagleys.demon.co.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 22.02.94 Original release
+ V1.1 23.05.94 Added FindNextChainPDB()
+ V1.2 05.10.94 KillSidechain() uses BOOL rather than int
+ V1.3 24.07.95 Added TermPDB()
+ V1.4 25.07.95 Added GetPDBChainLabels()
+ V1.5 26.09.95 Fixed bug in TermPDB()
+ V1.6 12.10.95 Added DupePDB(), CopyPDBCoords()
+ V1.7 23.10.95 Moved FindResidueSpec() to ParseRes.c
+ V1.8 10.01.96 Added ExtractZonePDB()
+ V1.9 14.03.96 Added FindAtomInRes()
+ V1.10 08.10.99 Initialised some variables
+
+*************************************************************************/
+/* Includes
+*/
+#include <math.h>
+#include <stdlib.h>
+
+#include "MathType.h"
+#include "SysDefs.h"
+#include "pdb.h"
+#include "macros.h"
+#include "general.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>void CopyPDB(PDB *out, PDB *in)
+ -------------------------------
+ Input: PDB *in Input PDB record pointer
+ Output: PDB *out Output PDB record pointer
+
+ Copy a PDB record, except that the ->next is set to NULL;
+
+ 12.05.92 Original By: ACRM
+ 17.07.01 Now uses the generic *out=*in
+*/
+void CopyPDB(PDB *out,
+ PDB *in)
+{
+ *out = *in;
+ out->next = NULL;
+}
+
diff --git a/src/bioplib/CreateRotMat.c b/src/bioplib/CreateRotMat.c
new file mode 100644
index 0000000..5ed10b6
--- /dev/null
+++ b/src/bioplib/CreateRotMat.c
@@ -0,0 +1,128 @@
+/*************************************************************************
+
+ Program:
+ File: CreateRotMat.c
+
+ Version: V1.6R
+ Date: 27.09.95
+ Function: Simple matrix and vector operations
+
+ Copyright: (c) SciTech Software 1991-5
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 06.09.91 Original
+ V1.0a 01.06.92 Documented
+ V1.1 30.09.92 Matrix multiplication added
+ V1.2 10.06.93 void return from matrix multiplication
+ V1.3 22.07.93 Added CreateRotMat()
+ V1.4 03.08.93 Changed matrix multiplication to standard direction
+ V1.5 28.07.95 Added VecDist()
+ V1.6 27.09.95 Added MatMult33_33()
+
+*************************************************************************/
+/* Includes
+*/
+#include <math.h>
+#include "MathType.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>void CreateRotMat(char direction, REAL angle, REAL matrix[3][3])
+ ----------------------------------------------------------------
+ Input: char direction Axis about which to rotate
+ REAL angle Angle (in rads) to rotate
+ Output: REAL matrix[3][3] Rotation matrix
+
+ Create a 3x3 rotation matrix. Takes a direction as a single character
+ ('x', 'y', or 'z'), an angle (in rads) and outputs a rotation matrix
+
+ 22.07.93 Original By: ACRM
+*/
+void CreateRotMat(char direction, REAL angle, REAL matrix[3][3])
+{
+ int i, j,
+ m,
+ m1,
+ m2;
+ REAL CosTheta,
+ SinTheta;
+
+ /* Initialise matrix to all 0.0 */
+ for(i=0; i<3; i++)
+ for(j=0; j<3; j++)
+ matrix[i][j] = 0.0;
+
+ /* Select the items that need to be filled in */
+ switch(direction)
+ {
+ case 'x': case 'X':
+ m = 0;
+ break;
+ case 'y': case 'Y':
+ m = 1;
+ break;
+ case 'z': case 'Z':
+ m = 2;
+ break;
+ default: /* Just return the unit matrix */
+ for(i=0; i<3; i++)
+ matrix[i][i] = 1.0;
+ return;
+ }
+
+ /* Find which items these relate to */
+ m1 = (m+1) % 3;
+ m2 = (m1+1) % 3;
+
+ /* Fill in the values */
+ matrix[m][m] = 1.0;
+ CosTheta = (REAL)cos((double)angle);
+ SinTheta = (REAL)sin((double)angle);
+ matrix[m1][m1] = CosTheta;
+ matrix[m2][m2] = CosTheta;
+ matrix[m1][m2] = SinTheta;
+ matrix[m2][m1] = -SinTheta;
+}
+
+
diff --git a/src/bioplib/DupePDB.c b/src/bioplib/DupePDB.c
new file mode 100644
index 0000000..fcec925
--- /dev/null
+++ b/src/bioplib/DupePDB.c
@@ -0,0 +1,120 @@
+/*************************************************************************
+
+ Program:
+ File: DupePDB.c
+
+ Version: V1.10R
+ Date: 08.10.99
+ Function: PDB linked list manipulation
+
+ Copyright: (c) SciTech Software 1992-6
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at stagleys.demon.co.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 22.02.94 Original release
+ V1.1 23.05.94 Added FindNextChainPDB()
+ V1.2 05.10.94 KillSidechain() uses BOOL rather than int
+ V1.3 24.07.95 Added TermPDB()
+ V1.4 25.07.95 Added GetPDBChainLabels()
+ V1.5 26.09.95 Fixed bug in TermPDB()
+ V1.6 12.10.95 Added DupePDB(), CopyPDBCoords()
+ V1.7 23.10.95 Moved FindResidueSpec() to ParseRes.c
+ V1.8 10.01.96 Added ExtractZonePDB()
+ V1.9 14.03.96 Added FindAtomInRes()
+ V1.10 08.10.99 Initialised some variables
+
+*************************************************************************/
+/* Includes
+*/
+#include <math.h>
+#include <stdlib.h>
+
+#include "MathType.h"
+#include "SysDefs.h"
+#include "pdb.h"
+#include "macros.h"
+#include "general.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>PDB *DupePDB(PDB *in)
+ ---------------------
+ Input: PDB *in Input PDB linked list
+ Returns: PDB * Duplicated PDB linked list
+ (NULL on allocation failure)
+
+ Duplicates a PDB linked list. Allocates new linked list with identical
+ data.
+
+ 11.10.95 Original By: ACRM
+ 08.10.99 Initialise q to NULL
+*/
+PDB *DupePDB(PDB *in)
+{
+ PDB *out = NULL,
+ *p, *q = NULL;
+
+ for(p=in; p!=NULL; NEXT(p))
+ {
+ if(out==NULL)
+ {
+ INIT(out, PDB);
+ q=out;
+ }
+ else
+ {
+ ALLOCNEXT(q, PDB);
+ }
+ if(q==NULL)
+ {
+ FREELIST(out, PDB);
+ return(NULL);
+ }
+
+ CopyPDB(q, p);
+ }
+
+ return(out);
+}
+
+
diff --git a/src/bioplib/FindNextResidue.c b/src/bioplib/FindNextResidue.c
new file mode 100644
index 0000000..0f1eea5
--- /dev/null
+++ b/src/bioplib/FindNextResidue.c
@@ -0,0 +1,122 @@
+/*************************************************************************
+
+ Program:
+ File: FindNextResidue.c
+
+ Version: V1.10R
+ Date: 08.10.99
+ Function: PDB linked list manipulation
+
+ Copyright: (c) SciTech Software 1992-6
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at stagleys.demon.co.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 22.02.94 Original release
+ V1.1 23.05.94 Added FindNextChainPDB()
+ V1.2 05.10.94 KillSidechain() uses BOOL rather than int
+ V1.3 24.07.95 Added TermPDB()
+ V1.4 25.07.95 Added GetPDBChainLabels()
+ V1.5 26.09.95 Fixed bug in TermPDB()
+ V1.6 12.10.95 Added DupePDB(), CopyPDBCoords()
+ V1.7 23.10.95 Moved FindResidueSpec() to ParseRes.c
+ V1.8 10.01.96 Added ExtractZonePDB()
+ V1.9 14.03.96 Added FindAtomInRes()
+ V1.10 08.10.99 Initialised some variables
+
+*************************************************************************/
+/* Includes
+*/
+#include <math.h>
+#include <stdlib.h>
+
+#include "MathType.h"
+#include "SysDefs.h"
+#include "pdb.h"
+#include "macros.h"
+#include "general.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>PDB *FindEndPDB(PDB *start)
+ ---------------------------
+ Input: PDB *start PDB linked list
+ Returns: PDB * pointer to next residue
+
+ Step along a PDB linked list from start until we find a different
+ residue. Return a pointer to this PDB item.
+
+ 08.07.93 Original By: ACRM
+ 09.08.95 Now simply calls FindNextResidue() which is a rather more
+ sensible name. Retained for backwards compatibility
+*/
+PDB *FindEndPDB(PDB *start)
+{
+ return(FindNextResidue(start));
+}
+
+/************************************************************************/
+/*>PDB *FindNextResidue(PDB *pdb)
+ ------------------------------
+ Input: PDB *pdb PDB linked list
+ Returns: PDB * Next residue in PDB linked list or NULL if
+ there is none.
+
+ Finds the next residue in a PDB linked list.
+
+ 08.08.95 Original By: ACRM
+*/
+PDB *FindNextResidue(PDB *pdb)
+{
+ PDB *p;
+
+ for(p=pdb; p!=NULL; NEXT(p))
+ {
+ if((p->resnum != pdb->resnum) ||
+ (p->insert[0] != pdb->insert[0]) ||
+ (p->chain[0] != pdb->chain[0]))
+ return(p);
+ }
+
+ return(NULL);
+}
+
diff --git a/src/bioplib/FindZonePDB.c b/src/bioplib/FindZonePDB.c
new file mode 100644
index 0000000..87e45f4
--- /dev/null
+++ b/src/bioplib/FindZonePDB.c
@@ -0,0 +1,237 @@
+/*************************************************************************
+
+ Program:
+ File: FindZonePDB.c
+
+ Version: V1.4R
+ Date: 20.02.01
+ Function: Routines for handling zones in PDB linked lists
+
+ Copyright: (c) SciTech Software 1993-6
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 30.09.92 Original
+ V1.1 16.06.93 Tidied for book. Mode now a char.
+ V1.2 18.06.96 Added InPDBZone() from QTree program
+ V1.3 19.09.96 Added InPDBZoneSpec()
+
+*************************************************************************/
+
+/* Includes
+*/
+#include "SysDefs.h"
+#include "MathType.h"
+
+#include "pdb.h"
+#include "macros.h"
+
+/************************************************************************/
+/*>BOOL FindZonePDB(PDB *pdb, int start, char startinsert, int stop,
+ char stopinsert, char chain, int mode,
+ PDB **pdb_start, PDB **pdb_stop)
+ -------------------------------------------------------------
+ Input: PDB *pdb PDB linked list
+ int start Resnum of start of zone
+ char startinsert Insert code for start of zone
+ int stop Resnum of end of zone
+ char stopinsert Insert code for end of zone
+ char chain Chain name
+ int mode ZONE_MODE_RESNUM: Use PDB residue
+ numbers/chain
+ ZONE_MODE_SEQUENTIAL: Use sequential
+ numbering
+ Output: PDB **pdb_start Start of zone
+ PDB **pdb_stop End of zone
+ Returns: BOOL OK?
+
+ Finds pointers to the start and end of a zone in a PDB linked list. The
+ end is the atom *after* the specified zone
+
+ 30.09.92 Original
+ 17.07.95 Chain name was being ignored in specs like L* (for whole
+ of light chain)
+ 18.08.95 Now handles inserts
+ 31.07.95 Fixed bug when zone end==chain end
+ 20.02.01 Changed to -999/-999 for beginning/end of chain rather than -1/-1
+*/
+BOOL FindZonePDB(PDB *pdb,
+ int start,
+ char startinsert,
+ int stop,
+ char stopinsert,
+ char chain,
+ int mode,
+ PDB **pdb_start,
+ PDB **pdb_stop)
+{
+ PDB *p;
+ int rescount,
+ resnum,
+ InStop = FALSE;
+ char insert;
+
+ /* To start, we don't know where either are */
+ *pdb_start = NULL;
+ *pdb_stop = NULL;
+
+ /* If both start and stop are -999, then the whole structure (or a
+ whole chain) is being specified
+ */
+ if((start == (-999)) && (stop == (-999)))
+ {
+ if(chain == ' ') /* Whole structure */
+ {
+ *pdb_start = pdb;
+ *pdb_stop = NULL;
+ return(TRUE);
+ }
+ else /* An individual chain */
+ {
+ for(p=pdb; p!=NULL; NEXT(p))
+ {
+ if(p->chain[0] == chain)
+ {
+ if(*pdb_start == NULL)
+ {
+ *pdb_start = p;
+ }
+ }
+ else if(*pdb_start != NULL) /* We've aleady got the start */
+ {
+ *pdb_stop = p;
+ return(TRUE);
+ }
+ }
+ if(*pdb_start==NULL)
+ return(FALSE); /* Chain not found */
+ else
+ return(TRUE);
+ }
+ }
+
+ /* Handle one end of a zone being set to -999 */
+ if(start == -999) *pdb_start = pdb;
+ if(stop == -999) *pdb_stop = NULL;
+
+ /* If either end is still undefined */
+ if(*pdb_start == NULL || *pdb_stop == NULL)
+ {
+ /* Search reference structure for start and end of zone */
+ rescount = 1;
+ resnum = pdb->resnum;
+ insert = pdb->insert[0];
+ InStop = FALSE;
+
+ for(p=pdb; p!=NULL; NEXT(p))
+ {
+ if(mode == ZONE_MODE_RESNUM)
+ {
+ if(chain == ' ' || chain == p->chain[0])
+ { /* We are in the correct chain */
+
+ /* If start undefined, see if residue matches */
+ if(*pdb_start == NULL)
+ {
+ if((p->resnum == start) &&
+ (p->insert[0] == startinsert))
+ *pdb_start = p;
+ }
+
+ /* If stop undefined, then find the following residue */
+ if(*pdb_stop == NULL)
+ {
+ /* See if we have just moved out of the stop residue.
+ If so, set the stop position and return
+ */
+ if(InStop &&
+ (p->resnum != stop || p->insert[0] != stopinsert))
+ {
+ *pdb_stop = p;
+ return((*pdb_start==NULL)?FALSE:TRUE);
+ }
+
+ /* Residue matches, so set flag to say we're in the
+ last residue of the zone.
+ */
+ if((p->resnum == stop) &&
+ (p->insert[0] == stopinsert))
+ InStop = TRUE;
+ }
+ if(*pdb_start != NULL && *pdb_stop != NULL) /*Found both*/
+ break;
+ }
+ else if(InStop)
+ {
+ /* We will get here if InStop has been set without having
+ found the start of the next residue. This will occur
+ if the last residue of a zone was also the last
+ residue of a chain, since the chain name will now have
+ changed.
+ We just set *pdb_stop to this pointer and return.
+ */
+ *pdb_stop = p;
+ return((*pdb_start==NULL)?FALSE:TRUE);
+ }
+ } /* End of ZONE_MODE_RESNUM */
+ else if(mode == ZONE_MODE_SEQUENTIAL)
+ {
+ /* Correct the residue count */
+ if(p->resnum != resnum || p->insert[0] != insert)
+ {
+ rescount++;
+ resnum = p->resnum;
+ insert = p->insert[0];
+ }
+
+ if(*pdb_start == NULL) /* Identify zone start */
+ if(rescount == start) *pdb_start = p;
+ if(*pdb_stop == NULL) /* Identify zone stop */
+ {
+ if(InStop && rescount != stop) *pdb_stop = p;
+
+ if(rescount == stop) InStop = TRUE;
+ }
+ if(*pdb_start != NULL && *pdb_stop != NULL) /* Found both */
+ break;
+ }
+ else
+ {
+ printf(" Error==> CreateFitAtoms(): Internal confusion!\n");
+ }
+ } /* End of loop through PDB linked list */
+ } /* End of if() one pointer undefined */
+
+ /* Check start of range has been found and return */
+ return((*pdb_start==NULL)?FALSE:TRUE);
+}
+
diff --git a/src/bioplib/FreeStringList.c b/src/bioplib/FreeStringList.c
new file mode 100644
index 0000000..0945c77
--- /dev/null
+++ b/src/bioplib/FreeStringList.c
@@ -0,0 +1,117 @@
+/*************************************************************************
+
+ Program:
+ File: FreeStringList.c
+
+ Version: V1.20R
+ Date: 18.09.96
+ Function: General purpose routines
+
+ Copyright: (c) SciTech Software 1991-6
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ These are generally useful C routines, mostly string handling.
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.1 08.02.91 Added KillLine()
+ V1.2 10.02.91 Added setextn() and index()
+ V1.3 20.03.91 Added Word()
+ V1.4 28.05.92 ANSIed
+ V1.5 22.06.92 Added tab check to Word(). Improved setextn().
+ Added WordN(). Documented other routines.
+ V1.6 27.07.93 Corrected fsscanf() for double precision
+ V1.7 07.10.93 Checks made on case before toupper()/tolower()
+ for SysV compatibility. Also index() becomes
+ chindex()
+ V1.8 18.03.94 getc() -> fgetc()
+ V1.9 11.05.94 Added GetFilestem(), upstrcmp(), upstrncmp() &
+ GetWord()
+ V1.10 24.08.94 Added OpenStdFiles()
+ V1.11 08.03.95 Corrected OpenFile() for non-UNIX
+ V1.12 09.03.95 Added check on non-NULL filename in OpenFile()
+ V1.13 17.07.95 Added countchar()
+ V1.14 18.10.95 Moved YorN() to WindIO.c
+ V1.15 06.11.95 Added StoreString(), InStringList() and FreeStringList()
+ V1.16 22.11.95 Moved ftostr() to generam.c
+ V1.17 15.12.95 Added QueryStrStr()
+ V1.18 18.12.95 OpenStdFiles() treats filename of - as stdin/stdout
+ V1.19 05.02.96 OpenStdFiles() allows NULL pointers instead if filenames
+ V1.20 18.09.96 Added padchar()
+
+*************************************************************************/
+/* Includes
+*/
+#include <string.h>
+#include <ctype.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <math.h>
+
+#include "MathType.h"
+#include "SysDefs.h"
+#include "general.h"
+#include "macros.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>void FreeStringList(STRINGLIST *StringList)
+ -------------------------------------------
+ Input: STRINGLIST *StringList Linked list of strings
+
+ Frees memory allocated for a string list.
+
+ 06.11.95 Original By: ACRM
+*/
+void FreeStringList(STRINGLIST *StringList)
+{
+ STRINGLIST *p;
+
+ for(p=StringList; p!=NULL; NEXT(p))
+ {
+ if(p->string != NULL)
+ free(p->string);
+ }
+
+ FREELIST(StringList, STRINGLIST);
+}
+
+
diff --git a/src/bioplib/GetPDBChainLabels.c b/src/bioplib/GetPDBChainLabels.c
new file mode 100644
index 0000000..6f2e391
--- /dev/null
+++ b/src/bioplib/GetPDBChainLabels.c
@@ -0,0 +1,135 @@
+/*************************************************************************
+
+ Program:
+ File: GetPDBChainLabels.c
+
+ Version: V1.10
+ Date: 08.10.99
+ Function:
+
+ Copyright: (c) SciTech Software 1992-6
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 22.02.94 Original release
+ V1.1 23.05.94 Added FindNextChainPDB()
+ V1.2 05.10.94 KillSidechain() uses BOOL rather than int
+ V1.3 24.07.95 Added TermPDB()
+ V1.4 25.07.95 Added GetPDBChainLabels()
+ V1.5 26.09.95 Fixed bug in TermPDB()
+ V1.6 12.10.95 Added DupePDB(), CopyPDBCoords()
+ V1.7 23.10.95 Moved FindResidueSpec() to ParseRes.c
+ V1.8 10.01.96 Added ExtractZonePDB()
+ V1.9 14.03.96 Added FindAtomInRes()
+ V1.10 08.10.99 Initialised some variables
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdlib.h>
+
+#include "pdb.h"
+#include "macros.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>char *GetPDBChainLabels(PDB *pdb)
+ ---------------------------------
+ Input: PDB *pdb PDB linked list
+ Returns: char * Allocated string containing chain labels
+ NULL if unable to allocate memory
+
+ Scans a PDB linked list for chain names. Allocates memory for a
+ string containing these labels which is returned.
+
+ N.B. You must free the allocated memory when you've finished with it!
+
+ 25.07.95 Original By: ACRM
+*/
+char *GetPDBChainLabels(PDB *pdb)
+{
+ char *chains;
+ int nchains = 0,
+ maxchains = 16;
+ PDB *p;
+
+ /* Just return if linked list is NULL */
+ if(pdb==NULL)
+ return(NULL);
+
+ /* Allocate a chunk for storing the chains */
+ if((chains = (char *)malloc(maxchains * sizeof(char)))==NULL)
+ return(NULL);
+
+ /* Set up first chain label */
+ chains[nchains] = pdb->chain[0];
+
+ /* Run through the linked list */
+ for(p=pdb; p!=NULL; NEXT(p))
+ {
+ /* If chain label has changed */
+ if(p->chain[0] != chains[nchains])
+ {
+ /* Increment chain count and reallocate memory if needed */
+ if(++nchains == maxchains)
+ {
+ maxchains += 16;
+ if((chains = realloc(chains, maxchains * sizeof(char)))==NULL)
+ return(NULL);
+ }
+ /* Store this new chain label */
+ chains[nchains] = p->chain[0];
+ }
+ }
+
+ /* Increment chain count and reallocate memory if needed */
+ if(++nchains == maxchains)
+ {
+ maxchains += 16;
+ if((chains = realloc(chains, maxchains * sizeof(char)))==NULL)
+ return(NULL);
+ }
+
+ /* Terminate the chain list with a NUL character */
+ chains[nchains] = '\0';
+
+ return(chains);
+}
+
+
diff --git a/src/bioplib/GetWord.c b/src/bioplib/GetWord.c
new file mode 100644
index 0000000..1452c14
--- /dev/null
+++ b/src/bioplib/GetWord.c
@@ -0,0 +1,261 @@
+/*************************************************************************
+
+ Program:
+ File: GetWord.c
+
+ Version: V2.0
+ Date: 10.06.99
+ Function: Get a space delimited word from a string
+
+ Copyright: (c) SciTech Software 1995
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 02.03.99 Original By: ACRM
+ V2.0 10.06.99 Complete rewrite to allow escaping of characters
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdio.h>
+#include "macros.h"
+#include "SysDefs.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+char *doGetWord(char *buffer, char *word, int maxlen, BOOL comma);
+
+
+/************************************************************************/
+/*>char *doGetWord(char *buffer, char *word, int maxlen, BOOL comma)
+ -----------------------------------------------------------------
+ Input: char *buffer Input buffer to read words from
+ int maxlen Max length of output word
+ BOOL comma Treat commas like white space?
+ Output: char *word Word read from buffer
+ Returns: char * Pointer to start of next word in buffer
+ or NULL
+
+ This code is designed to be called from GetWord() or GetWordNC()
+
+ Reads a whitespace delimted word out of buffer into word. If comma is
+ TRUE, then commas are treated just like white space, otherwise they
+ are treated like normal characters.
+
+ Words containing white space may be wrapped in double inverted commas.
+ A \ is used as an escape character and maybe used to escape *any*
+ following character. In particular:
+ "\\" -> '\' To get a backslash
+ "\ " -> ' ' To get a hard whitespace (alternatively wrap the
+ string in double inverted commas)
+ "\"" -> '"' To get a double inverted comma
+
+ 10.06.99 Original By: ACRM (based on code from Bioplib)
+*/
+char *doGetWord(char *buffer, char *word, int maxlen, BOOL comma)
+{
+ int i, j;
+ BOOL dic = FALSE,
+ escape = FALSE;
+ char *chp;
+
+ /* Decrement maxlen so we can terminate correctly */
+ maxlen--;
+
+ /* Check validity of passed pointers */
+ if(word==NULL)
+ return(NULL);
+
+ word[0] = '\0';
+ if(buffer==NULL)
+ return(NULL);
+
+ KILLLEADSPACES(chp, buffer);
+
+ /* Run through each character in the input buffer */
+ for(i=0, j=0; chp[i]; i++)
+ {
+ switch(chp[i])
+ {
+ case '\\':
+ /* Use backslash as an escape character. If we've just had an
+ escape, then simply store it
+ */
+ if(escape)
+ {
+ escape = FALSE;
+ if(j<maxlen)
+ word[j++] = chp[i];
+ }
+ else
+ {
+ escape = TRUE;
+ }
+ break;
+ case '\"':
+ /* Double inverted commas enclose strings containing white space
+ If we've just had an escape then handle as a normal character,
+ otherwise, toggle the dic flag
+ */
+ if(escape)
+ {
+ if(j<maxlen)
+ word[j++] = chp[i];
+ }
+ else
+ {
+ TOGGLE(dic);
+ }
+ escape = FALSE;
+ break;
+ case ',':
+ /* A comma is handled as white space or a normal character,
+ depending on the comma flag
+ */
+ if(!comma) /* Treat as default */
+ {
+ if(j<maxlen)
+ word[j++] = chp[i];
+ escape = FALSE;
+ break;
+ }
+ /* Otherwise, if comma is true, just fall through to treat it
+ like whitespace
+ */
+ case ' ':
+ case '\t':
+ /* If we are in double inverted commas or last char was an escape
+ just handle as a normal character
+ */
+ if(dic || escape)
+ {
+ if(j<maxlen)
+ word[j++] = chp[i];
+ }
+ else
+ {
+ /* Otherwise, this terminates the word, so terminate, move
+ the pointer on and return
+ */
+ word[j] = '\0';
+ chp += i;
+ KILLLEADSPACES(chp, chp);
+ if(comma)
+ {
+ /* If we are handling commas as whitespace, then k
+ the comma if found
+ */
+ if(*chp == ',') chp++;
+ }
+ if(*chp == '\0') chp = NULL;
+ return(chp);
+ }
+ escape = FALSE;
+ break;
+ default:
+ /* A normal character, copy it across */
+ if(j<maxlen)
+ word[j++] = chp[i];
+ escape = FALSE;
+ }
+ }
+
+ word[j] = '\0';
+ return(NULL);
+}
+
+/************************************************************************/
+/*>char *GetWord(char *buffer, char *word, int maxlen)
+ ---------------------------------------------------
+ Input: char *buffer Input buffer to read words from
+ int maxlen Max length of output word
+ Output: char *word Word read from buffer
+ Returns: char * Pointer to start of next word in buffer
+ or NULL
+
+ This code is a wrapper to doGetWord()
+
+ Reads a whitespace/comma delimted word out of buffer into word.
+
+ Words containing white space may be wrapped in double inverted commas.
+ A \ is used as an escape character and maybe used to escape *any*
+ following character. In particular:
+ "\\" -> '\' To get a backslash
+ "\ " -> ' ' To get a hard whitespace (alternatively wrap the
+ string in double inverted commas)
+ "\"" -> '"' To get a double inverted comma
+
+ 10.06.99 Original By: ACRM
+*/
+char *GetWord(char *buffer, char *word, int maxlen)
+{
+ return(doGetWord(buffer, word, maxlen, TRUE));
+}
+
+/************************************************************************/
+/*>char *GetWordNC(char *buffer, char *word, int maxlen)
+ -----------------------------------------------------
+ Input: char *buffer Input buffer to read words from
+ int maxlen Max length of output word
+ Output: char *word Word read from buffer
+ Returns: char * Pointer to start of next word in buffer
+ or NULL
+
+ This code is a wrapper to doGetWord()
+
+ Reads a whitespace delimted word out of buffer into word. Commas
+ are treated just like normal characters.
+
+ Words containing white space may be wrapped in double inverted commas.
+ A \ is used as an escape character and maybe used to escape *any*
+ following character. In particular:
+ "\\" -> '\' To get a backslash
+ "\ " -> ' ' To get a hard whitespace (alternatively wrap the
+ string in double inverted commas)
+ "\"" -> '"' To get a double inverted comma
+
+ 10.06.99 Original By: ACRM
+*/
+char *GetWordNC(char *buffer, char *word, int maxlen)
+{
+ return(doGetWord(buffer, word, maxlen, FALSE));
+}
+
diff --git a/src/bioplib/IndexPDB.c b/src/bioplib/IndexPDB.c
new file mode 100644
index 0000000..ab9b933
--- /dev/null
+++ b/src/bioplib/IndexPDB.c
@@ -0,0 +1,108 @@
+/*************************************************************************
+
+ Program:
+ File: IndexPDB.c
+
+ Version: V2.0R
+ Date: 01.03.94
+ Function: Create an array of pointers into a PDB linked list
+
+ Copyright: (c) SciTech Software 1993-4
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ IndexPDB() creates an array of pointers to each PDB record in a linked
+ list. This allows random access to atoms without having to step through
+ the PDB linked list.
+
+**************************************************************************
+
+ Usage:
+ ======
+ pdb.h must be included before using this routine.
+
+ PDB **indx,
+ *pdb;
+ int natom;
+
+ indx = IndexPDB(pdb, &natom);
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 19.07.90 Original
+ V1.0a 15.02.91 Corrected comments to match new standard.
+ V1.1 01.06.92 ANSIed and documented, FPU condition added
+ V2.0 24.02.94 Completely re-written. Note that the calling format
+ has changed!! NOT BACKWARDLY COMPATIBLE!
+
+*************************************************************************/
+/* Includes
+*/
+#include <math.h>
+#include <stdlib.h>
+
+#include "MathType.h"
+#include "SysDefs.h"
+#include "pdb.h"
+#include "macros.h"
+
+/************************************************************************/
+/*>PDB **IndexPDB(PDB *pdb, int *natom)
+ ------------------------------------
+ Input: PDB *pdb Pointer to the start of a PDB linked list.
+ Output: int *natom Number of atoms in the PDB linked list.
+ Returns: PDB **indx An array of pointers to the PDB records.
+ NULL if unable to allocate memory.
+
+ Creates an array of pointers to PDB from a linked list. This is used
+ to allow array style access to items in the linked list:
+ e.g. (indx[23])->x will give the x coordinate of the 23rd item
+
+ 19.07.90 Original
+ 01.06.92 ANSIed and documented.
+ 24.02.94 Re-written. Now allocates and returns the index.
+*/
+PDB **IndexPDB(PDB *pdb, int *natom)
+{
+ PDB *p,
+ **indx;
+ int i=0;
+
+ /* Count the number of entries */
+ for(p=pdb, i=0; p!=NULL; NEXT(p)) i++;
+ *natom = i;
+
+ /* Allocate memory for the index array */
+ if((indx = (PDB **)malloc((i+1) * sizeof(PDB *)))==NULL)
+ return(NULL);
+
+
+ for(p=pdb, i=0; p!=NULL; NEXT(p))
+ indx[i++] = p;
+
+ indx[i] = NULL;
+
+ return(indx);
+}
+
diff --git a/src/bioplib/KillLeadSpaces.c b/src/bioplib/KillLeadSpaces.c
new file mode 100644
index 0000000..cdfdba3
--- /dev/null
+++ b/src/bioplib/KillLeadSpaces.c
@@ -0,0 +1,102 @@
+/*************************************************************************
+
+ Program:
+ File: KillLeadSpaces.c
+
+ Version: V1.20
+ Date: 18.09.96
+ Function:
+
+ Copyright: (c) SciTech Software 1991-6
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.1 08.02.91 Added KillLine()
+ V1.2 10.02.91 Added setextn() and index()
+ V1.3 20.03.91 Added Word()
+ V1.4 28.05.92 ANSIed
+ V1.5 22.06.92 Added tab check to Word(). Improved setextn().
+ Added WordN(). Documented other routines.
+ V1.6 27.07.93 Corrected fsscanf() for double precision
+ V1.7 07.10.93 Checks made on case before toupper()/tolower()
+ for SysV compatibility. Also index() becomes
+ chindex()
+ V1.8 18.03.94 getc() -> fgetc()
+ V1.9 11.05.94 Added GetFilestem(), upstrcmp(), upstrncmp() &
+ GetWord()
+ V1.10 24.08.94 Added OpenStdFiles()
+ V1.11 08.03.95 Corrected OpenFile() for non-UNIX
+ V1.12 09.03.95 Added check on non-NULL filename in OpenFile()
+ V1.13 17.07.95 Added countchar()
+ V1.14 18.10.95 Moved YorN() to WindIO.c
+ V1.15 06.11.95 Added StoreString(), InStringList() and FreeStringList()
+ V1.16 22.11.95 Moved ftostr() to generam.c
+ V1.17 15.12.95 Added QueryStrStr()
+ V1.18 18.12.95 OpenStdFiles() treats filename of - as stdin/stdout
+ V1.19 05.02.96 OpenStdFiles() allows NULL pointers instead if filenames
+ V1.20 18.09.96 Added padchar()
+
+*************************************************************************/
+/* Includes
+*/
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>char *KillLeadSpaces(char *string)
+ ----------------------------------
+ Input: char *string A character string
+ Returns: *char A pointer to the string with the
+ leading spaces removed
+
+ This routine strips leading spaces and tabs from a string returning
+ a pointer to the first non-whitespace character.
+
+ N.B. THE MACRO KILLLEADSPACES() MAY NOW BE USED INSTEAD
+
+ 06.02.91 Original
+ 28.05.92 ANSIed
+ 06.07.93 Added tab skipping
+*/
+char *KillLeadSpaces(char *string)
+{
+ while (*string == ' ' || *string == '\t') string++;
+ return(string);
+}
+
+
diff --git a/src/bioplib/LegalAtomSpec.c b/src/bioplib/LegalAtomSpec.c
new file mode 100644
index 0000000..ceb3e0c
--- /dev/null
+++ b/src/bioplib/LegalAtomSpec.c
@@ -0,0 +1,100 @@
+/*************************************************************************
+
+ Program:
+ File: LegalAtomSpec.c
+
+ Version: V1.7
+ Date: 11.10.99
+ Function:
+
+ Copyright: (c) Dr. Andrew C. R. Martin, University of Reading, 2002
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 01.03.94 Original
+ V1.1 07.07.95 Now non-destructive
+ V1.2 17.07.95 Now checks that a number was specified as part of the
+ spec. and returns a BOOL
+ V1.3 23.10.95 Moved FindResidueSpec() from PDBList.c
+ V1.4 08.02.96 Added FindResidue() and changed FindResidueSpec() to
+ use it
+ V1.5 23.07.96 Added AtomNameMatch() and LegalAtomSpec()
+ V1.6 18.03.98 Added option to include a . to separate chain and
+ residue number so numeric chain names can be used
+ V1.7 11.10.99 Allow a . to be used to start a number (such that the
+ default blank chain name is used). Allows negative
+ residue numbers
+
+*************************************************************************/
+/* Includes
+*/
+#include "SysDefs.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>BOOL LegalAtomSpec(char *spec)
+ ------------------------------
+ Partner routine for AtomNameMatch(). Checks whether a wildcard
+ specfication is legal (i.e. will not return an error when used
+ with AtomNameMatch()).
+
+ The only thing which is not legal is characters following a *
+
+ 23.07.96 Original By: ACRM
+*/
+BOOL LegalAtomSpec(char *spec)
+{
+ char *chp;
+
+ for(chp=spec; *chp; chp++)
+ {
+ if(*chp == '\\')
+ {
+ chp++;
+ }
+ else if(*chp == '*')
+ {
+ chp++;
+ if(*chp && *chp != ' ')
+ return(FALSE);
+ }
+ }
+ return(TRUE);
+}
+
diff --git a/src/bioplib/MatMult33_33.c b/src/bioplib/MatMult33_33.c
new file mode 100644
index 0000000..3504c93
--- /dev/null
+++ b/src/bioplib/MatMult33_33.c
@@ -0,0 +1,98 @@
+/*************************************************************************
+
+ Program:
+ File: MatMult33_33.c
+
+ Version: V1.6
+ Date: 27.09.95
+ Function:
+
+ Copyright: (c) Dr. Andrew C. R. Martin, University of Reading, 2002
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 06.09.91 Original
+ V1.0a 01.06.92 Documented
+ V1.1 30.09.92 Matrix multiplication added
+ V1.2 10.06.93 void return from matrix multiplication
+ V1.3 22.07.93 Added CreateRotMat()
+ V1.4 03.08.93 Changed matrix multiplication to standard direction
+ V1.5 28.07.95 Added VecDist()
+ V1.6 27.09.95 Added MatMult33_33()
+
+*************************************************************************/
+/* Includes
+*/
+#include "MathType.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>void MatMult33_33(REAL a[3][3], REAL b[3][3], REAL out[3][3])
+ -------------------------------------------------------------
+ Input: REAL a[3][3] Matrix to be multiplied
+ REAL b[3][3] Matrix to be multiplied
+ Output: REAL out[3][3] Output matrix
+
+ Multiply two 3x3 matrices
+
+ 27.09.95 Original
+*/
+void MatMult33_33(REAL a[3][3], REAL b[3][3], REAL out[3][3])
+{
+ int i, j, k;
+ REAL ab;
+
+ for(i=0; i<3; i++)
+ {
+ for(j=0; j<3; j++)
+ {
+ ab = (REAL)0.0;
+ for(k=0; k<3; k++)
+ {
+ ab += a[i][k]*b[k][j];
+ }
+ out[i][j]=ab;
+ }
+ }
+}
+
+
+
+
+
diff --git a/src/bioplib/MatMult3_33.c b/src/bioplib/MatMult3_33.c
new file mode 100644
index 0000000..21d479b
--- /dev/null
+++ b/src/bioplib/MatMult3_33.c
@@ -0,0 +1,91 @@
+/*************************************************************************
+
+ Program:
+ File: MatMult3_33.c
+
+ Version: V1.6
+ Date: 27.09.95
+ Function:
+
+ Copyright: (c) SciTech Software 1991-5
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 06.09.91 Original
+ V1.0a 01.06.92 Documented
+ V1.1 30.09.92 Matrix multiplication added
+ V1.2 10.06.93 void return from matrix multiplication
+ V1.3 22.07.93 Added CreateRotMat()
+ V1.4 03.08.93 Changed matrix multiplication to standard direction
+ V1.5 28.07.95 Added VecDist()
+ V1.6 27.09.95 Added MatMult33_33()
+
+*************************************************************************/
+/* Includes
+*/
+#include "MathType.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>void MatMult3_33(VEC3F vecin, REAL matin[3][3], VEC3F *vecout)
+ -------------------------------------------------------------
+ Input: VEC3F vecin Vector to be multiplied
+ REAL matin[3][3] Rotation matrix
+ Output: VEC3F *vecout Output multiplied vector
+
+ Multiply a 3-vector by a 3x3 matrix
+
+ 30.09.92 Original
+ 03.08.93 Changed multiplication to standard direction
+*/
+void MatMult3_33(VEC3F vecin,
+ REAL matin[3][3],
+ VEC3F *vecout)
+{
+ vecout->x = vecin.x * matin[0][0] +
+ vecin.y * matin[1][0] +
+ vecin.z * matin[2][0];
+ vecout->y = vecin.x * matin[0][1] +
+ vecin.y * matin[1][1] +
+ vecin.z * matin[2][1];
+ vecout->z = vecin.x * matin[0][2] +
+ vecin.y * matin[1][2] +
+ vecin.z * matin[2][2];
+}
+
diff --git a/src/bioplib/MathType.h b/src/bioplib/MathType.h
new file mode 100644
index 0000000..0216ec3
--- /dev/null
+++ b/src/bioplib/MathType.h
@@ -0,0 +1,74 @@
+/*************************************************************************
+
+ Program:
+ File: MathType.h
+
+ Version: V1.0R
+ Date: 30.08.94
+ Function: Type definitions for maths
+
+ Copyright: (c) SciTech Software 1993-4
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+
+*************************************************************************/
+#ifndef _MATHTYPE_H
+#define _MATHTYPE_H
+
+/* This is for compilers running on machines such as Amigas, Macs and
+ older Sun workstations using 680X0 series processors with maths
+ coprocessors. This assumes that the symbol _M68881 is defined when
+ the compiler is run to use the maths coprocessor and that a file
+ called m68881.h is to be included to make full use of the coprocessor
+*/
+#ifdef _M68881
+#include <m68881.h>
+#endif
+
+/* Note, that if this is changed to float, all I/O routines using type
+ REAL will need %lf's changing to %f's
+*/
+typedef double REAL;
+
+typedef struct
+{ REAL x, y, z;
+} VEC3F;
+
+typedef VEC3F COOR;
+
+/* Define PI if not done */
+#ifndef PI
+#define PI (4.0 * atan(1.0))
+#endif
+
+#endif
diff --git a/src/bioplib/OpenFile.c b/src/bioplib/OpenFile.c
new file mode 100644
index 0000000..b62bb43
--- /dev/null
+++ b/src/bioplib/OpenFile.c
@@ -0,0 +1,158 @@
+/*************************************************************************
+
+ Program:
+ File: OpenFile.c
+
+ Version: V1.22
+ Date: 28.07.05
+ Function:
+
+ Copyright: (c) SciTech Software 1991-2005
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.1 08.02.91 Added KillLine()
+ V1.2 10.02.91 Added setextn() and index()
+ V1.3 20.03.91 Added Word()
+ V1.4 28.05.92 ANSIed
+ V1.5 22.06.92 Added tab check to Word(). Improved setextn().
+ Added WordN(). Documented other routines.
+ V1.6 27.07.93 Corrected fsscanf() for double precision
+ V1.7 07.10.93 Checks made on case before toupper()/tolower()
+ for SysV compatibility. Also index() becomes
+ chindex()
+ V1.8 18.03.94 getc() -> fgetc()
+ V1.9 11.05.94 Added GetFilestem(), upstrcmp(), upstrncmp() &
+ GetWord()
+ V1.10 24.08.94 Added OpenStdFiles()
+ V1.11 08.03.95 Corrected OpenFile() for non-UNIX
+ V1.12 09.03.95 Added check on non-NULL filename in OpenFile()
+ V1.13 17.07.95 Added countchar()
+ V1.14 18.10.95 Moved YorN() to WindIO.c
+ V1.15 06.11.95 Added StoreString(), InStringList() and FreeStringList()
+ V1.16 22.11.95 Moved ftostr() to generam.c
+ V1.17 15.12.95 Added QueryStrStr()
+ V1.18 18.12.95 OpenStdFiles() treats filename of - as stdin/stdout
+ V1.19 05.02.96 OpenStdFiles() allows NULL pointers instead if filenames
+ V1.20 18.09.96 Added padchar()
+ V1.21 18.06.02 Added string.h
+ V1.22 28.07.05 Added conditionals for Mac OS/X
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include "SysDefs.h"
+#include "port.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>FILE *OpenFile(char *filename, char *envvar, char *mode, BOOL *noenv)
+ ---------------------------------------------------------------------
+ Input: char *filename Filename to be opened
+ char *envvar Unix/MS-DOS environment variable
+ Other OS assign name (with :)
+ char *mode Mode in which to open file (r, w, etc)
+ Output: BOOL *noenv Set to TRUE under Unix/MS-DOS if
+ the reason for failure was that the
+ environment variable was not set.
+ Returns: FILE * File pointer or NULL on failure
+
+ Attempts to open a filename as specified. Returns a file
+ pointer. If this fails:
+
+ Under UNIX/MS-DOS:
+ gets the contents of the envvar environment variable and prepends
+ that to the filename and tries again. If envvar was not set, noenv
+ is set to TRUE and the routine returns a NULL pointer.
+
+ Under other OSs:
+ prepends the envvar string onto the filename and tries to open the
+ file again.
+
+ Returns the pointer returned by the open() command after all this.
+
+ 22.09.94 Original By: ACRM
+ 11.09.94 Puts a : in for the assign type.
+ 24.11.94 Added __unix define. Checks for trailing / in environment
+ variable
+ 08.03.95 Corrected basename to filename in non-unix version
+ 09.03.95 Checks that filename is not a NULL or blank string
+ 28.07.05 Added conditionals for Mac OS/X: __MACH__ and __APPLE__
+*/
+FILE *OpenFile(char *filename, char *envvar, char *mode, BOOL *noenv)
+{
+ char *datadir,
+ buffer[160];
+ FILE *fp;
+
+ if(filename == NULL || filename[0] == '\0')
+ return(NULL);
+
+ if(noenv != NULL) *noenv = FALSE;
+
+ /* Try to open the filename as specified */
+ if((fp=fopen(filename,mode)) == NULL)
+ {
+ /* Failed, so build alternative directory/filename */
+#if (unix || __unix__ || MS_WINDOWS || __unix || __MACH__ || __APPLE__)
+ if((datadir = getenv(envvar)) != NULL)
+ {
+ if(datadir[strlen(datadir)-1] == '/')
+ sprintf(buffer,"%s%s",datadir,filename);
+ else
+ sprintf(buffer,"%s/%s",datadir,filename);
+ fp = fopen(buffer,mode);
+ }
+ else
+ {
+ if(noenv != NULL) *noenv = TRUE;
+ return(NULL);
+ }
+#else
+ sprintf(buffer,"%s:%s",envvar,filename);
+ fp = fopen(buffer,mode);
+#endif
+ }
+
+ return(fp);
+}
+
+
diff --git a/src/bioplib/PDB2Seq.c b/src/bioplib/PDB2Seq.c
new file mode 100644
index 0000000..b55aa9f
--- /dev/null
+++ b/src/bioplib/PDB2Seq.c
@@ -0,0 +1,251 @@
+/*************************************************************************
+
+ Program:
+ File: PDB2Seq.c
+
+ Version: V1.12R
+ Date: 10.06.05
+ Function: Conversion from PDB to sequence and other sequence
+ related routines
+
+ Copyright: (c) SciTech Software 1993-2005
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 29.09.92 Original
+ V1.1 07.06.93 Corrected allocation
+ V1.2 18.06.93 Handles multi-chains and skips NTER and CTER residues.
+ Added SplitSeq()
+ V1.3 09.07.93 SplitSeq() cleans up properly if allocation failed
+ V1.4 11.05.94 Added TrueSeqLen()
+ V1.5 13.05.94 Fixed bug in PDB2Seq().
+ Added KnownSeqLen().
+ V1.6 07.09.94 Fixed allocation bug in SplitSeq()
+ V1.7 19.07.95 Added check for ATOM records
+ V1.8 24.01.96 Fixed bug when no ATOM records in linked list
+ Returns a blank string
+ V1.9 26.08.97 Renamed DoPDB2Seq() with handling of Asx/Glx and
+ protein-only. Added macros to recreate the
+ old PDB2Seq() interface and similar new calls
+ V1.10 02.10.00 Added NoX option
+ V1.11 30.05.02 Changed PDB field from 'junk' to 'record_type'
+ V1.12 10.06.05 Fixed bug - was undercounting by 1 for CA-only chains
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdlib.h>
+#include <string.h>
+
+#include "macros.h"
+#include "pdb.h"
+#include "seq.h"
+
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>char *DoPDB2Seq(PDB *pdb, BOOL DoAsxGlx, BOOL ProtOnly, BOOL NoX)
+ -----------------------------------------------------------------
+ Input: PDB *pdb PDB linked list
+ BOOL DoAsxGlx Handle Asx and Glx as B and Z rather than X
+ BOOL ProtOnly Don't do DNA/RNA; these simply don't get
+ done rather than being handled as X
+ BOOL NoX Skip amino acids which would be assigned as X
+ Returns: char * Allocated character array containing sequence
+
+ malloc()'s an array containing the 1-letter sequence corresponding to
+ an input PDB linked list. Returns NULL if given a NULL parameter or
+ memory allocation fails. Puts *'s in the sequence for multi-chains.
+
+ This routine is normally called via the macro interfaces:
+ PDB2Seq(pdb), PDB2SeqX(pdb), PDBProt2Seq(pdb), PDBProt2SeqX(pdb)
+ Those with Prot in their names handle protein only; those with
+ X handle Asx/Glx as B/Z rather than as X
+
+ 29.09.92 Original By: ACRM
+ 07.06.93 Corrected allocation.
+ 18.06.93 Handles multi-chains and skips NTER and CTER residues
+ 13.05.94 Check for chain change *before* copy residue (!)
+ (Bug reported by Bob MacCullum)
+ 19.07.95 Added check for ATOM records
+ 24.01.96 Returns blank string (rather than core dumping!) if the
+ linked list contained no ATOM records
+ 26.08.97 Changed to doPDB2Seq with extra parameters (DoAsxGlx &
+ ProtOnly). The old calling forms have now become macros
+ 02.10.00 Added NoX
+ 10.06.05 Changed the initialization of rescount, resnum, etc. so
+ it correctly points to the first residue. This solves a
+ bug with CA-only chains where it was undercounting by 1
+*/
+char *DoPDB2Seq(PDB *pdb, BOOL DoAsxGlx, BOOL ProtOnly, BOOL NoX)
+{
+ int resnum,
+ rescount,
+ NBreak = 0;
+ char insert,
+ chain,
+ *sequence = NULL;
+ PDB *p = NULL;
+
+ /* Sanity check */
+ if(pdb==NULL) return(NULL);
+
+ /* First step through the pdb linked list to see how many residues
+ and chains.
+ 10.06.05 Fixed bug - was undercounting by one for CA-only chains
+ */
+ rescount = 1;
+ resnum = pdb->resnum;
+ insert = pdb->insert[0];
+ chain = pdb->chain[0];
+
+ for(p=pdb->next; p!=NULL; NEXT(p))
+ {
+ if(p->resnum != resnum || p->insert[0] != insert)
+ {
+ if(strncmp(p->resnam,"NTER",4) &&
+ strncmp(p->resnam,"CTER",4) &&
+ !strncmp(p->record_type,"ATOM ",6)) /* V1.7 */
+ rescount++;
+
+ resnum = p->resnum;
+ insert = p->insert[0];
+
+ /* Check for chain change */
+ if(chain != p->chain[0])
+ {
+ NBreak++;
+ chain = p->chain[0];
+ }
+ }
+ }
+
+ if(NBreak) rescount += NBreak;
+
+ /* Allocate memory for the sequence array */
+ sequence = malloc((rescount + 1) * sizeof(char));
+ if(sequence == NULL) return(NULL);
+
+ /* Step through the pdb linked list again, setting sequence array */
+ p = pdb;
+
+ /* Skip an NTER residue */
+ /* 24.01.96 Added NULL check; occurs when no ATOM records present */
+ while(p!=NULL &&
+ (!strncmp(p->resnam,"NTER",4) ||
+ strncmp(p->record_type,"ATOM ",6)))
+ NEXT(p);
+ if(p==NULL)
+ {
+ sequence[0] = '\0';
+ return(sequence);
+ }
+
+ sequence[0] = ((DoAsxGlx)?thronex(p->resnam):throne(p->resnam));
+ if((!ProtOnly) || (!gBioplibSeqNucleicAcid))
+ rescount = 1;
+ else
+ rescount = 0;
+
+ /* 02.10.00 Reset count if it's an X character and we are ignoring
+ them
+ */
+ if(NoX && sequence[0] == 'X')
+ rescount = 0;
+
+ resnum = p->resnum;
+ insert = p->insert[0];
+ chain = p->chain[0];
+
+ for(p=p->next; p!=NULL; NEXT(p))
+ {
+ if(!strncmp(p->record_type,"ATOM ",6)) /* V1.7 */
+ {
+ if(p->resnum != resnum || p->insert[0] != insert)
+ {
+ /* Check for chain change */
+ if(chain != p->chain[0])
+ {
+ sequence[rescount++] = '*';
+ chain = p->chain[0];
+ }
+
+ /* 06.02.03 Fixed bug - was incrementing recount even when
+ it was NTER/CTER
+ */
+ if(strncmp(p->resnam,"NTER",4) && strncmp(p->resnam,"CTER",4))
+ {
+ sequence[rescount] = ((DoAsxGlx) ?
+ thronex(p->resnam):
+ throne(p->resnam));
+ if((!ProtOnly) || (!gBioplibSeqNucleicAcid))
+ rescount++;
+
+ /* 02.10.00 Reset count if it's an X character and we are
+ ignoring them
+ */
+ if(NoX && sequence[rescount-1] == 'X')
+ rescount--;
+ }
+
+ resnum = p->resnum;
+ insert = p->insert[0];
+ }
+ }
+ }
+
+ sequence[rescount] = '\0';
+
+ return(sequence);
+}
+
+#ifdef TEST_MAIN
+#include <stdio.h>
+int main(int argc, char **argv)
+{
+ PDB *pdb;
+ int natoms;
+ char *seq;
+ FILE *fp;
+ fp = fopen("/acrm/data/pdb/pdb1crn.ent", "r");
+ pdb=ReadPDB(fp, &natoms);
+
+ seq = DoPDB2Seq(pdb, FALSE, FALSE, FALSE);
+ return(0);
+}
+#endif
+
diff --git a/src/bioplib/ParseRes.c b/src/bioplib/ParseRes.c
new file mode 100644
index 0000000..72079a6
--- /dev/null
+++ b/src/bioplib/ParseRes.c
@@ -0,0 +1,233 @@
+/*************************************************************************
+
+ Program:
+ File: ParseRes.c
+
+ Version: V1.8R
+ Date: 29.09.05
+ Function: Parse a residue specification
+
+ Copyright: (c) SciTech Software 1993-2005
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ EMail: andrew at bioinf.org.uk
+ martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 01.03.94 Original
+ V1.1 07.07.95 Now non-destructive
+ V1.2 17.07.95 Now checks that a number was specified as part of the
+ spec. and returns a BOOL
+ V1.3 23.10.95 Moved FindResidueSpec() from PDBList.c
+ V1.4 08.02.96 Added FindResidue() and changed FindResidueSpec() to
+ use it
+ V1.5 23.07.96 Added AtomNameMatch() and LegalAtomSpec()
+ V1.6 18.03.98 Added option to include a . to separate chain and
+ residue number so numeric chain names can be used
+ V1.7 11.10.99 Allow a . to be used to start a number (such that the
+ default blank chain name is used). Allows negative
+ residue numbers
+ V1.8 29.09.05 Moved ParseResSpec() into DoParseResSpec() with extra
+ param and added wrappers for ParseResSpec() and
+ ParseResSpecNoUpper() (Changes by Tony Lewis) By: TL
+
+*************************************************************************/
+/* Includes
+*/
+#include <ctype.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "macros.h"
+#include "SysDefs.h"
+#include "pdb.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>BOOL ParseResSpec(char *spec, char *chain, int *resnum, char *insert)
+ ---------------------------------------------------------------------
+ Input: char *spec Residue specification
+ Output: char *chain Chain label
+ int *resnum Residue number
+ char *insert Insert label
+ Returns: BOOL Success?
+
+ Splits up a residue specification of the form
+ [c][.]num[i]
+ into chain, resnum and insert. Chain and insert are optional and will
+ be set to spaces if not specified. Converts the resiude specification
+ to upper case before processing.
+
+ Moved the code that was here to a new function, DoParseResSpec()
+ and made this function just call that new function. See
+ DoParseResSpec()'s comments for notes on previous changes. This
+ move is to allow the underlying function to have an extra parameter
+ to specify whether or not the residue specification should be upper
+ cased (without affecting code that calls this function).
+
+ 29.09.05 Original By: TL
+*/
+BOOL ParseResSpec(char *spec, char *chain, int *resnum, char *insert)
+{
+ return DoParseResSpec(spec, chain, resnum, insert, TRUE);
+}
+
+/************************************************************************/
+/*>BOOL ParseResSpecNoUpper(char *spec, char *chain, int *resnum,
+ char *insert)
+ --------------------------------------------------------------
+ Input: char *spec Residue specification
+ Output: char *chain Chain label
+ int *resnum Residue number
+ char *insert Insert label
+ Returns: BOOL Success?
+
+ Splits up a residue specification of the form
+ [c][.]num[i]
+ into chain, resnum and insert. Chain and insert are optional and will
+ be set to spaces if not specified. Does not converts the resiude
+ specification to upper case before processing.
+
+ 29.09.05 Original By: TL
+*/
+BOOL ParseResSpecNoUpper(char *spec, char *chain, int *resnum,
+ char *insert)
+{
+ return DoParseResSpec(spec, chain, resnum, insert, FALSE);
+}
+
+/************************************************************************/
+/*>BOOL DoParseResSpec(char *spec, char *chain, int *resnum, char *insert,
+ BOOL uppercaseresspec)
+ -----------------------------------------------------------------------
+ Input: char *spec Residue specification
+ BOOL uppercaseresspec
+ Output: char *chain Chain label
+ int *resnum Residue number
+ char *insert Insert label
+ Returns: BOOL Success?
+
+ Splits up a residue specification of the form
+ [c][.]num[i]
+ into chain, resnum and insert. Chain and insert are optional and will
+ be set to spaces if not specified. If uppercaseresspec eqauls TRUE,
+ the spec is upper cased before processing
+
+ 21.07.93 Original By: ACRM
+ 17.07.95 Added BOOL return
+ 18.03.98 Added option to include a . to separate chain and residue
+ number so numeric chain names can be used
+ 29.09.05 Moved this code to from ParseResSpec() to DoParseResSpec()
+ and made that function just call this new function.
+ This move is to allow this underlying function to have an
+ extra parameter to specify whether or not the residue
+ specification should be upper cased (without affecting code
+ that calls the old function). By: TL
+*/
+BOOL DoParseResSpec(char *spec, char *chain, int *resnum, char *insert,
+ BOOL uppercaseresspec)
+{
+ char *ptr,
+ *ptr2;
+ BOOL DoRestore = FALSE,
+ retval = TRUE;
+
+ /* 11.10.99 Default resnum of 0 */
+ *resnum = 0;
+
+ /* Upper case the residue specification if it has been requested */
+ if (uppercaseresspec == TRUE)
+ {
+ UPPER(spec);
+ }
+ KILLLEADSPACES(ptr, spec);
+
+ /* Extract chain from spec */
+ if(*ptr == '.')
+ {
+ *chain = ' ';
+ ptr++;
+ }
+ else if((*(ptr+1) == '.') || (!isdigit(*ptr) && (*ptr != '-')))
+ {
+ /* Chain was specified */
+ *chain = *ptr;
+ ptr++;
+ if(*ptr == '.')
+ {
+ ptr++;
+ }
+ }
+ else
+ {
+ /* Spec started with a digit, so no chain specified */
+ *chain = ' ';
+ }
+
+ /* Extract insert from spec */
+ *insert = ' ';
+ for(ptr2 = ptr; *ptr2; ptr2++)
+ {
+ /* 11.10.99 Now also checks that it isn't a - as the first
+ character
+ */
+ if(!isdigit(*ptr2) && ((ptr2!=ptr)||(*ptr2 != '-')))
+ {
+ *insert = *ptr2;
+ *ptr2 = '\0';
+ DoRestore = TRUE;
+ break;
+ }
+ }
+
+ /* Extract residue number from spec */
+ if(sscanf(ptr,"%d",resnum) == 0)
+ retval = FALSE;
+
+ if(DoRestore)
+ {
+ /* V1.1: Restore the original string */
+ *ptr2 = *insert;
+ }
+
+ return(retval);
+}
+
+
diff --git a/src/bioplib/ReadPDB.c b/src/bioplib/ReadPDB.c
new file mode 100644
index 0000000..a9f7462
--- /dev/null
+++ b/src/bioplib/ReadPDB.c
@@ -0,0 +1,1074 @@
+/*************************************************************************
+
+ Program:
+ File: ReadPDB.c
+
+ Version: V2.21
+ Date: 17.03.09
+ Function: Read coordinates from a PDB file
+
+ Copyright: (c) SciTech Software 1988-2009
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ pdb = ReadPDB(fp,natom) - This subroutine will read a .PDB file
+ of any size and form a linked list of the protein structure.
+ This list is contained in a linked set of structures of type
+ pdb_entry. The strucure is set up by including the file
+ "pdb.h". For details of the structure, see this file.
+
+ To free the space created by this routine, call FREELIST(pdb,PDB).
+
+ The parameters passed to the subroutine are:
+ fp - A pointer to type FILE in which the .PDB file is stored.
+ pdb - A pointer to type PDB.
+ natom - A pointer to type integer in which the number of atoms
+ found is stored.
+
+ As of V2.3, the routine makes provision for partial occupancies. If
+ the occupancies are 1.0 or 0.0, the atoms are read verbatim. If not,
+ only the highest occupancy atoms are read and the atom names are
+ corrected to remove alternative labels. This behaviour can be
+ overridden by calling one of the ...OccRank() routines to read lower
+ occupancy atoms. If any partial occupancy atoms are read the global
+ flag gPDBPartialOcc is set to TRUE.
+
+NOTE: Although some of the fields are represented by a single character,
+ they are still stored in character arrays.
+
+BUGS: The subroutine cannot read files with VAX Fortran carriage control!
+ It just sits there and page faults like crazy.
+
+BUGS: The multiple occupancy code assumes that all positions for a given
+ atom in consecutive records of the file
+
+BUGS: 25.01.05 Note the multiple occupancy code won't work properly for
+ 3pga where atoms have occupancies of zero and one
+
+**************************************************************************
+
+ Usage:
+ ======
+ pdb = ReadPDB(fp,natom)
+
+ Input: FILE *fp A pointer to type FILE in which the
+ .PDB file is stored.
+ Returns: PDB *pdb A pointer to the first allocated item of
+ the PDB linked list
+ Output: int *natom Number of atoms read.
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 04.11.88 Original
+ V1.1 07.02.89 Now ignores any records from the .PDB file which
+ don't start with ATOM or HETATM.
+ V1.2 28.03.90 Some fields altered to match the exact specifications
+ of the PDB. The only differences from the standard
+ are:
+ 1. The residue name is 4 characters rather than 3
+ (allowing LYSH, HISA, etc.).
+ 2. The atom name starts one column later than the
+ standard and is four columns wide encompasing the
+ standard's `alternate' field. These two
+ differences from the standard reflect the common
+ usage.
+ V1.2a 28.06.90 Buffer size increased to 85 chars.
+ V1.2b 15.02.91 Simply changed comment header to match new standard.
+ V1.3 07.01.92 Corrected small bug in while() loop. Now ignores
+ blank lines properly
+ V1.4 11.05.92 Added check on EOF in while() loop and memset() of
+ buffer. ANSIfied.
+ V1.5 01.06.92 Documented for autodoc
+ V1.7 01.10.92 Changed to use fgets()
+ V1.6 19.06.92 Corrected use of stdlib
+ V1.8 08.12.92 SAS/C V6 now defines atof() in stdlib
+ V1.9 10.06.93 Returns TRUE or FALSE rather than exiting on failure
+ V2.0 17.06.93 Rewritten to use fsscanf()
+ V2.1 08.07.93 Modified to give ReadPDB() and ReadPDBAtoms()
+ V2.2 09.07.93 Modified to return the PDB pointer rather than a BOOL.
+ There is now no need to initialise the structure first.
+ Rewrote allocation scheme.
+ V2.3 17.03.94 Handles partial occupancies. If occupancies are not
+ 1.0 or 0.0, the normal routine now reads only the
+ highest occupancy atoms and corrects the atoms names
+ to remove alternative labels. This behaviour can be
+ overridden by calling one of the ...OccRank()
+ routines to read lower occupancy atoms.
+ Sets natom to -1 if there was an error to distinguish
+ from no atoms.
+ Handles atom names which start in column 13 rather
+ than column 14. This is allowed in the standard, but
+ very rare.
+ Added ReadPDBOccRank() & ReadPDBAtomsOccRank()
+ Sets gPDBPartialOcc flag.
+ V2.4 06.04.94 With atom names which start in column 13, now checks
+ if the first character is a digit. If so, moves it
+ to the end of the atom name. Thus, 1HH1 becomes HH11
+ and 2HH1 becomes HH12.
+ V2.5 04.10.94 Fixed partial occ when resnum changes as well as atom
+ name. Fixed bug when MAXPARTIAL exceeded.
+ V2.6 03.11.94 Simply Corrected description. No code changes
+ V2.7 06.03.95 Now reads just the first NMR model by default
+ doReadPDB() no longer static
+ Sets gPDBMultiNMR if ENDMDL records found.
+ V2.8 13.01.97 Added check on return from fsscanf. Blank lines used
+ to result in duplication of the previous line since
+ fsscanf() does not reset the variables on receiving
+ a blank line. Also fixed in fsscanf().
+ V2.9 25.02.98 Added transparent reading of gzipped PDB files if
+ GUNZIP_SUPPORT is defined
+ V2.10 18.08.98 Added cast to popen() for SunOS
+ V2.11 08.10.99 Initialised some variables
+ V2.12 15.02.01 Added atnam_raw into PDB structure
+ V2.13 30.05.02 Changed PDB field from 'junk' to 'record_type'
+ V2.14 27.04.05 Fixed bug in atnam_raw for multiple occupancies
+ V2.15 03.06.05 Added altpos field to PDB structure. The massaged atom
+ name no longer contains the alternate indicator and
+ atnam_raw has only the atom name with altpos having the
+ alternate indicator (as it should!)
+ V2.16 14.10.05 Fixed a problem in StoreOccRankAtom() when a lower
+ occupancy atom has (erroneously) been set to occupancy
+ of zero and you want to pull out that atom
+ V2.17 25.01.06 Added calls to RemoveAlternates()
+ V2.18 03.02.06 Added prototypes for popen() and pclose()
+ V2.19 05.06.07 Added support for Unix compress'd files
+ V2.20 29.06.07 popen() and pclose() prototypes now skipped for MAC OSX
+ which defines them differently
+ V2.21 17.03.09 popen() prototype skipped for Windows. By: CTP
+
+*************************************************************************/
+/* Defines required for includes
+*/
+#define READPDB_MAIN
+
+/************************************************************************/
+/* Includes
+*/
+#include "port.h" /* Required before stdio.h */
+
+#include <stdio.h>
+#include <string.h>
+#include <math.h>
+#include <stdlib.h>
+#include <ctype.h>
+#include <unistd.h>
+
+#include "SysDefs.h"
+#include "MathType.h"
+#include "pdb.h"
+#include "macros.h"
+#include "fsscanf.h"
+#include "general.h"
+
+#define MAXPARTIAL 8
+#define SMALL 0.000001
+
+/************************************************************************/
+/* Prototypes
+*/
+static BOOL StoreOccRankAtom(int OccRank, PDB multi[MAXPARTIAL],
+ int NPartial, PDB **ppdb, PDB **pp,
+ int *natom);
+#if !defined(__APPLE__) && !defined(MS_WINDOWS)
+FILE *popen(char *, char *);
+#endif
+#ifndef __APPLE__
+int pclose(FILE *);
+#endif
+
+/************************************************************************/
+/*>PDB *ReadPDB(FILE *fp, int *natom)
+ ----------------------------------
+ Input: FILE *fp A pointer to type FILE in which the
+ .PDB file is stored.
+ Output: int *natom Number of atoms read. -1 if error.
+ Returns: PDB *pdb A pointer to the first allocated item of
+ the PDB linked list
+
+ Reads a PDB file into a PDB linked list
+
+ 08.07.93 Written as entry for doReadPDB()
+ 09.07.93 Modified to return pointer to PDB
+ 17.03.94 Modified to handle OccRank
+ 06.03.95 Added value for NMR model to read (1 = first)
+ 25.01.06 Added call to RemoveAlternates() - this deals with odd
+ cases where alternate atom positions don't appear where
+ they should!
+ 25.01.06 Added call to RemoveAlternates(). This deals with odd uses
+ of multiple occupancies like 3pga and the instance where
+ the alternates are all grouped at the end of the file.
+*/
+PDB *ReadPDB(FILE *fp,
+ int *natom)
+{
+ PDB *pdb;
+ pdb = doReadPDB(fp, natom, TRUE, 1, 1);
+ pdb = RemoveAlternates(pdb);
+ return(pdb);
+}
+
+/************************************************************************/
+/*>PDB *ReadPDBAll(FILE *fp, int *natom)
+ -------------------------------------
+ Input: FILE *fp A pointer to type FILE in which the
+ .PDB file is stored.
+ Output: int *natom Number of atoms read. -1 if error.
+ Returns: PDB *pdb A pointer to the first allocated item of
+ the PDB linked list
+
+ Reads a PDB file into a PDB linked list. Reads all partial occupancy
+ atoms. Reads both ATOM and HETATM records.
+
+ 04.10.94 Original By: ACRM
+ 06.03.95 Added value for NMR model to read (0 = all)
+*/
+PDB *ReadPDBAll(FILE *fp,
+ int *natom)
+{
+ return(doReadPDB(fp, natom, TRUE, 0, 0));
+}
+
+/************************************************************************/
+/*>PDB *ReadPDBAtoms(FILE *fp, int *natom)
+ ---------------------------------------
+ Input: FILE *fp A pointer to type FILE in which the
+ .PDB file is stored.
+ Output: int *natom Number of atoms read. -1 if error.
+ Returns: PDB *pdb A pointer to the first allocated item of
+ the PDB linked list
+
+ Reads a PDB file into a PDB linked list. Atoms only (no HETATM cards).
+
+ 08.07.93 Written as entry for doReadPDB()
+ 09.07.93 Modified to return pointer to PDB
+ 17.03.94 Modified to handle OccRank
+ 06.03.95 Added value for NMR model to read (1 = first)
+ 25.01.06 Added call to RemoveAlternates(). This deals with odd uses
+ of multiple occupancies like 3pga and the instance where
+ the alternates are all grouped at the end of the file.
+*/
+PDB *ReadPDBAtoms(FILE *fp,
+ int *natom)
+{
+ PDB *pdb;
+ pdb = doReadPDB(fp, natom, FALSE, 1, 1);
+ pdb = RemoveAlternates(pdb);
+ return(pdb);
+}
+
+/************************************************************************/
+/*>PDB *ReadPDBOccRank(FILE *fp, int *natom, int OccRank)
+ ------------------------------------------------------
+ Input: FILE *fp A pointer to type FILE in which the
+ .PDB file is stored.
+ int OccRank Occupancy ranking (>=1)
+ Output: int *natom Number of atoms read. -1 if error.
+ Returns: PDB *pdb A pointer to the first allocated item of
+ the PDB linked list
+
+ Reads a PDB file into a PDB linked list selecting the OccRank'th
+ highest occupancy atoms
+
+ 17.03.94 Original By: ACRM
+ 06.03.95 Added value for NMR model to read (1 = first)
+*/
+PDB *ReadPDBOccRank(FILE *fp, int *natom, int OccRank)
+{
+ return(doReadPDB(fp, natom, TRUE, OccRank, 1));
+}
+
+/************************************************************************/
+/*>PDB *ReadPDBAtomsOccRank(FILE *fp, int *natom, int OccRank)
+ -----------------------------------------------------------
+ Input: FILE *fp A pointer to type FILE in which the
+ .PDB file is stored.
+ int OccRank Occupancy ranking (>=1)
+ Output: int *natom Number of atoms read. -1 if error.
+ Returns: PDB *pdb A pointer to the first allocated item of
+ the PDB linked list
+
+ Reads a PDB file into a PDB linked list ignoring HETATM records
+ and selecting the OccRank'th highest occupancy atoms
+
+ 17.03.94 Original By: ACRM
+ 06.03.95 Added value for NMR model to read (1 = first)
+*/
+PDB *ReadPDBAtomsOccRank(FILE *fp, int *natom, int OccRank)
+{
+ return(doReadPDB(fp, natom, FALSE, OccRank, 1));
+}
+
+/************************************************************************/
+/*>PDB *doReadPDB(FILE *fp, int *natom, BOOL AllAtoms, int OccRank,
+ int ModelNum)
+ ----------------------------------------------------------------
+ Input: FILE *fp A pointer to type FILE in which the
+ .PDB file is stored.
+ BOOL AllAtoms TRUE: ATOM & HETATM records
+ FALSE: ATOM records only
+ int OccRank Occupancy ranking
+ int ModelNum NMR Model number (0 = all)
+ Output: int *natom Number of atoms read. -1 if error.
+ Returns: PDB *pdb A pointer to the first allocated item of
+ the PDB linked list
+
+ Reads a PDB file into a PDB linked list. The OccRank value indicates
+ occupancy ranking to read for partial occupancy atoms.
+ If any partial occupancy atoms are read the global flag
+ gPDBPartialOcc is set to TRUE.
+
+ 04.11.88 V1.0 Original
+ 07.02.89 V1.1 Ignore records which aren't ATOM or HETATM
+ 28.03.90 V1.2 Altered field widths to match PDB standard better
+ See notes above for deviations
+ 28.06.90 V1.2a Buffer size increased to 85 chars.
+ 15.02.91 V1.2b Changed comment header to match new standard.
+ 07.01.92 V1.3 Ignores blank lines properly
+ 11.05.92 V1.4 Check on EOF in while() loop, memset() buffer.
+ ANSIed.
+ 01.06.92 V1.5 Documented for autodoc
+ 19.06.92 V1.6 Corrected use of stdlib
+ 01.10.92 V1.7 Changed to use fgets()
+ 10.06.93 V1.9 Returns 0 on failure rather than exiting
+ Replaced SIZE with sizeof(PDB) directly
+ 17.06.93 V2.0 Rewritten to use fsscanf()
+ 08.07.93 V2.1 Split from ReadPDB()
+ 09.07.93 V2.2 Modified to return pointer to PDB. Rewrote allocation
+ scheme.
+ 17.03.94 V2.3 Handles partial occupancies
+ Sets natom to -1 if there was an error to distinguish
+ from no atoms.
+ Handles atom names which start in column 13 rather
+ than column 14. This is allowed in the standard, but
+ very rare.
+ Sets flag for partials.
+ 06.04.94 V2.4 Atom names starting in column 13 have their first
+ character moved to the end if it is a digit.
+ 03.10.94 V2.5 Check residue number as well as atom name when running
+ through alternative atoms for partial occupancy
+ Moved increment of NPartial, so only done if there
+ is space in the array. If OccRank is 0, all atoms are
+ read regardless of occupancy.
+ 06.03.95 V2.7 Added value for NMR model to read (0 = all)
+ No longer static. Sets gPDBMultiNMR if ENDMDL records
+ found.
+ 13.01.97 V2.8 Added check on return from fsscanf. Blank lines used
+ to result in duplication of the previous line since
+ fsscanf() does not reset the variables on receiving
+ a blank line. Also fixed in fsscanf().
+ 25.02.98 V2.9 Added code to read gzipped PDB files transparently
+ when GUNZIP_SUPPORT is defined
+ 17.08.98 V2.10 Added case to popen() for SunOS
+ 08.10.99 V2.11 Initialise CurIns and CurRes
+ 15.02.01 V2.12 Added atnam_raw
+ 27.04.05 V2.14 Added another atnam_raw for multiple occupancies
+ 03.06.05 V2.15 Added altpos
+ 14.10.05 V2.16 Modified detection of partial occupancy. handles
+ residues like 1zeh/B16 where a lower partial is
+ erroneously set to zero
+ 05.06.07 V2.19 Added support for Unix compress'd files
+*/
+PDB *doReadPDB(FILE *fpin,
+ int *natom,
+ BOOL AllAtoms,
+ int OccRank,
+ int ModelNum)
+{
+ char record_type[8],
+ atnambuff[8],
+ *atnam,
+ atnam_raw[8],
+ resnam[8],
+ chain[4],
+ insert[4],
+ buffer[160],
+ CurAtom[8],
+ cmd[80],
+ CurIns = ' ',
+ altpos;
+ int atnum,
+ resnum,
+ CurRes = 0,
+ NPartial,
+ ModelCount = 1;
+ FILE *fp = fpin;
+ double x,y,z,
+ occ,
+ bval;
+ PDB *pdb = NULL,
+ *p,
+ multi[MAXPARTIAL]; /* Temporary storage for partial occ */
+
+#ifdef GUNZIP_SUPPORT
+ int signature[3],
+ i,
+ ch;
+#endif
+
+ *natom = 0;
+ CurAtom[0] = '\0';
+ NPartial = 0;
+ gPDBPartialOcc = FALSE;
+ gPDBMultiNMR = FALSE;
+ cmd[0] = '\0';
+
+#ifdef GUNZIP_SUPPORT
+ /* See whether this is a gzipped file */
+ for(i=0; i<3; i++)
+ signature[i] = fgetc(fpin);
+ for(i=2; i>=0; i--)
+ ungetc(signature[i], fpin);
+ if(((signature[0] == (int)0x1F) && /* gzip */
+ (signature[1] == (int)0x8B) &&
+ (signature[2] == (int)0x08)) ||
+ ((signature[0] == (int)0x1F) && /* 05.06.07 compress */
+ (signature[1] == (int)0x9D) &&
+ (signature[2] == (int)0x90)))
+ {
+ /* It is gzipped so we'll open gunzip as a pipe and send the data
+ through that into a temporary file
+ */
+ sprintf(cmd,"gunzip >/tmp/readpdb_%d",(int)getpid());
+ if((fp = (FILE *)popen(cmd,"w"))==NULL)
+ {
+ *natom = (-1);
+ return(NULL);
+ }
+ while((ch=fgetc(fpin))!=EOF)
+ fputc(ch, fp);
+ pclose(fp);
+
+ /* We now reopen the temporary file as our PDB input file */
+ sprintf(cmd,"/tmp/readpdb_%d",(int)getpid());
+ if((fp = fopen(cmd,"r"))==NULL)
+ {
+ *natom = (-1);
+ return(NULL);
+ }
+ }
+#endif
+
+ while(fgets(buffer,159,fp))
+ {
+ if(ModelNum != 0) /* We are interested in model numbers */
+ {
+ if(!strncmp(buffer,"ENDMDL",6))
+ {
+ ModelCount++;
+ }
+
+ if(ModelCount < ModelNum) /* Haven't reached the right model */
+ continue;
+ else if(ModelCount > ModelNum) /* Gone past the right model */
+ break;
+ }
+
+ if(!strncmp(buffer,"ENDMDL",6))
+ gPDBMultiNMR = TRUE;
+
+ if(fsscanf(buffer,"%6s%5d%1x%5s%4s%1s%4d%1s%3x%8lf%8lf%8lf%6lf%6lf",
+ record_type,&atnum,atnambuff,resnam,chain,&resnum,insert,
+ &x,&y,&z,&occ,&bval) != EOF)
+ {
+ if((!strncmp(record_type,"ATOM ",6)) ||
+ (!strncmp(record_type,"HETATM",6) && AllAtoms))
+ {
+ /* Copy the raw atom name */
+ /* 03.06.05 Note: this reads the alternate atom position as
+ well as the atom name - changes in FixAtomName() now strip
+ that
+ We now copy only the first 4 characters into atnam_raw and
+ put the 5th character into altpos
+ */
+ strncpy(atnam_raw, atnambuff, 4);
+ atnam_raw[4] = '\0';
+ altpos = atnambuff[4];
+
+ /* Fix the atom name accounting for start in column 13 or 14*/
+ atnam = FixAtomName(atnambuff, occ);
+
+ /* Check for full occupancy. If occupancy is 0.0 assume that
+ it is actually fully occupied; the column just hasn't been
+ filled in correctly
+
+ 04.10.94 Read all atoms if OccRank is 0
+
+ 14.10.05 Now takes an atom as full occupancy:
+ if occ==1.0
+ if occ==0.0 and altpos==' '
+ if OccRank==0
+ This fixes problems where a lower (partial)
+ occupancy has erroneously been set to zero
+ */
+ if(((altpos == ' ') && (occ < (double)SMALL)) ||
+ (occ > (double)0.999) ||
+ (OccRank == 0))
+ {
+ /* Trim the atom name to 4 characters */
+ atnam[4] = '\0';
+
+ if(NPartial != 0)
+ {
+ if(!StoreOccRankAtom(OccRank,multi,NPartial,&pdb,&p,
+ natom))
+ {
+ if(pdb != NULL) FREELIST(pdb, PDB);
+ *natom = (-1);
+ if(cmd[0]) unlink(cmd);
+ return(NULL);
+ }
+
+ /* Set partial occupancy counter to 0 */
+ NPartial = 0;
+ }
+
+ /* Allocate space in the linked list */
+ if(pdb == NULL)
+ {
+ INIT(pdb, PDB);
+ p = pdb;
+ }
+ else
+ {
+ ALLOCNEXT(p, PDB);
+ }
+
+ /* Failed to allocate space; free up list so far & return*/
+ if(p==NULL)
+ {
+ if(pdb != NULL) FREELIST(pdb, PDB);
+ *natom = (-1);
+ if(cmd[0]) unlink(cmd);
+ return(NULL);
+ }
+
+ /* Increment the number of atoms */
+ (*natom)++;
+
+ /* Store the information read */
+ p->atnum = atnum;
+ p->resnum = resnum;
+ p->x = (REAL)x;
+ p->y = (REAL)y;
+ p->z = (REAL)z;
+ p->occ = (REAL)occ;
+ p->bval = (REAL)bval;
+ p->altpos = altpos; /* 03.06.05 Added this one */
+ p->next = NULL;
+ strcpy(p->record_type, record_type);
+ strcpy(p->atnam, atnam);
+ strcpy(p->atnam_raw, atnam_raw);
+ strcpy(p->resnam, resnam);
+ strcpy(p->chain, chain);
+ strcpy(p->insert, insert);
+ }
+ else /* Partial occupancy */
+ {
+ /* Set flag to say we've got a partial occupancy atom */
+ gPDBPartialOcc = TRUE;
+
+ /* First in a group, store atom name */
+ if(NPartial == 0)
+ {
+ CurIns = insert[0];
+ CurRes = resnum;
+ strncpy(CurAtom,atnam,8);
+ }
+
+ if(strncmp(CurAtom,atnam,strlen(CurAtom)-1) ||
+ resnum != CurRes ||
+ CurIns != insert[0])
+ {
+ /* Atom name has changed
+ Select and store the OccRank highest occupancy atom
+ */
+ if(!StoreOccRankAtom(OccRank,multi,NPartial,&pdb,&p,
+ natom))
+ {
+ if(pdb != NULL) FREELIST(pdb, PDB);
+ *natom = (-1);
+ if(cmd[0]) unlink(cmd);
+ return(NULL);
+ }
+
+ /* Reset the partial atom counter */
+ NPartial = 0;
+ strncpy(CurAtom,atnam,8);
+ CurRes = resnum;
+ CurIns = insert[0];
+ }
+
+ if(NPartial < MAXPARTIAL)
+ {
+ /* Store the partial atom data */
+ multi[NPartial].atnum = atnum;
+ multi[NPartial].resnum = resnum;
+ multi[NPartial].x = (REAL)x;
+ multi[NPartial].y = (REAL)y;
+ multi[NPartial].z = (REAL)z;
+ multi[NPartial].occ = (REAL)occ;
+ multi[NPartial].bval = (REAL)bval;
+ multi[NPartial].next = NULL;
+ strcpy(multi[NPartial].record_type, record_type);
+ strcpy(multi[NPartial].atnam, atnam);
+ /* 27.04.05 - added this line */
+ strcpy(multi[NPartial].atnam_raw, atnam_raw);
+ strcpy(multi[NPartial].resnam, resnam);
+ strcpy(multi[NPartial].chain, chain);
+ strcpy(multi[NPartial].insert, insert);
+ /* 03.06.05 - added this line */
+ multi[NPartial].altpos = altpos;
+
+ NPartial++;
+ }
+ }
+ }
+ }
+ }
+
+ if(NPartial != 0)
+ {
+ if(!StoreOccRankAtom(OccRank,multi,NPartial,&pdb,&p,natom))
+ {
+ if(pdb != NULL) FREELIST(pdb, PDB);
+ *natom = (-1);
+ if(cmd[0]) unlink(cmd);
+ return(NULL);
+ }
+ }
+
+ if(cmd[0]) unlink(cmd);
+
+ /* Return pointer to start of linked list */
+ return(pdb);
+}
+
+/************************************************************************/
+/*>static BOOL StoreOccRankAtom(int OccRank, PDB multi[MAXPARTIAL],
+ int NPartial, PDB **ppdb, PDB **pp,
+ int *natom)
+ ----------------------------------------------------------------
+ Input: int OccRank Occupancy ranking required (>=1)
+ PDB multi[] Array of PDB records for alternative atom
+ positions
+ int NPartial Number of items in multi array
+ I/O: PDB **ppdb Start of PDB linked list (or NULL)
+ PDB **pp Current position in PDB linked list (or NULL)
+ int *natom Number of atoms read
+ Returns: BOOL Memory allocation success
+
+ Takes an array of PDB records which represent alternative atom
+ positions for an atom. Select the OccRank'th highest occupancy and
+ add this one into the PDB linked list.
+
+ To be called by doReadPDB().
+
+ 17.03.94 Original By: ACRM
+ 08.10.99 Initialise IMaxOcc and MaxOcc
+ 27.04.05 Added atnam_raw
+ 03.06.05 Added altpos
+ 14.10.05 Modified the flag value from 0.0 to -1.0 so that erroneous
+ lower occupancies of 0.0 are read properly and written back
+ with their occupancy (0.0) rather than the next higher
+ occupancy. Handles residues like 1zeh/B16
+*/
+static BOOL StoreOccRankAtom(int OccRank, PDB multi[MAXPARTIAL],
+ int NPartial, PDB **ppdb, PDB **pp,
+ int *natom)
+{
+ int i,
+ j,
+ IMaxOcc = 0;
+ REAL MaxOcc = (REAL)0.0,
+ LastOcc = (REAL)0.0;
+
+ if(OccRank < 1) OccRank = 1;
+
+ for(i=0; i<OccRank; i++)
+ {
+ MaxOcc = (REAL)0.0;
+ IMaxOcc = 0;
+
+ for(j=0; j<NPartial; j++)
+ {
+ if(multi[j].occ >= MaxOcc)
+ {
+ MaxOcc = multi[j].occ;
+ IMaxOcc = j;
+ }
+ }
+ /* 14.10.05 Changed flag value to -1 so that erroneous occupancies
+ of zero are treated properly
+ */
+ multi[IMaxOcc].occ = (REAL)-1.0;
+
+ /* 14.10.05 Changed flag value to -1 so that erroneous occupancies
+ of zero are treated properly
+ */
+ if(MaxOcc < (REAL)0.0) break;
+ LastOcc = MaxOcc;
+ }
+
+ /* If we ran out of rankings, take the last one to be found */
+ /* 14.10.05 Changed flag value to -1 so that erroneous occupancies
+ of zero are treated properly
+ */
+ if(MaxOcc < (REAL)0.0)
+ MaxOcc = LastOcc;
+
+ /* Store this atom
+ Allocate space in the linked list
+ */
+ if(*ppdb == NULL)
+ {
+ INIT((*ppdb), PDB);
+ *pp = *ppdb;
+ }
+ else
+ {
+ ALLOCNEXT(*pp, PDB);
+ }
+
+ /* Failed to allocate space; error return. */
+ if(*pp==NULL)
+ return(FALSE);
+
+ /* Increment the number of atoms */
+ (*natom)++;
+
+ /* Store the information read */
+ (*pp)->atnum = multi[IMaxOcc].atnum;
+ (*pp)->resnum = multi[IMaxOcc].resnum;
+ (*pp)->x = multi[IMaxOcc].x;
+ (*pp)->y = multi[IMaxOcc].y;
+ (*pp)->z = multi[IMaxOcc].z;
+ (*pp)->occ = MaxOcc;
+ (*pp)->bval = multi[IMaxOcc].bval;
+ (*pp)->next = NULL;
+ /* 03.06.05 Added this line */
+ (*pp)->altpos = multi[IMaxOcc].altpos;
+ strcpy((*pp)->record_type, multi[IMaxOcc].record_type);
+ strcpy((*pp)->atnam, multi[IMaxOcc].atnam);
+ /* 27.04.05 Added this line */
+ strcpy((*pp)->atnam_raw, multi[IMaxOcc].atnam_raw);
+ strcpy((*pp)->resnam, multi[IMaxOcc].resnam);
+ strcpy((*pp)->chain, multi[IMaxOcc].chain);
+ strcpy((*pp)->insert, multi[IMaxOcc].insert);
+
+ /* Patch the atom name to remove the alternate letter */
+ if(strlen((*pp)->atnam) > 4)
+ ((*pp)->atnam)[4] = '\0';
+ else
+ ((*pp)->atnam)[3] = ' ';
+
+ return(TRUE);
+}
+
+/************************************************************************/
+/*>char *FixAtomName(char *name, REAL occup)
+ -----------------------------------------
+ Input: char *name Atom name read from file
+ REAL occup Occupancy to allow fixing of partial occupancy
+ atom names
+ Returns: char * Fixed atom name (pointer into name)
+
+ Fixes an atom name by removing leading spaces, or moving a leading
+ digit to the end of the string. Used by doReadPDB()
+
+ 06.04.94 Original By: ACRM
+ 01.03.01 No longer static
+ 03.06.05 The name passed in has always contained the column which is
+ officially the alternate atom position indicator, but is
+ used by some programs as part of the atom name. Thus the
+ properly constructed variable coming into the routine should
+ be something like '1HG1 ' or '1HG1A' for an alternate atom
+ position. However some programs use ' HG11'. Therefore we
+ now check for a character in the last position and replace
+ it with a space if there is a space in the preceeding
+ position (e.g. ' CA A' -> ' CA ') or if there is a
+ character in the first position (e.g. '1HG1A' -> '1HG1 ')
+ or if the occupancy is not zero/one
+
+ NOTE!!! To support this, the routine now has a second
+ parameter: REAL occup
+*/
+char *FixAtomName(char *name, REAL occup)
+{
+ char *newname;
+ int len;
+
+ /* Default behaviour, just return the input string */
+ newname = name;
+
+ if(name[0] == ' ') /* Name starts in column 14 */
+ {
+ /* remove leading spaces */
+ KILLLEADSPACES(newname,name);
+ /* 03.06.05 If the last-but-one position is a space, force the last
+ position (the alternate atom indicator) to be a space
+ */
+ if(newname[2] == ' ')
+ {
+ newname[3] = ' ';
+ }
+ }
+ else /* Name starts in column 13 */
+ {
+ /* 03.06.05 The last character is the alternate atom indicator,
+ so force it to be a space
+ */
+ name[4] = ' ';
+
+ /* If the first character is a digit, move it to the end */
+ if(isdigit(name[0]))
+ {
+ if((len = chindex(name,' ')) == (-1))
+ {
+ /* We didn't find a space in the name, so add the character
+ onto the end of the string and re-terminate
+ */
+ len = strlen(name);
+ newname = name+1;
+ name[len] = name[0];
+ name[len+1] = '\0';
+ }
+ else
+ {
+ /* We did find a space in the name, so put the first
+ character there
+ */
+ newname = name+1;
+ name[len] = name[0];
+ }
+ }
+ }
+ return(newname);
+}
+
+/************************************************************************/
+/*>PDB *RemoveAlternates(PDB *pdb)
+ -------------------------------
+ I/O: PDB *pdb PDB
+ Returns: PDB * Ammended linked list (in case start has
+ changed)
+
+ Remove alternate atoms - we keep only the highest occupancy or the
+ first if there are more than one the same.
+
+ 25.01.05 Original based on code written for Inpharmatica By: ACRM
+*/
+PDB *RemoveAlternates(PDB *pdb)
+{
+ PDB *p,
+ *q,
+ *r,
+ *s,
+ *s_prev,
+ *r_prev,
+ *a_prev,
+ *next,
+ *alts[MAXPARTIAL];
+ int i,
+ altCount,
+ highest;
+
+
+ /* Step through residues */
+ r_prev=NULL;
+ for(p=pdb; p!=NULL; p=q)
+ {
+ q=FindNextResidue(p);
+
+ /* Step through atoms */
+ for(r=p; r!=q; NEXT(r))
+ {
+ if(r->altpos != ' ')
+ {
+#ifdef DEBUG
+ fprintf(stderr,"\n\nAlt pos found for record:\n");
+ WritePDBRecord(stderr, r);
+#endif
+ /* We have an alternate, store it and search for the other
+ ones
+ */
+ altCount=0;
+ alts[altCount++] = r;
+ /* Search through this residue for the alternates.
+ This will work for 99.9% of files where the alternates are
+ with the main atoms
+ */
+ for(s=r->next; s!=q; NEXT(s))
+ {
+ if(!strcmp(s->atnam_raw, alts[0]->atnam_raw))
+ {
+ if(altCount < MAXPARTIAL)
+ {
+ alts[altCount++] = s;
+#ifdef DEBUG
+ fprintf(stderr,"Partner atom found in res:\n");
+ WritePDBRecord(stderr, s);
+#endif
+ }
+ else
+ {
+ fprintf(stderr,"Warning==> More than %d alternative \
+conformations in\n", MAXPARTIAL);
+ fprintf(stderr," residue %c%d%c atom %s. \
+Increase MAXPARTIAL in ReadPDB.c\n", s->chain[0],
+ s->resnum,
+ s->insert[0],
+ s->atnam);
+ }
+ }
+ }
+ /* If we didn't find the alternates within the residue, then
+ we search the rest of the records.
+ This covers the known entry where the alternates are shoved
+ on the end instead!
+ */
+ if(altCount<2)
+ {
+#ifdef DEBUG
+ fprintf(stderr,"No partner found in residue\n");
+#endif
+
+ s_prev = NULL;
+ for(s=q; s!=NULL; NEXT(s))
+ {
+ if((s->resnum == alts[0]->resnum) &&
+ (s->insert[0] == alts[0]->insert[0]) &&
+ (s->chain[0] == alts[0]->chain[0]) &&
+ !strcmp(s->atnam_raw, alts[0]->atnam_raw))
+ {
+ if(altCount < MAXPARTIAL)
+ {
+ alts[altCount++] = s;
+#ifdef DEBUG
+ fprintf(stderr,"Partner found outside \
+residue:\n");
+ WritePDBRecord(stderr, s);
+#endif
+ }
+ else
+ {
+ fprintf(stderr,"Warning==> More than %d \
+alternative conformations in\n", MAXPARTIAL);
+ fprintf(stderr," residue %c%d%c atom \
+%s. Increase MAXPARTIAL in ReadPDB.c\n", s->chain[0],
+ s->resnum,
+ s->insert[0],
+ s->atnam);
+
+ /* Move this record to the correct position in the
+ linked list
+
+ First unlink s from its old position
+ */
+ if(s_prev != NULL)
+ s_prev->next = s->next;
+
+ /* Now link it back in where it should be */
+ next = r->next;
+ r->next = s;
+ s->next = next;
+ }
+ }
+ s_prev = s;
+ }
+ }
+
+ if(altCount < 2)
+ {
+#ifdef DEBUG
+ fprintf(stderr,"No alternates found. Resetting ALT \
+flag\n\n");
+#endif
+ alts[0]->altpos = ' ';
+
+ }
+ else
+ {
+ /* Find the highest occupancy, defaulting to the first */
+ highest = 0;
+ for(i=0; i<altCount; i++)
+ {
+ if(alts[i]->occ > alts[highest]->occ)
+ highest = i;
+ }
+
+ /* Delete the unwanted alternates */
+ for(i=0; i<altCount; i++)
+ {
+ if(i==highest) /* For the highest remove the ALT flag */
+ {
+#ifdef DEBUG
+ fprintf(stderr,"Highest occupancy selected:\n");
+ WritePDBRecord(stderr, alts[i]);
+#endif
+ alts[i]->altpos = ' ';
+ }
+ else
+ {
+ /* If we are deleting the current record pointer,
+ then we need to update it
+ */
+ if(alts[i] == r)
+ {
+#ifdef DEBUG
+ fprintf(stderr,"Deleting current record \
+pointer\n");
+#endif
+
+ if(r_prev == NULL)
+ {
+ r_prev = r;
+ NEXT(r);
+ /* We are deleting the head of the list so we
+ must update the main list pointer
+ */
+ pdb = r;
+ }
+ else
+ {
+ r = r_prev;
+ FINDPREV(r_prev, pdb, r);
+ }
+ }
+
+ /* Delete the alternate we don't need */
+#ifdef DEBUG
+ fprintf(stderr,"Deleting Alt pos record:\n");
+ WritePDBRecord(stderr, alts[i]);
+#endif
+
+ FINDPREV(a_prev, pdb, alts[i]);
+ if(a_prev != NULL)
+ a_prev->next = alts[i]->next;
+ free(alts[i]);
+
+ } /* Not the highest, so we delete it */
+ } /* Stepping through the alternates */
+ }
+
+ } /* We have an alternate */
+ r_prev = r;
+ } /* Stepping through the atoms of this residue */
+ } /* Stepping through the residues */
+ return(pdb);
+}
+
diff --git a/src/bioplib/ReadPIR.c b/src/bioplib/ReadPIR.c
new file mode 100644
index 0000000..3da2759
--- /dev/null
+++ b/src/bioplib/ReadPIR.c
@@ -0,0 +1,445 @@
+/*************************************************************************
+
+ Program:
+ File: ReadPIR.c
+
+ Version: V2.7R
+ Date: 06.02.96
+ Function: Read a PIR sequence file
+
+ Copyright: (c) SciTech Software 1991-6
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+ int ReadPIR(FILE *fp, BOOL DoInsert, char **seqs, int maxchain,
+ SEQINFO *seqinfo, BOOL *punct, BOOL *error)
+ ---------------------------------------------------------------
+ This version attempts to read any PIR file following the PIR
+ specifications. It also accepts a few non-standard features:
+ lower case sequence, no star at end of last chain, dashes in the
+ sequence to indicate insertions.
+
+ See also:
+ int SimpleReadPIR(FILE *fp, int maxres, char **seqs)
+ int ReadRawPIR(FILE *fp, BOOL DoInsert, char **seqs, int maxchain,
+ SEQINFO *seqinfo, BOOL *punct, BOOL *error)
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 01.06.92 Original
+ V2.0 08.03.94 Changed name of ReadPIR() to ReadSimplePIR()
+ Added new ReadPIR().
+ V2.1 18.03.94 getc() -> fgetc()
+ V2.2 11.05.94 Changes to ReadPIR() for better compatibility with
+ PIR V38.0 and V39.0
+ V2.3 28.02.95 Added ReadRawPIR()
+ V2.4 13.03.95 Fixed bug in reading text lines in ReadRawPIR()
+ V2.5 26.07.95 Removed unused variables
+ V2.6 30.10.95 Cosmetic
+ V2.7 06.02.96 Removes trailing spaces from comment line
+
+*************************************************************************/
+/* Includes
+*/
+#include <string.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <ctype.h>
+
+#include "SysDefs.h"
+#include "macros.h"
+#include "seq.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>int ReadPIR(FILE *fp, BOOL DoInsert, char **seqs, int maxchain,
+ SEQINFO *seqinfo, BOOL *punct, BOOL *error)
+ ------------------------------------------------------------------
+ Input: FILE *fp File pointer
+ BOOL DoInsert TRUE Read - characters into the sequence
+ FALSE Skip - characters
+ int maxchain Max number of chains to read. This is the
+ dimension of the seqs array.
+ N.B. THIS SHOULD BE AT LEAST 1 MORE THAN
+ THE EXPECTED MAXIMUM NUMBER OF SEQUENCES
+ Output: char **seqs Array of character pointers which will
+ be filled in with sequence information.
+ Memory will be allocated for any sequence
+ length.
+ SEQINFO *seqinfo This structure will be filled in with
+ extra information about the sequence.
+ Header & title information and details
+ of any punctuation.
+ BOOL *punct TRUE if any punctuation found.
+ BOOL *error TRUE if an error occured (e.g. memory
+ allocation)
+ Returns: int Number of chains in this sequence.
+ 0 if file ended, or no valid sequence
+ entries found.
+
+ This is an all-singing, all-dancing PIR reader which should handle
+ all legal PIR files and some (slightly) incorrect ones. The only
+ requirements of the code are that the PIR file should have 2 title
+ lines per entry, the first line starting with a > sign.
+
+ The routine will handle multiple sequence files. Successive calls
+ will return information on the next entry. The routine will return
+ 0 when there are no more entries.
+
+ Header line: Must start with >. Will handle files which don't have
+ the proper P1; or F1; parts of the header as well as those which
+ do.
+
+ Title line: Will read the name and source fields if correctly
+ separated by a -, otherwise copies all information into the name.
+
+ Sequence: May contain allowed puctuation. This will set the punct
+ flag and information on the types found will be placed in seqinfo.
+ White space and line breaks are ignored. Each chain should end with
+ a *, but the routine will accept the last chain of an entry with no
+ *. While the standard requires upper case text, this routine will
+ handle lower case and convert it to upper case. While the routine
+ does pretty well at last chains not terminated with a *, a last
+ chain ending with a / not followed by a * but followed by a text
+ line will be identified as incomplete rather than truncated.
+ If the DoInsert flag is set, - signs in the sequence will be
+ read as part of the sequence, otherwise they will be skipped. This
+ is an addition to the PIR standard.
+
+ Text lines: Text lines after an entry (beginning with R;, C;, A;,
+ N; or F;) are ignored.
+
+ 02.03.94 Original By: ACRM
+ 03.03.94 Added / and = handling, upcasing, strcpy()->strncpy(),
+ header lines without semi-colon, title lines without -
+ 07.03.94 Added sequence insertion handling and DoInsert parameter.
+ 11.05.94 buffer is now 504 characters (V38.0 spec allows 500 chars)
+ Removes leading spaces from entry code and terminates at
+ first space (V39.0 spec allows comments after the code).
+ 28.02.95 Added check that buffer doesn't overflow. Check on nseq
+ changed to >=
+ 06.02.96 Removes trailing spaces from comment line
+*/
+int ReadPIR(FILE *fp, BOOL DoInsert, char **seqs, int maxchain,
+ SEQINFO *seqinfo, BOOL *punct, BOOL *error)
+{
+ int ch,
+ i,
+ chpos,
+ nseq = 0,
+ ArraySize,
+ SeqPos;
+ char buffer[504],
+ *ptr;
+ BOOL InParen,
+ GotStar;
+
+ /* Initialise error and punct outputs */
+ *error = FALSE;
+ *punct = FALSE;
+
+ /* Initialise seqinfo structure */
+ if(seqinfo != NULL)
+ {
+ seqinfo->code[0] = '\0';
+ seqinfo->name[0] = '\0';
+ seqinfo->source[0] = '\0';
+ seqinfo->fragment = FALSE;
+ seqinfo->paren = FALSE;
+ seqinfo->DotInParen = FALSE;
+ seqinfo->NonExpJoin = FALSE;
+ seqinfo->UnknownPos = FALSE;
+ seqinfo->Incomplete = FALSE;
+ seqinfo->Juxtapose = FALSE;
+ seqinfo->Truncated = FALSE;
+ }
+
+ /* Skip over any characters until the first > sign */
+ while((ch=fgetc(fp)) != EOF && ch != '>') ;
+
+ /* Check for end of file */
+ if(ch==EOF) return(0);
+
+ /* Read the rest of this line into a buffer */
+ i = 0;
+ while((ch=fgetc(fp)) != EOF && ch != '\n' && i<503)
+ buffer[i++] = (char)ch;
+ buffer[i] = '\0';
+
+ /* Check for end of file */
+ if(ch==EOF) return(0);
+
+ /* Set information in the seqinfo structure */
+ if(seqinfo != NULL)
+ {
+ /* Fragment flag */
+ if(buffer[2] == ';' && buffer[0] == 'F')
+ seqinfo->fragment = TRUE;
+ else
+ seqinfo->fragment = FALSE;
+
+ /* Entry code */
+ if(buffer[2] == ';')
+ {
+ KILLLEADSPACES(ptr,(buffer+3));
+ }
+ else
+ {
+ KILLLEADSPACES(ptr,buffer);
+ }
+
+ strncpy(seqinfo->code, ptr, 16);
+ seqinfo->code[15] = '\0';
+
+ /* Terminate entry code at first space since comments are allowed
+ after the entry code (V39.0 spec)
+ */
+ for(i=0; seqinfo->code[i]; i++)
+ {
+ if(seqinfo->code[i] == ' ' || seqinfo->code[i] == '\t')
+ {
+ seqinfo->code[i] = '\0';
+ break;
+ }
+ }
+ }
+
+ /* Now read the title line */
+ if(!fgets(buffer,240,fp))
+ return(0);
+ buffer[240] = '\0';
+
+ /* 06.02.96 Remove any trailing spaces */
+ KILLTRAILSPACES(buffer);
+
+ /* Set information in the seqinfo structure */
+ if(seqinfo)
+ {
+ TERMINATE(buffer);
+ /* If it's a fully legal PIR file, there will be a - in the midle
+ of the title line to separate name from source. If we don't
+ find one, we copy the whole line into the name
+ */
+ if((ptr = strstr(buffer," - ")) != NULL)
+ {
+ *ptr = '\0';
+ strncpy(seqinfo->source, ptr+3, 160);
+ seqinfo->source[159] = '\0';
+ }
+ strncpy(seqinfo->name, buffer, 160);
+ seqinfo->name[159] = '\0';
+ /* 06.02.96 Remove any trailing spaces */
+ KILLTRAILSPACES(seqinfo->name);
+ }
+
+ /* Read the actual sequence info. */
+ chpos = 0;
+ for(;;)
+ {
+ GotStar = FALSE;
+ InParen = FALSE;
+
+ /* Allocate some space for the sequence */
+ ArraySize = ALLOCSIZE;
+ if((seqs[nseq] = (char *)malloc(ArraySize * sizeof(char)))==NULL)
+ {
+ *error = TRUE;
+ return(0);
+ }
+
+ SeqPos = 0;
+
+ /* Read characters, storing sequence and handling any
+ punctuation
+ */
+ while((ch = fgetc(fp)) != EOF && ch != '*' && ch != '>')
+ {
+ chpos++;
+
+ if(isalpha(ch) || (ch == '-' && DoInsert))
+ {
+ /* This is a sequence entry (probably!) */
+ seqs[nseq][SeqPos++] = (isupper(ch) ? ch : toupper(ch));
+
+ /* If necessary, expand the sequence array */
+ if(SeqPos >= ArraySize)
+ {
+ ArraySize += ALLOCSIZE;
+ seqs[nseq] = (char *)realloc((void *)(seqs[nseq]),
+ ArraySize);
+ if(seqs[nseq] == NULL)
+ {
+ *error = TRUE;
+ return(0);
+ }
+ }
+ }
+ else if(ch == '/')
+ {
+ /* Sequence is incomplete or truncated */
+ *punct = TRUE;
+
+ if(seqinfo != NULL)
+ {
+ if(SeqPos == 0) /* It's the first character in a chain */
+ {
+
+ seqinfo->Truncated = TRUE;
+ }
+ else /* Not first, is it last? */
+ {
+ /* Skip spaces and newlines till we get the next real
+ character
+ */
+ while((ch = fgetc(fp)) != EOF &&
+ (ch == ' ' || ch == '\t' || ch == '\n')) ;
+ /* Replace the character in the input stream */
+ ungetc(ch,fp);
+
+ if(ch == '*' ||
+ ch == EOF ||
+ ch == '>') /* End of chain */
+ seqinfo->Truncated = TRUE;
+ else /* Middle of chain */
+ seqinfo->Incomplete = TRUE;
+ }
+ }
+ }
+ else if(ch == '=')
+ {
+ /* Parts of the sequence may be juxtaposed */
+ *punct = TRUE;
+ if(seqinfo != NULL) seqinfo->Juxtapose = TRUE;
+ }
+ else if(ch == '(')
+ {
+ /* Start of a region in parentheses */
+ InParen = TRUE;
+ *punct = TRUE;
+ if(seqinfo != NULL) seqinfo->paren = TRUE;
+ }
+ else if(ch == ')')
+ {
+ /* End of region in parentheses */
+ InParen = FALSE;
+ *punct = TRUE;
+ if(seqinfo != NULL) seqinfo->paren = TRUE;
+ }
+ else if(ch == '.')
+ {
+ *punct = TRUE;
+
+ if(InParen)
+ {
+ /* Previous aa >90% certain in position */
+ if(seqinfo != NULL) seqinfo->DotInParen = TRUE;
+ }
+ else
+ {
+ /* Join in sequence not known experimentally but is clear
+ from sequence homology.
+ */
+ if(seqinfo != NULL) seqinfo->NonExpJoin = TRUE;
+ }
+ }
+ else if(ch == ',')
+ {
+ /* Position of previous aa not known with confidence */
+ if(seqinfo != NULL) seqinfo->UnknownPos = TRUE;
+ }
+ else if(ch == '\n')
+ {
+ /* Start of new line, relevant to check on ; */
+ chpos = 0;
+ }
+ else if(ch == ';' && chpos == 2)
+ {
+ /* This is a text line, so the previous character wasn't
+ a sequence item
+ */
+ SeqPos--;
+
+ /* Ignore the rest of this line and reset chpos */
+ while((ch = fgetc(fp))!=EOF && ch != '\n') ;
+ chpos = 0;
+ }
+ } /* Reading this sequence */
+
+ /* Test the exit conditions from the read character loop */
+ if(ch == '*')
+ {
+ /* End of chain */
+ seqs[nseq][SeqPos] = '\0';
+ GotStar = TRUE;
+ if(++nseq >= maxchain)
+ {
+ *error = TRUE;
+ return(nseq);
+ }
+ }
+ else if(ch == '>')
+ {
+ /* Start of new entry */
+ ungetc(ch,fp);
+ break; /* Out of read for this sequence */
+ }
+ else if(ch == EOF)
+ {
+ /* End of file */
+ break; /* Out of read for this sequence */
+ }
+ } /* Loop on with this sequence (next chain) */
+
+
+ /* Now tidy up if we have an unfinished sequence */
+ if(!GotStar)
+ {
+ seqs[nseq][SeqPos] = '\0';
+ if(!strlen(seqs[nseq]))
+ free(seqs[nseq]);
+ else
+ nseq++;
+ }
+
+ return(nseq);
+}
+
+
diff --git a/src/bioplib/SelAtPDB.c b/src/bioplib/SelAtPDB.c
new file mode 100644
index 0000000..02687e6
--- /dev/null
+++ b/src/bioplib/SelAtPDB.c
@@ -0,0 +1,186 @@
+/*************************************************************************
+
+ Program:
+ File: SelAtPDB.c
+
+ Version: V1.8
+ Date: 03.20.09
+ Function: Select a subset of atom types from a PDB linked list
+
+ Copyright: (c) SciTech Software 1990-2009
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+ pdbout = selectatoms(pdbin,nsel,sel,natom)
+
+ This routine takes a linked list of type PDB and returns a list
+ containing only those atom types specfied in the sel array.
+
+ Input: pdbin *PDB Input list
+ nsel int Number of atom types to keep
+ sel **char List of atom types to keep
+ Output: natom *int Number of atoms kept
+ Returns: pdbout *PDB Output list
+
+ To set up the list of atoms to keep, define an array of pointers
+ to char:
+ e.g. char *sel[10]
+ Then define the atoms in the list thus:
+ SELECT(sel[0],"N ");
+ SELECT(sel[1],"CA ");
+ SELECT(sel[2],"C ");
+ SELECT(sel[3],"O ");
+ The SELECT macro returns a character pointer which will be NULL if
+ the allocation it performs fails.
+
+ N.B. The routines are non-destructive; i.e. the original PDB linked
+ list is intact after the selection process
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 01.03.90 Original By: ACRM
+ V1.1 28.03.90 Modified to match new version of pdb.h
+ V1.2 24.05.90 Fixed so the variables passed in as sel[] don't
+ *have* to be 4 chars.
+ V1.3 17.05.93 Modified for book. Returns BOOL.
+ V1.4 09.07.93 Modified to return PDB pointer. Changed allocation
+ scheme. Changed back to sel[] variables *must* be 4
+ chars.
+ V1.5 01.11.94 Added HStripPDB()
+ V1.6 26.07.95 Removed unused variables
+ V1.7 16.10.96 Added SelectCaPDB()
+ V1.8 04.02.09 SelectAtomsPDB(): Initialize q for fussy compliers
+
+*************************************************************************/
+/* Includes
+*/
+#include <string.h>
+#include <math.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "SysDefs.h"
+#include "MathType.h"
+
+#include "pdb.h"
+#include "macros.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>PDB *SelectAtomsPDB(PDB *pdbin, int nsel, char **sel, int *natom)
+ -----------------------------------------------------------------
+ Input: pdbin *PDB Input list
+ nsel int Number of atom types to keep
+ sel **char List of atom types to keep
+ Output: natom *int Number of atoms kept
+ Returns: *PDB Output list
+
+ Take a PDB linked list and returns a list containing only those atom
+ types specified in the sel array.
+
+ To set up the list of atoms to keep, define an array of pointers
+ to char:
+ e.g. char *sel[10]
+ Then define the atoms in the list thus:
+ SELECT(sel[0],"N ");
+ SELECT(sel[1],"CA ");
+ SELECT(sel[2],"C ");
+ SELECT(sel[3],"O ");
+ Ensure the spaces are used!!
+
+ N.B. The routine is non-destructive; i.e. the original PDB linked
+ list is intact after the selection process
+
+ 01.03.90 Original By: ACRM
+ 28.03.90 Modified to match new version of pdb.h
+ 24.05.90 Fixed so the variables passed in as sel[] don't
+ *have* to be 4 chars.
+ 17.05.93 Modified for book. Returns BOOL.
+ 09.07.93 Modified to return PDB pointer. Changed allocation
+ scheme. Changed back to sel[] variables *must* be 4
+ chars.
+ 04.02.09 Initialize q for fussy compliers
+*/
+PDB *SelectAtomsPDB(PDB *pdbin, int nsel, char **sel, int *natom)
+{
+ PDB *pdbout = NULL,
+ *p,
+ *q = NULL;
+ int i;
+
+ *natom = 0;
+
+ /* Step through the input PDB linked list */
+ for(p=pdbin; p!= NULL; NEXT(p))
+ {
+ /* Step through the selection list */
+ for(i=0; i<nsel; i++)
+ {
+ /* See if there is a match */
+ if(!strncmp(p->atnam,sel[i],4))
+ {
+ /* Alloacte a new entry */
+ if(pdbout==NULL)
+ {
+ INIT(pdbout, PDB);
+ q = pdbout;
+ }
+ else
+ {
+ ALLOCNEXT(q, PDB);
+ }
+
+ /* If failed, free anything allocated and return */
+ if(q==NULL)
+ {
+ if(pdbout != NULL) FREELIST(pdbout,PDB);
+ *natom = 0;
+ return(NULL);
+ }
+
+ /* Increment atom count */
+ (*natom)++;
+
+ /* Copy the record to the output list (sets ->next to NULL) */
+ CopyPDB(q, p);
+
+ break;
+ }
+ }
+ }
+
+ /* Return pointer to start of output list */
+ return(pdbout);
+}
+
diff --git a/src/bioplib/StoreString.c b/src/bioplib/StoreString.c
new file mode 100644
index 0000000..a5767b5
--- /dev/null
+++ b/src/bioplib/StoreString.c
@@ -0,0 +1,160 @@
+/*************************************************************************
+
+ Program:
+ File: StoreString.c
+
+ Version: V1.21
+ Date: 18.06.02
+ Function:
+
+ Copyright: (c) SciTech Software 1991-2002
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.1 08.02.91 Added KillLine()
+ V1.2 10.02.91 Added setextn() and index()
+ V1.3 20.03.91 Added Word()
+ V1.4 28.05.92 ANSIed
+ V1.5 22.06.92 Added tab check to Word(). Improved setextn().
+ Added WordN(). Documented other routines.
+ V1.6 27.07.93 Corrected fsscanf() for double precision
+ V1.7 07.10.93 Checks made on case before toupper()/tolower()
+ for SysV compatibility. Also index() becomes
+ chindex()
+ V1.8 18.03.94 getc() -> fgetc()
+ V1.9 11.05.94 Added GetFilestem(), upstrcmp(), upstrncmp() &
+ GetWord()
+ V1.10 24.08.94 Added OpenStdFiles()
+ V1.11 08.03.95 Corrected OpenFile() for non-UNIX
+ V1.12 09.03.95 Added check on non-NULL filename in OpenFile()
+ V1.13 17.07.95 Added countchar()
+ V1.14 18.10.95 Moved YorN() to WindIO.c
+ V1.15 06.11.95 Added StoreString(), InStringList() and FreeStringList()
+ V1.16 22.11.95 Moved ftostr() to generam.c
+ V1.17 15.12.95 Added QueryStrStr()
+ V1.18 18.12.95 OpenStdFiles() treats filename of - as stdin/stdout
+ V1.19 05.02.96 OpenStdFiles() allows NULL pointers instead if filenames
+ V1.20 18.09.96 Added padchar()
+ V1.21 18.06.02 Added string.h
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdlib.h>
+#include <string.h>
+#include "general.h"
+#include "macros.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>STRINGLIST *StoreString(STRINGLIST *StringList, char *string)
+ -------------------------------------------------------------
+ Input: STRINGLIST *StringList The current linked list or NULL
+ if nothing yet allocated
+ char *string The string to store
+ Returns: STRINGLIST * Start of linked list. Used on
+ first call (when input StringList
+ is NULL) to return the pointer to
+ the start of the linked list.
+ NULL if unable to allocate.
+
+ Stores strings (of any length) in a linked list of type STRINGLIST.
+ Return a pointer to the start of the linked list which is used on
+ the first call to access the newly allocated memory.
+
+ If allocation fails, memory allocated so far is freed and the routine
+ returns NULL.
+
+ 06.11.95 Original By: ACRM
+*/
+STRINGLIST *StoreString(STRINGLIST *StringList, char *string)
+{
+ STRINGLIST *p,
+ *start;
+
+ if((StringList!=NULL) && ((string == NULL) || (string[0] == '\0')))
+ return(StringList);
+
+ /* Set p to the String List and move to the end of the linked list */
+ start = StringList;
+
+ /* If nothing in the list, initialise it */
+ if(start == NULL)
+ {
+ INIT(start,STRINGLIST);
+ p=start;
+ }
+ else /* Move to end of current list and add another item */
+ {
+ p=start;
+ LAST(p);
+
+ /* Only allocate another slot if a string is inserted in this one */
+ if(p->string != NULL)
+ ALLOCNEXT(p,STRINGLIST);
+ }
+
+ /* Check allocation */
+ if(p==NULL)
+ {
+ /* If failed, free the list so far and return NULL */
+ FREELIST(start, STRINGLIST);
+ return(NULL);
+ }
+ p->string = NULL;
+
+ /* Everything OK, allocate memory for the string */
+ if((string != NULL) && (string[0] != '\0'))
+ {
+ if((p->string = (char *)malloc((1+strlen(string))*sizeof(char)))
+ ==NULL)
+ {
+ /* No memory, free linked list and return */
+ FREELIST(start, STRINGLIST);
+ return(NULL);
+ }
+
+ /* Still OK, copy in the string and return */
+ strcpy(p->string,string);
+ }
+
+ return(start);
+}
+
+
diff --git a/src/bioplib/StringToUpper.c b/src/bioplib/StringToUpper.c
new file mode 100644
index 0000000..b284e6c
--- /dev/null
+++ b/src/bioplib/StringToUpper.c
@@ -0,0 +1,108 @@
+/*************************************************************************
+
+ Program:
+ File: StringToUpper.c
+
+ Version: V1.21
+ Date: 18.06.02
+ Function:
+
+ Copyright: (c) SciTech Software 1991-2002
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.1 08.02.91 Added KillLine()
+ V1.2 10.02.91 Added setextn() and index()
+ V1.3 20.03.91 Added Word()
+ V1.4 28.05.92 ANSIed
+ V1.5 22.06.92 Added tab check to Word(). Improved setextn().
+ Added WordN(). Documented other routines.
+ V1.6 27.07.93 Corrected fsscanf() for double precision
+ V1.7 07.10.93 Checks made on case before toupper()/tolower()
+ for SysV compatibility. Also index() becomes
+ chindex()
+ V1.8 18.03.94 getc() -> fgetc()
+ V1.9 11.05.94 Added GetFilestem(), upstrcmp(), upstrncmp() &
+ GetWord()
+ V1.10 24.08.94 Added OpenStdFiles()
+ V1.11 08.03.95 Corrected OpenFile() for non-UNIX
+ V1.12 09.03.95 Added check on non-NULL filename in OpenFile()
+ V1.13 17.07.95 Added countchar()
+ V1.14 18.10.95 Moved YorN() to WindIO.c
+ V1.15 06.11.95 Added StoreString(), InStringList() and FreeStringList()
+ V1.16 22.11.95 Moved ftostr() to generam.c
+ V1.17 15.12.95 Added QueryStrStr()
+ V1.18 18.12.95 OpenStdFiles() treats filename of - as stdin/stdout
+ V1.19 05.02.96 OpenStdFiles() allows NULL pointers instead if filenames
+ V1.20 18.09.96 Added padchar()
+ V1.21 18.06.02 Added string.h
+
+*************************************************************************/
+/* Includes
+*/
+#include <ctype.h>
+#include <string.h>
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>void StringToUpper(char *string1, char *string2)
+ ------------------------------------------------
+ Input: char *string1 A character string
+ Output: char *string2 Upper case version of string1
+
+ This routine converts a lower or mixed case string to upper case.
+
+ 06.02.91 Original
+ 28.05.92 ANSIed
+ 07.01.93 Checks case before converting for SysV
+*/
+void StringToUpper(char *string1,
+ char *string2)
+{
+ int i;
+
+ for(i=0;i<strlen(string1);i++)
+ if(islower(string1[i]))
+ string2[i]=toupper(string1[i]);
+ else
+ string2[i]=string1[i];
+ string2[i]='\0';
+}
+
+
diff --git a/src/bioplib/SysDefs.h b/src/bioplib/SysDefs.h
new file mode 100644
index 0000000..6b0557a
--- /dev/null
+++ b/src/bioplib/SysDefs.h
@@ -0,0 +1,80 @@
+/*************************************************************************
+
+ Program:
+ File: SysDefs.h
+
+ Version: V1.2R
+ Date: 01.02.96
+ Function: System-type variable type definitions
+
+ Copyright: (c) SciTech Software 1993-6
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 01.03.94 Original By: ACRM
+ V1.1 02.08.95 Added UCHAR
+ V1.2 01.02.96 Added UBYTE
+
+*************************************************************************/
+#ifndef _SYSDEFS_H
+#define _SYSDEFS_H
+
+#ifndef EXEC_TYPES_H /* Commodore Amiga; defines in <exec/types.h> */
+typedef void *APTR;
+
+#ifndef SYS_TYPES_H /* Unix: <sys/types.h>, MS-DOS: <sys\types.h> */
+#ifndef _TYPES_ /* Ditto */
+typedef short BOOL;
+typedef long LONG;
+typedef unsigned long ULONG;
+typedef short SHORT;
+typedef unsigned short USHORT;
+typedef unsigned char UCHAR;
+typedef unsigned char UBYTE;
+#endif
+#endif
+#endif
+
+#ifndef TRUE
+#define TRUE 1
+#endif
+#ifndef FALSE
+#define FALSE 0
+#endif
+
+#ifdef _ESV_
+typedef long time_t; /* Required on E&S System V */
+typedef long clock_t; /* Ditto */
+#define CLOCKS_PER_SEC 1000000 /* Ditto */
+#endif
+
+#endif
diff --git a/src/bioplib/TranslatePDB.c b/src/bioplib/TranslatePDB.c
new file mode 100644
index 0000000..80c5dfb
--- /dev/null
+++ b/src/bioplib/TranslatePDB.c
@@ -0,0 +1,86 @@
+/*************************************************************************
+
+ Program:
+ File: TranslatePDB.c
+
+ Version: V1.2
+ Date: 27.02.98
+ Function:
+
+ Copyright: (c) SciTech Software 1993-8
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.1 01.03.94 Original
+ V1.2 27.02.98 Removed unreachable break from switch()
+
+*************************************************************************/
+/* Includes
+*/
+#include "MathType.h"
+#include "pdb.h"
+#include "macros.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>void TranslatePDB(PDB *pdb,VEC3F tvect)
+ ---------------------------------------
+ I/O: PDB *pdb PDB linked list to move
+ Input: VEC3F tvect Translation vector
+
+ Translates a PDB linked list, ignoring null (9999.0) coordinates.
+ 01.10.92 Original
+ 11.03.94 Changed check on 9999.0 to >9998.0 and cast to REAL
+*/
+void TranslatePDB(PDB *pdb,
+ VEC3F tvect)
+{
+ PDB *p;
+
+ for(p=pdb; p!=NULL; NEXT(p))
+ {
+ if(p->x < (REAL)9999.0 && p->y < (REAL)9999.0 && p->z < (REAL)9999.0)
+ {
+ p->x += tvect.x;
+ p->y += tvect.y;
+ p->z += tvect.z;
+ }
+ }
+}
+
diff --git a/src/bioplib/WholePDB.c b/src/bioplib/WholePDB.c
new file mode 100644
index 0000000..3258892
--- /dev/null
+++ b/src/bioplib/WholePDB.c
@@ -0,0 +1,335 @@
+/*************************************************************************
+
+ Program:
+ File: ReadWholePDB.c
+
+ Version: V1.3
+ Date: 17.03.09
+ Function:
+
+ Copyright: (c) Dr. Andrew C. R. Martin, University of Reading, 2002
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 30.05.02 Original
+ V1.1 12.06.08 CTP Added include for port.h
+ V1.2 13.06.08 popen() and pclose() prototypes skipped for Mac OS X.
+ V1.3 17.03.09 popen() prototype skipped for Windows. By: CTP
+
+*************************************************************************/
+/* Includes
+*/
+#include "port.h" /* Required before stdio.h */
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "macros.h"
+#include "general.h"
+#include "pdb.h"
+
+
+/************************************************************************/
+/* Defines and macros
+*/
+#define MAXBUFF 160
+
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+static WHOLEPDB *doReadWholePDB(FILE *fpin, BOOL atomsonly);
+
+#if !defined(__APPLE__) && !defined(MS_WINDOWS)
+FILE *popen(char *, char *);
+#endif
+#ifndef __APPLE__
+int pclose(FILE *);
+#endif
+
+/************************************************************************/
+/*>void FreeWholePDB(WHOLEPDB *wpdb)
+ ---------------------------------
+ Input: WHOLEPDB *wpdb WHOLEPDB structure to be freed
+
+ Frees the header, trailer and atom content from a WHOLEPDB structure
+
+ 30.05.02 Original By: ACRM
+*/
+void FreeWholePDB(WHOLEPDB *wpdb)
+{
+ FreeStringList(wpdb->header);
+ FreeStringList(wpdb->trailer);
+ FREELIST(wpdb->pdb, PDB);
+ free(wpdb);
+}
+
+/************************************************************************/
+/*>void WriteWholePDB(FILE *fp, WHOLEPDB *wpdb)
+ --------------------------------------------
+ Input: FILE *fp File pointer
+ WHOLEPDB *wpdb Whole PDB structure pointer
+
+ Writes a PDB file including header and trailer information
+
+ 30.05.02 Original By: ACRM
+*/
+void WriteWholePDB(FILE *fp, WHOLEPDB *wpdb)
+{
+ WriteWholePDBHeader(fp, wpdb);
+ WritePDB(fp, wpdb->pdb);
+ WriteWholePDBTrailer(fp, wpdb);
+}
+
+
+/************************************************************************/
+/*>void WriteWholePDBHeader(FILE *fp, WHOLEPDB *wpdb)
+ --------------------------------------------------
+ Input: FILE *fp File pointer
+ WHOLEPDB *wpdb Whole PDB structure pointer
+
+ Writes the header of a PDB file
+
+ 30.05.02 Original By: ACRM
+*/
+void WriteWholePDBHeader(FILE *fp, WHOLEPDB *wpdb)
+{
+ STRINGLIST *s;
+
+ for(s=wpdb->header; s!=NULL; NEXT(s))
+ {
+ fputs(s->string, fp);
+ }
+}
+
+
+/************************************************************************/
+/*>void WriteWholePDBTrailer(FILE *fp, WHOLEPDB *wpdb)
+ ---------------------------------------------------
+ Input: FILE *fp File pointer
+ WHOLEPDB *wpdb Whole PDB structure pointer
+
+ Writes the trailer of a PDB file
+
+ 30.05.02 Original By: ACRM
+*/
+void WriteWholePDBTrailer(FILE *fp, WHOLEPDB *wpdb)
+{
+ STRINGLIST *s;
+
+ for(s=wpdb->trailer; s!=NULL; NEXT(s))
+ {
+ fputs(s->string, fp);
+ }
+}
+
+
+/************************************************************************/
+/*>WHOLEPDB *ReadWholePDB(FILE *fpin)
+ ----------------------------------
+ Input: FILE *fpin File pointer
+ Returns: WHOLEPDB * Whole PDB structure containing linked
+ list to PDB coordinate data
+
+ Reads a PDB file, storing the header and trailer information as
+ well as the coordinate data. Can read gzipped files as well as
+ uncompressed files.
+
+ Coordinate data is accessed as linked list of type PDB as follows:
+
+ WHOLEPDB *wpdb;
+ PDB *p;
+ wpdb = ReadWholePDB(fp);
+ for(p=wpdb->pdb; p!=NULL; p=p->next)
+ {
+ ... Do something with p ...
+ }
+
+ 07.03.07 Made into a wrapper to doReadWholePDB()
+*/
+WHOLEPDB *ReadWholePDB(FILE *fpin)
+{
+ return(doReadWholePDB(fpin, FALSE));
+}
+
+/************************************************************************/
+/*>WHOLEPDB *ReadWholePDBAtoms(FILE *fpin)
+ ---------------------------------------
+ Input: FILE *fpin File pointer
+ Returns: WHOLEPDB * Whole PDB structure containing linked
+ list to PDB coordinate data
+
+ Reads a PDB file, storing the header and trailer information as
+ well as the coordinate data. Can read gzipped files as well as
+ uncompressed files.
+
+ Coordinate data is accessed as linked list of type PDB as follows:
+
+ WHOLEPDB *wpdb;
+ PDB *p;
+ wpdb = ReadWholePDB(fp);
+ for(p=wpdb->pdb; p!=NULL; p=p->next)
+ {
+ ... Do something with p ...
+ }
+
+ 07.03.07 Made into a wrapper to doReadWholePDB()
+*/
+WHOLEPDB *ReadWholePDBAtoms(FILE *fpin)
+{
+ return(doReadWholePDB(fpin, TRUE));
+}
+
+
+/************************************************************************/
+/*>static WHOLEPDB *ReadWholePDB(FILE *fpin)
+ -----------------------------------------
+ Input: FILE *fpin File pointer
+ Returns: WHOLEPDB * Whole PDB structure containing linked
+ list to PDB coordinate data
+
+ Reads a PDB file, storing the header and trailer information as
+ well as the coordinate data. Can read gzipped files as well as
+ uncompressed files.
+
+ Coordinate data is accessed as linked list of type PDB as follows:
+
+ WHOLEPDB *wpdb;
+ PDB *p;
+ wpdb = ReadWholePDB(fp);
+ for(p=wpdb->pdb; p!=NULL; p=p->next)
+ {
+ ... Do something with p ...
+ }
+
+ 30.05.02 Original By: ACRM
+ 07.03.07 Made into a doXXX routine to add a atomsonly parameter
+ 05.06.07 Added support for Unix compress'd files
+
+ TODO FIXME!!!!! Move all this into doReadPDB so that we don't worry
+ about rewinding any more
+*/
+static WHOLEPDB *doReadWholePDB(FILE *fpin, BOOL atomsonly)
+{
+ WHOLEPDB *wpdb;
+ char buffer[MAXBUFF];
+ FILE *fp = fpin;
+
+#ifdef GUNZIP_SUPPORT
+ int signature[3],
+ i,
+ ch;
+ char cmd[80];
+#endif
+
+ if((wpdb=(WHOLEPDB *)malloc(sizeof(WHOLEPDB)))==NULL)
+ return(NULL);
+
+ wpdb->pdb = NULL;
+ wpdb->header = NULL;
+ wpdb->trailer = NULL;
+
+#ifdef GUNZIP_SUPPORT
+ cmd[0] = '\0';
+
+ /* See whether this is a gzipped file */
+ for(i=0; i<3; i++)
+ signature[i] = fgetc(fpin);
+ for(i=2; i>=0; i--)
+ ungetc(signature[i], fpin);
+ if(((signature[0] == (int)0x1F) && /* gzip */
+ (signature[1] == (int)0x8B) &&
+ (signature[2] == (int)0x08)) ||
+ ((signature[0] == (int)0x1F) && /* 05.06.07 compress */
+ (signature[1] == (int)0x9D) &&
+ (signature[2] == (int)0x90)))
+ {
+ /* It is gzipped so we'll open gunzip as a pipe and send the data
+ through that into a temporary file
+ */
+ cmd[0] = '\0';
+ sprintf(cmd,"gunzip >/tmp/readpdb_%d",(int)getpid());
+ if((fp = (FILE *)popen(cmd,"w"))==NULL)
+ {
+ wpdb->natoms = (-1);
+ return(NULL);
+ }
+ while((ch=fgetc(fpin))!=EOF)
+ fputc(ch, fp);
+ pclose(fp);
+
+ /* We now reopen the temporary file as our PDB input file */
+ sprintf(cmd,"/tmp/readpdb_%d",(int)getpid());
+ if((fp = fopen(cmd,"r"))==NULL)
+ {
+ wpdb->natoms = (-1);
+ return(NULL);
+ }
+ }
+#endif
+
+ /* Read the header from the PDB file */
+ while(fgets(buffer,MAXBUFF,fp))
+ {
+ if(!strncmp(buffer, "ATOM ", 6) ||
+ !strncmp(buffer, "HETATM", 6) ||
+ !strncmp(buffer, "MODEL ", 6))
+ {
+ break;
+ }
+ if((wpdb->header = StoreString(wpdb->header, buffer))==NULL)
+ return(NULL);
+ }
+
+ /* Read the coordinates */
+ rewind(fp);
+ if(atomsonly)
+ {
+ wpdb->pdb = ReadPDBAtoms(fp, &(wpdb->natoms));
+ }
+ else
+ {
+ wpdb->pdb = ReadPDB(fp, &(wpdb->natoms));
+ }
+
+ /* Read the trailer */
+ rewind(fp);
+ while(fgets(buffer,MAXBUFF,fp))
+ {
+ if(!strncmp(buffer, "CONECT", 6) ||
+ !strncmp(buffer, "MASTER", 6) ||
+ !strncmp(buffer, "END ", 6))
+ {
+ wpdb->trailer = StoreString(wpdb->trailer, buffer);
+ }
+ }
+
+ return(wpdb);
+}
+
diff --git a/src/bioplib/WindIO.c b/src/bioplib/WindIO.c
new file mode 100644
index 0000000..62cb2af
--- /dev/null
+++ b/src/bioplib/WindIO.c
@@ -0,0 +1,352 @@
+/*************************************************************************
+
+ Program:
+ File: WindIO.c
+
+ Version: V1.4R
+ Date: 01.02.01
+ Function: Windowing I/O for various systems
+
+ Copyright: (c) SciTech Software 1992-2001
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at andrew-martin.org
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V0.4 25.09.92 Original test version
+ V1.0 11.03.94 Various tidying up for release
+ Added RePrompt() and WindowMode()
+ V1.1 15.03.94 Added sInteractive and modified logic with sWindowMode
+ V1.2 18.03.94 Bug fix in WindowInteractive(). Includes SysDefs.h
+ V1.3 18.10.95 Moved Yorn() here from general.c
+ V1.4 01.02.01 Changed gets() to fgets()
+
+*************************************************************************/
+/* Definition of windowing type. If nothing defined, simple screen I/O
+ will be used.
+*/
+/* #define CURSES */ /* Curses windowing */
+/* #define AMIGA_WINDOWS */ /* Amiga windowing */
+
+/************************************************************************/
+/* Includes
+*/
+#include <stdio.h>
+#include <string.h>
+
+#include "SysDefs.h"
+#include "WindIO.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+#if defined(CURSES) || defined(AMIGA_WINDOWS)
+# define WINDOWING
+#endif
+
+/************************************************************************/
+/* Globals
+*/
+static char sPromptString[40];
+static int sLineCount = 0;
+static BOOL sDoPaging = FALSE,
+ sWindowMode = FALSE,
+ sInteractive = TRUE;
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>screen(char *string)
+ --------------------
+ Input: char *string String to write on window
+
+ Writes information to the screen. Handles any windows as appropriate.
+ 25.09.92 Original
+ 02.10.92 Added CURSES support
+ 05.10.92 Added AMIGA_WINDOWS support
+ 07.10.92 Added paging support
+ 11.03.94 Added check on sWindowMode
+ 14.03.94 Changed this to check on sInteractive
+ Changed check on WINDOWING to sWindowMode
+ 01.02.01 Added maxlen parameter
+ 15.02.01 oops! removed maxlen parameter, value of 80 should have gone
+ into GetKybdString()
+*/
+void screen(char *string)
+{
+ if(sDoPaging && sInteractive)
+ {
+ if(strchr(string,'\n'))
+ {
+ if(sLineCount++ > 18)
+ {
+ char dummy[80];
+
+ sLineCount = 0;
+
+ if(!sWindowMode)
+ printf("\n");
+
+ prompt("More...");
+ GetKybdString(dummy, 80);
+ }
+ }
+ }
+
+#ifndef WINDOWING
+ printf("%s",string);
+#else
+ if(sWindowMode)
+ {
+# ifdef AMIGA_WINDOWS
+ WriteMessageNR(string);
+# endif
+# ifdef CURSES
+ outputstring(string);
+# endif
+ }
+ else
+ {
+ printf("%s",string);
+ }
+#endif
+
+ return;
+}
+
+/************************************************************************/
+/*>prompt(char *string)
+ --------------------
+ Input: char *string Prompt string
+
+ Sets a prompt for input. If windowing is on, this simply sets the
+ prompt variable (the actual prompt is issued by the GetKybdString()
+ function). If no windowing is used, the actual string is printed.
+ If the prompt ends with a . it is simply printed; if not, a > is
+ appended.
+
+ 25.09.92 Original
+ 02.10.92 Added CURSES support
+ 05.10.92 Added AMIGA_WINDOWS support
+ 11.03.94 Modified to save prompt string even if not windowing
+ 15.03.94 Now sets up string and just calls RePrompt()
+*/
+void prompt(char *string)
+{
+ if(string[strlen(string)-1] == '.')
+ sprintf(sPromptString,"%s ",string);
+ else
+ sprintf(sPromptString,"%s> ",string);
+
+ RePrompt();
+}
+
+/************************************************************************/
+/*>void RePrompt(void)
+ -------------------
+ Reissue the current prompt. Only has any effect when windowing is not
+ in use. Normally only used by ReadBufferedFile() and
+ ProbeBufferedFile() to re-issue prompts while eating blank lines.
+
+ 10.03.94 Original By: ACRM
+ 15.03.94 Changed to work whenever we're not windowing and are
+ interactive
+*/
+void RePrompt(void)
+{
+ if(!sWindowMode && sInteractive)
+ {
+ printf("%s",sPromptString);
+ fflush(stdout);
+ }
+}
+
+/************************************************************************/
+/*>void GetKybdString(char *string, int maxlen)
+ --------------------------------------------
+ Reads a string from the keyboard
+ 02.10.92 Original
+ 05.10.92 Added AMIGA_WINDOWS support
+ 15.03.94 Added check on sWindowMode
+ 01.02.01 Added maxlen parameter and changed gets() to fgets()
+*/
+void GetKybdString(char *string, int maxlen)
+{
+ if(sWindowMode)
+ {
+#ifdef AMIGA_WINDOWS
+ ReadKybd(sPromptString,string,79);
+#endif
+
+#ifdef CURSES
+ GetCursesString(sPromptString, string);
+#endif
+
+#ifndef WINDOWING
+ fgets(string,maxlen,stdin);
+#endif
+ }
+ else
+ {
+ fgets(string,maxlen,stdin);
+ }
+}
+
+/************************************************************************/
+/*>void PagingOn(void)
+ -------------------
+ Switches on screen paging.
+ 07.10.92 Original
+*/
+void PagingOn(void)
+{
+ sLineCount = 0;
+ sDoPaging = TRUE;
+}
+
+/************************************************************************/
+/*>void PagingOff(void)
+ --------------------
+ Switches off screen paging.
+ 07.10.92 Original
+*/
+void PagingOff(void)
+{
+ sDoPaging = FALSE;
+}
+
+/************************************************************************/
+/*>void WindowMode(BOOL mode)
+ --------------------------
+ Input: BOOL mode TRUE: Use windowing
+ FALSE: Output normally (default)
+
+ Switch window mode on or off.
+ 11.03.94 Original By: ACRM
+ 15.03.94 Added check on WINDOWING
+*/
+void WindowMode(BOOL mode)
+{
+#ifdef WINDOWING
+ sWindowMode = mode;
+#else
+ sWindowMode = FALSE;
+#endif
+}
+
+/************************************************************************/
+/*>void WindowInteractive(BOOL mode)
+ ---------------------------------
+ Input: BOOL mode TRUE: Is interactive (default)
+ FALSE: Not interactive
+
+ Switch interactive mode on or off.
+ If switched off, calls WindowMode(FALSE) to switch off windowing
+
+ 15.03.94 Original By: ACRM
+ 17.03.94 Set sInteractive not sWindowMode!
+*/
+void WindowInteractive(BOOL mode)
+{
+ sInteractive = mode;
+
+ if(!mode) WindowMode(FALSE);
+}
+
+
+/************************************************************************/
+/*>int YorN(char deflt)
+ --------------------
+ Input: char *deflt Default response ('y' or 'n') if return is
+ pressed without a letter or an invalid letter
+ is given
+ Returns: int 0 if the user responds with N or n
+ 1 if the user responds with Y or y
+ 2 if the user responds with A or a
+ 3 if the user responds with Q or q
+
+ Get a yes or no response from the keyboard
+
+ A default ('y' or 'n') is supplied in the function call and hitting
+ <return> or supplying any invalid character will result in the
+ default being used.
+
+ The routine will work correctly with any response which starts with
+ the right letter (e.g. Yes, Yeah, yellow(!), no, Never, etc.)
+
+ 18.06.93 Original By: ACRM
+ 01.02.01 Added maxlen parameter to GetKybdString()
+*/
+int YorN(char deflt)
+{
+ char buffer[80],
+ response;
+ int i;
+
+ GetKybdString(buffer, 20);
+ response = buffer[0];
+
+ if((response != 'Y') && (response != 'y') &&
+ (response != 'N') && (response != 'n') &&
+ (response != 'A') && (response != 'a') &&
+ (response != 'Q') && (response != 'q'))
+ {
+ response = deflt;
+ }
+
+ switch(response)
+ {
+ case 'Y':
+ case 'y':
+ i = 1;
+ break;
+ case 'N':
+ case 'n':
+ i = 0;
+ break;
+ case 'A':
+ case 'a':
+ i = 2;
+ break;
+ case 'Q':
+ case 'q':
+ i = 3;
+ break;
+ default: /* Should never occur */
+ i=0;
+ break;
+ }
+
+ return(i);
+}
+
+
diff --git a/src/bioplib/WindIO.h b/src/bioplib/WindIO.h
new file mode 100644
index 0000000..92fb4a2
--- /dev/null
+++ b/src/bioplib/WindIO.h
@@ -0,0 +1,60 @@
+/*************************************************************************
+
+ Program:
+ File: WindIO.h
+
+ Version: V1.3R
+ Date: 18.10.95
+ Function: Header for window/normal interface routines
+
+ Copyright: (c) SciTech Software 1993-5
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+
+*************************************************************************/
+#ifndef _WINDIO_H
+#define _WINDIO_H
+
+#include "SysDefs.h"
+
+void screen(char *string);
+void prompt(char *string);
+void RePrompt(void);
+void GetKybdString(char *string, int maxlen);
+void PagingOn(void);
+void PagingOff(void);
+void WindowMode(BOOL mode);
+void WindowInteractive(BOOL mode);
+int YorN(char deflt);
+
+#endif
diff --git a/src/bioplib/WritePDB.c b/src/bioplib/WritePDB.c
new file mode 100644
index 0000000..b003a39
--- /dev/null
+++ b/src/bioplib/WritePDB.c
@@ -0,0 +1,174 @@
+/*************************************************************************
+
+ Program:
+ File: WritePDB.c
+
+ Version: V1.9R
+ Date: 22.09.05
+ Function: Write a PDB file from a linked list
+
+ Copyright: (c) SciTech Software 1993-2005
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ This routine will write a .PDB file of any size from a linked list of
+ the protein structure. This list is contained in a linked set of
+ structures of type pdb_entry. The strucure is set up by including the
+ file "pdb.h". For details of the structure, see this file.
+
+**************************************************************************
+
+ Usage:
+ ======
+ WritePDB(fp, pdb)
+ Input: FILE *fp A pointer to the file to write
+ PDB *pdb The start of the PDB linked list.
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 08.03.89 Original
+ V1.2 28.03.90 Modified to match the correct column definition of
+ ReadPDB V1.2 (N.B. There was no V1.1)
+ V1.3 01.06.92 Corrected header, to match standard. Autodoc'd,
+ ANSIed. Added FPU check.
+ V1.4 10.06.93 Changed to use NEXT() macro. void types
+ V1.5 22.02.94 Added TER card at end of file
+ V1.6 15.02.01 Writes using atnam_raw so atom name is unchanged from
+ input
+ V1.7 30.05.02 Changed PDB field from 'junk' to 'record_type'
+ V1.8 03.06.05 'atnam_raw' no longer includes the alternate indicator
+ which is now in 'altpos'
+ V1.9 22.09.06 Added WritePDBRecordAtnam()
+
+*************************************************************************/
+#include <stdio.h>
+#include <string.h>
+#include <math.h>
+#include <stdlib.h>
+
+#include "MathType.h"
+#include "pdb.h"
+#include "macros.h"
+
+/************************************************************************/
+/*>void WritePDB(FILE *fp, PDB *pdb)
+ ---------------------------------
+ Input: FILE *fp PDB file pointer to be written
+ PDB *pdb PDB linked list to write
+
+ Write a PDB linked list by calls to WritePDBRecord()
+
+ 08.03.89 Original
+ 01.06.92 ANSIed and autodoc'd
+ 10.06.93 Uses NEXT macro; void type
+ 08.07.93 Added insertion of TER cards
+ 22.02.94 And a TER card at the end of the file
+*/
+void WritePDB(FILE *fp,
+ PDB *pdb)
+{
+ PDB *p;
+ char PrevChain[8];
+
+ strcpy(PrevChain,pdb->chain);
+
+ for(p = pdb ; p ; NEXT(p))
+ {
+ if(strncmp(PrevChain,p->chain,1))
+ {
+ /* Chain change, insert TER card */
+ fprintf(fp,"TER \n");
+ strcpy(PrevChain,p->chain);
+ }
+ WritePDBRecord(fp,p);
+ }
+ fprintf(fp,"TER \n");
+}
+
+
+/************************************************************************/
+/*>void WritePDBRecord(FILE *fp, PDB *pdb)
+ ---------------------------------------
+ Input: FILE *fp PDB file pointer to be written
+ PDB *pdb PDB linked list record to write
+
+ Write a PDB record
+
+ 08.03.89 Original
+ 28.03.90 Changed to match ReadPDB() V1.2 for column widths
+ 01.06.92 ANSIed and autodoc'd
+ 10.06.93 void type
+ 22.06.93 Changed to %lf. Ljust strings
+ 11.03.94 %lf back to %f (!)
+ 15.02.01 Modified to use atnam_raw
+ 03.06.05 Modified to use altpos
+*/
+void WritePDBRecord(FILE *fp,
+ PDB *pdb)
+{
+ fprintf(fp,"%-6s%5d %-4s%c%-4s%1s%4d%1s %8.3f%8.3f%8.3f%6.2f%6.2f\n",
+ pdb->record_type,
+ pdb->atnum,
+ pdb->atnam_raw,
+ pdb->altpos,
+ pdb->resnam,
+ pdb->chain,
+ pdb->resnum,
+ pdb->insert,
+ pdb->x,
+ pdb->y,
+ pdb->z,
+ pdb->occ,
+ pdb->bval);
+}
+/************************************************************************/
+/*>void WritePDBRecordAtnam(FILE *fp, PDB *pdb)
+ --------------------------------------------
+ Input: FILE *fp PDB file pointer to be written
+ PDB *pdb PDB linked list record to write
+
+ Write a PDB record
+
+ 08.03.89 Original
+ 28.03.90 Changed to match ReadPDB() V1.2 for column widths
+ 01.06.92 ANSIed and autodoc'd
+ 10.06.93 void type
+ 22.06.93 Changed to %lf. Ljust strings
+ 11.03.94 %lf back to %f (!)
+ 15.02.01 Modified to use atnam_raw
+ 03.06.05 Modified to use altpos
+ 22.09.05 This is like the old version which used atnam rather
+ than atnam_raw
+*/
+void WritePDBRecordAtnam(FILE *fp,
+ PDB *pdb)
+{
+ fprintf(fp,"%-6s%5d %-4s%-4s%1s%4d%1s %8.3f%8.3f%8.3f%6.2f%6.2f\n",
+ pdb->record_type,
+ pdb->atnum,
+ pdb->atnam,
+ pdb->resnam,
+ pdb->chain,
+ pdb->resnum,
+ pdb->insert,
+ pdb->x,
+ pdb->y,
+ pdb->z,
+ pdb->occ,
+ pdb->bval);
+}
diff --git a/src/bioplib/aalist.c b/src/bioplib/aalist.c
new file mode 100644
index 0000000..09480c3
--- /dev/null
+++ b/src/bioplib/aalist.c
@@ -0,0 +1,432 @@
+/*************************************************************************
+
+ Program:
+ File: aalist.c
+
+ Version: V3.0
+ Date: 18.02.09
+ Function: Amino acid linked lists.
+
+ Copyright: (c) UCL / Dr. Andrew C.R. Martin 2006-2009
+ Author: Dr. Andrew C.R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may be modified as required, but any modifications must be
+ documented so that the person responsible can be identified. If someone
+ else breaks this code, I don't want to be blamed for code that does not
+ work!
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+ #include "bioplib/aalist.h" to define AA datatype.
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 21.08.06 Original By: ACRM
+ V3.0 06.11.08 Incorporated into ProFit V3 By: CTP
+ V3.0 18.02.09 Moved to bioplib. By: CTP
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdlib.h>
+#include "macros.h"
+#include "aalist.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>AA *InsertNextResiduesInAAList(AA *a, char res, int nres)
+ ---------------------------------------------------------
+ Inputs: AA *a Sequence linked list
+ char res Residue to insert
+ int nres Number of residues to insert
+ Returns: AA * Pointer to the residue that has just been
+ inserted
+
+ Inserts a set of identical residues after the current position in
+ the linked list. The returned value is the last residue which has
+ been inserted so this can be called again on the returned aa to
+ insert another aa
+
+ 21.08.06 Original By: ACRM
+*/
+AA *InsertNextResiduesInAAList(AA *a, char res, int nres)
+{
+ int i;
+ for(i=0; i<nres; i++)
+ {
+ a = InsertNextResidueInAAList(a, res);
+ }
+ return(a);
+}
+
+
+/************************************************************************/
+/*>AA *InsertNextResidueInAAList(AA *a, char res)
+ ----------------------------------------------
+ Inputs: AA *a Sequence linked list
+ char res Residue to insert
+ Returns: AA * Pointer to the residue that has just been
+ inserted
+
+ Inserts a residues after the current position in the linked list.
+ The returned value is the residue which has been inserted so this
+ can be called again on the returned aa to insert another aa
+
+ 21.08.06 Original By: ACRM
+*/
+AA *InsertNextResidueInAAList(AA *a, char res)
+{
+ AA *b = NULL;
+
+ if(a!=NULL)
+ {
+ INITPREV(b, AA);
+ if(b==NULL)
+ {
+ FREELIST(a, AA);
+ return(NULL);
+ }
+ b->seqnum = (-1);
+ b->res = res;
+
+ b->next = a->next;
+ b->prev = a;
+ if(b->next != NULL)
+ b->next->prev = b;
+ a->next = b;
+ }
+
+ return(b);
+}
+
+
+/************************************************************************/
+/*>char *BuildSeqFromAAList(AA *aa)
+ --------------------------------
+ Inputs: AA *aa Sequence linked list
+ Returns: char * Sequence as a string (malloc'd)
+
+ Converts the linked list back into a string which is malloc'd
+
+ 21.08.06 Original By: ACRM
+*/
+char *BuildSeqFromAAList(AA *aa)
+{
+ AA *a;
+ char *seq=NULL;
+ int count=0;
+
+ count = GetAAListLen(aa);
+ if((seq=(char *)malloc((1+count)*sizeof(char)))!=NULL)
+ {
+ count = 0;
+ for(a=aa; a!=NULL; NEXT(a))
+ {
+ seq[count++] = a->res;
+ }
+ seq[count] = '\0';
+ }
+ return(seq);
+}
+
+
+/************************************************************************/
+/*>AA *InsertResidueInAAListAt(AA *aa, char res, int pos)
+ ------------------------------------------------------
+ Inputs: AA *a Sequence linked list
+ char res Residue to insert
+ int pos Position at which to insert (from 1...)
+ Returns: AA * Updated sequence linked list
+
+ Inserts a residue after the specified position in the
+ list. Residues are numbered from 1. If the position is > length of
+ the list then the residue will be added at the end. If the position
+ is zero, it will be at the start of the list in which case the
+ return value for the list will be different from the input value.
+
+ 21.08.06 Original By: ACRM
+ 18.06.08 Set inserted residue's flag to FALSE. By: CTP
+*/
+AA *InsertResidueInAAListAt(AA *aa, char res, int pos)
+{
+ AA *a, *b;
+ int count=0;
+
+ for(a=aa, count=0; a!=NULL && count<pos-1; count++, NEXT(a));
+
+ if(a!=NULL)
+ {
+ INITPREV(b, AA);
+ if(b==NULL)
+ {
+ FREELIST(aa, AA);
+ return(NULL);
+ }
+ b->seqnum = (-1);
+ b->res = res;
+ b->flag = FALSE;
+
+ if(pos==0)
+ {
+ b->next = aa;
+ b->prev = NULL;
+ aa->prev = b;
+ aa = b;
+ }
+ else
+ {
+ b->next = a->next;
+ b->prev = a;
+ if(b->next != NULL)
+ b->next->prev = b;
+ a->next = b;
+ }
+ }
+ else /* Append to end of sequence */
+ {
+ a = aa;
+ LAST(a);
+ ALLOCNEXTPREV(a, AA);
+ if(a==NULL)
+ {
+ FREELIST(aa, AA);
+ return(NULL);
+ }
+ a->seqnum = (-1);
+ a->res = res;
+ a->flag = FALSE;
+ }
+
+ return(aa);
+}
+
+
+/************************************************************************/
+/*>AA *InsertResiduesInAAListAt(AA *aa, char res, int nres, int pos)
+ -----------------------------------------------------------------
+ Inputs: AA *aa Sequence linked list
+ char res Residue to insert
+ int nres Number of residues to insert
+ int pos Position at which to insert (from 1...)
+ Returns: AA * Updated sequence linked list
+
+ Inserts a set of residues after the specified position in the
+ list. Residues are numbered from 1. If the position is > length of
+ the list then the residue will be added at the end. If the position
+ is zero, it will be at the start of the list in which case the
+ return value for the list will be different from the input value.
+
+ 21.08.06 Original By: ACRM
+*/
+AA *InsertResiduesInAAListAt(AA *aa, char res, int nres, int pos)
+{
+ int i;
+ for(i=0; i<nres; i++)
+ {
+ aa = InsertResidueInAAListAt(aa, res, pos);
+ }
+ return(aa);
+}
+
+
+/************************************************************************/
+/*>AA *BuildAAList(char *seq)
+ --------------------------
+ Inputs: char *seq The sequence as a string
+ Returns: AA * A linked list representation
+
+ Converts a sequence string into a linked list
+
+ 21.08.06 Original By: ACRM
+*/
+AA *BuildAAList(char *seq)
+{
+ AA *aa = NULL,
+ *a = NULL;
+ int seqnum = 1;
+
+ while(*seq)
+ {
+ if(aa == NULL)
+ {
+ INITPREV(aa, AA);
+ a = aa;
+ }
+ else
+ {
+ ALLOCNEXTPREV(a, AA);
+ }
+ if(a==NULL)
+ {
+ FREELIST(aa, AA);
+ return(NULL);
+ }
+ a->res = *(seq++);
+ a->seqnum = seqnum++;
+ a->flag = FALSE;
+ }
+
+ return(aa);
+}
+
+
+/************************************************************************/
+/*>int FindAAListOffsetByResnum(AA *aa, int resnum)
+ ------------------------------------------------
+ Inputs: AA *aa Sequence linked list
+ int resnum Residue number
+ Returns: int Linked list offset
+
+ Searches the linked list of the specified resnum (i.e. the original
+ residue number in the sequence before any insertions were made) and
+ returns the position of that residue in the list (numbered from 1)
+
+ 21.08.06 Original By: ACRM
+*/
+int FindAAListOffsetByResnum(AA *aa, int resnum)
+{
+ int count=1;
+ AA *a;
+
+ for(a=aa; a!=NULL; NEXT(a))
+ {
+ if(a->seqnum == resnum)
+ break;
+ count++;
+ }
+
+ if(a==NULL)
+ return(-1);
+
+ return(count);
+}
+
+
+/************************************************************************/
+/*>AA *FindAAListItemByResnum(AA *aa, int resnum)
+ ----------------------------------------------
+ Inputs: AA *aa Sequence linked list
+ int resnum Residue number
+ Returns: AA * Linked list item
+
+ Searches the linked list of the specified resnum (i.e. the original
+ residue number in the sequence before any insertions were made) and
+ returns a pointer to that item in the list.
+
+ 21.08.06 Original By: ACRM
+*/
+AA *FindAAListItemByResnum(AA *aa, int resnum)
+{
+ AA *a;
+
+ for(a=aa; a!=NULL; NEXT(a))
+ {
+ if(a->seqnum == resnum)
+ break;
+ }
+
+ return(a);
+}
+
+
+/************************************************************************/
+/*>void SetAAListFlagByResnum(AA *aa, int resnum)
+ ----------------------------------------------
+ Inputs: AA *aa Sequence linked list
+
+ Searches the linked list of the specified resnum (i.e. the original
+ residue number in the sequence before any insertions were made) and
+ sets the flag in that item in the linked list
+
+ 21.08.06 Original By: ACRM
+*/
+void SetAAListFlagByResnum(AA *aa, int resnum)
+{
+ AA *a;
+ if((a = FindAAListItemByResnum(aa, resnum))!=NULL)
+ a->flag = TRUE;
+}
+
+
+/************************************************************************/
+/*>char *BuildFlagSeqFromAAList(AA *aa, char ch)
+ ---------------------------------------------
+ Inputs: AA *aa Sequence linked list
+ char ch Character to use in the sequence
+ Returns: char * Sequence string (malloc'd)
+
+ Builds a sequence string with blanks except where the flag in the
+ sequence structure is set. At these positions the character specified
+ in ch is used instead.
+
+ 21.08.06 Original By: ACRM
+*/
+char *BuildFlagSeqFromAAList(AA *aa, char ch)
+{
+ AA *a;
+ char *seq=NULL;
+ int count=0;
+
+ count = GetAAListLen(aa);
+ if((seq=(char *)malloc((1+count)*sizeof(char)))!=NULL)
+ {
+ count = 0;
+ for(a=aa; a!=NULL; NEXT(a))
+ {
+ seq[count++] = ((a->flag)?ch:' ');
+ }
+ seq[count] = '\0';
+ }
+ return(seq);
+}
+
+
+/************************************************************************/
+/*>int GetAAListLen(AA *aa)
+ ------------------------
+ Inputs: AA *a Sequence linked list
+ Returns: int Length of sequence linked list
+
+ Returns the number of items in the linked list
+
+ 21.08.06 Original By: ACRM
+*/
+int GetAAListLen(AA *aa)
+{
+ AA *a;
+ int count;
+ for(a=aa, count=0; a!=NULL; count++, NEXT(a));
+ return(count);
+}
diff --git a/src/bioplib/aalist.h b/src/bioplib/aalist.h
new file mode 100644
index 0000000..e6c1157
--- /dev/null
+++ b/src/bioplib/aalist.h
@@ -0,0 +1,87 @@
+/*************************************************************************
+
+ Program:
+ File: aalist.h
+
+ Version: V3.0
+ Date: 18.02.09
+ Function: Include file for amino acid linked lists.
+
+ Copyright: (c) UCL / Dr. Andrew C.R. Martin 2006-2009
+ Author: Dr. Andrew C.R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may be modified as required, but any modifications must be
+ documented so that the person responsible can be identified. If someone
+ else breaks this code, I don't want to be blamed for code that does not
+ work!
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 21.08.06 Original By: ACRM
+ V3.0 06.11.08 Incorporated into ProFit V3 By: CTP
+ V3.0 18.02.09 Moved to bioplib. By: CTP
+
+
+*************************************************************************/
+#ifndef _AALIST_H
+#define _AALIST_H
+
+
+/* Includes
+*/
+#include "general.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+typedef struct aa
+{
+ struct aa *next, *prev;
+ int seqnum;
+ BOOL flag;
+ char res;
+} AA;
+
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+AA *InsertNextResiduesInAAList(AA *a, char res, int nres);
+AA *InsertNextResidueInAAList(AA *a, char res);
+char *BuildSeqFromAAList(AA *aa);
+AA *InsertResidueInAAListAt(AA *aa, char res, int pos);
+AA *InsertResiduesInAAListAt(AA *aa, char res, int nres, int pos);
+AA *BuildAAList(char *seq);
+int FindAAListOffsetByResnum(AA *aa, int resnum);
+AA *FindAAListItemByResnum(AA *aa, int resnum);
+void SetAAListFlagByResnum(AA *aa, int resnum);
+char *BuildFlagSeqFromAAList(AA *aa, char ch);
+int GetAAListLen(AA *aa);
+
+#endif
diff --git a/src/bioplib/align.c b/src/bioplib/align.c
new file mode 100644
index 0000000..d247b19
--- /dev/null
+++ b/src/bioplib/align.c
@@ -0,0 +1,1327 @@
+/*************************************************************************
+
+ Program:
+ File: align.c
+
+ Version: V3.3
+ Date: 07.04.09
+ Function: Perform Needleman & Wunsch sequence alignment
+
+ Copyright: (c) SciTech Software 1993-2009
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ A simple Needleman & Wunsch Dynamic Programming alignment of 2
+ sequences.
+ A window is not used so the routine may be a bit slow on long
+ sequences.
+
+**************************************************************************
+
+ Usage:
+ ======
+ First call ReadMDM() to read the mutation data matrix, then call
+ align() to align the sequences.
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 19.06.90 Original used in NW program
+ V2.0 07.10.92 Original extracted from old NW program
+ V2.1 16.06.93 Tidied for book
+ V2.2 01.03.94 Changed static variable names
+ V2.3 18.03.94 getc() -> fgetc()
+ V2.4 24.11.94 ReadMDM() now looks after searching DATADIR
+ V2.5 28.02.95 ReadMDM() and other code improved to cope with MDMs of
+ any size
+ V2.6 26.07.95 Removed unused variables
+ V2.7 21.08.95 Initialisation of matrix was incorrect leading to errors
+ at the end of the alignment.
+ V2.8 24.08.95 calcscore() was doing an out-of-bounds array reference
+ if a character wasn't found
+ V2.9 11.07.96 calcscore() changed to CalcMDMScore() and made
+ non-static
+ V2.10 09.09.96 Improved comments for ReadMDM()
+ V2.11 17.09.96 Added ZeroMDM()
+ V3.0 06.03.00 Traceback code rewritten to use a trace matrix created
+ while the main matrix is populated. New affinealign()
+ routine implemented. align() is now a wrapper to that.
+ V3.1 06.02.03 Fixed for new version of GetWord()
+ V3.2 27.02.07 Added affinealineuc() and CalcMDMScoreUC()
+ V3.3 07.04.09 Complete re-write of ReadMDM() so it can read BLAST
+ style matrix files as well as our own
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <ctype.h>
+
+#include "SysDefs.h"
+#include "macros.h"
+#include "array.h"
+#include "general.h"
+#include "seq.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+#ifndef MAX
+#define MAX(a,b) ((a) > (b) ? (a) : (b))
+#endif
+
+#define MAX3(c,d,e) (MAX(MAX((c),(d)),(e)))
+
+#define DATAENV "DATADIR" /* Environment variable or assign */
+
+#define MAXBUFF 400
+#define MAXWORD 16
+
+/* Type definition to store a X,Y coordinate pair in the matrix */
+typedef struct
+{
+ int x, y;
+} XY;
+
+
+/************************************************************************/
+/* Globals
+*/
+static int **sMDMScore;
+static char *sMDM_AAList;
+static int sMDMSize = 0;
+
+/************************************************************************/
+/* Prototypes
+*/
+static int SearchForBest(int **matrix, int length1, int length2,
+ int *BestI, int *BestJ, char *seq1, char *seq2,
+ char *align1, char *align2);
+static int TraceBack(int **matrix, XY **dirn, int length1, int length2,
+ char *seq1, char *seq2, char *align1, char *align2,
+ int *align_len);
+
+
+/************************************************************************/
+/*>int align(char *seq1, int length1, char *seq2, int length2,
+ BOOL verbose, BOOL identity, int penalty,
+ char *align1, char *align2, int *align_len)
+ -----------------------------------------------------------
+ Input: char *seq1 First sequence
+ int length1 First sequence length
+ char *seq2 Second sequence
+ int length2 Second sequence length
+ BOOL verbose Display N&W matrix
+ BOOL identity Use identity matrix
+ int penalty Gap insertion penalty value
+ Output: char *align1 Sequence 1 aligned
+ char *align2 Sequence 2 aligned
+ int *align_len Alignment length
+ Returns: int Alignment score (0 on error)
+
+ Perform simple N&W alignment of seq1 and seq2. No window is used, so
+ will be slow for long sequences.
+
+ A single gap penalty is used, so gap extension incurrs no further
+ penalty.
+
+ Note that you must allocate sufficient memory for the aligned
+ sequences.
+ The easy way to do this is to ensure that align1 and align2 are
+ of length (length1+length2).
+
+ 06.03.00 Implemented as a wrapper to affinealign() which is the old
+ align() routine, plus support for affine gap penalties,
+ plus new traceback code based on storing the path as we
+ go
+*/
+int align(char *seq1,
+ int length1,
+ char *seq2,
+ int length2,
+ BOOL verbose,
+ BOOL identity,
+ int penalty,
+ char *align1,
+ char *align2,
+ int *align_len)
+{
+ return(affinealign(seq1, length1, seq2, length2, verbose, identity,
+ penalty, 0, align1, align2, align_len));
+}
+
+
+/************************************************************************/
+/*>int affinealign(char *seq1, int length1, char *seq2, int length2,
+ BOOL verbose, BOOL identity, int penalty, int penext,
+ char *align1, char *align2, int *align_len)
+ ---------------------------------------------------------------------
+ Input: char *seq1 First sequence
+ int length1 First sequence length
+ char *seq2 Second sequence
+ int length2 Second sequence length
+ BOOL verbose Display N&W matrix
+ BOOL identity Use identity matrix
+ int penalty Gap insertion penalty value
+ int penext Extension penalty
+ Output: char *align1 Sequence 1 aligned
+ char *align2 Sequence 2 aligned
+ int *align_len Alignment length
+ Returns: int Alignment score (0 on error)
+
+ Perform simple N&W alignment of seq1 and seq2. No window is used, so
+ will be slow for long sequences.
+
+ Note that you must allocate sufficient memory for the aligned
+ sequences.
+ The easy way to do this is to ensure that align1 and align2 are
+ of length (length1+length2).
+
+ 07.10.92 Adapted from original written while at NIMR
+ 08.10.92 Split into separate routines
+ 09.10.92 Changed best structure to simple integers, moved
+ SearchForBest() into TraceBack()
+ 21.08.95 Was only filling in the bottom right cell at initialisation
+ rather than all the right hand column and bottom row
+ 11.07.96 Changed calls to calcscore() to CalcMDMScore()
+ 06.03.00 Changed name to affinealign() (the routine align() is
+ provided as a backwards compatible wrapper). Added penext
+ parameter. Now supports affine gap penalties with separate
+ opening and extension penalties. The code now maintains
+ the path as it goes.
+**************************************************************************
+****** NOTE AND CHANGES SHOULD BE PROPAGATED TO affinealignuc() ******
+**************************************************************************
+*/
+int affinealign(char *seq1,
+ int length1,
+ char *seq2,
+ int length2,
+ BOOL verbose,
+ BOOL identity,
+ int penalty,
+ int penext,
+ char *align1,
+ char *align2,
+ int *align_len)
+{
+ XY **dirn = NULL;
+ int **matrix = NULL,
+ maxdim,
+ i, j, k, l,
+ i1, j1,
+ dia, right, down,
+ rcell, dcell, maxoff,
+ match = 1,
+ thisscore,
+ gapext,
+ score;
+
+ maxdim = MAX(length1, length2);
+
+ /* Initialise the score matrix */
+ if((matrix = (int **)Array2D(sizeof(int), maxdim, maxdim))==NULL)
+ return(0);
+ if((dirn = (XY **)Array2D(sizeof(XY), maxdim, maxdim))==NULL)
+ return(0);
+
+ for(i=0;i<maxdim;i++)
+ {
+ for(j=0;j<maxdim;j++)
+ {
+ matrix[i][j] = 0;
+ dirn[i][j].x = -1;
+ dirn[i][j].y = -1;
+ }
+ }
+
+ /* Fill in scores up the right hand side of the matrix */
+ for(j=0; j<length2; j++)
+ {
+ if(identity)
+ {
+ if(seq1[length1-1] == seq2[j]) matrix[length1-1][j] = match;
+ }
+ else
+ {
+ matrix[length1-1][j] = CalcMDMScore(seq1[length1-1], seq2[j]);
+ }
+ }
+
+ /* Fill in scores along the bottom row of the matrix */
+ for(i=0; i<length1; i++)
+ {
+ if(identity)
+ {
+ if(seq1[i] == seq2[length2-1]) matrix[i][length2-1] = match;
+ }
+ else
+ {
+ matrix[i][length2-1] = CalcMDMScore(seq1[i], seq2[length2-1]);
+ }
+ }
+
+ i = length1 - 1;
+ j = length2 - 1;
+
+ /* Move back along the diagonal */
+ while(i > 0 && j > 0)
+ {
+ i--;
+ j--;
+
+ /* Fill in the scores along this row */
+ for(i1 = i; i1 > -1; i1--)
+ {
+ dia = matrix[i1+1][j+1];
+
+ /* Find highest score to right of diagonal */
+ rcell = i1+2;
+ if(i1+2 >= length1) right = 0;
+ else right = matrix[i1+2][j+1] - penalty;
+
+ gapext = 1;
+ for(k = i1+3; k<length1; k++, gapext++)
+ {
+ thisscore = matrix[k][j+1] - (penalty + gapext*penext);
+
+ if(thisscore > right)
+ {
+ right = thisscore;
+ rcell = k;
+ }
+ }
+
+ /* Find highest score below diagonal */
+ dcell = j+2;
+ if(j+2 >= length2) down = 0;
+ else down = matrix[i1+1][j+2] - penalty;
+
+ gapext = 1;
+ for(l = j+3; l<length2; l++, gapext++)
+ {
+ thisscore = matrix[i1+1][l] - (penalty + gapext*penext);
+
+ if(thisscore > down)
+ {
+ down = thisscore;
+ dcell = l;
+ }
+ }
+
+ /* Set score to best of these */
+ maxoff = MAX(right, down);
+ if(dia >= maxoff)
+ {
+ matrix[i1][j] = dia;
+ dirn[i1][j].x = i1+1;
+ dirn[i1][j].y = j+1;
+ }
+ else
+ {
+ if(right > down)
+ {
+ matrix[i1][j] = right;
+ dirn[i1][j].x = rcell;
+ dirn[i1][j].y = j+1;
+ }
+ else
+ {
+ matrix[i1][j] = down;
+ dirn[i1][j].x = i1+1;
+ dirn[i1][j].y = dcell;
+ }
+ }
+
+ /* Add the score for a match */
+ if(identity)
+ {
+ if(seq1[i1] == seq2[j]) matrix[i1][j] += match;
+ }
+ else
+ {
+ matrix[i1][j] += CalcMDMScore(seq1[i1],seq2[j]);
+ }
+ }
+
+ /* Fill in the scores in this column */
+ for(j1 = j; j1 > -1; j1--)
+ {
+ dia = matrix[i+1][j1+1];
+
+ /* Find highest score to right of diagonal */
+ rcell = i+2;
+ if(i+2 >= length1) right = 0;
+ else right = matrix[i+2][j1+1] - penalty;
+
+ gapext = 1;
+ for(k = i+3; k<length1; k++, gapext++)
+ {
+ thisscore = matrix[k][j1+1] - (penalty + gapext*penext);
+
+ if(thisscore > right)
+ {
+ right = thisscore;
+ rcell = k;
+ }
+ }
+
+ /* Find highest score below diagonal */
+ dcell = j1+2;
+ if(j1+2 >= length2) down = 0;
+ else down = matrix[i+1][j1+2] - penalty;
+
+ gapext = 1;
+ for(l = j1+3; l<length2; l++, gapext++)
+ {
+ thisscore = matrix[i+1][l] - (penalty + gapext*penext);
+
+ if(thisscore > down)
+ {
+ down = thisscore;
+ dcell = l;
+ }
+ }
+
+ /* Set score to best of these */
+ maxoff = MAX(right, down);
+ if(dia >= maxoff)
+ {
+ matrix[i][j1] = dia;
+ dirn[i][j1].x = i+1;
+ dirn[i][j1].y = j1+1;
+ }
+ else
+ {
+ if(right > down)
+ {
+ matrix[i][j1] = right;
+ dirn[i][j1].x = rcell;
+ dirn[i][j1].y = j1+1;
+ }
+ else
+ {
+ matrix[i][j1] = down;
+ dirn[i][j1].x = i+1;
+ dirn[i][j1].y = dcell;
+ }
+ }
+
+ /* Add the score for a match */
+ if(identity)
+ {
+ if(seq1[i] == seq2[j1]) matrix[i][j1] += match;
+ }
+ else
+ {
+ matrix[i][j1] += CalcMDMScore(seq1[i],seq2[j1]);
+ }
+ }
+ }
+
+ score = TraceBack(matrix, dirn, length1, length2,
+ seq1, seq2, align1, align2, align_len);
+
+ if(verbose)
+ {
+ printf("Matrix:\n-------\n");
+ for(j=0; j<length2;j++)
+ {
+ for(i=0; i<length1; i++)
+ {
+ printf("%3d ",matrix[i][j]);
+ }
+ printf("\n");
+ }
+
+ printf("Path:\n-----\n");
+ for(j=0; j<length2;j++)
+ {
+ for(i=0; i<length1; i++)
+ {
+ printf("(%3d,%3d) ",dirn[i][j].x,dirn[i][j].y);
+ }
+ printf("\n");
+ }
+ }
+
+ FreeArray2D((char **)matrix, maxdim, maxdim);
+ FreeArray2D((char **)dirn, maxdim, maxdim);
+
+ return(score);
+}
+
+
+/************************************************************************/
+/*>int affinealign(char *seq1, int length1, char *seq2, int length2,
+ BOOL verbose, BOOL identity, int penalty, int penext,
+ char *align1, char *align2, int *align_len)
+ ---------------------------------------------------------------------
+ Input: char *seq1 First sequence
+ int length1 First sequence length
+ char *seq2 Second sequence
+ int length2 Second sequence length
+ BOOL verbose Display N&W matrix
+ BOOL identity Use identity matrix
+ int penalty Gap insertion penalty value
+ int penext Extension penalty
+ Output: char *align1 Sequence 1 aligned
+ char *align2 Sequence 2 aligned
+ int *align_len Alignment length
+ Returns: int Alignment score (0 on error)
+
+ Perform simple N&W alignment of seq1 and seq2. No window is used, so
+ will be slow for long sequences.
+
+ Note that you must allocate sufficient memory for the aligned
+ sequences.
+ The easy way to do this is to ensure that align1 and align2 are
+ of length (length1+length2).
+
+ 07.10.92 Adapted from original written while at NIMR
+ 08.10.92 Split into separate routines
+ 09.10.92 Changed best structure to simple integers, moved
+ SearchForBest() into TraceBack()
+ 21.08.95 Was only filling in the bottom right cell at initialisation
+ rather than all the right hand column and bottom row
+ 11.07.96 Changed calls to calcscore() to CalcMDMScore()
+ 06.03.00 Changed name to affinealign() (the routine align() is
+ provided as a backwards compatible wrapper). Added penext
+ parameter. Now supports affine gap penalties with separate
+ opening and extension penalties. The code now maintains
+ the path as it goes.
+ 27.02.07 Exactly as affinealign() but upcases characters before
+ comparison
+
+**************************************************************************
+****** NOTE AND CHANGES SHOULD BE PROPAGATED TO affinealign() ******
+**************************************************************************
+*/
+int affinealignuc(char *seq1,
+ int length1,
+ char *seq2,
+ int length2,
+ BOOL verbose,
+ BOOL identity,
+ int penalty,
+ int penext,
+ char *align1,
+ char *align2,
+ int *align_len)
+{
+ XY **dirn = NULL;
+ int **matrix = NULL,
+ maxdim,
+ i, j, k, l,
+ i1, j1,
+ dia, right, down,
+ rcell, dcell, maxoff,
+ match = 1,
+ thisscore,
+ gapext,
+ score;
+
+ maxdim = MAX(length1, length2);
+
+ /* Initialise the score matrix */
+ if((matrix = (int **)Array2D(sizeof(int), maxdim, maxdim))==NULL)
+ return(0);
+ if((dirn = (XY **)Array2D(sizeof(XY), maxdim, maxdim))==NULL)
+ return(0);
+
+ for(i=0;i<maxdim;i++)
+ {
+ for(j=0;j<maxdim;j++)
+ {
+ matrix[i][j] = 0;
+ dirn[i][j].x = -1;
+ dirn[i][j].y = -1;
+ }
+ }
+
+ /* Fill in scores up the right hand side of the matrix */
+ for(j=0; j<length2; j++)
+ {
+ if(identity)
+ {
+ if(seq1[length1-1] == seq2[j]) matrix[length1-1][j] = match;
+ }
+ else
+ {
+ matrix[length1-1][j] = CalcMDMScoreUC(seq1[length1-1], seq2[j]);
+ }
+ }
+
+ /* Fill in scores along the bottom row of the matrix */
+ for(i=0; i<length1; i++)
+ {
+ if(identity)
+ {
+ if(seq1[i] == seq2[length2-1]) matrix[i][length2-1] = match;
+ }
+ else
+ {
+ matrix[i][length2-1] = CalcMDMScoreUC(seq1[i], seq2[length2-1]);
+ }
+ }
+
+ i = length1 - 1;
+ j = length2 - 1;
+
+ /* Move back along the diagonal */
+ while(i > 0 && j > 0)
+ {
+ i--;
+ j--;
+
+ /* Fill in the scores along this row */
+ for(i1 = i; i1 > -1; i1--)
+ {
+ dia = matrix[i1+1][j+1];
+
+ /* Find highest score to right of diagonal */
+ rcell = i1+2;
+ if(i1+2 >= length1) right = 0;
+ else right = matrix[i1+2][j+1] - penalty;
+
+ gapext = 1;
+ for(k = i1+3; k<length1; k++, gapext++)
+ {
+ thisscore = matrix[k][j+1] - (penalty + gapext*penext);
+
+ if(thisscore > right)
+ {
+ right = thisscore;
+ rcell = k;
+ }
+ }
+
+ /* Find highest score below diagonal */
+ dcell = j+2;
+ if(j+2 >= length2) down = 0;
+ else down = matrix[i1+1][j+2] - penalty;
+
+ gapext = 1;
+ for(l = j+3; l<length2; l++, gapext++)
+ {
+ thisscore = matrix[i1+1][l] - (penalty + gapext*penext);
+
+ if(thisscore > down)
+ {
+ down = thisscore;
+ dcell = l;
+ }
+ }
+
+ /* Set score to best of these */
+ maxoff = MAX(right, down);
+ if(dia >= maxoff)
+ {
+ matrix[i1][j] = dia;
+ dirn[i1][j].x = i1+1;
+ dirn[i1][j].y = j+1;
+ }
+ else
+ {
+ if(right > down)
+ {
+ matrix[i1][j] = right;
+ dirn[i1][j].x = rcell;
+ dirn[i1][j].y = j+1;
+ }
+ else
+ {
+ matrix[i1][j] = down;
+ dirn[i1][j].x = i1+1;
+ dirn[i1][j].y = dcell;
+ }
+ }
+
+ /* Add the score for a match */
+ if(identity)
+ {
+ if(seq1[i1] == seq2[j]) matrix[i1][j] += match;
+ }
+ else
+ {
+ matrix[i1][j] += CalcMDMScoreUC(seq1[i1],seq2[j]);
+ }
+ }
+
+ /* Fill in the scores in this column */
+ for(j1 = j; j1 > -1; j1--)
+ {
+ dia = matrix[i+1][j1+1];
+
+ /* Find highest score to right of diagonal */
+ rcell = i+2;
+ if(i+2 >= length1) right = 0;
+ else right = matrix[i+2][j1+1] - penalty;
+
+ gapext = 1;
+ for(k = i+3; k<length1; k++, gapext++)
+ {
+ thisscore = matrix[k][j1+1] - (penalty + gapext*penext);
+
+ if(thisscore > right)
+ {
+ right = thisscore;
+ rcell = k;
+ }
+ }
+
+ /* Find highest score below diagonal */
+ dcell = j1+2;
+ if(j1+2 >= length2) down = 0;
+ else down = matrix[i+1][j1+2] - penalty;
+
+ gapext = 1;
+ for(l = j1+3; l<length2; l++, gapext++)
+ {
+ thisscore = matrix[i+1][l] - (penalty + gapext*penext);
+
+ if(thisscore > down)
+ {
+ down = thisscore;
+ dcell = l;
+ }
+ }
+
+ /* Set score to best of these */
+ maxoff = MAX(right, down);
+ if(dia >= maxoff)
+ {
+ matrix[i][j1] = dia;
+ dirn[i][j1].x = i+1;
+ dirn[i][j1].y = j1+1;
+ }
+ else
+ {
+ if(right > down)
+ {
+ matrix[i][j1] = right;
+ dirn[i][j1].x = rcell;
+ dirn[i][j1].y = j1+1;
+ }
+ else
+ {
+ matrix[i][j1] = down;
+ dirn[i][j1].x = i+1;
+ dirn[i][j1].y = dcell;
+ }
+ }
+
+ /* Add the score for a match */
+ if(identity)
+ {
+ if(seq1[i] == seq2[j1]) matrix[i][j1] += match;
+ }
+ else
+ {
+ matrix[i][j1] += CalcMDMScoreUC(seq1[i],seq2[j1]);
+ }
+ }
+ }
+
+ score = TraceBack(matrix, dirn, length1, length2,
+ seq1, seq2, align1, align2, align_len);
+
+ if(verbose)
+ {
+ printf("Matrix:\n-------\n");
+ for(j=0; j<length2;j++)
+ {
+ for(i=0; i<length1; i++)
+ {
+ printf("%3d ",matrix[i][j]);
+ }
+ printf("\n");
+ }
+
+ printf("Path:\n-----\n");
+ for(j=0; j<length2;j++)
+ {
+ for(i=0; i<length1; i++)
+ {
+ printf("(%3d,%3d) ",dirn[i][j].x,dirn[i][j].y);
+ }
+ printf("\n");
+ }
+ }
+
+ FreeArray2D((char **)matrix, maxdim, maxdim);
+ FreeArray2D((char **)dirn, maxdim, maxdim);
+
+ return(score);
+}
+
+
+/************************************************************************/
+/*>BOOL ReadMDM(char *mdmfile)
+ ---------------------------
+ Input: char *mdmfile Mutation data matrix filename
+ Returns: BOOL Success?
+
+ Read mutation data matrix into static global arrays. The matrix may
+ have comments at the start introduced with a ! in the first column.
+ The matrix must be complete (i.e. a triangular matrix will not
+ work). A line describing the residue types must appear, and may
+ be placed before or after the matrix itself
+
+ 07.10.92 Original
+ 18.03.94 getc() -> fgetc()
+ 24.11.94 Automatically looks in DATAENV if not found in current
+ directory
+ 28.02.95 Modified to read any size MDM and allow comments
+ Also allows the list of aa types before or after the actual
+ matrix
+ 26.07.95 Removed unused variables
+ 06.02.03 Fixed for new version of GetWord()
+ 07.04.09 Completely re-written to allow it to read BLAST style matrix
+ files as well as the ones used previously
+ Allow comments introduced with # as well as !
+ Uses MAXWORD rather than hardcoded 16
+*/
+BOOL ReadMDM(char *mdmfile)
+{
+ FILE *mdm = NULL;
+ int i, j, k, row, tmpStoreSize;
+ char buffer[MAXBUFF],
+ word[MAXWORD],
+ *p,
+ **tmpStore;
+ BOOL noenv;
+
+ if((mdm=OpenFile(mdmfile, DATAENV, "r", &noenv))==NULL)
+ {
+ return(FALSE);
+ }
+
+ /* First read the file to determine the dimensions */
+ while(fgets(buffer,MAXBUFF,mdm))
+ {
+ TERMINATE(buffer);
+ KILLLEADSPACES(p,buffer);
+
+ /* First line which is non-blank and non-comment */
+ if(strlen(p) && p[0] != '!' && p[0] != '#')
+ {
+ sMDMSize = 0;
+ for(p = buffer; p!=NULL;)
+ {
+ p = GetWord(p, word, MAXWORD);
+ /* Increment counter if this is numeric */
+ if(isdigit(word[0]) ||
+ ((word[0] == '-')&&(isdigit(word[1]))))
+ sMDMSize++;
+ }
+ if(sMDMSize)
+ break;
+ }
+ }
+
+ /* Allocate memory for the MDM and the AA List */
+ if((sMDMScore = (int **)Array2D(sizeof(int),sMDMSize,sMDMSize))==NULL)
+ return(FALSE);
+ if((sMDM_AAList = (char *)malloc((sMDMSize+1)*sizeof(char)))==NULL)
+ {
+ FreeArray2D((char **)sMDMScore, sMDMSize, sMDMSize);
+ return(FALSE);
+ }
+
+ /* Allocate temporary storage for a row from the matrix */
+ tmpStoreSize = 2*sMDMSize;
+ if((tmpStore = (char **)Array2D(sizeof(char), tmpStoreSize, MAXWORD))
+ ==NULL)
+ {
+ free(sMDM_AAList);
+ FreeArray2D((char **)sMDMScore, sMDMSize, sMDMSize);
+ return(FALSE);
+ }
+
+ /* Fill the matrix with zeros */
+ for(i=0; i<sMDMSize; i++)
+ {
+ for(j=0; j<sMDMSize; j++)
+ {
+ sMDMScore[i][j] = 0;
+ }
+ }
+
+ /* Rewind the file and read the actual data */
+ rewind(mdm);
+ row = 0;
+ while(fgets(buffer,MAXBUFF,mdm))
+ {
+ int Numeric;
+
+ TERMINATE(buffer);
+ KILLLEADSPACES(p,buffer);
+
+ /* Check line is non-blank and non-comment */
+ if(strlen(p) && p[0] != '!' && p[0] != '#')
+ {
+ Numeric = 0;
+ for(p = buffer, i = 0; p!=NULL && i<tmpStoreSize; i++)
+ {
+ p = GetWord(p, tmpStore[i], MAXWORD);
+ /* Incremement Numeric counter if it's a numeric field */
+ if(isdigit(tmpStore[i][0]) ||
+ ((tmpStore[i][0] == '-')&&(isdigit(tmpStore[i][1]))))
+ {
+ Numeric++;
+ }
+ }
+
+ /* No numeric fields so it is the amino acid names */
+ if(Numeric == 0)
+ {
+ for(j = 0; j<i && j<sMDMSize; j++)
+ {
+ sMDM_AAList[j] = tmpStore[j][0];
+ }
+ }
+ else
+ {
+ /* There were numeric fields, so copy them into the matrix,
+ skipping any non-numeric fields
+ j counts the input fields
+ k counts the fields in sMDMScore
+ row counts the row in sMDMScore
+ */
+ for(j=0, k=0; j<i && k<sMDMSize; j++)
+ {
+ if(isdigit(tmpStore[j][0]) ||
+ ((tmpStore[j][0] == '-')&&(isdigit(tmpStore[j][1]))))
+ {
+ sscanf(tmpStore[j],"%d",&(sMDMScore[row][k]));
+ k++;
+ }
+ }
+
+ row++;
+ }
+ }
+ }
+ fclose(mdm);
+ FreeArray2D((char **)tmpStore, tmpStoreSize, MAXWORD);
+
+ return(TRUE);
+}
+
+
+/************************************************************************/
+/*>static int SearchForBest(int **matrix, int length1, int length2,
+ int *BestI, int *BestJ, char *seq1,
+ char *seq2, char *align1, char *align2)
+ ----------------------------------------------------------------
+ Input: int **matrix N&W matrix
+ int length1 Length of first sequence
+ int length2 Length of second sequence
+ int *BestI x position of highest score
+ int *BestJ y position of highest score
+ char *seq1 First sequence
+ char *seq2 Second sequence
+ Output: char *align1 First sequence with end aligned correctly
+ char *align2 Second sequence with end aligned correctly
+ Returns: int Alignment length thus far
+
+ Searches the outside of the matrix for the best score and starts the
+ alignment by putting in any starting - characters.
+
+ 08.10.92 Original extracted from Align()
+*/
+static int SearchForBest(int **matrix,
+ int length1,
+ int length2,
+ int *BestI,
+ int *BestJ,
+ char *seq1,
+ char *seq2,
+ char *align1,
+ char *align2)
+{
+ int ai,
+ besti, bestj,
+ i, j;
+
+ /* Now search the outside of the matrix for the highest scoring cell */
+ ai = 0;
+ besti = 0;
+ for(i = 1; i < length1; i++)
+ {
+ if(matrix[i][0] > matrix[besti][0]) besti = i;
+ }
+ bestj = 0;
+ for(j = 1; j < length2; j++)
+ {
+ if(matrix[0][j] > matrix[0][bestj]) bestj = j;
+ }
+ if(matrix[besti][0] > matrix[0][bestj])
+ {
+ *BestI = besti;
+ *BestJ = 0;
+ for(i=0; i<*BestI; i++)
+ {
+ align1[ai] = seq1[i];
+ align2[ai++] = '-';
+ }
+ }
+ else
+ {
+ *BestI = 0;
+ *BestJ = bestj;
+ for(j=0; j<*BestJ; j++)
+ {
+ align1[ai] = '-';
+ align2[ai++] = seq2[j];
+ }
+ }
+ return(ai);
+}
+
+
+/************************************************************************/
+/*>static int TraceBack(int **matrix, XY **dirn,
+ int length1, int length2,
+ char *seq1, char *seq2, char *align1,
+ char *align2, int *align_len)
+ ----------------------------------------------------------
+ Input: int **matrix N&W matrix
+ XY **dirn Direction Matrix
+ int length1 Length of first sequence
+ int length2 Length of second sequence
+ char *seq1 First sequence
+ char *seq2 Second sequence
+ Output: char *align1 First sequence aligned
+ char *align2 Second sequence aligned
+ int *align_len Aligned sequence length
+ Returns: int Alignment score
+
+ Does the traceback to find the aligment.
+
+ 08.10.92 Original extracted from Align(). Rewritten to do tracing
+ correctly.
+ 09.10.92 Changed to call SearchForBest(). Nor returns score rather than
+ length.
+ 06.03.00 Recoded to take the path matrix which is calculated within
+ the main affinealign() routine rather than calculating the
+ path as we go. penalty parameter removed as this is no longer
+ needed. dirn parameter added.
+ 28.09.00 Fixed bug where last inserts were printed properly if one chain
+ ended first
+*/
+static int TraceBack(int **matrix,
+ XY **dirn,
+ int length1,
+ int length2,
+ char *seq1,
+ char *seq2,
+ char *align1,
+ char *align2,
+ int *align_len)
+{
+ int i, j,
+ ai,
+ BestI,BestJ;
+ XY nextCell;
+
+ ai = SearchForBest(matrix, length1, length2, &BestI, &BestJ,
+ seq1, seq2, align1, align2);
+
+ /* Now trace back to find the alignment */
+ i = BestI;
+ j = BestJ;
+ align1[ai] = seq1[i];
+ align2[ai++] = seq2[j];
+
+ while(i < length1-1 && j < length2-1)
+ {
+ nextCell.x = dirn[i][j].x;
+ nextCell.y = dirn[i][j].y;
+ if((nextCell.x == i+1) && (nextCell.y == j+1))
+ {
+ /* We are inheriting from the diagonal */
+ i++;
+ j++;
+ }
+ else if(nextCell.y == j+1)
+ {
+ /* We are inheriting from the off-diagonal inserting a gap in
+ the y-sequence (seq2)
+ */
+ i++;
+ j++;
+ while((i < nextCell.x) && (i < length1-1))
+ {
+ align1[ai] = seq1[i++];
+ align2[ai++] = '-';
+ }
+ }
+ else if(nextCell.x == i+1)
+ {
+ /* We are inheriting from the off-diagonal inserting a gap in
+ the x-sequence (seq1)
+ */
+ i++;
+ j++;
+ while((j < nextCell.y) && (j < length2-1))
+ {
+ align1[ai] = '-';
+ align2[ai++] = seq2[j++];
+ }
+ }
+ else
+ {
+ /* Cockup! */
+ fprintf(stderr,"align.c/TraceBack() internal error\n");
+ }
+
+ align1[ai] = seq1[i];
+ align2[ai++] = seq2[j];
+ }
+
+ /* If one sequence finished first, fill in the end with insertions */
+ if(i < length1-1)
+ {
+ for(j=i+1; j<length1; j++)
+ {
+ align1[ai] = seq1[j];
+ align2[ai++] = '-';
+ }
+ }
+ else if(j < length2-1)
+ {
+ for(i=j+1; i<length2; i++)
+ {
+ align1[ai] = '-';
+ align2[ai++] = seq2[i];
+ }
+ }
+
+ *align_len = ai;
+
+ return(matrix[BestI][BestJ]);
+}
+
+
+/************************************************************************/
+/*>int CalcMDMScore(char resa, char resb)
+ --------------------------------------
+ Input: char resa First residue
+ char resb Second residue
+ Returns: int score
+
+ Calculate score from static globally stored mutation data matrix
+
+ 07.10.92 Adapted from NIMR-written original
+ 24.11.94 Only gives 10 warnings
+ 28.02.95 Modified to use sMDMSize
+ 24.08.95 If a residue was not found was doing an out-of-bounds array
+ reference causing a potential core dump
+ 11.07.96 Name changed from calcscore() and now non-static
+*/
+int CalcMDMScore(char resa, char resb)
+{
+ int i,j;
+ static int NWarn = 0;
+ BOOL Warned = FALSE;
+
+ for(i=0;i<sMDMSize;i++)
+ {
+ if(resa==sMDM_AAList[i]) break;
+ }
+ if(i==sMDMSize)
+ {
+ if(NWarn < 10)
+ printf("Residue %c not found in matrix\n",resa);
+ else if(NWarn == 10)
+ printf("More residues not found in matrix...\n");
+ Warned = TRUE;
+ }
+ for(j=0;j<sMDMSize;j++)
+ {
+ if(resb==sMDM_AAList[j]) break;
+ }
+ if(j==sMDMSize)
+ {
+ if(NWarn < 10)
+ printf("Residue %c not found in matrix\n",resb);
+ else if(NWarn == 10)
+ printf("More residues not found in matrix...\n");
+ Warned = TRUE;
+ }
+
+ if(Warned)
+ {
+ NWarn++;
+ return(0);
+ }
+
+ return(sMDMScore[i][j]);
+}
+
+/************************************************************************/
+/*>int CalcMDMScoreUC(char resa, char resb)
+ ----------------------------------------
+ Input: char resa First residue
+ char resb Second residue
+ Returns: int score
+
+ Calculate score from static globally stored mutation data matrix
+
+ 07.10.92 Adapted from NIMR-written original
+ 24.11.94 Only gives 10 warnings
+ 28.02.95 Modified to use sMDMSize
+ 24.08.95 If a residue was not found was doing an out-of-bounds array
+ reference causing a potential core dump
+ 11.07.96 Name changed from calcscore() and now non-static
+ 27.02.07 As CalcMDMScore() but upcases characters before comparison
+*/
+int CalcMDMScoreUC(char resa, char resb)
+{
+ int i,j;
+ static int NWarn = 0;
+ BOOL Warned = FALSE;
+
+ resa = (islower(resa)?toupper(resa):resa);
+ resb = (islower(resb)?toupper(resb):resb);
+
+ for(i=0;i<sMDMSize;i++)
+ {
+ if(resa==sMDM_AAList[i]) break;
+ }
+ if(i==sMDMSize)
+ {
+ if(NWarn < 10)
+ printf("Residue %c not found in matrix\n",resa);
+ else if(NWarn == 10)
+ printf("More residues not found in matrix...\n");
+ Warned = TRUE;
+ }
+ for(j=0;j<sMDMSize;j++)
+ {
+ if(resb==sMDM_AAList[j]) break;
+ }
+ if(j==sMDMSize)
+ {
+ if(NWarn < 10)
+ printf("Residue %c not found in matrix\n",resb);
+ else if(NWarn == 10)
+ printf("More residues not found in matrix...\n");
+ Warned = TRUE;
+ }
+
+ if(Warned)
+ {
+ NWarn++;
+ return(0);
+ }
+
+ return(sMDMScore[i][j]);
+}
+
+/************************************************************************/
+/*>int ZeroMDM(void)
+ -----------------
+ Returns: int Maximum value in modified matrix
+
+ Modifies all values in the MDM such that the minimum value is 0
+ 17.09.96 Original
+*/
+int ZeroMDM(void)
+{
+ int MinVal = sMDMScore[0][0],
+ MaxVal = sMDMScore[0][0],
+ i, j;
+
+ /* Find the minimum and maximum values on the matrix */
+ for(i=0; i<sMDMSize; i++)
+ {
+ for(j=0; j<sMDMSize; j++)
+ {
+ if(sMDMScore[i][j] < MinVal)
+ {
+ MinVal = sMDMScore[i][j];
+ }
+ else if(sMDMScore[i][j] > MaxVal)
+ {
+ MaxVal = sMDMScore[i][j];
+ }
+ }
+ }
+
+ /* Now subtract the MinVal from all cells in the matrix so it starts
+ at zero.
+ */
+ for(i=0; i<sMDMSize; i++)
+ {
+ for(j=0; j<sMDMSize; j++)
+ {
+ sMDMScore[i][j] -= MinVal;
+ }
+ }
+
+ /* Return maximum value in modified matrix */
+ return(MaxVal-MinVal);
+}
+
+
+
+
+#ifdef DEMO
+int main(int argc, char **argv)
+{
+ char seq1[] = "ACTCLMCT",
+ seq2[] = "ACTCCT",
+ align1[100],
+ align2[100];
+ int score, al_len;
+ int i, j;
+
+ ReadMDM("pet91.mat");
+
+ for(i=0; i<sMDMSize; i++)
+ {
+ printf(" %c", sMDM_AAList[i]);
+ }
+ printf("\n");
+
+ for(i=0; i<sMDMSize; i++)
+ {
+ for(j=0; j<sMDMSize; j++)
+ {
+ printf("%3d", sMDMScore[i][j]);
+ }
+ printf("\n");
+ }
+
+ score = affinealign(seq1, strlen(seq1), seq2, strlen(seq2),
+ TRUE, FALSE,
+ 10, 1, align1, align2, &al_len);
+
+ align1[al_len] = '\0';
+ align2[al_len] = '\0';
+
+ printf("%s\n", align1);
+ printf("%s\n", align2);
+
+ return(0);
+}
+#endif
diff --git a/src/bioplib/angle.c b/src/bioplib/angle.c
new file mode 100644
index 0000000..4e192a9
--- /dev/null
+++ b/src/bioplib/angle.c
@@ -0,0 +1,148 @@
+/*************************************************************************
+
+ Program:
+ File: angle.c
+
+ Version: V1.5R
+ Date: 27.03.95
+ Function: Calculate angles, torsions, etc.
+
+ Copyright: (c) SciTech Software 1993-5
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ These routines return angles and torsion angles. The definition of a
+ torsion angle is the chemical definition:
+ i.e. Assuming the atoms are co-planar:
+ A---B A---B
+ | = 0.0 | = 180.0
+ | |
+ D---C C---D
+
+**************************************************************************
+
+ Usage:
+ ======
+ REAL angle(xi,yi,zi,xj,yj,zj,xk,yk,zk)
+ Input: REAL xi,yi,zi Input coordinates
+ xj,yj,zj
+ xk,yk,zk
+ Returns: REAL The angle between the 3 atoms
+
+
+ REAL phi(xi,yi,zi,xj,yj,zj,xk,yk,zk,xl,yl,zl)
+ Input: REAL xi,yi,zi Input coordinates
+ xj,yj,zj
+ xk,yk,zk
+ xl,yl,zl
+ Returns: REAL The torsion angle between the 4 atoms
+
+
+ REAL simpleang(ang)
+ Input: REAL ang An angle
+ Returns: REAL Simplified angle
+
+ Simplifies a signed angle to an unsigned angle <=2*PI
+
+
+ REAL TrueAngle(REAL opp, REAL adj)
+ Input: REAL opp Length of opposite side
+ REAL adj Length of adjacent side
+ Returns: REAL The angle from 0 to 2PI
+
+ Returns the true positive angle between 0 and 2PI given the opp and
+ adj lengths
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 07.02.91 Original
+ V1.1 17.02.91 Corrected comments to new standard and added phi()
+ V1.2 04.03.91 angle() and phi() now return _correct_ values!
+ V1.3 01.06.92 ANSIed
+ V1.4 08.12.92 Changed abs() to ABS() from macros.h
+ V1.5 27.03.95 Added TrueAngle()
+
+*************************************************************************/
+/* Includes
+*/
+#include <math.h>
+
+#include "MathType.h"
+#include "macros.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>REAL angle(REAL xi,REAL yi,REAL zi,REAL xj,REAL yj,
+ REAL zj,REAL xk,REAL yk,REAL zk)
+ ---------------------------------------------------
+ Input: REAL xi,yi,zi Input coordinates
+ xj,yj,zj
+ xk,yk,zk
+ Returns: REAL The angle between the 3 atoms
+
+ Calculates the angle between three sets of coordinates
+
+ 07.02.89 Original By: ACRM
+ 04.03.91 Fixed return value
+ 16.06.93 Changed float to REAL
+*/
+REAL angle(REAL xi,
+ REAL yi,
+ REAL zi,
+ REAL xj,
+ REAL yj,
+ REAL zj,
+ REAL xk,
+ REAL yk,
+ REAL zk)
+{
+ REAL qx,qy,qz,sq,px,py,pz,sp,cosa2,a2;
+
+ px = xi - xj;
+ py = yi - yj;
+ pz = zi - zj;
+ sp = sqrt(px * px + py * py + pz * pz);
+
+ qx = xk - xj;
+ qy = yk - yj;
+ qz = zk - zj;
+ sq = sqrt(qx * qx + qy * qy + qz * qz);
+
+ cosa2 = (qx * px + qy * py + qz * pz) / (sp * sq);
+ a2 = acos(cosa2);
+
+ return(a2);
+}
+
diff --git a/src/bioplib/angle.h b/src/bioplib/angle.h
new file mode 100644
index 0000000..e568ba7
--- /dev/null
+++ b/src/bioplib/angle.h
@@ -0,0 +1,80 @@
+/*************************************************************************
+
+ Program:
+ File: angle.h
+
+ Version: V1.7R
+ Date: 06.09.96
+ Function: Include file for angle functions
+
+ Copyright: (c) SciTech Software 1993-6
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.5 27.03.95
+ V1.6 09.07.96 Added TorToCoor()
+ V1.7 06.09.96 Added includes
+
+*************************************************************************/
+/* Includes
+*/
+#include "MathType.h"
+#include "SysDefs.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+
+#ifndef _ANGLE_H
+#define _ANGLE_H
+
+REAL angle(REAL xi, REAL yi, REAL zi, REAL xj, REAL yj, REAL zj,
+ REAL xk, REAL yk, REAL zk);
+REAL phi(REAL xi, REAL yi, REAL zi, REAL xj, REAL yj, REAL zj,
+ REAL xk, REAL yk, REAL zk, REAL xl, REAL yl, REAL zl);
+REAL simpleangle(REAL ang);
+REAL TrueAngle(REAL opp, REAL adj);
+BOOL TorToCoor(VEC3F ant1, VEC3F ant2, VEC3F ant3,
+ REAL bond, REAL theta, REAL torsion,
+ VEC3F *coords);
+
+#endif
diff --git a/src/bioplib/array.h b/src/bioplib/array.h
new file mode 100644
index 0000000..8ba192e
--- /dev/null
+++ b/src/bioplib/array.h
@@ -0,0 +1,73 @@
+/*************************************************************************
+
+ Program:
+ File: array.h
+
+ Version: V1.5R
+ Date: 30.05.02
+ Function: Include file for 2D/3D array functions
+
+ Copyright: (c) SciTech Software 1994-2002
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.4 18.03.94
+ V1.5 30.05.02 Added 3D functions
+
+*************************************************************************/
+/* Includes
+*/
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+
+#ifndef _ARRAY_H
+#define _ARRAY_H
+
+char **Array2D(int size, int dim1, int dim2);
+void FreeArray2D(char **array, int dim1, int dim2);
+
+char ***Array3D(int size, int dim1, int dim2, int dim3);
+void FreeArray3D(char ***array, int dim1, int dim2, int dim3);
+
+#endif
diff --git a/src/bioplib/array2.c b/src/bioplib/array2.c
new file mode 100644
index 0000000..029e263
--- /dev/null
+++ b/src/bioplib/array2.c
@@ -0,0 +1,154 @@
+/*************************************************************************
+
+ Program:
+ File: array.c
+
+ Version: V1.4R
+ Date: 18.03.94
+ Function: Allocate and free 2D arrays
+
+ Copyright: (c) SciTech Software 1993-4
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ Creates a 2D array where the first dimension is a set of pointers. This
+ is better for passing into subroutines than the conventional C method
+ of simply declaring:
+ TYPE matrix[10][10];
+ which, when passed to a fuunction, loses the concept of dimensions
+ unless the matrix is explicitly defined with these dimension in the
+ function.
+
+ This routine creates an array of pointers to 1-D arrays and can thus be
+ passed to functions successfully.
+
+**************************************************************************
+
+ Usage:
+ ======
+ matrix = (TYPE **)Array2D(sizeof(TYPE), nrows, ncolumns);
+
+ e.g.
+ matrix = (float **)Array2D(sizeof(float), 10, 10);
+
+ Returns NULL (having freed any allocated memory) if there is a problem.
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 07.10.92 Original
+ V1.1 29.01.93 Added includes of sysdefs.h & malloc.h for MS-DOS
+ V1.2 16.06.93 Includes stdlib.h rather than malloc.h
+ V1.3 01.03.94 Corrected other include file usage
+ V1.4 18.03.94 Added NULL definition for systems which don't define
+ it in stdlib.h
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdlib.h>
+
+/************************************************************************/
+/* Defines and macros
+*/
+#ifndef NULL
+#define NULL ((void *)0)
+#endif
+
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>char **Array2D(int size, int dim1, int dim2)
+ --------------------------------------------
+ Input: int size Size of an array element
+ int dim1 First dimension (number of rows)
+ int dim2 Second dimension (number of columns)
+ Returns: char ** Array of pointers. Must be cast to required
+ type
+
+ Create a 2D array of elements of size `size' with dimensions `dim1'
+ rows by `dim2' columns.
+
+ 07.10.92 Original
+ 12.07.93 Tidied and commented
+*/
+char **Array2D(int size,
+ int dim1,
+ int dim2)
+{
+ char **array = NULL;
+ int i;
+
+ /* Allocate memory for the outer dimension array */
+ if((array = (char **)malloc(dim1 * sizeof(char *))) == NULL)
+ return(NULL);
+
+ /* Set all positions to NULL */
+ for(i=0; i<dim1; i++) array[i] = NULL;
+
+ /* Allocate memory for each array in the second dimension */
+ for(i=0; i<dim1; i++)
+ {
+ /* If allocation fails, jump to badexit */
+ if((array[i] = (char *)malloc(dim2 * size)) == NULL)
+ goto badexit;
+ }
+
+ return(array);
+
+badexit:
+ for(i=0; i<dim1; i++) if(array[i]) free(array[i]);
+ free(array);
+ return(NULL);
+}
+
+/************************************************************************/
+/*>void FreeArray2D(char **array, int dim1, int dim2)
+ --------------------------------------------------
+ Input: char ** Array of pointers to be freed
+ int dim1 First dimension (number of rows)
+ int dim2 Second dimension (number of columns)
+
+ Frees a 2D array with dimensions `dim1' rows by `dim2' columns.
+
+ 07.10.92 Original
+*/
+void FreeArray2D(char **array,
+ int dim1,
+ int dim2)
+{
+ int i;
+
+ if(array)
+ {
+ for(i=0; i<dim1; i++) if(array[i]) free(array[i]);
+ free(array);
+ }
+}
diff --git a/src/bioplib/chindex.c b/src/bioplib/chindex.c
new file mode 100644
index 0000000..fd31ac8
--- /dev/null
+++ b/src/bioplib/chindex.c
@@ -0,0 +1,110 @@
+/*************************************************************************
+
+ Program:
+ File: chindex.c
+
+ Version: V1.21
+ Date: 18.06.02
+ Function:
+
+ Copyright: (c) Dr. Andrew C. R. Martin, University of Reading, 2002
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.1 08.02.91 Added KillLine()
+ V1.2 10.02.91 Added setextn() and index()
+ V1.3 20.03.91 Added Word()
+ V1.4 28.05.92 ANSIed
+ V1.5 22.06.92 Added tab check to Word(). Improved setextn().
+ Added WordN(). Documented other routines.
+ V1.6 27.07.93 Corrected fsscanf() for double precision
+ V1.7 07.10.93 Checks made on case before toupper()/tolower()
+ for SysV compatibility. Also index() becomes
+ chindex()
+ V1.8 18.03.94 getc() -> fgetc()
+ V1.9 11.05.94 Added GetFilestem(), upstrcmp(), upstrncmp() &
+ GetWord()
+ V1.10 24.08.94 Added OpenStdFiles()
+ V1.11 08.03.95 Corrected OpenFile() for non-UNIX
+ V1.12 09.03.95 Added check on non-NULL filename in OpenFile()
+ V1.13 17.07.95 Added countchar()
+ V1.14 18.10.95 Moved YorN() to WindIO.c
+ V1.15 06.11.95 Added StoreString(), InStringList() and FreeStringList()
+ V1.16 22.11.95 Moved ftostr() to generam.c
+ V1.17 15.12.95 Added QueryStrStr()
+ V1.18 18.12.95 OpenStdFiles() treats filename of - as stdin/stdout
+ V1.19 05.02.96 OpenStdFiles() allows NULL pointers instead if filenames
+ V1.20 18.09.96 Added padchar()
+ V1.21 18.06.02 Added string.h
+
+*************************************************************************/
+/* Includes
+*/
+#include <string.h>
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>int chindex(char *string, char ch)
+ ----------------------------------
+ Input: char *string A string
+ ch A character for which to search
+ Returns: int The offset of ch in string.
+
+ Returns the offset of a character in a string. -1 if not found. This is
+ used in a similar manner to strchr(), but gives an offset in the string
+ rather than a pointer to the character.
+
+ 10.02.91 Original
+ 28.05.92 ANSIed
+ 06.10.93 Changed name to chindex() to avoid UNIX name clash
+*/
+int chindex(char *string,
+ char ch)
+{
+ int count;
+
+ for(count=0;count<strlen(string);count++)
+ if(string[count] == ch) break;
+
+ if(count >= strlen(string)) count = -1;
+
+ return(count);
+}
+
+
diff --git a/src/bioplib/countchar.c b/src/bioplib/countchar.c
new file mode 100644
index 0000000..83dd2b6
--- /dev/null
+++ b/src/bioplib/countchar.c
@@ -0,0 +1,108 @@
+/*************************************************************************
+
+ Program:
+ File: countchar.c
+
+ Version: V1.20
+ Date: 18.09.96
+ Function:
+
+ Copyright: (c) SciTech Software 1991-6
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.1 08.02.91 Added KillLine()
+ V1.2 10.02.91 Added setextn() and index()
+ V1.3 20.03.91 Added Word()
+ V1.4 28.05.92 ANSIed
+ V1.5 22.06.92 Added tab check to Word(). Improved setextn().
+ Added WordN(). Documented other routines.
+ V1.6 27.07.93 Corrected fsscanf() for double precision
+ V1.7 07.10.93 Checks made on case before toupper()/tolower()
+ for SysV compatibility. Also index() becomes
+ chindex()
+ V1.8 18.03.94 getc() -> fgetc()
+ V1.9 11.05.94 Added GetFilestem(), upstrcmp(), upstrncmp() &
+ GetWord()
+ V1.10 24.08.94 Added OpenStdFiles()
+ V1.11 08.03.95 Corrected OpenFile() for non-UNIX
+ V1.12 09.03.95 Added check on non-NULL filename in OpenFile()
+ V1.13 17.07.95 Added countchar()
+ V1.14 18.10.95 Moved YorN() to WindIO.c
+ V1.15 06.11.95 Added StoreString(), InStringList() and FreeStringList()
+ V1.16 22.11.95 Moved ftostr() to generam.c
+ V1.17 15.12.95 Added QueryStrStr()
+ V1.18 18.12.95 OpenStdFiles() treats filename of - as stdin/stdout
+ V1.19 05.02.96 OpenStdFiles() allows NULL pointers instead if filenames
+ V1.20 18.09.96 Added padchar()
+
+*************************************************************************/
+/* Includes
+*/
+#include <string.h>
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>int countchar(char *string, char ch)
+ ------------------------------------
+ Input: char *string String to search for characters
+ char ch Character for which to search
+ Returns: int Number of occurrences of ch in string
+
+ Counts occurrences of charcater ch in string, string.
+
+ 17.07.95 Original By: ACRM
+*/
+int countchar(char *string, char ch)
+{
+ char *chp;
+ int count = 0;
+
+ if(string==NULL)
+ return(0);
+
+ for(chp=string, count=0; *chp; chp++)
+ {
+ if(*chp == ch)
+ count++;
+ }
+ return(count);
+}
+
+
diff --git a/src/bioplib/fit.c b/src/bioplib/fit.c
new file mode 100644
index 0000000..804dc42
--- /dev/null
+++ b/src/bioplib/fit.c
@@ -0,0 +1,373 @@
+/*************************************************************************
+
+ Program:
+ File: fit.c
+
+ Version: V1.5
+ Date: 03.04.09
+ Function: Perform least squares fitting of coordinate sets
+
+ Copyright: (c) SciTech Software 1993-7
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: amartin at stagleys.demon.co.uk
+ martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ This code performs least squares fitting of two coordinate set using
+ the method of McLachlan as modified by Sutcliffe.
+
+**************************************************************************
+
+ Usage:
+ ======
+ Passed two coordinate arrays both centred around the origin and,
+ optionally, an array of weights, returns a rotation matrix.
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 04.02.91 Original
+ V1.1 01.06.92 ANSIed and static'd
+ V1.2 08.12.92 Changed abs() to ABS() using macros.h. Includes stdio.h
+ V1.3 11.02.94 Changed column flag to BOOL
+ V1.4 03.06.97 Corrected documentation
+ V1.5 03.04.09 Initialize clep in qikfit() By: CTP
+
+*************************************************************************/
+/* Includes
+*/
+#include <math.h>
+#include <stdio.h>
+
+#include "MathType.h"
+#include "fit.h"
+#include "macros.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+#define SMALL 1.0e-20 /* Convergence cutoffs */
+#define SMALSN 1.0e-10
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+static void qikfit(REAL umat[3][3], REAL rm[3][3], BOOL column);
+
+/************************************************************************/
+/*>BOOL matfit(COOR *x1, COOR *x2, REAL rm[3][3], int n,
+ REAL *wt1, BOOL column)
+ -----------------------------------------------------
+ Input: COOR *x1 First (fixed) array of coordinates
+ COOR *x2 Second (mobile) array of coordinates
+ int n Number of coordinates
+ REAL *wt1 Weight array or NULL
+ BOOL column TRUE: Output a column-wise matrix (as used
+ by FRODO)
+ FALSE: Output a standard row-wise matrix.
+ Output: REAL rm[3][3] Returned rotation matrix
+ Returns: BOOL TRUE: success
+ FALSE: error
+
+ Fit coordinate array x2 to x1 both centred around the origin and of
+ length n. Optionally weighted with the wt1 array if wt1 is not NULL.
+ If column is set the matrix will be returned column-wise rather
+ than row-wise.
+
+ 04.02.91 Original based on code by Mike Sutcliffe
+ 01.06.92 ANSIed & doc'd
+ 17.06.93 various changes for release (including parameters)
+ 11.03.94 column changed to BOOL
+ 25.11.02 Corrected header!
+*/
+BOOL matfit(COOR *x1, /* First coord array */
+ COOR *x2, /* Second coord array */
+ REAL rm[3][3], /* Rotation matrix */
+ int n, /* Number of points */
+ REAL *wt1, /* Weight array */
+ BOOL column) /* Column-wise output */
+{
+ int i,j;
+ REAL umat[3][3];
+
+
+ if(n<2)
+ {
+ return(FALSE);
+ }
+
+ if(wt1)
+ {
+ for(i=0;i<3;i++)
+ {
+ for(j=0;j<3;j++) umat[i][j] = 0.0;
+
+ for(j=0;j<n;j++)
+ {
+ switch(i)
+ {
+ case 0:
+ umat[i][0] += wt1[j] * x1[j].x * x2[j].x;
+ umat[i][1] += wt1[j] * x1[j].x * x2[j].y;
+ umat[i][2] += wt1[j] * x1[j].x * x2[j].z;
+ break;
+ case 1:
+ umat[i][0] += wt1[j] * x1[j].y * x2[j].x;
+ umat[i][1] += wt1[j] * x1[j].y * x2[j].y;
+ umat[i][2] += wt1[j] * x1[j].y * x2[j].z;
+ break;
+ case 2:
+ umat[i][0] += wt1[j] * x1[j].z * x2[j].x;
+ umat[i][1] += wt1[j] * x1[j].z * x2[j].y;
+ umat[i][2] += wt1[j] * x1[j].z * x2[j].z;
+ break;
+ }
+ }
+ }
+ }
+ else
+ {
+ for(i=0;i<3;i++)
+ {
+ for(j=0;j<3;j++) umat[i][j] = 0.0;
+
+ for(j=0;j<n;j++)
+ {
+ switch(i)
+ {
+ case 0:
+ umat[i][0] += x1[j].x * x2[j].x;
+ umat[i][1] += x1[j].x * x2[j].y;
+ umat[i][2] += x1[j].x * x2[j].z;
+ break;
+ case 1:
+ umat[i][0] += x1[j].y * x2[j].x;
+ umat[i][1] += x1[j].y * x2[j].y;
+ umat[i][2] += x1[j].y * x2[j].z;
+ break;
+ case 2:
+ umat[i][0] += x1[j].z * x2[j].x;
+ umat[i][1] += x1[j].z * x2[j].y;
+ umat[i][2] += x1[j].z * x2[j].z;
+ break;
+ }
+ }
+ }
+ }
+ qikfit(umat,rm,column);
+
+ return(TRUE);
+}
+
+/************************************************************************/
+/*>static void qikfit(REAL umat[3][3], REAL rm[3][3], BOOL column)
+ ---------------------------------------------------------------
+ Input: REAL umat[3][3] The U matrix
+ BOOL column TRUE: Create a column-wise matrix
+ (other way round from normal).
+ Output: REAL rm[3][3] The output rotation matrix
+
+ Does the actual fitting for matfit().
+ 04.02.91 Original based on code by Mike Sutcliffe
+ 01.06.92 ANSIed & doc'd
+ 11.03.94 column changed to BOOL
+ 03.04.09 Initialize clep for fussy compliers. By: CTP
+*/
+static void qikfit(REAL umat[3][3],
+ REAL rm[3][3],
+ BOOL column)
+{
+
+ REAL rot[3][3],
+ turmat[3][3],
+ c[3][3],
+ coup[3],
+ dir[3],
+ step[3],
+ v[3],
+ rtsum,rtsump,
+ rsum,
+ stp,stcoup,
+ ud,tr,ta,cs,sn,ac,
+ delta,deltap,
+ gfac,
+ cle,
+ clep = 0.0;
+ int i,j,k,l,m,
+ jmax,
+ ncyc,
+ nsteep,
+ nrem;
+
+ /* Rotate repeatedly to reduce couple about initial direction to zero.
+ Clear the rotation matrix
+ */
+ for(l=0;l<3;l++)
+ {
+ for(m=0;m<3;m++)
+ rot[l][m] = 0.0;
+ rot[l][l] = 1.0;
+ }
+
+ /* Copy vmat[][] (sp) into umat[][] (dp) */
+ jmax = 30;
+ rtsum = umat[0][0] + umat[1][1] + umat[2][2];
+ delta = 0.0;
+
+ for(ncyc=0;ncyc<jmax;ncyc++)
+ {
+ /* Modified CG. For first and every NSTEEP cycles, set previous
+ step as zero and do an SD step
+ */
+ nsteep = 3;
+ nrem = ncyc-nsteep*(int)(ncyc/nsteep);
+
+ if(!nrem)
+ {
+ for(i=0;i<3;i++) step[i]=0.0;
+ clep = 1.0;
+ }
+
+ /* Couple */
+ coup[0] = umat[1][2]-umat[2][1];
+ coup[1] = umat[2][0]-umat[0][2];
+ coup[2] = umat[0][1]-umat[1][0];
+ cle = sqrt(coup[0]*coup[0] + coup[1]*coup[1] + coup[2]*coup[2]);
+
+ /* Gradient vector is now -coup */
+ gfac = (cle/clep)*(cle/clep);
+
+ /* Value of rtsum from previous step */
+ rtsump = rtsum;
+ deltap = delta;
+ clep = cle;
+ if(cle < SMALL) break;
+
+ /* Step vector conjugate to previous */
+ stp = 0.0;
+ for(i=0;i<3;i++)
+ {
+ step[i]=coup[i]+step[i]*gfac;
+ stp += (step[i] * step[i]);
+ }
+ stp = 1.0/sqrt(stp);
+
+ /* Normalised step */
+ for(i=0;i<3;i++) dir[i] = stp*step[i];
+
+ /* Couple resolved along step direction */
+ stcoup = coup[0]*dir[0] + coup[1]*dir[1] + coup[2]*dir[2];
+
+ /* Component of UMAT along direction */
+ ud = 0.0;
+ for(l=0;l<3;l++)
+ for(m=0;m<3;m++)
+ ud += umat[l][m]*dir[l]*dir[m];
+
+
+ tr = umat[0][0]+umat[1][1]+umat[2][2]-ud;
+ ta = sqrt(tr*tr + stcoup*stcoup);
+ cs=tr/ta;
+ sn=stcoup/ta;
+
+ /* If cs<0 then posiiton is unstable, so don't stop */
+ if((cs>0.0) && (ABS(sn)<SMALSN)) break;
+
+ /* Turn matrix for correcting rotation:
+
+ Symmetric part
+ */
+ ac = 1.0-cs;
+ for(l=0;l<3;l++)
+ {
+ v[l] = ac*dir[l];
+ for(m=0;m<3;m++)
+ turmat[l][m] = v[l]*dir[m];
+ turmat[l][l] += cs;
+ v[l]=dir[l]*sn;
+ }
+
+ /* Asymmetric part */
+ turmat[0][1] -= v[2];
+ turmat[1][2] -= v[0];
+ turmat[2][0] -= v[1];
+ turmat[1][0] += v[2];
+ turmat[2][1] += v[0];
+ turmat[0][2] += v[1];
+
+ /* Update total rotation matrix */
+ for(l=0;l<3;l++)
+ {
+ for(m=0;m<3;m++)
+ {
+ c[l][m] = 0.0;
+ for(k=0;k<3;k++)
+ c[l][m] += turmat[l][k]*rot[k][m];
+ }
+ }
+
+ for(l=0;l<3;l++)
+ for(m=0;m<3;m++)
+ rot[l][m] = c[l][m];
+
+ /* Update umat tensor */
+ for(l=0;l<3;l++)
+ for(m=0;m<3;m++)
+ {
+ c[l][m] = 0.0;
+ for(k=0;k<3;k++)
+ c[l][m] += turmat[l][k]*umat[k][m];
+ }
+
+ for(l=0;l<3;l++)
+ for(m=0;m<3;m++)
+ umat[l][m] = c[l][m];
+
+ rtsum = umat[0][0] + umat[1][1] + umat[2][2];
+ delta = rtsum - rtsump;
+
+ /* If no improvement in this cycle then stop */
+ if(ABS(delta)<SMALL) break;
+
+ /* Next cycle */
+ }
+
+ rsum = rtsum;
+
+ /* Copy rotation matrix for output */
+ if(column)
+ {
+ for(i=0;i<3;i++)
+ for(j=0;j<3;j++)
+ rm[j][i] = rot[i][j];
+ }
+ else
+ {
+ for(i=0;i<3;i++)
+ for(j=0;j<3;j++)
+ rm[i][j] = rot[i][j];
+ }
+}
diff --git a/src/bioplib/fit.h b/src/bioplib/fit.h
new file mode 100644
index 0000000..695ea1e
--- /dev/null
+++ b/src/bioplib/fit.h
@@ -0,0 +1,58 @@
+/*************************************************************************
+
+ Program:
+ File: fit.h
+
+ Version: V1.1R
+ Date: 01.03.94
+ Function: Include file for least squares fitting
+
+ Copyright: (c) SciTech Software 1993-4
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 04.02.91 Original
+ V1.1 08.12.92 Removed qikfit() prototype as is static
+
+*************************************************************************/
+#ifndef _FIT_H
+#define _FIT_H
+
+#include "MathType.h"
+#include "SysDefs.h"
+
+/* Prototypes for functions defined in fit.c */
+BOOL matfit(COOR *x1, COOR *x2, REAL rm[3][3], int n, REAL *wt1,
+ BOOL column);
+
+#endif
+
diff --git a/src/bioplib/fsscanf.c b/src/bioplib/fsscanf.c
new file mode 100644
index 0000000..e8e21c9
--- /dev/null
+++ b/src/bioplib/fsscanf.c
@@ -0,0 +1,344 @@
+/*************************************************************************
+
+ Program:
+ File: fsscanf.c
+
+ Version: V1.3R
+ Date: 13.01.97
+ Function: Read from a string using FORTRAN-like rigid formatting
+
+ Copyright: (c) SciTech Software 1993-7
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ Hard formatted version of sscanf(). Implements FORTRAN-like file
+ reading.
+
+ The only parsing characters recognised are:
+ %<n>f A single precision floating point number of width <n>
+ %<n>lf A double precision floating point number of width <n>
+ %<n>d An integer of width <n>
+ %<n>ld A long integer of width <n>
+ %<n>u An unsigned of width <n>
+ %<n>lu An unsigned long of width <n>
+ %<n>s A string of width <n>
+ %c A character (of width 1)
+ %<n>x <n> spaces (like FORTRAN).
+ With the exception of the %c parser, the column width, <n>,
+ *must* be specified.
+
+ Blank fields read as numbers are given a value of zero.
+
+ Returns: The number of arguments filled in (EOF if blank string or no
+ specifiers found in format string).
+
+**************************************************************************
+
+ Usage:
+ ======
+ For example:
+
+ double DoubVar;
+ int IntVar;
+ char CharVar,
+ StringVar[16];
+
+ fsscanf(buffer,"%8lf%5x%3d%c%3x%8s",
+ &DoubVar,&IntVar,&CharVar,StringVar);
+
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 17.06.93 Original By: ACRM
+ V1.1 12.07.93 Added %u and %lu. Corrected %s and %c to blank rather
+ than NULL strings if buffer runs out. Pads string if
+ buffer ran out in the middle. Takes \n in buffer as end
+ of string.
+ V1.2 24.11.95 `value' was a fixed 40 character buffer. Now changed to
+ allocate a suitable number of characters as required.
+ V1.3 13.01.97 Now does the EOF return at the end of the routine
+ rather than at the beginning so that all the variable
+ get set to blank or zero first.
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdio.h>
+#include <string.h>
+#include <stdarg.h>
+#include <stdlib.h>
+#include <ctype.h>
+
+#include "SysDefs.h"
+#include "macros.h"
+#include "general.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>int fsscanf(char *buffer, char *format, ...)
+ --------------------------------------------
+ Input: char *buffer Buffer from which to read information
+ char *format Format string (like scanf() et al., but see
+ restrictions below)
+ Output: ... Scanned output variables
+ Returns: int Number of values read (EOF if end of file or
+ no specifiers found in format string)
+
+ Hard formatted version of sscanf(). Implements FORTRAN-like rigid
+ column reading out of a string.
+
+ The only parsing characters recognised are:
+ %<n>f A single precision floating point number of width <n>
+ %<n>lf A double precision floating point number of width <n>
+ %<n>d An integer of width <n>
+ %<n>ld A long integer of width <n>
+ %<n>u An unsigned of width <n>
+ %<n>lu An unsigned long of width <n>
+ %<n>s A string of width <n>
+ %c A character (of width 1)
+ %<n>x <n> spaces (like FORTRAN).
+ With the exception of the %c parser, the column width, <n>,
+ *must* be specified.
+
+ Blank fields read as numbers are given a value of zero.
+
+
+ 17.06.93 Original By: ACRM
+ 12.07.93 Added %u and %lu. Corrected %s and %c to blank rather than
+ NULL strings if buffer runs out. Pads string if buffer ran
+ out in the middle. Takes \n in buffer as end of string.
+ 24.11.95 `value' was a fixed 40 character buffer. Now changed to
+ allocate a suitable number of characters as required.
+ 13.01.97 Previously if reading from a blank line the output variables
+ were unmodified since an EOF return was done immediately.
+ Now the immediate EOF return only happens if the input
+ buffer is a NULL variable and the EOF on blank string is
+ moved to the end such that all output variables are set to
+ zero or blank before the EOF return.
+*/
+int fsscanf(char *buffer, char *format, ...)
+{
+ va_list ap;
+ char *FormStart,
+ *BuffStart,
+ *stop,
+ form[16], /* Store a single formatting code */
+ *value = NULL, /* Store an item */
+ *ptr,
+ type;
+ int i,
+ MaxValLength = 40, /* Initial max value width */
+ *IntPtr,
+ NArg = 0,
+ width = 0;
+ BOOL LongType = FALSE;
+ double *DblPtr;
+ float *FloatPtr;
+ long *LongPtr;
+ unsigned *UPtr;
+ unsigned long *ULongPtr;
+
+ /* Return if line is blank */
+ if(!buffer) return(EOF);
+
+ /* Allocate initial memory for storing a value */
+ if((value=(char *)malloc((1+MaxValLength)*sizeof(char)))==NULL)
+ return(0);
+
+ /* Start the variable argument processing */
+ va_start(ap, format);
+
+ /* Intialise FormStart to the start of the format string and BuffStart
+ to start of input buffer
+ */
+ FormStart = format;
+ BuffStart = buffer;
+
+ for(;;)
+ {
+ /* Flag for long variables */
+ LongType = FALSE;
+
+ /* Find the start of a % group from the format string */
+ while(*FormStart && *FormStart != '%') FormStart++;
+ if(!(*FormStart)) break; /* Exit routine */
+
+ /* Find the next occurence of a % */
+ stop = FormStart+1;
+ while(*stop && *stop != '%') stop++;
+
+ /* Copy these format characters into our working buffer */
+ for(i=0; FormStart != stop; i++)
+ form[i] = *(FormStart++);
+ form[i] = '\0';
+
+ /* Find the type we're dealing with */
+ ptr = form + i;
+ while(*ptr == '\0' || *ptr == ' ' || *ptr == '\t') ptr--;
+ type = toupper(*ptr);
+
+ /* Set long flag if appropriate */
+ if((*(ptr-1) == 'l') || (*(ptr-1) == 'L'))
+ LongType = TRUE;
+
+ /* If it's not a character, read the width from the form string */
+ width = 0;
+ if(type == 'C')
+ {
+ width = 1;
+ }
+ else
+ {
+ for(ptr = form+1; *ptr && isdigit(*ptr); ptr++)
+ {
+ width *= 10;
+ width += (*ptr) - '0';
+ }
+ }
+
+ /* See if our buffer is wide enough for this item. If not, make
+ more space
+ */
+ if(width > MaxValLength)
+ {
+ if((value = (char *)realloc(value, (width+1) * sizeof(char)))
+ ==NULL)
+ {
+ /* Unable to do allocation */
+ va_end(ap);
+ return(0);
+ }
+ MaxValLength = width;
+ }
+
+
+ /* Extract width characters from the input buffer. If the input
+ buffer has run out, value will be a NULL string.
+ */
+ stop = BuffStart + width;
+ for(i=0; *BuffStart && *BuffStart != '\n' && BuffStart != stop; i++)
+ value[i] = *(BuffStart++);
+ value[i] = '\0';
+
+ /* Act on each type */
+ switch(type)
+ {
+ case 'F': /* A double precision or float */
+ if(LongType)
+ {
+ DblPtr = va_arg(ap, double *);
+ if(sscanf(value,"%lf", DblPtr) == (-1))
+ *DblPtr = (double)0.0;
+ }
+ else
+ {
+ FloatPtr = va_arg(ap, float *);
+ if(sscanf(value,"%f", FloatPtr) == (-1))
+ *FloatPtr = (float)0.0;
+ }
+
+ break;
+ case 'D': /* An integer or long int */
+ if(LongType)
+ {
+ LongPtr = va_arg(ap, long *);
+ if(sscanf(value,"%ld", LongPtr) == (-1))
+ *LongPtr = 0L;
+ }
+ else
+ {
+ IntPtr = va_arg(ap, int *);
+ if(sscanf(value,"%d", IntPtr) == (-1))
+ *IntPtr = 0;
+ }
+ break;
+ case 'U': /* An unsigned or unsigned long */
+ if(LongType)
+ {
+ ULongPtr = va_arg(ap, unsigned long *);
+ if(sscanf(value,"%lu", ULongPtr) == (-1))
+ *ULongPtr = 0L;
+ }
+ else
+ {
+ UPtr = va_arg(ap, unsigned *);
+ if(sscanf(value,"%u", UPtr) == (-1))
+ *UPtr = 0;
+ }
+ break;
+ case 'S': /* A string */
+ ptr = va_arg(ap, char *);
+ if(value[0]) /* Input buffer not empty */
+ {
+ *(value + width) = '\0';
+ strncpy(ptr, value, width+1);
+
+ /* If the input buffer ran out in this string, pad with
+ spaces and terminate.
+ */
+ if(strlen(ptr) < width) padterm(ptr, width);
+ }
+ else /* Input buffer empty */
+ {
+ for(i=0; i<width; i++)
+ *(ptr + i) = ' ';
+ *(ptr + width) = '\0';
+ }
+ break;
+ case 'C': /* A character (insert a space if buffer empty) */
+ *(va_arg(ap, char *)) = (value[0] ? value[0]: ' ');
+ break;
+ case 'X': /* A column to skip */
+ /* Fall through to default action */
+ default:
+ /* Do nothing */
+ ;
+ }
+
+ /* If not a blank column, increment arg count */
+ if(type != 'X') NArg++;
+
+ }
+
+ /* End variable argument parsing */
+ va_end(ap);
+
+ /* Free the allocated buffer */
+ free(value);
+
+ /* Return number of values read or EOF if it was a blank input */
+ if(buffer[0] == '\0' || buffer[0] == '\n') return(EOF);
+ return(NArg);
+}
diff --git a/src/bioplib/fsscanf.h b/src/bioplib/fsscanf.h
new file mode 100644
index 0000000..ee8ae8f
--- /dev/null
+++ b/src/bioplib/fsscanf.h
@@ -0,0 +1,67 @@
+/*************************************************************************
+
+ Program:
+ File: fsscanf.h
+
+ Version: V1.1R
+ Date: 01.03.94
+ Function: Include file for fsscanf()
+
+ Copyright: (c) SciTech Software 1994
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+
+*************************************************************************/
+/* Includes
+*/
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+
+#ifndef _FSSCANF_H
+#define _FSSCANF_H
+
+int fsscanf(char *buffer, char *format, ...);
+
+#endif
diff --git a/src/bioplib/general.h b/src/bioplib/general.h
new file mode 100644
index 0000000..49f579a
--- /dev/null
+++ b/src/bioplib/general.h
@@ -0,0 +1,111 @@
+/*************************************************************************
+
+ Program:
+ File: general.h
+
+ Version: V1.12R
+ Date: 30.05.02
+ Function: Header file for general purpose routines
+
+ Copyright: (c) SciTech Software 1994-2002
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: amartin at stagleys.demon.co.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 11.05.94 Original By: ACRM
+ V1.1 24.08.94 Added OpenStdFiles()
+ V1.2 22.09.94 Added OpenFile()
+ V1.3 17.07.95 Added countchar()
+ V1.4 11.09.95 Added fgetsany()
+ V1.5 18.10.95 Moved YorN() to WindIO.h
+ V1.6 06.11.95 Added StoreString(), InStringList() and FreeStringList()
+ V1.7 15.12.95 Added QueryStrStr()
+ V1.8 09.07.96 Added IndxReal()
+ V1.9 18.09.96 Added padchar()
+ V1.11 13.06.00 Added strcatalloc()
+ V1.12 30.05.02 Added WrapString(), WrapPrint(), RightJustify(),
+ GetWordNC() and getfield()
+
+*************************************************************************/
+#ifndef _GENERAL_H
+#define _GENERAL_H
+
+#include <stdio.h>
+#include "SysDefs.h"
+#include "MathType.h"
+
+typedef struct _stringlist
+{
+ struct _stringlist *next;
+ char *string;
+} STRINGLIST;
+
+void StringToLower(char *string1, char *string2);
+void StringToUpper(char *string1, char *string2);
+char *KillLeadSpaces(char *string);
+void KillLine(FILE *fp);
+void SetExtn(char *File, char *Ext);
+int chindex(char *string, char ch);
+void Word(char *string1, char *string2);
+void WordN(char *string1, char *string2, int MaxChar);
+void padterm(char *string, int length);
+void padchar(char *string, int length, char ch);
+BOOL CheckExtn(char *string, char *ext);
+char *ftostr(char *str, int maxlen, REAL x, int precision);
+
+void GetFilestem(char *filename, char *stem);
+int upstrcmp(char *word1, char *word2);
+int upstrncmp(char *word1, char *word2, int ncomp);
+char *GetWord(char *buffer, char *word, int maxsize);
+BOOL OpenStdFiles(char *infile, char *outfile, FILE **in, FILE **out);
+FILE *OpenFile(char *filename, char *envvar, char *mode, BOOL *noenv);
+int countchar(char *string, char ch);
+char *fgetsany(FILE *fp);
+char *strcatalloc(char *instr, char *catstr);
+
+STRINGLIST *StoreString(STRINGLIST *StringList, char *string);
+BOOL InStringList(STRINGLIST *StringList, char *string);
+void FreeStringList(STRINGLIST *StringList);
+
+char *QueryStrStr(char *string, char *substring);
+
+void IndexReal(REAL *arrin, int *indx, int n);
+
+FILE *OpenOrPipe(char *filename);
+int CloseOrPipe(FILE *fp);
+
+BOOL WrapString(char *in, char *out, int maxlen);
+BOOL WrapPrint(FILE *out, char *string);
+void RightJustify(char *string);
+char *GetWordNC(char *buffer, char *word, int maxlen);
+void getfield(char *buffer, int start, int width, char *str);
+
+#endif
diff --git a/src/bioplib/help.c b/src/bioplib/help.c
new file mode 100644
index 0000000..f77a525
--- /dev/null
+++ b/src/bioplib/help.c
@@ -0,0 +1,323 @@
+/*************************************************************************
+
+ Program:
+ File: help.c
+
+ Version: V1.3R
+ Date: 18.01.95
+ Function: Provides a simple file-based command line help utility
+
+ Copyright: (c) SciTech Software 1992-5
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ A pair of routines for handling a help facility. All screen/keyboard
+ I/O is via screen() and GetKybdString() routines.
+
+**************************************************************************
+
+ Usage:
+ ======
+ DoHelp(string,helpfilename) This is the normal entry point and should
+ be supplied with the complete command
+ including the word `help'. If this is the
+ only word given, the routine will prompt
+ with Help> and give help on each word
+ typed until return is hit to exit help.
+ If help followed by a keyword is given,
+ help only on that topic will be supplied.
+
+ Help(string,helpfilename) Generates help from helpfilename on the
+ topic named by string. If this is `help'
+ or `?', available topics will be listed.
+
+ Help(NULL,"CLOSE") Used to close the help file
+
+ Under Unix, the environment variable, HELPDIR should be set to
+ specify the directory in which help files are stored.
+ Under operating systems such as VAX/VMS and AmigaDOS which support
+ the assign command, the HELPDIR: assign should be set up for the help
+ directory.
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 29.09.92 Original
+ V1.1 12.08.93 Returns correctly if help file not found
+ V1.2 04.01.94 Changed DoHelp() to fix problem with compilers which
+ don't let you write to strings defined in double
+ inverted commas and never assigned to a variable.
+ Added getenv call for Unix support
+ V1.3 18.01.95 Help() changed to call OpenFile() rather than handling
+ alternative directory internally. Consequently assign
+ or envvar is called HELPDIR.
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdio.h>
+#include <string.h>
+#include <ctype.h>
+#include <stdlib.h>
+
+#include "macros.h"
+#include "WindIO.h"
+#include "parse.h"
+#include "help.h"
+#include "general.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+#define BUFFLEN 240 /* Buffer for reading from file */
+#define HELPENV "HELPDIR" /* The help directory (variable) for unix or
+ assign name for VMS/AmigaDOS */
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>void Help(char *string, char *HelpFile)
+ ---------------------------------------
+ Input: char *string Topic on which to provide help. If "help" or
+ "?" then list all topics
+ char *HelpFile Name of help file, or "CLOSE" to close the
+ help file
+
+ Generates help from a help file on the topic named by string. If
+ this is `help' or `?', available topics will be listed.
+
+ Only one help file may be open at a time. Once a file is open, no check
+ is made on the HelpFile string to check that you still require the same
+ help file. You must first close the file before changing to a new file.
+
+ Calling:
+ Help(NULL,"CLOSE")
+ will close the help file.
+
+ The directory specified by the environment variable or assign
+ specified by the #define HELPENV (normally HELPDIR) will be searched
+ for the help file if not found in the local directory.
+
+ 25.09.92 Original
+ 28.09.92 Added HELP
+ 29.09.92 Changed to file-based version
+ 07.10.92 Added paging support
+ 12.08.93 Correctly returns if help file not found
+ 05.01.94 Handles getenv() call for Unix
+ 11.03.94 Resets FirstCall to TRUE when file is closed
+ 18.01.95 Calls OpenFile() rather than handling alternative directory
+ internally. Consequently assign or envvar is called HELPDIR.
+*/
+void Help(char *string,
+ char *HelpFile)
+{
+ int nletters,
+ buffpos,
+ i,
+ Found;
+ static int FirstCall = TRUE;
+ char FileBuff[BUFFLEN],
+ buffer[80],
+ *ptr;
+ static FILE *fp = NULL;
+
+ if(!strcmp(HelpFile,"CLOSE"))
+ {
+ if(fp)
+ fclose(fp);
+ FirstCall = TRUE;
+ return;
+ }
+
+ /* If first call, open file */
+ if(FirstCall)
+ {
+ BOOL NoEnv;
+
+ if((fp=OpenFile(HelpFile, HELPENV, "r", &NoEnv))==NULL)
+ {
+ screen(" Error==> Unable to open help file.\n");
+ sprintf(FileBuff," The %s environment variable \
+or assign has not been set.\n",HELPENV);
+ screen(FileBuff);
+ return;
+ }
+
+ FirstCall = FALSE;
+ }
+
+ /* Rewind the file */
+ rewind(fp);
+
+ /* If asking from general help, display known commands */
+ if(match(string,"HELP",&nletters) || string[0] == '?')
+ {
+ /* Search the file for keywords, echoing them to the screen */
+ buffpos = 0;
+
+ while(fgets(FileBuff, BUFFLEN, fp))
+ {
+ TERMINATE(FileBuff);
+ if(FileBuff[0] == '#')
+ {
+ if(buffpos + strlen(FileBuff) > 58)
+ {
+ buffer[buffpos] = '\0';
+
+ screen(" ");
+ screen(buffer);
+ screen("\n");
+ buffpos = 0;
+ }
+ for(i=1; i<strlen(FileBuff); i++)
+ {
+ buffer[buffpos++] = FileBuff[i];
+ }
+ buffer[buffpos++] = ' ';
+ }
+ }
+
+ /* Print the last line */
+ if(buffpos != 0)
+ {
+ buffer[buffpos] = '\0';
+
+ screen(" ");
+ screen(buffer);
+ screen("\n");
+ }
+ }
+ else /* Asking for help on a specific subject */
+ {
+ Found = FALSE;
+ PagingOn();
+
+ while(fgets(FileBuff, BUFFLEN, fp))
+ {
+ TERMINATE(FileBuff);
+ if(FileBuff[0] == '#')
+ {
+ ptr = FileBuff+1;
+ UPPER(ptr);
+ if(match(string,ptr,&nletters))
+ {
+ Found = TRUE;
+ while(fgets(FileBuff, BUFFLEN, fp))
+ {
+ TERMINATE(FileBuff);
+ if(FileBuff[0] == '#') break;
+
+ screen(FileBuff);
+ screen("\n");
+ }
+ }
+ }
+ }
+ if(!Found)
+ {
+ screen(" Sorry, no help on '");
+ screen(string);
+ screen("'\n");
+ }
+ PagingOff();
+ }
+}
+
+/************************************************************************/
+/*>void DoHelp(char *string, char *HelpFile)
+ -----------------------------------------
+ Input: char *string String on which to give help, must include
+ the word "help". If given on its own, sits
+ in a loop prompting for help.
+ char *HelpFile The help file to search
+
+ Handles help facility.
+ This is the normal entry point and should be supplied with the
+ complete command including the word `help'.
+
+ e.g. DoHelp("help","foo.hlp");
+ or DoHelp("help bar","foo.hlp");
+
+ If help is the only word given, the routine will prompt with Help>
+ and give help on each word typed until return is hit to exit help. If
+ help followed by a keyword is given, help only on that topic will be
+ supplied.
+
+ 25.09.92 Original
+ 28.09.92 Changed to call Help("Help")
+ 02.10.92 Added GetKybdString() rather than gets()
+ 04.01.94 Changed to fix problem with compilers which
+ don't let you write to strings defined in double
+ inverted commas and never assigned to a variable
+*/
+void DoHelp(char *string,
+ char *HelpFile)
+{
+ int i;
+ char *str,
+ buffer[160];
+
+ /* Trim any trailing spaces */
+ strcpy(buffer,string); /* 04.01.94: Put string in buffer */
+ for(i=strlen(buffer)-1;buffer[i]==' '||buffer[i]=='\t';i--);
+ buffer[++i]='\0';
+
+ if((str=strchr(buffer,' '))!=NULL) /* See if a space is in the string*/
+ {
+ /* Yes, set pointer to the position after the space
+ (i.e. the keyword)
+ */
+ str++;
+ }
+ else
+ {
+ /* No keyword was specified, so give help on help */
+ Help("Help",HelpFile);
+ }
+
+
+ if(str) /* If specified, just give help on the keyword */
+ {
+ Help(str,HelpFile);
+ }
+ else /* Sit in a loop handling each keyword */
+ {
+ for(;;)
+ {
+ prompt("Help");
+ GetKybdString(buffer, 160);
+
+ TERMINATE(buffer);
+ if(buffer[0])
+ Help(buffer,HelpFile);
+ else
+ break;
+ }
+ }
+}
diff --git a/src/bioplib/help.h b/src/bioplib/help.h
new file mode 100644
index 0000000..222f96d
--- /dev/null
+++ b/src/bioplib/help.h
@@ -0,0 +1,51 @@
+/*************************************************************************
+
+ Program:
+ File: help.h
+
+ Version: V1.0R
+ Date: 01.03.94
+ Function: Include file for help functions
+
+ Copyright: (c) SciTech Software 1993-4
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+
+*************************************************************************/
+#ifndef _HELP_H
+#define _HELP_H
+
+void Help(char *string, char *HelpFile);
+void DoHelp(char *string, char *HelpFile);
+
+#endif
diff --git a/src/bioplib/macros.h b/src/bioplib/macros.h
new file mode 100644
index 0000000..c4bd538
--- /dev/null
+++ b/src/bioplib/macros.h
@@ -0,0 +1,483 @@
+/*************************************************************************
+
+ Program:
+ File: Macros.h
+
+ Version: V2.17
+ Date: 10.04.08
+ Function: Useful macros
+
+ Copyright: SciTech Software 1991-2008
+ Author: Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ If not Amiga defines abs().
+ Defines max(), min() and PI if not done.
+ Defines list handling macros.
+ Defines newline() and toggle() macros.
+
+**************************************************************************
+
+ Usage:
+ ======
+ INIT(x,y) Initialise list of name x and type y.
+ Set x->next to NULL
+ INITPREV(x,y) Ditto, but also sets x->prev to NULL
+ NEXT(x) Step on in linked list
+ PREV(x) Step back in linked list
+ ALLOCNEXT(x,y) Allocate next item in list and step on
+ ALLOCNEXTPREV(x,y) Allocate next item in list and step on.
+ Also set ->prev item in next item
+ LAST(x) Move to end of list
+ FREELIST(y,z) Free list y of type z
+ NEWLINE Print a newline character ty stdout
+ TOGGLE(x) Toggle a flag
+ RANGECHECK(x,y,z) Return x constrained to range y to z
+ TERMINATE(x) Terminate a string at the first \n
+ MAX(x,y) max() as macro
+ MIN(x,y) min() as macro
+ ABS(x,y) abs() as macro
+ UPPER(x) Converts a string to upper case
+ KILLLEADSPACES(x,y) Makes x a pointer into string y after any spaces
+ or tabs.
+ D(BUG) Prints the BUG string if DEBUG is defined first
+ DELETE(lst,itm,type) Deletes (itm) from linked list (lst) of type
+ (type)
+ TESTINARRAY(x,l,y,r) Tests whether value (y) is in array (x) if length
+ (l) returning the result in (r)
+ FINDINARRAY(x,l,y,r) Finds value (y) is in array (x) if length
+ (l) returning the offset in (r). Offset is -1 if
+ not found
+ SET(x,y) Sets bit y (a hex value) in variable x
+ UNSET(x,y) Clears bit y (a hex value) in variable x
+ ISSET(x,y) Tests bit y (a hex value) in variable x
+ TERMAT(x,y) Terminates character string x at first character y
+ KILLTRAILSPACES(x) Terminate string to remove any trailing white
+ space
+ PROMPT(fp,x) Issue a prompt to stdout if fp is a terminal
+ PADMINTERM(str,len) Pads a string to len chars only if it is shorter
+ DELETEDOUBLE(lst,itm,type) Deletes (itm) from a doubly linked list
+ (lst) of type (type)
+ PADCHARMINTERM(str,char,len) Pads a string to len chars with specified
+ character only if it is shorter
+ DOTIFY(str) Replace ' ' with '.' in string
+ DEDOTIFY(str) Replace '.' with ' ' in string
+ FINDPREV(p, start, q) Set p to item in linked list start before q
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 06.02.91 Original
+ V1.1 15.02.91 Moved PI definition to non-Amiga's only
+ V1.2 21.03.91 Added RANGECHECK
+ V1.3 06.09.91 Added DIST, DISTSQ and Vec3f
+ V1.4 09.09.91 Fixed multi-command macros with {}
+ V1.5 24.01.92 Fixed for 32 bit addresses and added malloc checks.
+ V1.6 03.04.92 Small change to ALLOCNEXT and ALLOCNEXTPREV, so
+ will do a NEXT() even if malloc() fails.
+ V1.7 06.05.92 Added TERMINATE()
+ V1.8 06.07.92 Added MAX(), MIN() and ABS()
+ V1.9 22.07.92 Fixed ABS()
+ V1.10 28.09.92 Added TRUE & FALSE and UPPER()
+ V1.11 03.11.92 Changed TOGGLE and newline is now upper case.
+ V1.12 16.11.92 Added KILLLEADSPACES()
+ V1.13 18.11.92 Fixed UPPER() for MicrosoftC which returns strlen()
+ as unsigned
+ V1.14 20.11.92 ABS() now uses 0 rather than 0.0, so we don't
+ try to use floats with ints...
+ V2.0 24.11.92 Removed all small letter macros
+ V2.1 12.07.93 Added double include check, moved math definitions
+ to MathType.h and added LOWER()
+ V2.2 07.10.93 UPPER() and LOWER() check case first for ESV
+ compatibility
+ V2.3 23.05.94 Added D(BUG)
+ V2.4 14.07.94 Added do{}while(0) bracketing of all multi-line macros
+ V2.5 21.11.94 ABS, MAX and MIN check that they're not already defined
+ V2.6 16.02.95 Added DELETE()
+ V2.7 21.02.95 Updated some internal variable names
+ V2.8 02.08.95 Added TESTINARRAY(), FINDINARRAY(),
+ SET(), UNSET() and ISSET()
+ V2.9 20.11.95 Added TERMAT()
+ V2.10 06.02.96 Added KILLTRAILSPACES()
+ V2.11 14.06.96 Added PROMPT()
+ V2.12 23.07.96 Added PADMINTERM()
+ V2.13 19.09.96 Include ctype for UPPER() etc
+ V2.14 13.03.99 Added DELETEDOUBLE()
+ V2.15 01.03.01 Added DOTIFY() DEDOTIFY() PADCHARMINTERM() SUBSCHAR()
+ V2.16 25.01.06 Added FINDPREV()
+ V2.17 10.04.08 Fixed bug in DELETE() - the break was not properly
+ stopping prev from being changed
+
+*************************************************************************/
+#ifndef _MACROS_H
+#define _MACROS_H
+
+/***************************** Includes *********************************/
+#include <ctype.h>
+
+/**************************** Definitions *******************************/
+#ifndef PI
+#define PI (4.0 * atan(1.0))
+#endif
+
+#ifndef TRUE
+#define TRUE 1
+#endif
+
+#ifndef FALSE
+#define FALSE 0
+#endif
+
+/***************************** Maths macros *****************************/
+#define RANGECHECK(x,y,z) ((x)<(y)) ? (y) : ((x)>(z)) ? (z) : (x)
+#define DISTSQ(a,b) (((a)->x - (b)->x) * ((a)->x - (b)->x) + \
+ ((a)->y - (b)->y) * ((a)->y - (b)->y) + \
+ ((a)->z - (b)->z) * ((a)->z - (b)->z))
+#define DIST(a,b) sqrt(((a)->x - (b)->x) * ((a)->x - (b)->x) + \
+ ((a)->y - (b)->y) * ((a)->y - (b)->y) + \
+ ((a)->z - (b)->z) * ((a)->z - (b)->z))
+#ifndef ABS
+#define ABS(x) (((x)<0) ? (-(x)) : (x))
+#endif
+
+#ifndef MAX
+#define MAX(a,b) (((a)>(b)) ? (a) : (b))
+#define MIN(a,b) (((a)<(b)) ? (a) : (b))
+#endif
+
+/***************************** List macros ******************************/
+#define INIT(x,y) do { x=(y *)malloc(sizeof(y)); \
+ if(x != NULL) x->next = NULL; } while(0)
+#define INITPREV(x,y) do { x=(y *)malloc(sizeof(y));\
+ if(x != NULL) {x->next=NULL; x->prev=NULL;} } \
+ while(0)
+#define NEXT(x) (x)=(x)->next
+#define PREV(x) (x)=(x)->prev
+#define ALLOCNEXT(x,y) do { (x)->next=(y *)malloc(sizeof(y));\
+ if((x)->next != NULL) { (x)->next->next=NULL; }\
+ NEXT(x); } while(0)
+#define ALLOCNEXTPREV(x,y) do { (x)->next=(y *)malloc(sizeof(y));\
+ if((x)->next != NULL)\
+ { (x)->next->prev = (x); \
+ (x)->next->next=NULL; }\
+ NEXT(x);} while(0)
+#define LAST(x) while((x)->next != NULL) NEXT(x)
+/* FREELIST takes 2 parameters:
+ y: name of list
+ z: type of list
+*/
+#define FREELIST(y,z) while((y)!=NULL) \
+ { z *_freelist_macro_q; \
+ _freelist_macro_q = (y)->next; \
+ free((char *)(y)); \
+ (y) = _freelist_macro_q; \
+ }
+
+/*>DELETE(start, item, type)
+ -------------------------
+ Deletes (item) from a linked list.
+ (start) will be modified if (item) is the first in the list.
+ (item) is returned as the pointer to the next item in the list (i.e.
+ as item->next). One can therefore simply call the routine N times
+ to delete N items. If (start) or (item) is NULL, does nothing
+
+ 16.02.95 Original By: ACRM
+ 10.04.08 Fixed position of break. By: CTP
+*/
+#define DELETE(x, y, z) \
+do { \
+ z *_delete_macro_p, \
+ *_delete_macro_prev = NULL, \
+ *_delete_macro_temp, \
+ *_delete_macro_temp2; \
+ if((x)!=NULL && (y)!=NULL) \
+ { \
+ for(_delete_macro_p=(x); \
+ _delete_macro_p!=NULL; \
+ NEXT(_delete_macro_p)) \
+ { \
+ if(_delete_macro_p == (y)) \
+ { \
+ _delete_macro_temp2 = (y)->next; \
+ if(_delete_macro_prev == NULL) \
+ { \
+ _delete_macro_temp = (x)->next; \
+ free(x); \
+ (x) = _delete_macro_temp; \
+ } \
+ else \
+ { \
+ _delete_macro_prev->next = _delete_macro_p->next; \
+ free(_delete_macro_p); \
+ } \
+ break; \
+ } \
+ _delete_macro_prev = _delete_macro_p; \
+ } \
+ (y) = _delete_macro_temp2; \
+ } \
+} while(FALSE)
+
+
+/*>DELETEDOUBLE(start, item, type)
+ -------------------------------
+ Deletes (item) from a doubly linked list.
+ (start) will be modified if (item) is the first in the list.
+ (item) is returned as the pointer to the next item in the list (i.e.
+ as item->next). One can therefore simply call the routine N times
+ to delete N items. If (start) or (item) is NULL, does nothing
+
+ 13.03.99 Original By: ACRM
+*/
+#define DELETEDOUBLE(s, x, y) \
+ do { y *_deleteandnext_macro_temp; \
+ if(((s)!=NULL) && ((x)!=NULL)) \
+ { if((x)==(s)) (s) = (x)->next; \
+ _deleteandnext_macro_temp = (x)->next; \
+ if((x)->prev != NULL) (x)->prev->next = (x)->next; \
+ if((x)->next != NULL) (x)->next->prev = (x)->prev; \
+ free(x); \
+ (x) = _deleteandnext_macro_temp; \
+ } } while(0)
+
+/*>FINDPREV(ptr, start, item)
+ --------------------------
+ Searches a linked list beginning at (start) to find the item which
+ preceeds (item). Its address is put into (ptr). If (item) is the
+ same as (start) or (item) is not found, then the routine returns
+ NULL in (ptr)
+ This is used when wanting to look at the previous item in a singly
+ linked list.
+
+ 26.01.06 Original By: ACRM
+*/
+#define FINDPREV(p, s, l) \
+ do { p = (s); \
+ if((s)==(l)) \
+ { p = NULL; } else \
+ { \
+ while((p != NULL) && (p->next != (l))) \
+ { p = p->next; \
+ } } } while(0)
+
+
+/***************************** Misc. macros *****************************/
+#define NEWLINE printf("\n")
+
+#define TOGGLE(x) (x) = (x) ? FALSE : TRUE
+
+#define TERMINATE(x) do { int _terminate_macro_j; \
+ for(_terminate_macro_j=0; \
+ (x)[_terminate_macro_j]; \
+ _terminate_macro_j++) \
+ { if((x)[_terminate_macro_j] == '\n') \
+ { (x)[_terminate_macro_j] = '\0'; \
+ break; \
+ } } } while(0)
+#define TERMAT(x, y) do { int _termat_macro_j; \
+ for(_termat_macro_j=0; \
+ (x)[_termat_macro_j]; \
+ _termat_macro_j++) \
+ { if((x)[_termat_macro_j] == (y)) \
+ { (x)[_termat_macro_j] = '\0'; \
+ break; \
+ } } } while(0)
+#define UPPER(x) do { int _upper_macro_i; \
+ for(_upper_macro_i=0; \
+ _upper_macro_i<(int)strlen(x) && \
+ (x)[_upper_macro_i]; \
+ _upper_macro_i++) \
+ if(islower((x)[_upper_macro_i])) \
+ (x)[_upper_macro_i] = \
+ (char)toupper((x)[_upper_macro_i]); \
+ } while(0)
+#define LOWER(x) do { int _lower_macro_i; \
+ for(_lower_macro_i=0; \
+ _lower_macro_i<(int)strlen(x) && \
+ (x)[_lower_macro_i]; \
+ _lower_macro_i++) \
+ if(isupper((x)[_lower_macro_i])) \
+ (x)[_lower_macro_i] = \
+ (char)tolower((x)[_lower_macro_i]); \
+ } while(0)
+#define KILLLEADSPACES(y,x) \
+ do \
+ { for((y)=(x); *(y) == ' ' || *(y) == '\t'; (y)++) ; } \
+ while(0)
+
+
+#define KILLTRAILSPACES(x) \
+do { int _kts_macro_i; \
+ _kts_macro_i = strlen(x) - 1; \
+ while(((x)[_kts_macro_i] == ' ' || \
+ (x)[_kts_macro_i] == '\t') && \
+ _kts_macro_i>=0) \
+ (_kts_macro_i)--; \
+ (x)[++(_kts_macro_i)] = '\0'; \
+ } while(0)
+
+
+/* Tests for the presence of (y) in array (x) of length (l). The result
+ (TRUE or FALSE) is returned in (r)
+ 02.08.95 Original
+*/
+#define TESTINARRAY(x, l, y, r) \
+do { \
+ int _inarray_macro_i; \
+ (r) = FALSE; \
+ if((x)==NULL) break; \
+ for(_inarray_macro_i=0; _inarray_macro_i<(l); _inarray_macro_i++) \
+ { if((x)[_inarray_macro_i] == (y)) \
+ { (r) = TRUE; \
+ break; \
+} } } while(FALSE)
+
+/* Finds offset of item (y) in array (x) of length (l). The result
+ is returned in (r) which is -1 if item not found
+ 02.08.95 Original
+*/
+#define FINDINARRAY(x, l, y, r) \
+do { \
+ int _inarray_macro_i; \
+ (r) = (-1); \
+ if((x)==NULL) break; \
+ for(_inarray_macro_i=0; _inarray_macro_i<(l); _inarray_macro_i++) \
+ { if((x)[_inarray_macro_i] == (y)) \
+ { (r) = _inarray_macro_i; \
+ break; \
+} } } while(FALSE)
+
+
+/* Used just like padterm, but doesn't touch the string if it's already
+ longer than len characters
+*/
+#define PADMINTERM(string, len) \
+ do { \
+ if(strlen((string)) < (len)) padterm((string), (len)); \
+ } while(0)
+
+/************************************************************************/
+/*>PADCHARMINTERM(string, char, length)
+ ------------------------------------
+ Pads a string to a specified length using char and terminates at that
+ point
+
+ 13.03.99 Original By: ACRM
+*/
+#define PADCHARMINTERM(s, c, l) \
+do { int _padminterm_macro_i; \
+ if(strlen((s)) < (l)) \
+ { for(_padminterm_macro_i=strlen((s)); \
+ _padminterm_macro_i<(l); \
+ _padminterm_macro_i++) \
+ (s)[_padminterm_macro_i] = (c); \
+ (s)[(l)] = '\0'; \
+ } } while(0)
+
+
+/************************************************************************/
+/*>DOTIFY(char *str)
+ -----------------
+ Macro to replace ' ' in a string with '.'
+
+ 21.04.99 Original By: ACRM
+*/
+#define DOTIFY(str) \
+do { \
+ char *_dotify_macro_chp; \
+ _dotify_macro_chp = str; \
+ while(*_dotify_macro_chp) { \
+ if(*_dotify_macro_chp==' ') *_dotify_macro_chp = '.'; \
+ _dotify_macro_chp++; \
+} } while(0)
+
+/************************************************************************/
+/*>DEDOTIFY(char *str)
+ -------------------
+ Macro to replace '.' in a string with ' '
+
+ 21.04.99 Original By: ACRM
+*/
+#define DEDOTIFY(str) \
+do { \
+ char *_dedotify_macro_chp; \
+ _dedotify_macro_chp = str; \
+ while(*_dedotify_macro_chp) { \
+ if(*_dedotify_macro_chp=='.') *_dedotify_macro_chp = ' '; \
+ _dedotify_macro_chp++; \
+} } while(0)
+
+
+/************************************************************************/
+/*>SUBSCHAR(s, x, y)
+ -----------------
+ Substitute character x by character y in string s
+
+ 21.05.99 Original
+*/
+#define SUBSCHAR(s, x, y) \
+do { char *_subschar_macro_ch = (s); \
+ while(*_subschar_macro_ch != '\0') \
+ { if(*_subschar_macro_ch == (x)) *_subschar_macro_ch = (y); \
+ _subschar_macro_ch++; \
+ } } while(0)
+
+
+
+
+
+/* Bit-wise operators
+ 02.08.95 Original
+*/
+#define SET(x, y) (x) |= (y)
+#define UNSET(x, y) (x) &= (~(y))
+#define ISSET(x, y) ((BOOL)((x)&(y)))
+
+
+
+#ifdef DEBUG
+#define D(BUG) fprintf(stderr,"%s",BUG); fflush(stderr)
+#else
+#define D(BUG)
+#endif
+
+
+/************************** The PROMPT macro ****************************/
+/* isatty() is not POSIX */
+#ifdef __unix
+# if defined(_POSIX_SOURCE) || !defined(_SVR4_SOURCE)
+ extern int isatty(int);
+# endif
+#endif
+
+/* Default is just to print a string as a prompt */
+#define PROMPT(in,x) printf("%s",(x))
+
+/* More intelligent prompts for systems where we know the FILE structure*/
+#ifdef __sgi
+# undef PROMPT
+# define PROMPT(in,x) do{if(isatty((in)->_file)) \
+ printf("%s",(x));}while(0)
+#endif
+#ifdef __linux__
+# undef PROMPT
+# define PROMPT(in,x) do{if(isatty((in)->_fileno)) \
+ printf("%s",(x));}while(0)
+#endif
+
+#endif /* _MACROS_H */
+
diff --git a/src/bioplib/matrix.h b/src/bioplib/matrix.h
new file mode 100644
index 0000000..f1317bf
--- /dev/null
+++ b/src/bioplib/matrix.h
@@ -0,0 +1,72 @@
+/*************************************************************************
+
+ Program:
+ File: matrix.h
+
+ Version: V1.6R
+ Date: 27.09.95
+ Function: Include file for matrix operations
+
+ Copyright: (c) SciTech Software 1995
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+
+*************************************************************************/
+/* Includes
+*/
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+
+#ifndef _MATRIX_H
+#define _MATRIX_H
+#include "MathType.h"
+
+void MatMult3_33(VEC3F vecin, REAL matin[3][3], VEC3F *vecout);
+void MatMult33_33(REAL a[3][3], REAL b[3][3], REAL out[3][3]);
+void invert33(REAL s[3][3], REAL ss[3][3]);
+void CreateRotMat(char direction, REAL angle, REAL matrix[3][3]);
+REAL VecDist(REAL *a, REAL *b, int len);
+
+#endif
diff --git a/src/bioplib/openorpipe.c b/src/bioplib/openorpipe.c
new file mode 100644
index 0000000..53c415b
--- /dev/null
+++ b/src/bioplib/openorpipe.c
@@ -0,0 +1,147 @@
+/*************************************************************************
+
+ Program:
+ File: openorpipe.c
+
+ Version: V1.8
+ Date: 02.04.09
+ Function: Open a file for writing unless the filename starts with
+ a | in which case open as a pipe
+
+ Copyright: (c) SciTech Software 1997-2009
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 26.05.97 Original By: ACRM
+ V1.1 26.06.97 Added calls to signal()
+ V1.2 27.02.98 Uses port.h
+ V1.3 18.08.98 Added cast to popen() for SunOS
+ V1.4 28.01.04 Added NOPIPE define. Allows compilation on systems
+ which don't support unix pipes
+ V1.5 03.02.06 Added prototypes for popen() and pclose()
+ V1.6 29.06.07 popen() and pclose() prototypes now skipped for MAC OSX
+ which defines them differently
+ V1.7 17.03.09 popen() prototype now skipped for Windows.
+ V1.8 02.04.09 Clean compile with NOPIPE defined
+
+*************************************************************************/
+/* Includes
+*/
+#ifndef NOPIPE
+#include "port.h" /* Required before stdio.h */
+#endif
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include "macros.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+#if !defined(__APPLE__) && !defined(MS_WINDOWS)
+FILE *popen(char *, char *);
+#endif
+#ifndef __APPLE__
+int pclose(FILE *);
+#endif
+
+/************************************************************************/
+/*>FILE *OpenOrPipe(char *filename)
+ --------------------------------
+ Input: char *filename A file or pipe to be opened
+ Returns: FILE * A file pointer
+
+ Opens a file for writing unless the filename begins with a | in which
+ case it is opened as a pipe.
+
+ Broken pipe signals are ignored.
+
+ 26.05.97 Original By: ACRM
+ 26.06.97 Added call to signal()
+ 18.08.98 Added case to popen() for SunOS
+ 28.01.05 Added NOPIPE define
+*/
+FILE *OpenOrPipe(char *filename)
+{
+ char *fnam;
+
+ KILLLEADSPACES(fnam, filename);
+#ifdef NOPIPE
+ return(fopen(fnam, "w"));
+#else
+ if(fnam[0] == '|')
+ {
+ signal(SIGPIPE, SIG_IGN);
+ fnam++;
+ KILLLEADSPACES(fnam, fnam);
+ return((FILE *)popen(fnam, "w"));
+ }
+ else
+ {
+ return(fopen(fnam, "w"));
+ }
+#endif
+}
+
+/************************************************************************/
+/*>int CloseOrPipe(FILE *fp)
+ -------------------------
+ Input: FILE *fp File pointer to be closed
+ Returns: int Error code (as for fclose())
+
+ Attempts to close a file pointer as a pipe. If it isn't associated
+ with a pipe (i.e. popen returns (-1)), tries again to close it as
+ a normal file.
+
+ 26.05.97 Original By: ACRM
+ 26.06.97 Added call to signal()
+ 28.01.05 Added NOPIPE define
+ 02.04.09 Moved 'int ret' to be in the #else
+*/
+int CloseOrPipe(FILE *fp)
+{
+#ifdef NOPIPE
+ return(fclose(fp));
+#else
+ int ret;
+
+ if((ret=pclose(fp)) == (-1))
+ return(fclose(fp));
+
+ signal(SIGPIPE, SIG_DFL);
+ return(ret);
+#endif
+}
+
diff --git a/src/bioplib/padterm.c b/src/bioplib/padterm.c
new file mode 100644
index 0000000..b8d1843
--- /dev/null
+++ b/src/bioplib/padterm.c
@@ -0,0 +1,103 @@
+/*************************************************************************
+
+ Program:
+ File: padterm.c
+
+ Version: V1.21
+ Date: 18.06.02
+ Function:
+
+ Copyright: (c) SciTech Software 1991-2002
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 06.09.91 Original
+ V1.1 08.02.91 Added KillLine()
+ V1.2 10.02.91 Added setextn() and index()
+ V1.3 20.03.91 Added Word()
+ V1.4 28.05.92 ANSIed
+ V1.5 22.06.92 Added tab check to Word(). Improved setextn().
+ Added WordN(). Documented other routines.
+ V1.6 27.07.93 Corrected fsscanf() for double precision
+ V1.7 07.10.93 Checks made on case before toupper()/tolower()
+ for SysV compatibility. Also index() becomes
+ chindex()
+ V1.8 18.03.94 getc() -> fgetc()
+ V1.9 11.05.94 Added GetFilestem(), upstrcmp(), upstrncmp() &
+ GetWord()
+ V1.10 24.08.94 Added OpenStdFiles()
+ V1.11 08.03.95 Corrected OpenFile() for non-UNIX
+ V1.12 09.03.95 Added check on non-NULL filename in OpenFile()
+ V1.13 17.07.95 Added countchar()
+ V1.14 18.10.95 Moved YorN() to WindIO.c
+ V1.15 06.11.95 Added StoreString(), InStringList() and FreeStringList()
+ V1.16 22.11.95 Moved ftostr() to generam.c
+ V1.17 15.12.95 Added QueryStrStr()
+ V1.18 18.12.95 OpenStdFiles() treats filename of - as stdin/stdout
+ V1.19 05.02.96 OpenStdFiles() allows NULL pointers instead if filenames
+ V1.20 18.09.96 Added padchar()
+ V1.21 18.06.02 Added string.h
+
+*************************************************************************/
+/* Includes
+*/
+#include <string.h>
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>void padterm(char *string, int length)
+ ---------------------------------------
+ I/O: char *string String to be padded with spaces
+ Input: int length Required size for string
+
+ Pads a string with spaces to length characters, then terminates it.
+
+ 06.09.91 Original By: ACRM
+*/
+void padterm(char *string,
+ int length)
+{
+ int i;
+
+ for(i=strlen(string); i<length; i++)
+ string[i] = ' ';
+ string[length] = '\0';
+}
+
+
diff --git a/src/bioplib/parse.c b/src/bioplib/parse.c
new file mode 100644
index 0000000..908cc68
--- /dev/null
+++ b/src/bioplib/parse.c
@@ -0,0 +1,467 @@
+/*************************************************************************
+
+ Program:
+ File: parse.c
+
+ Version: V1.9R
+ Date: 08.10.99
+ Function: A keyword command parser
+
+ Copyright: (c) SciTech Software 1990-4
+ Author: Dr. Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at stagleys.demon.co.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+ parse() is a command line parser which will accept upper or
+ lower case commands and abbreviations. Comment lines may be
+ indicated using a !. The keyword structure array and returned
+ string array are defined thus:
+ KeyWd keywords[NCOMM];
+ char *strparam[MAXSTRPARAM];
+ The returned REAL parameters are defined thus:
+ REAL floatparam[MAXFLOATPARAM];
+ Space for the returned strings must be allocated thus:
+ strparam[n] = (char *)malloc(MAXSTRLEN * sizeof(char));
+ and repeated for each parameter.
+
+ The keyword list with type and numbers of returned parameters
+ is constructed using the MAKEKEY macros:
+ MAKEKEY(keywords[0],"RANGE",NUMBER,2);
+ MAKEKEY(keywords[1],"STRING",STRING,1);
+ Here, the keywords must be defined in upper case.
+
+
+ mparse() is used in the same way, but allows a variable number of
+ parameters for each keyword. Keywords are of type MKeyWd and are
+ defined using the macro MAKEMKEY:
+ MAKEMKEY(keywords[0],"RANGE",NUMBER,2,2);
+ MAKEMKEY(keywords[1],"STRING",STRING,1,3);
+
+**************************************************************************
+
+ Usage:
+ ======
+ parse(comline,nkeys,keywords,floatparam,strparam)
+ Input: char *comline A command line string to parse
+ int nkeys Number of keywords
+ KeyWd *keywords Array of keyword structures
+ Output: REAL *floatparam Array of returned strings
+ char **strparam Array of pointers to returned strings
+ Returns: int Index of found command or error flag
+
+ mparse(comline,nkeys,keywords,floatparam,strparam,nparam)
+ Input: char *comline A command line string to parse
+ int nkeys Number of keywords
+ MKeyWd *keywords Array of keyword structures
+ Output: REAL *floatparam Array of returned strings
+ char **strparam Array of pointers to returned strings
+ int *nparam Number of parameters found
+ Returns: int Index of found command or error flag
+
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 11.07.90 Original
+ V1.1 29.10.90 match() now frees the memory it allocates and calls
+ terminate()
+ parse() now calls terminate() on the keyword string
+ V1.2 25.09.91 Messages will only appear from parse() if NOISY is
+ #defined.
+ Added FPU support.
+ V1.3 28.05.92 ANSIed and autodoc'd
+ V1.4 08.12.92 Includes stdlib.h
+ V1.5 22.04.93 Various tidying to exact ANSI standard and of function
+ headers. Corrected some calls to free()
+ V1.6 16.06.93 Tidied for book
+ V1.7 01.03.94 Added mparse()
+ V1.8 11.03.94 Added internal support for lines starting with a $.
+ The line is passed as a system() call and parse()
+ acts as if the line had been a comment.
+ V1.9 08.10.99 Initialised some variables
+
+*************************************************************************/
+/* Includes
+*/
+#include <stdio.h>
+#include <string.h>
+#include <math.h>
+#include <stdlib.h>
+#include <ctype.h>
+
+#include "MathType.h"
+#include "SysDefs.h"
+#include "macros.h"
+#include "parse.h"
+#include "general.h"
+
+/************************************************************************/
+/* General defines for these routines
+*/
+#define LF 10
+#define CR 13
+#define DIC 34 /* Double inverted commas */
+
+/************************************************************************/
+/*>int parse(char *comline, int nkeys, KeyWd *keywords,
+ REAL *floatparam, char **strparam)
+ ----------------------------------------------------
+ Input: char *comline A command line string to parse
+ int nkeys Number of keywords
+ KeyWd *keywords Array of keyword structures
+ Output: REAL *floatparam Array of returned strings
+ char **strparam Array of pointers to returned strings
+ Returns: int Index of found command or error flag
+
+ Keyword-based command parser. Fixed number of parameters.
+
+ 11.07.90 Original By: ACRM
+ 22.04.93 Tidied comments, etc. Corrected NULL to 0.
+ 11.03.94 Added $ line handling
+ 08.10.99 Initialise nlett
+*/
+int parse(char *comline,
+ int nkeys,
+ KeyWd *keywords,
+ REAL *floatparam,
+ char **strparam)
+{
+ char *command;
+ int i,n,found,nletters,nlett = 0;
+
+ command = KillLeadSpaces(comline);
+ TERMINATE(command);
+
+ if(command[0] == '$')
+ {
+ system(command+1);
+ return(PARSE_COMMENT);
+ }
+
+ found = 0;
+ if((command[0]=='!') ||
+ (command[0]==LF) ||
+ (command[0]==CR) ||
+ (command[0]=='\0'))
+ return(PARSE_COMMENT);
+
+ for(i=0;i<nkeys;i++)
+ {
+ /* match() returns 1 if first string finishes first or exact match
+ 2 if second string finishes first
+ 0 if a mismatch
+ We only want to act in the first case
+ */
+ if((n=match(command,(keywords[i]).name,&nletters))==1)
+ {
+ if(found) /* If found already */
+ {
+ return(PARSE_ERRC);
+ }
+ found = i+1; /* +1, so keyword 0 will flag TRUE */
+ nlett = nletters;
+ }
+ }
+ if(!found)
+ {
+ return(PARSE_ERRC);
+ }
+ command+=nlett;
+ found--; /* Reset to point to the correct keyword */
+
+ /* Get data requirements for this keyword */
+ if((keywords[found]).string)
+ {
+ for(i=0; i<(keywords[found]).nparam; i++)
+ {
+ command = KillLeadSpaces(command);
+ if((nletters = GetString(command,strparam[i]))==0)
+ {
+ return(PARSE_ERRP);
+ }
+ command += nletters;
+ } /* End of for(i) */
+ }
+ else
+ {
+ /* A numeric or no parameter */
+ for(i=0; i<(keywords[found]).nparam; i++)
+ {
+ command = KillLeadSpaces(command);
+ if(!GetParam(command,&(floatparam[i]),&nletters))
+ {
+ return(PARSE_ERRP);
+ }
+ command += nletters;
+ } /* End of for(i) */
+ } /* End of else */
+ return(found);
+}
+
+/************************************************************************/
+/*>int match(char *comstring, char *string2, int *nletters)
+ --------------------------------------------------------
+ Input: char *comstring A character string
+ char *string2 A second string
+ Output: int *nletters Number of letters matched
+ Returns: int 0 String mismatch
+ 1 First string finished first
+ 2 Second string finished first
+
+ This routine matches two strings, but stops the comparison as soon
+ as a space or NULL is found in either string. The returned value
+ indicates which string finished first or 0 if the letters before the
+ space or NULL have a mismatch. The routine calls StringToUpper()
+ on `comstring' before the comparison.
+
+ 11.07.90 Original By: ACRM
+ 22.04.93 Tidied comments, etc. Added check on malloc and corrected
+ calls to free()
+*/
+int match(char *comstring,
+ char *string2,
+ int *nletters)
+{
+ int i;
+ char *string1;
+
+ TERMINATE(comstring);
+ TERMINATE(string2);
+ string1 = (char *)malloc((strlen(comstring) + 2) * sizeof(char));
+ if(string1 == NULL) return(0);
+
+ StringToUpper(comstring,string1);
+
+ for(i=0;;i++)
+ {
+ if((!string1[i])||(string1[i]==' '))
+ {
+ *nletters = i;
+ free(string1);
+ return(1);
+ }
+ if((!string2[i])||(string2[i]==' '))
+ {
+ *nletters = i;
+ free(string1);
+ return(2);
+ }
+ if(string1[i] != string2[i])
+ {
+ *nletters = i;
+ free(string1);
+ return(0);
+ }
+ }
+}
+
+/************************************************************************/
+/*>int GetString(char *command, char *strparam)
+ --------------------------------------------
+ Input: char *command A character string
+ Output: char *strparam Returned character string
+ Returns: int Number of characters pulled out
+ of the command string
+
+ This routine returns the first space-delimited group of characters
+ from character string `command'
+
+ 11.07.90 Original By: ACRM
+ 22.04.93 Tidied comments, etc. Changed toggle method
+*/
+int GetString(char *command,
+ char *strparam)
+{
+ int i,j,inv_commas;
+
+ inv_commas=0;
+ j=0;
+ for(i=0;;i++)
+ {
+ if(command[i]==DIC)
+ {
+ /* Toggle the inv_commas flag */
+ inv_commas = !inv_commas;
+
+ /* Don't copy anything */
+ continue;
+ }
+
+ /* Break out if we're at the end of a line */
+ if((command[i]==LF)
+ ||(command[i]==CR)
+ ||(command[i]=='\0')) break;
+
+ /* Also break out if we've a space and we're not between
+ inverted commas
+ */
+ if((command[i]==' ') && (!inv_commas)) break;
+
+ /* Other wise copy the character */
+ strparam[j++] = command[i];
+ }
+ strparam[j]='\0';
+ return(i);
+}
+
+/************************************************************************/
+/*>int GetParam(char *command, REAL *value, int *nletters)
+ -------------------------------------------------------
+ Input: char *command A character string
+ Output: REAL *value Returned float value
+ int *nletters Number of charcters pulled out
+ of the command string
+ Returns: int 0 If error
+ 1 If OK
+
+ This routine extracts the first space-delimited number from the
+ `command' character string.
+
+ 11.07.90 Original By: ACRM
+ 22.04.93 Tidied comments, etc. Corrected NULL to 0
+*/
+int GetParam(char *command,
+ REAL *value,
+ int *nletters)
+{
+ char buffer[50];
+ int retval;
+
+ if((*nletters = GetString(command,buffer))==0)
+ return(0);
+
+ retval = sscanf(buffer,"%lf",value);
+ return(retval);
+}
+
+/************************************************************************/
+/*>int mparse(char *comline, int nkeys, MKeyWd *keywords,
+ REAL *floatparam, char **strparam, int *nparam)
+ ----------------------------------------------------------
+ Input: char *comline A command line string to parse
+ int nkeys Number of keywords
+ MKeyWd *keywords Array of keyword structures
+ Output: REAL *floatparam Array of returned strings
+ char **strparam Array of pointers to returned strings
+ int *nparam Number of parameters found
+ Returns: int Index of found command or error flag
+
+ As parse(), but allows variable number of parameters to each keyword.
+
+ 23.02.94 Original based on parse() By: ACRM
+ 11.03.94 Added $ line handling
+ 08.10.99 Initialise nlett to 0
+*/
+int mparse(char *comline,
+ int nkeys,
+ MKeyWd *keywords,
+ REAL *floatparam,
+ char **strparam,
+ int *nparam)
+{
+ char *command;
+ int i,n,found,nletters,nlett=0;
+
+ command = KillLeadSpaces(comline);
+ TERMINATE(command);
+
+ if(command[0] == '$')
+ {
+ system(command+1);
+ return(PARSE_COMMENT);
+ }
+
+ found = 0;
+ if((command[0]=='!') ||
+ (command[0]==LF) ||
+ (command[0]==CR) ||
+ (command[0]=='\0'))
+ return(PARSE_COMMENT);
+
+ for(i=0;i<nkeys;i++)
+ {
+ /* match() returns 1 if first string finishes first or exact match
+ 2 if second string finishes first
+ 0 if a mismatch
+ We only want to act in the first case
+ */
+ if((n=match(command,(keywords[i]).name,&nletters))==1)
+ {
+ if(found) /* If found already */
+ {
+ return(PARSE_ERRC);
+ }
+ found = i+1; /* +1, so keyword 0 will flag TRUE */
+ nlett = nletters;
+ }
+ }
+
+ if(!found)
+ {
+ return(PARSE_ERRC);
+ }
+
+ command+=nlett;
+ found--; /* Reset to point to the correct keyword */
+
+ *nparam = 0; /* Zero the parameter count */
+
+ /* Get data requirements for this keyword */
+ if((keywords[found]).string)
+ {
+ for(i=0; i<(keywords[found]).maxparam; i++)
+ {
+ command = KillLeadSpaces(command);
+ if((nletters = GetString(command,strparam[i]))==0)
+ {
+ if(i < (keywords[found]).minparam)
+ return(PARSE_ERRP);
+ else
+ break;
+ }
+ else
+ {
+ (*nparam)++;
+ }
+ command += nletters;
+ } /* End of for(i) */
+ }
+ else
+ {
+ /* A numeric or no parameter */
+ for(i=0; i<(keywords[found]).maxparam; i++)
+ {
+ command = KillLeadSpaces(command);
+ if(!GetParam(command,&(floatparam[i]),&nletters))
+ {
+ if(i < (keywords[found]).minparam)
+ return(PARSE_ERRP);
+ else
+ break;
+ }
+ command += nletters;
+ (*nparam)++;
+ } /* End of for(i) */
+ } /* End of else */
+ return(found);
+}
+
diff --git a/src/bioplib/parse.h b/src/bioplib/parse.h
new file mode 100644
index 0000000..79f3722
--- /dev/null
+++ b/src/bioplib/parse.h
@@ -0,0 +1,116 @@
+/*************************************************************************
+
+ Program:
+ File: Parse.h
+
+ Version: V1.8R
+ Date: 11.03.94
+ Function: Include file for the command parser
+
+ Copyright: SciTech Software 1991-4
+ Author: Andrew C. R. Martin
+ Address: SciTech Software
+ 23, Stag Leys,
+ Ashtead,
+ Surrey,
+ KT21 2TD.
+ Phone: +44 (0) 1372 275775
+ EMail: martin at biochem.ucl.ac.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+************************************************************************
+
+ Description:
+ ============
+ Here are defined the MAKEKEY macro, STRING and NUMBER defines, the
+ KeyWd structure and return values for the parser.
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 11.07.90 Original
+ V1.1 08.12.92 Defines prototypes
+ V1.2 16.06.93 Added memory check to MAKEKEY
+ V1.3-1.6 Skipped
+ V1.7 01.03.94 Added mparse()
+ V1.8 11.03.94 Skipped
+
+*************************************************************************/
+#ifndef _PARSE_H
+#define _PARSE_H
+
+/************************************************************************/
+/* Includes
+*/
+#include "MathType.h"
+
+/************************************************************************/
+/* Defines
+*/
+
+#define STRING 1 /* Defines used for the MAKEKEY macro */
+#define NUMBER 0
+
+#define PARSE_ERRC -1 /* Error return values from parse() */
+#define PARSE_ERRP -2
+#define PARSE_COMMENT -3
+
+/************************************************************************/
+/* Type definitions
+*/
+typedef struct /* Used to store keywords for parse() */
+{
+ char *name;
+ int string, nparam;
+} KeyWd;
+
+typedef struct /* Used to store keywords for mparse() */
+{
+ char *name;
+ int string, minparam, maxparam;
+} MKeyWd;
+
+/************************************************************************/
+/* Macros
+*/
+/* Create a keyword for parse() */
+#define MAKEKEY(x,w,v,z) \
+ (x).name = (char *)malloc((strlen(w)+2) * sizeof(char)); \
+ if((x).name != NULL) strcpy((x).name,w); \
+ (x).string = v; \
+ (x).nparam = z
+
+/* Create a keyword for mparse() */
+#define MAKEMKEY(x,w,v,mn,mx) \
+ (x).name = (char *)malloc((strlen(w)+2) * sizeof(char)); \
+ if((x).name != NULL) strcpy((x).name,w); \
+ (x).string = v; \
+ (x).minparam = mn; \
+ (x).maxparam = mx
+
+/************************************************************************/
+/* Prototypes
+*/
+int parse(char *comline, int nkeys, KeyWd *keywords, REAL *REALparam,
+ char **strparam);
+int mparse(char *comline, int nkeys, MKeyWd *keywords, REAL *REALparam,
+ char **strparam, int *nparams);
+int match(char *comstring, char *string2, int *nletters);
+int GetString(char *command, char *strparam);
+int GetParam(char *command, REAL *value, int *nletters);
+
+#endif
diff --git a/src/bioplib/pdb.h b/src/bioplib/pdb.h
new file mode 100644
index 0000000..a31d639
--- /dev/null
+++ b/src/bioplib/pdb.h
@@ -0,0 +1,360 @@
+/*************************************************************************
+
+ Program:
+ File: pdb.h
+
+ Version: V1.42R
+ Date: 08.11.07
+ Function: Include file for pdb routines
+
+ Copyright: (c) SciTech Software, UCL, Reading 1993-2007
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 04.11.88 Original
+ V1.1 22.03.90 Added Secondary structure routines
+ V1.2 28.03.90 Corrected field widths for V1.2 of ReadPDB
+ V1.3 04.05.90 Added clear_pdb()
+ V1.4 19.06.90 Changed SEC structure to correct chain and ins widths
+ V1.5 19.07.90 Added INITINDEX macro
+ V1.6 09.09.91 Added define so won't screw up if included twice
+ V1.7 22.09.91 Altered character sizes for alignment
+ V1.8 10.06.93 Changed to use REAL rather than float. Changed
+ order within structure
+ V1.9 22.02.94 Added MAXSTDAA and MAXATINRES definitions
+ V1.10 01.03.94 Added stuff for ResolPDB. Removed INIT_INDEX().
+ Added DISULPHIDE definition.
+ Added HADDINFO definition.
+ V1.11 18.03.94 Added prototypes for ReadPDBOccRank() and
+ ReadPDBAtomsOccRank()
+ Added gPDBPartialOcc
+ V1.12 23.05.94 Added FindNextChainPDB() prototype
+ V1.13 24.08.94 Added OpenPGPFile() prototype. Added prototypes for
+ new version of FixPDB(). Added CTER styles
+ V1.14 03.10.94 Added FindCofGPDBRange(), FindCofGPDBSCRange(),
+ ReadPDBALL()
+ V1.15 05.10.94 Changed KillSidechain()
+ V1.16 11.01.94 Added StripHPDB()
+ V1.17 06.03.95 doReadPDB() is now defined here rather than static
+ V1.18 17.07.95 ParseResSpec() is now a BOOL
+ V1.19 24.07.95 Added FNam2PDB(), TermPDB()
+ V1.20 25.07.95 Added GetPDBChainLabels()
+ V1.21 08.08.95 Added FindResidueSpec() and FindNextResidue()
+ V1.22 12.10.95 Added DupePDB(), CopyPDBCoords(), CalcCellTrans(),
+ GetCrystPDB(), WriteCrystPDB()
+ V1.23 10.01.96 Added ExtractZonePDB()
+ V1.24 08.02.96 Added FindResidue()
+ V1.25 14.03.96 Added FitCaPDB(), FindAtomInRes()
+ V1.26 18.06.96 Added InPDBZone() and ZONE_MODE_*. Modified prototype
+ for FindZonePDB()
+ V1.27 23.07.96 Added AtomNameMatch() and LegalAtomSpec()
+ V1.28 12.08.96 Added RepOneSChain() and EndRepSChain()
+ V1.29 19.09.96 Added InPDBZoneSpec()
+ V1.30 14.10.96 Added ReadSeqresPDB();
+ V1.31 16.10.96 Added SelectCaPDB()
+ V1.32 18.08.98 Changed SEC to SECSTRUC 'cos of conflict in SunOS
+ Also defines SEC macro if not defined to warn you to
+ change your code!
+ V1.33 28.04.99 Added GetExptl()
+ V1.34 15.02.01 Added atnam_raw[] to PDB
+ Added WriteGromosPDB(), WriteGromosPDBRecord(),
+ AtomNameRawMatch()
+ V1.35 12.12.01 Added FitNCaCPDB()
+ V1.36 30.05.02 Changed PDB field from 'junk' to 'record_type'
+ Added the WholePDB routines and definition
+ V1.37 03.06.05 Added altpos to PDB.
+ Added altpos and atnam_raw to CLEAR_PDB
+ V1.38 22.09.05 Added WritePDBRecordAtnam()
+ V1.39 29.09.05 Added ParseResSpecNoUpper() and DoParseResSpec() By: TL
+ V1.40 04.01.06 Added AddCBtiGly(), AddCBtoAllGly(),
+ StripGlyCB() By: ACRM
+ V1.41 25.01.06 Added RemoveAlternates()
+ V1.42 08.11.07 Added BuildAtomNeighbourPDBList()
+ FindAtomWildcardInRes()
+ DupeResiduePDB()
+ V1.43 30.04.08 Added StripWatersPDB() and ISWATER() macro
+
+*************************************************************************/
+#ifndef _PDB_H
+#define _PDB_H
+
+#include <stdio.h>
+#include <string.h>
+
+#include "MathType.h"
+#include "SysDefs.h"
+#include "general.h"
+
+#define MAXSTDAA 21 /* Number of standard amino acids (w/ PCA)*/
+#define MAXATINRES 14 /* Max number of atoms in a standard aa */
+
+typedef struct pdb_entry
+{
+ REAL x,y,z,occ,bval;
+ struct pdb_entry *next;
+ int atnum;
+ int resnum;
+ char record_type[8];
+ char atnam[8];
+ char atnam_raw[8];
+ char resnam[8];
+ char insert[8];
+ char chain[8];
+ char altpos;
+} PDB;
+
+#define SELECT(x,w) (x) = (char *)malloc(5 * sizeof(char)); \
+ if((x) != NULL) strncpy((x),(w),5)
+
+typedef struct sec_entry
+{
+ struct sec_entry *next;
+ char chain1[8];
+ char ins1[8];
+ char chain2[8];
+ char ins2[8];
+ int res1;
+ int res2;
+ char type;
+} SECSTRUC;
+
+typedef struct _wholepdb
+{
+ PDB *pdb;
+ STRINGLIST *header;
+ STRINGLIST *trailer;
+ int natoms;
+} WHOLEPDB;
+
+/* This is designed to cause an error message which prints this line
+ It has been tested with gcc and Irix cc and does as required in
+ both cases
+*/
+#ifndef SEC
+# define SEC (The_type_SEC_is_now_called_SECSTRUC_You_must_change_your_code *)
+#endif
+
+typedef struct _disulphide
+{
+ struct _disulphide *next;
+ int res1,
+ res2;
+ char chain1[8],
+ chain2[8],
+ insert1[8],
+ insert2[8];
+} DISULPHIDE;
+
+typedef struct
+{
+ int Total, /* Total hydrogens */
+ T1, /* Type 1 C-H's */
+ T2, /* Type 2 C-H2's */
+ T3, /* Type 3 C-H3's */
+ T4, /* Type 4 sp2 C-H's,>N-H */
+ T5; /* Type 5 O-H's =N-H's */
+} HADDINFO;
+
+#define CLEAR_PDB(p) strcpy(p->record_type," "); \
+ p->atnum=0; \
+ strcpy(p->atnam," "); \
+ strcpy(p->atnam_raw," "); \
+ strcpy(p->resnam," "); \
+ p->resnum=0; \
+ strcpy(p->insert," "); \
+ strcpy(p->chain," "); \
+ p->x = 0.0; p->y = 0.0; p->z = 0.0; \
+ p->altpos = ' '; \
+ p->occ = 0.0; p->bval = 0.0; \
+ p->next = NULL
+
+#define ISWATER(z) (!strncmp((z)->resnam,"HOH",3) || \
+ !strncmp((z)->resnam,"OH2",3) || \
+ !strncmp((z)->resnam,"OHH",3) || \
+ !strncmp((z)->resnam,"DOD",3) || \
+ !strncmp((z)->resnam,"OD2",3) || \
+ !strncmp((z)->resnam,"ODD",3) || \
+ !strncmp((z)->resnam,"WAT",3))
+
+
+/* These are the types returned by ResolPDB() */
+#define STRUCTURE_TYPE_UNKNOWN 0
+#define STRUCTURE_TYPE_XTAL 1
+#define STRUCTURE_TYPE_NMR 2
+#define STRUCTURE_TYPE_MODEL 3
+#define STRUCTURE_TYPE_ELECTDIFF 4
+
+/* These are the styles used by FixCterPDB() */
+#define CTER_STYLE_STD 0
+#define CTER_STYLE_GROMOS 1
+#define CTER_STYLE_CHARMM 2
+
+/* Return flags from GetCrystPDB() */
+#define XTAL_DATA_CRYST 0x0001
+#define XTAL_DATA_ORIGX 0x0002
+#define XTAL_DATA_SCALE 0x0004
+
+/* Modes for FindZonePDB() */
+#define ZONE_MODE_RESNUM 0
+#define ZONE_MODE_SEQUENTIAL 1
+
+
+/************************************************************************/
+/* Globals
+*/
+#ifdef RSC_MAIN
+ char gRSCError[80];
+#else
+ extern char gRSCError[80];
+#endif
+
+#ifdef READPDB_MAIN
+ BOOL gPDBPartialOcc;
+ BOOL gPDBMultiNMR;
+#else
+ extern BOOL gPDBPartialOcc;
+ extern BOOL gPDBMultiNMR;
+#endif
+
+/************************************************************************/
+/* Prototypes
+*/
+PDB *ReadPDB(FILE *fp, int *natom);
+PDB *ReadPDBAll(FILE *fp, int *natom);
+PDB *ReadPDBAtoms(FILE *fp, int *natom);
+PDB *ReadPDBOccRank(FILE *fp, int *natom, int OccRank);
+PDB *ReadPDBAtomsOccRank(FILE *fp, int *natom, int OccRank);
+PDB *doReadPDB(FILE *fp, int *natom, BOOL AllAtoms, int OccRank,
+ int ModelNum);
+void WritePDB(FILE *fp, PDB *pdb);
+void WritePDBRecord(FILE *fp, PDB *pdb);
+void WritePDBRecordAtnam(FILE *fp, PDB *pdb);
+void WriteGromosPDB(FILE *fp, PDB *pdb);
+void WriteGromosPDBRecord(FILE *fp, PDB *pdb);
+void GetCofGPDB(PDB *pdb, VEC3F *cg);
+void GetCofGPDBRange(PDB *start, PDB *stop, VEC3F *cg);
+void GetCofGPDBSCRange(PDB *start, PDB *stop, VEC3F *cg);
+void OriginPDB(PDB *pdb);
+void RotatePDB(PDB *pdb, REAL rm[3][3]);
+void TranslatePDB(PDB *pdb, VEC3F tvect);
+BOOL FitPDB(PDB *ref_pdb, PDB *fit_pdb, REAL rm[3][3]);
+BOOL FitCaPDB(PDB *ref_pdb, PDB *fit_pdb, REAL rm[3][3]);
+BOOL FitNCaCPDB(PDB *ref_pdb, PDB *fit_pdb, REAL rm[3][3]);
+BOOL FitCaCbPDB(PDB *ref_pdb, PDB *fit_pdb, REAL rm[3][3]);
+REAL CalcRMSPDB(PDB *pdb1, PDB *pdb2);
+int GetPDBCoor(PDB *pdb, COOR **coor);
+BOOL FindZonePDB(PDB *pdb, int start, char startinsert, int stop, char stopinsert,
+ char chain, int mode, PDB **pdb_start, PDB **pdb_stop);
+int HAddPDB(FILE *fp, PDB *pdb);
+int ReadPGP(FILE *fp);
+FILE *OpenPGPFile(char *pgpfile, BOOL AllHyd);
+PDB *SelectAtomsPDB(PDB *pdbin, int nsel, char **sel, int *natom);
+PDB *StripHPDB(PDB *pdbin, int *natom);
+SECSTRUC *ReadSecPDB(FILE *fp, int *nsec);
+void RenumAtomsPDB(PDB *pdb);
+BOOL UnPackPDB(FILE *in, FILE *out);
+PDB *ReadPackedPDB(FILE *in, int *natom);
+BOOL PackPDB(FILE *in, FILE *out);
+void WritePackedResidue(FILE *out, PDB *start, PDB *end);
+PDB *FindEndPDB(PDB *start);
+PDB *FixOrderPDB(PDB *pdb, BOOL Pad, BOOL Renum);
+PDB *ShuffleResPDB(PDB *start, PDB *end, BOOL Pad);
+BOOL GetAtomTypes(char *resnam, char **AtomTypes);
+PDB *KillPDB(PDB *pdb, PDB *prev);
+void CopyPDB(PDB *out, PDB *in);
+BOOL MovePDB(PDB *move, PDB **from, PDB **to);
+PDB *AppendPDB(PDB *first, PDB *second);
+PDB *ShuffleBB(PDB *pdb);
+REAL CalcChi(PDB *pdb, int type);
+PDB *GetPDBByN(PDB *pdb, int n);
+void SetChi(PDB *pdb, PDB *next, REAL chi, int type);
+BOOL KillSidechain(PDB *ResStart, PDB *NextRes, BOOL doCB);
+void SetResnam(PDB *ResStart, PDB *NextRes, char *resnam, int resnum,
+ char *insert, char *chain);
+void ApplyMatrixPDB(PDB *pdb, REAL matrix[3][3]);
+BOOL GetResolPDB(FILE *fp, REAL *resolution, REAL *RFactor,
+ int *StrucType);
+BOOL GetExptl(FILE *fp, REAL *resolution, REAL *RFactor, REAL *FreeR,
+ int *StrucType);
+PDB **IndexPDB(PDB *pdb, int *natom);
+DISULPHIDE *ReadDisulphidesPDB(FILE *fp, BOOL *error);
+BOOL ParseResSpec(char *spec, char *chain, int *resnum, char *insert);
+BOOL ParseResSpecNoUpper(char *spec, char *chain, int *resnum, char *insert);
+BOOL DoParseResSpec(char *spec, char *chain, int *resnum, char *insert,
+ BOOL uppercaseresspec);
+BOOL RepSChain(PDB *pdb, char *sequence, char *ChiTable, char *RefCoords);
+PDB *FindNextChainPDB(PDB *pdb);
+BOOL FixCterPDB(PDB *pdb, int style);
+BOOL CalcCterCoords(PDB *p, PDB *ca_p, PDB *c_p, PDB *o_p);
+int CalcTetraHCoords(PDB *nter, COOR *coor);
+int AddNTerHs(PDB **ppdb, BOOL Charmm);
+char *FNam2PDB(char *filename);
+PDB *TermPDB(PDB *pdb, int length);
+char *GetPDBChainLabels(PDB *pdb);
+PDB *FindResidueSpec(PDB *pdb, char *resspec);
+PDB *FindNextResidue(PDB *pdb);
+PDB *DupePDB(PDB *in);
+BOOL CopyPDBCoords(PDB *out, PDB *in);
+void CalcCellTrans(VEC3F UnitCell, VEC3F CellAngles,
+ VEC3F *xtrans, VEC3F *ytrans, VEC3F *ztrans);
+int GetCrystPDB(FILE *fp, VEC3F *UnitCell, VEC3F *CellAngles,
+ char *spacegroup,
+ REAL OrigMatrix[3][4], REAL ScaleMatrix[3][4]);
+void WriteCrystPDB(FILE *fp, VEC3F UnitCell, VEC3F CellAngles,
+ char *spacegroup,
+ REAL OrigMatrix[3][4], REAL ScaleMatrix[3][4]);
+PDB *ExtractZonePDB(PDB *inpdb, char *chain1, int resnum1, char *insert1,
+ char *chain2, int resnum2, char *insert2);
+PDB *FindResidue(PDB *pdb, char chain, int resnum, char insert);
+PDB *FindAtomInRes(PDB *pdb, char *atnam);
+BOOL InPDBZone(PDB *p, char chain, int resnum1, char insert1,
+ int resnum2, char insert2);
+BOOL InPDBZoneSpec(PDB *p, char *resspec1, char *resspec2);
+BOOL AtomNameMatch(char *atnam, char *spec, BOOL *ErrorWarn);
+BOOL AtomNameRawMatch(char *atnam, char *spec, BOOL *ErrorWarn);
+BOOL LegalAtomSpec(char *spec);
+BOOL RepOneSChain(PDB *pdb, char *ResSpec, char aa, char *ChiTable,
+ char *RefCoords);
+void EndRepSChain(void);
+char **ReadSeqresPDB(FILE *fp, int *nchains);
+PDB *SelectCaPDB(PDB *pdb);
+char *FixAtomName(char *name, REAL occup);
+
+void FreeWholePDB(WHOLEPDB *wpdb);
+void WriteWholePDB(FILE *fp, WHOLEPDB *wpdb);
+void WriteWholePDBHeader(FILE *fp, WHOLEPDB *wpdb);
+void WriteWholePDBTrailer(FILE *fp, WHOLEPDB *wpdb);
+WHOLEPDB *ReadWholePDB(FILE *fpin);
+WHOLEPDB *ReadWholePDBAtoms(FILE *fpin);
+BOOL AddCBtoGly(PDB *pdb);
+BOOL AddCBtoAllGly(PDB *pdb);
+PDB *StripGlyCB(PDB *pdb);
+PDB *RemoveAlternates(PDB *pdb);
+PDB *BuildAtomNeighbourPDBList(PDB *pdb, PDB *pRes, REAL NeighbDist);
+PDB *FindAtomWildcardInRes(PDB *pdb, char *pattern);
+PDB *DupeResiduePDB(PDB *in);
+PDB *StripWatersPDB(PDB *pdbin, int *natom);
+#endif
diff --git a/src/bioplib/port.h b/src/bioplib/port.h
new file mode 100644
index 0000000..189c5bd
--- /dev/null
+++ b/src/bioplib/port.h
@@ -0,0 +1,99 @@
+/*************************************************************************
+
+ Program:
+ File: port.h
+
+ Version: V1.2
+ Date: 03.04.09
+ Function: Port-specific defines to allow us to use things like
+ popen() in a clean compile
+
+ Copyright: (c) SciTech Software 1988-2009
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+ This must be included before the system includes (i.e. stdio.h etc)
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 27.02.98 Original
+ V1.1 17.03.09 Added Mac OS X and Windows. By: CTP
+ V1.2 03.04.09 Added check for linux whether _POSIX_SOURCE already
+ defined and added further checks for MS_WINDOWS
+
+*************************************************************************/
+/***
+ *** The following are necessary for getting popen() to work cleanly
+ ***/
+
+/* Silicon graphics IRIX */
+#ifdef __sgi
+/* These are for Irix6 */
+# ifndef __EXTENSIONS__
+# define __EXTENSIONS__
+# endif
+
+# ifdef _POSIX_C_SOURCE
+# undef _POSIX_C_SOURCE
+# endif
+# define _POSIX_C_SOURCE 2
+
+/* These are for Irix5 */
+# ifdef _POSIX_SOURCE
+# undef _POSIX_SOURCE
+# endif
+#endif
+
+/* DEC OSF/1 (Alpha) */
+#ifdef __osf__
+# ifndef _XOPEN_SOURCE
+# define _XOPEN_SOURCE
+# endif
+# ifndef __STDC__
+# define __STDC__
+# endif
+#endif
+
+/* SunOS - doesn't need anything for SunOS 4.1.2 */
+#if defined(sun) && defined(sparc) && !defined(__svr4)
+#endif
+
+/* Linux */
+#ifdef linux
+# ifndef _POSIX_SOURCE
+# define _POSIX_SOURCE
+# endif
+#endif
+
+/* MacOS - Doesn't need anything. */
+#ifdef __APPLE__
+#endif
+
+/* Windows - Does not support unix pipes. */
+#if defined(__WIN32__) || defined(_WIN32) || defined(WIN32) || \
+ defined(__WIN64__) || defined(_WIN64) || defined(WIN64) || \
+ defined(__WIN95__) || defined(__NT__) || defined(__WINDOWS__) || \
+ defined(msdos) || defined(__msdos__)
+# define MS_WINDOWS 1
+# define NOPIPE
+#endif
diff --git a/src/bioplib/seq.h b/src/bioplib/seq.h
new file mode 100644
index 0000000..975701b
--- /dev/null
+++ b/src/bioplib/seq.h
@@ -0,0 +1,122 @@
+/*************************************************************************
+
+ Program:
+ File: seq.h
+
+ Version: V2.10R
+ Date: 27.02.07
+ Function: Header file for sequence handling
+
+ Copyright: (c) SciTech Software 1991-2007
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V2.0 11.03.94 Original V2 release
+ V2.1 11.05.94 Added DNAtoAA() & TrueSeqLen() prototypes
+ V2.2 13.05.93 Added KnownSeqLen() prototype
+ V2.3 28.02.95 Added ReadRawPIR()
+ V2.4 25.07.95 Added the gBioplibSeqNucleicAcid external for throne()
+ V2.5 11.07.96 Added CalcMDMScore()
+ V2.6 17.09.95 Added ZeroMDM()
+ V2.7 26.08.97 Added macro interfaces to new DoPDB2Seq()
+ V2.8 08.03.00 Added Numeric***() alignment routines
+ V2.9 02.10.00 Modified DoPDB2Seq()
+ V2.10 27.02.07 Added CalcMDMScoreUC() and affinealignuc()
+
+*************************************************************************/
+#ifndef _SEQ_H
+#define _SEQ_H
+
+#include "MathType.h"
+#include "SysDefs.h"
+#include "pdb.h"
+
+/************************************************************************/
+/* Defines and macros
+ */
+#define ALLOCSIZE 80 /* ReadPIR() uses this as a chunk size for
+ allocating memory
+ */
+
+typedef struct
+{
+ BOOL fragment,
+ paren,
+ DotInParen,
+ NonExpJoin,
+ UnknownPos,
+ Incomplete,
+ Truncated,
+ Juxtapose;
+ char code[16],
+ name[160],
+ source[160];
+} SEQINFO;
+
+extern BOOL gBioplibSeqNucleicAcid;
+
+#define PDB2Seq(x) DoPDB2Seq((x), FALSE, FALSE, FALSE)
+#define PDB2SeqX(x) DoPDB2Seq((x), TRUE, FALSE, FALSE)
+#define PDB2SeqNoX(x) DoPDB2Seq((x), FALSE, FALSE, TRUE)
+#define PDB2SeqXNoX(x) DoPDB2Seq((x), TRUE, FALSE, TRUE)
+
+#define PDBProt2Seq(x) DoPDB2Seq((x), FALSE, TRUE, FALSE)
+#define PDBProt2SeqX(x) DoPDB2Seq((x), TRUE, TRUE, FALSE)
+#define PDBProt2SeqNoX(x) DoPDB2Seq((x), FALSE, TRUE, TRUE)
+#define PDBProt2SeqXNoX(x) DoPDB2Seq((x), TRUE, TRUE, TRUE)
+
+char throne(char *three);
+char thronex(char *three);
+char *onethr(char one);
+char *DoPDB2Seq(PDB *pdb, BOOL DoAsxGlx, BOOL ProtOnly, BOOL NoX);
+int SplitSeq(char *LinearSeq, char **seqs);
+int ReadSimplePIR(FILE *fp, int maxres, char **seqs);
+int ReadPIR(FILE *fp, BOOL DoInsert, char **seqs, int maxchain,
+ SEQINFO *seqinfo, BOOL *punct, BOOL *error);
+int ReadRawPIR(FILE *fp, char **seqs, int maxchain, BOOL upcase,
+ SEQINFO *seqinfo, BOOL *error);
+int align(char *seq1, int length1, char *seq2, int length2,
+ BOOL verbose, BOOL identity, int penalty,
+ char *align1, char *align2, int *align_len);
+int affinealign(char *seq1, int length1, char *seq2, int length2,
+ BOOL verbose, BOOL identity, int penalty, int penext,
+ char *align1, char *align2, int *align_len);
+int CalcMDMScore(char resa, char resb);
+int affinealignuc(char *seq1, int length1, char *seq2, int length2,
+ BOOL verbose, BOOL identity, int penalty, int penext,
+ char *align1, char *align2, int *align_len);
+int CalcMDMScoreUC(char resa, char resb);
+BOOL ReadMDM(char *mdmfile);
+int ZeroMDM(void);
+char DNAtoAA(char *dna);
+int TrueSeqLen(char *sequence);
+int KnownSeqLen(char *sequence);
+BOOL NumericReadMDM(char *mdmfile);
+int NumericCalcMDMScore(int resa, int resb);
+int NumericAffineAlign(int *seq1, int length1, int *seq2, int length2,
+ BOOL verbose, BOOL identity, int penalty, int penext,
+ int *align1, int *align2, int *align_len);
+#endif
diff --git a/src/bioplib/throne.c b/src/bioplib/throne.c
new file mode 100644
index 0000000..34389ff
--- /dev/null
+++ b/src/bioplib/throne.c
@@ -0,0 +1,198 @@
+/*************************************************************************
+
+ Program:
+ File: throne.c
+
+ Version: V1.7
+ Date: 18.02.09
+ Function: Convert between 1 and 3 letter aa codes
+
+ Copyright: (c) SciTech Software 1993-2009
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.0 29.09.92 Original By: ACRM
+ V1.1 11.03.94 Added PCA, ASX and GLX to translation table.
+ PCA translates to E
+ Added routines to handle asx/glx
+ V1.2 25.07.95 handles nucleic acids
+ Sets the gBioplibSeqNucleicAcid flag if it's a
+ nucleic acid.
+ V1.3 08.03.07 Added PGA (Pyroglutamate) to translation table
+ (same as PCA: pyrrolidone carboxylic acid).
+ Note that it isn't clear whether this should translate
+ to Glu or Gln
+ V1.4 21.07.08 Added CGN (5-OXO-PYRROLIDINE-2-CARBALDEHYDE) which
+ is again the same as PCA
+ V1.5 19.12.08 Corrected NUMAAKNOWN - wasn't looking at U or X as
+ these were > NUMAAKNOWN!
+ V1.6 04.02.09 onethr() was not properly working from end of list
+ for nucleic acids
+ V1.7 18.02.09 Fixed for new PDB files which have " DT" etc for DNA
+ sequences
+
+*************************************************************************/
+/* Includes
+*/
+#include <string.h>
+#include "SysDefs.h"
+
+/************************************************************************/
+/* Defines and macros
+*/
+#define NUMAAKNOWN 37
+
+/************************************************************************/
+/* Globals
+*/
+
+/* N.B. The order in sTab1[] and sTab3[] must be the same and they must
+ end with X/UNK.
+ Also, nucleic acids must come *after* amino acids.
+*/
+/* Don't forget to fix NUMAAKNOWN if adding to this table! */
+static char sTab1[] = {'A','C','D','E','F',
+ 'G','H','I','K','L',
+ 'M','N','P','Q','R',
+ 'S','T','V','W','Y',
+ 'E','B','Z','E','E',
+ 'A','T','C','G','U','I',
+ 'A','T','C','G','I','X'
+ };
+/* Don't forget to fix NUMAAKNOWN if adding to this table! */
+static char sTab3[][8] = {"ALA ","CYS ","ASP ","GLU ","PHE ",
+ "GLY ","HIS ","ILE ","LYS ","LEU ",
+ "MET ","ASN ","PRO ","GLN ","ARG ",
+ "SER ","THR ","VAL ","TRP ","TYR ",
+ "PCA ","ASX ","GLX ","PGA ","CGN ",
+ " A "," T "," C "," G "," U "," I ",
+ " DA "," DT "," DC "," DG "," DI ","UNK "
+ };
+/* Don't forget to fix NUMAAKNOWN if adding to this table! */
+
+BOOL gBioplibSeqNucleicAcid = FALSE;
+
+/************************************************************************/
+/* Prototypes
+*/
+
+
+/************************************************************************/
+/*>char throne(char *three)
+ ------------------------
+ Input: char *three Three letter code
+ Returns: char One letter code
+
+ Converts 3-letter code to 1-letter code.
+ Handles ASX and GLX as X
+
+ 29.09.92 Original By: ACRM
+ 11.03.94 Modified to handle ASX and GLX in the tables
+ 25.07.95 Added handling of gBioplibSeqNucleicAcid
+*/
+char throne(char *three)
+{
+ int j;
+
+ if(three[0] == ' ' && three[1] == ' ')
+ gBioplibSeqNucleicAcid = TRUE;
+ else
+ gBioplibSeqNucleicAcid = FALSE;
+
+ if(three[2] == 'X')
+ return('X');
+
+ for(j=0;j<NUMAAKNOWN;j++)
+ if(!strncmp(sTab3[j],three,3)) return(sTab1[j]);
+
+ /* Only get here if the three letter code was not found */
+ return('X');
+}
+
+
+/************************************************************************/
+/*>char thronex(char *three)
+ -------------------------
+ Input: char *three Three letter code
+ Returns: char One letter code
+
+ Converts 3-letter code to 1-letter code.
+ Handles ASX and GLX as B and Z.
+
+ 29.09.92 Original By: ACRM
+ 25.07.95 Added handling of gBioplibSeqNucleicAcid
+*/
+char thronex(char *three)
+{
+ int j;
+
+ if(three[0] == ' ' && three[1] == ' ')
+ gBioplibSeqNucleicAcid = TRUE;
+ else
+ gBioplibSeqNucleicAcid = FALSE;
+
+ for(j=0;j<NUMAAKNOWN;j++)
+ if(!strncmp(sTab3[j],three,3)) return(sTab1[j]);
+
+ /* Only get here if the three letter code was not found */
+ return('X');
+}
+
+
+/************************************************************************/
+/*>char *onethr(char one)
+ ----------------------
+ Input: char one One letter code
+ Returns: char * Three letter code (padded to 4 chars with a
+ space)
+
+ Converts 1-letter code to 3-letter code (actually as 4 chars).
+
+ 07.06.93 Original By: ACRM
+ 25.07.95 If the gBioplibSeqNucleicAcid flag is set, assumes nucleic
+ acids rather than amino acids
+ 03.02.09 Fixed nucleic search - j was incrementing instead of
+ decrementing!
+*/
+char *onethr(char one)
+{
+ int j;
+
+ if(gBioplibSeqNucleicAcid) /* Work from end of table */
+ {
+ for(j=NUMAAKNOWN-1;j>=0;j--)
+ if(sTab1[j] == one) return(sTab3[j]);
+ }
+ else /* Work from start of table */
+ {
+ for(j=0;j<NUMAAKNOWN;j++)
+ if(sTab1[j] == one) return(sTab3[j]);
+ }
+
+ /* Only get here if the one letter code was not found */
+ return(sTab3[NUMAAKNOWN-1]);
+}
+
diff --git a/src/bioplib/upstrncmp.c b/src/bioplib/upstrncmp.c
new file mode 100644
index 0000000..0f76bda
--- /dev/null
+++ b/src/bioplib/upstrncmp.c
@@ -0,0 +1,109 @@
+/*************************************************************************
+
+ Program:
+ File: upstrncmp.c
+
+ Version: V1.20
+ Date: 18.09.96
+ Function:
+
+ Copyright: (c) SciTech Software 1991-6
+ Author: Dr. Andrew C. R. Martin
+ Phone: +44 (0) 1372 275775
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain, but it may be copied
+ according to the conditions laid out in the accompanying file
+ COPYING.DOC
+
+ The code may not be sold commercially or included as part of a
+ commercial product except as described in the file COPYING.DOC.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V1.1 08.02.91 Added KillLine()
+ V1.2 10.02.91 Added setextn() and index()
+ V1.3 20.03.91 Added Word()
+ V1.4 28.05.92 ANSIed
+ V1.5 22.06.92 Added tab check to Word(). Improved setextn().
+ Added WordN(). Documented other routines.
+ V1.6 27.07.93 Corrected fsscanf() for double precision
+ V1.7 07.10.93 Checks made on case before toupper()/tolower()
+ for SysV compatibility. Also index() becomes
+ chindex()
+ V1.8 18.03.94 getc() -> fgetc()
+ V1.9 11.05.94 Added GetFilestem(), upstrcmp(), upstrncmp() &
+ GetWord()
+ V1.10 24.08.94 Added OpenStdFiles()
+ V1.11 08.03.95 Corrected OpenFile() for non-UNIX
+ V1.12 09.03.95 Added check on non-NULL filename in OpenFile()
+ V1.13 17.07.95 Added countchar()
+ V1.14 18.10.95 Moved YorN() to WindIO.c
+ V1.15 06.11.95 Added StoreString(), InStringList() and FreeStringList()
+ V1.16 22.11.95 Moved ftostr() to generam.c
+ V1.17 15.12.95 Added QueryStrStr()
+ V1.18 18.12.95 OpenStdFiles() treats filename of - as stdin/stdout
+ V1.19 05.02.96 OpenStdFiles() allows NULL pointers instead if filenames
+ V1.20 18.09.96 Added padchar()
+
+*************************************************************************/
+/* Includes
+*/
+#include <ctype.h>
+
+/************************************************************************/
+/* Defines and macros
+*/
+
+/************************************************************************/
+/* Globals
+*/
+
+/************************************************************************/
+/* Prototypes
+*/
+
+/************************************************************************/
+/*>int upstrncmp(char *word1, char *word2, int ncomp)
+ --------------------------------------------------
+ Input: char *word1 First word
+ char *word2 Second word
+ int ncomp Number of characters to compare
+ Returns: int 0 if strings match or offset of first
+ mismatched character
+
+ Like strncmp(), but upcases each character before comparison
+
+ 20.04.94 Original By: ACRM
+*/
+int upstrncmp(char *word1, char *word2, int ncomp)
+{
+ int i;
+
+ for(i=0; i<ncomp; i++)
+ {
+ if(!word1[i] || !word2[i]) return(i+1);
+
+ if((islower(word1[i])?toupper(word1[i]):word1[i]) !=
+ (islower(word2[i])?toupper(word1[i]):word2[i]))
+ return(i+1);
+ }
+
+ return(0);
+}
+
+
diff --git a/src/fitting.c b/src/fitting.c
new file mode 100644
index 0000000..6361eb2
--- /dev/null
+++ b/src/fitting.c
@@ -0,0 +1,3359 @@
+/*************************************************************************
+
+ Program: ProFit
+ File: fitting.c
+
+ Version: V3.1
+ Date: 31.03.09
+ Function: Protein Fitting program.
+
+ Copyright: SciTech Software / UCL 1992-2009
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain.
+
+ It may not be copied or made available to third parties, but may be
+ freely used by non-profit-making organisations who have obtained it
+ directly from the author or by FTP.
+
+ You are requested to send EMail to the author to say that you are
+ using this code so that you may be informed of future updates.
+
+ The code may not be made available on other FTP sites without express
+ permission from the author.
+
+ The code may be modified as required, but any modifications must be
+ documented so that the person responsible can be identified. If
+ someone else breaks this code, the author doesn't want to be blamed
+ for code that does not work! You may not distribute any
+ modifications, but are encouraged to send them to the author so
+ that they may be incorporated into future versions of the code.
+
+ Such modifications become the property of Dr. Andrew C.R. Martin and
+ SciTech Software though their origin will be acknowledged.
+
+ The code may not be sold commercially or used for commercial purposes
+ without prior permission from the author.
+
+**************************************************************************
+
+ Description:
+ ============
+ These routines perform the actual fitting and RMS calculation.
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V0.1 25.09.92 Original
+ V0.5 08.10.93 Various tidying for Unix & chaned for booklib
+ V0.6 05.01.94 More tidying
+ V0.7 24.11.94 Skipped
+ V0.8 17.07.95 Changed screen() calls to printf()
+ Multiple chains now work correctly.
+ V1.0 18.07.95 Insert codes now work.
+ First official release (at last!).
+ V1.1 20.07.95 Added WEIGHT command support and translation vector
+ output from MATRIX command
+ V1.2 22.07.95 Skipped
+ V1.3 31.07.95 Skipped
+ V1.4 14.08.95 Fixed bug in CalcRMS() which was skipping the last
+ residue when printing RMS by res.
+ V1.5 21.08.95 Skipped
+ V1.6 20.11.95 Skipped
+ V1.6c 13.12.95 Extra info printed when zones mismatch
+ V1.6e 31.05.96 Added test on B-values
+ V1.6f 13.06.96 Added weight inverting for BWEIGHT command
+ V1.6g 18.06.96 Moved FindZone() to FindZonePDB() in Bioplib
+ V1.7 23.07.96 Supports atom wildcards. Some comment tidying.
+ V1.7a 07.11.96 Added -ve Bvalues to mean ignore < cutoff
+ V1.7b 11.11.96 Checks actual value of gUseBVal
+ V1.7c 18.11.96 Added option to ignore missing atoms
+ V1.7d 20.12.96 Added setting of gNFittedCoor
+ V1.7e 27.06.97 Allows WRITE and RESIDUE to output to a pipe
+ V1.7f 03.07.97 Added break into CreateFitArrays() to fix core dump
+ on bad multiple-occupancy PDB files
+ V1.8 07.05.98 Skipped for release
+ V2.0 01.03.01 Now supports multiple structure fitting and iterative
+ zone updating
+ V2.1 28.03.01 Parameter for ITERATE and added CENTRE command
+ V2.2 20.12.01 Skipped for release
+ V2.3 01.12.04 Skipped for release
+ V2.4 03.06.05 Skipped for release
+ V2.5 07.06.05 Skipped for release
+ V2.6 18.03.08 Added CentreOnZone() By: CTP
+ V2.6 27.10.08 Added NoFitStructures() and DoNoFit().
+ V3.0 06.11.08 Release Version
+ V3.0 16.02.09 Rewrote CalculateRotationMatrix().
+ V3.1 31.03.09 Skipped for release
+
+*************************************************************************/
+/* Includes
+*/
+#include "ProFit.h"
+
+
+
+
+/************************************************************************/
+/*>void FitStructures(void)
+ ------------------------
+ Fits the 2 structures using the currently defined ranges and displays
+ the RMSd.
+
+ 28.09.92 Framework
+ 29.09.92 Various subsidiary bits added, still doesn't actually do
+ fitting
+ 30.09.92 Added NOT option and calls to do actual fitting.
+ 17.07.95 Changed screen() to printf()
+ 18.07.95 Added initialisation of inserts in zones
+ 12.01.01 Moved ShowRMS() out of DoFitting() into here
+ 15.01.01 Added iteration of fitting zones
+ 01.02.01 Added multi-structure fitting and iteration of multiple
+ structures
+ 15.02.01 Was iterating over multiple structures even when there
+ were only 2.
+ 03.04.08 Added parameter to ShowRMS()
+ 10.06.08 disabled error check for multiple chains when using
+ iterative fitting. By: CTP
+ 29.08.08 added a final iteration for multi during which the coordinates
+ for the averaged reference structure are NOT updated.
+ 05.09.08 Final iteration for multiple structure fitting does not
+ update fitting zones when using iterative zone updating.
+*/
+void FitStructures(void)
+{
+ ZONE *z1,
+ *z2;
+ int atmnum,
+ NCoor,
+ strucnum,
+ niter;
+ REAL rmstot,
+ rmsprev = (-100.0),
+ deltaRMS;
+ BOOL final = FALSE;
+
+
+ gFitted = FALSE;
+
+ if(!gRefFilename[0])
+ {
+ printf(" Error==> Reference structure undefined.\n");
+ return;
+ }
+ if(!gMobFilename[0][0])
+ {
+ printf(" Error==> Mobile structure undefined.\n");
+ return;
+ }
+
+ /* 10.06.08 Error check for multiple chains disabled as this is now
+ supported By: CTP
+ */
+/***
+ if(gIterate)
+ {
+ // Check for numbers of chains
+ if(countchar(gRefSeq,'*') > 0)
+ {
+ printf(" Error==> Structures must have only one chain \
+for iterative zones\n");
+ return;
+ }
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ if(countchar(gMobSeq[strucnum],'*') > 0)
+ {
+ printf(" Error==> Structures must have only one \
+chain for iterative zones\n");
+ return;
+ }
+ }
+ }
+***/
+
+ if(!gQuiet)
+ {
+ printf(" Fitting structures...\n");
+ }
+
+ /* First copy the zones for display to match those for fitting */
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ if(gRZoneList[strucnum] != NULL)
+ {
+ FREELIST(gRZoneList[strucnum],ZONE);
+ gRZoneList[strucnum] = NULL;
+ }
+ for(z1=gZoneList[strucnum]; z1!=NULL; NEXT(z1))
+ {
+ /* Allocate an entry in RMS zone list */
+ if(gRZoneList[strucnum])
+ {
+ /* Move to end of zone list */
+ z2=gRZoneList[strucnum];
+ LAST(z2);
+ ALLOCNEXT(z2,ZONE);
+ }
+ else
+ {
+ INIT(gRZoneList[strucnum],ZONE);
+ z2 = gRZoneList[strucnum];
+ }
+
+ if(z2==NULL)
+ {
+ printf(" Error==> No memory for RMS zone!\n");
+ }
+ else
+ {
+ /* Add this zone to the RMS zone list */
+ z2->chain1 = z1->chain1;
+ z2->start1 = z1->start1;
+ z2->startinsert1 = z1->startinsert1;
+ z2->stop1 = z1->stop1;
+ z2->stopinsert1 = z1->stopinsert1;
+ z2->chain2 = z1->chain2;
+ z2->start2 = z1->start2;
+ z2->startinsert2 = z1->startinsert2;
+ z2->stop2 = z1->stop2;
+ z2->stopinsert2 = z1->stopinsert2;
+ z2->mode = z1->mode;
+ }
+ }
+ }
+
+ /* Now copy the atoms for RMS calculation */
+ gNOTRMSAtoms = gNOTFitAtoms;
+ for(atmnum=0; atmnum<NUMTYPES; atmnum++)
+ strcpy(gRMSAtoms[atmnum],gFitAtoms[atmnum]);
+
+ if(gMultiCount > 1)
+ {
+ /* Keep looping counting the iterations */
+ for(niter=0; ; niter++)
+ {
+ printf (" Multi-structure fit iteration %d\n", niter);
+
+ rmstot = (REAL)0.0;
+
+ /* Loop through the structures we are fitting */
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ /* Set up arrays for fitting */
+ if((NCoor=CreateFitArrays(strucnum))!=0)
+ {
+ /* Reset the convergence criterion */
+ CheckForConvergence(0, strucnum);
+
+ /* Perform the fit */
+ if(DoFitting(NCoor, strucnum))
+ {
+ while(gIterate && !final)
+ {
+ if((NCoor = UpdateFitArrays(strucnum))!=0)
+ {
+ if(!DoFitting(NCoor, strucnum))
+ return;
+ if(CheckForConvergence(NCoor, strucnum))
+ break;
+ }
+ else
+ {
+ break;
+ }
+ }
+
+ /* Find RMS - Do not update during final iteration */
+ rmstot += ShowRMS(FALSE,NULL,strucnum,!final,FALSE);
+ if(gIterate && !gQuiet)
+ {
+ printf(" (Over %d equivalenced CA-atoms)\n",
+ NCoor);
+ }
+ }
+ }
+ }
+ deltaRMS = (rmstot - rmsprev);
+ rmsprev = rmstot;
+
+ /* If we've converged or done too many iterations, do a final */
+ /* iteration then break out. */
+ if(final) break;
+
+ if((ABS(deltaRMS) < MULTI_ITER_STOP) ||
+ (niter > MAXMULTIITER))
+ final = TRUE;
+ }
+ }
+ else
+ {
+ /* Set up arrays for fitting */
+ if((NCoor=CreateFitArrays(FALSE))!=0)
+ {
+ /* Reset the convergence criterion */
+ CheckForConvergence(0,0);
+
+ /* Perform the fit */
+ DoFitting(NCoor, 0);
+
+ while(gIterate)
+ {
+ printf("Iterating fit zones\n");
+ if((NCoor = UpdateFitArrays(0))!=0)
+ {
+ DoFitting(NCoor, 0);
+ if(CheckForConvergence(NCoor, 0))
+ break;
+ }
+ else
+ {
+ break;
+ }
+ }
+
+ ShowRMS(FALSE,NULL,0,FALSE,FALSE);
+ if(gIterate && !gQuiet)
+ {
+ printf(" (Over %d equivalenced CA-atoms)\n",
+ NCoor);
+ }
+ }
+ }
+
+ return;
+}
+
+
+/************************************************************************/
+/*>void NoFitStructures(void)
+ --------------------------
+ Based on FitStructures(). Sets fitting zones but doesn't perform a fit
+ of the structures. Called by NOFIT. Used when wanting to perform
+ RMS,MATRIX,etc... without fitting first.
+
+ 27.10.08 Original based on FitSructures() By: CTP
+*/
+void NoFitStructures(void)
+{
+ ZONE *z1,
+ *z2;
+ int atmnum,
+ strucnum;
+
+ /* Set gFitted and gNFittedCoor */
+ gFitted = TRUE;
+ gNFittedCoor = 0;
+
+ if(!gRefFilename[0])
+ {
+ printf(" Error==> Reference structure undefined.\n");
+ return;
+ }
+ if(!gMobFilename[0][0])
+ {
+ printf(" Error==> Mobile structure undefined.\n");
+ return;
+ }
+
+ if(!gQuiet)
+ {
+ if(gMultiCount == 1)
+ printf(" Mobile structure marked as fitted...\n");
+ else
+ printf(" Mobile structures marked as fitted...\n");
+ }
+
+ /* First copy the zones for display to match those for fitting */
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ if(gRZoneList[strucnum] != NULL)
+ {
+ FREELIST(gRZoneList[strucnum],ZONE);
+ gRZoneList[strucnum] = NULL;
+ }
+ for(z1=gZoneList[strucnum]; z1!=NULL; NEXT(z1))
+ {
+ /* Allocate an entry in RMS zone list */
+ if(gRZoneList[strucnum])
+ {
+ /* Move to end of zone list */
+ z2=gRZoneList[strucnum];
+ LAST(z2);
+ ALLOCNEXT(z2,ZONE);
+ }
+ else
+ {
+ INIT(gRZoneList[strucnum],ZONE);
+ z2 = gRZoneList[strucnum];
+ }
+
+ if(z2==NULL)
+ {
+ printf(" Error==> No memory for RMS zone!\n");
+ }
+ else
+ {
+ /* Add this zone to the RMS zone list */
+ z2->chain1 = z1->chain1;
+ z2->start1 = z1->start1;
+ z2->startinsert1 = z1->startinsert1;
+ z2->stop1 = z1->stop1;
+ z2->stopinsert1 = z1->stopinsert1;
+ z2->chain2 = z1->chain2;
+ z2->start2 = z1->start2;
+ z2->startinsert2 = z1->startinsert2;
+ z2->stop2 = z1->stop2;
+ z2->stopinsert2 = z1->stopinsert2;
+ z2->mode = z1->mode;
+ }
+ }
+ }
+
+ /* Now copy the atoms for RMS calculation */
+ gNOTRMSAtoms = gNOTFitAtoms;
+ for(atmnum=0; atmnum<NUMTYPES; atmnum++)
+ strcpy(gRMSAtoms[atmnum],gFitAtoms[atmnum]);
+
+ /* Copy the mobile PDB linked list to the rotation list. */
+ if(gMultiCount > 1)
+ {
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ DoNoFitting(strucnum);
+ }
+ }
+ else
+ {
+ DoNoFitting(0);
+ }
+
+ return;
+}
+
+
+/************************************************************************/
+/*>BOOL DoFitting(int NCoor, int strucnum)
+ ---------------------------------------
+ Does the actual fitting of the coordinate arrays.
+
+ 30.09.92 Original
+ 01.10.92 Corrected rotation procedure (!)
+ 08.10.93 Modified for new version of matfit().
+ RotatePDB() -> ApplyMatrixPDB()
+ 17.07.95 Changed screen() to printf()
+ 19.07.95 Added parameter to ShowRMS()
+ 20.07.95 Added Weighted fitting
+ Separate COOR structure for CofG so we don't corrupt the
+ global version (allows printing of translation vector).
+ 25.07.95 Added another parameter to ShowRMS()
+ 13.06.96 Added B-value inverting for BWEIGHT command
+ 20.12.96 Added setting of gNFittedCoor
+ 12.01.01 gMobPDB[] and gFitPDB[] now arrays
+ Moved ShowRMS() up to FitStructures()
+ 15.01.01 Now returns success/failure
+ 01.02.01 Added strucnum parameter
+ 20.02.01 gMobCofG now an array
+ RotMat now local and a copy made into gRotMat
+ 13.08.08 Modified to Fit, Rotate & Refit. By: CTP
+*/
+BOOL DoFitting(int NCoor, int strucnum)
+{
+ PDB *p,
+ *q;
+ VEC3F CofG;
+ REAL RotMat[3][3];
+ int i, j;
+
+ gNFittedCoor = 0;
+
+ if(NCoor < 3)
+ {
+ printf(" Error==> Fewer than 3 points to fit\n");
+ return(FALSE);
+ }
+ else
+ {
+ /* If we are using inverse B-values, then invert the weights array*/
+ if(gDoWeights==WEIGHT_INVBVAL)
+ {
+ int i;
+ for(i=0; i<NCoor; i++)
+ gWeights[i] = (REAL)1.0 / gWeights[i];
+ }
+
+ matfit(gRefCoor,gMobCoor[strucnum],RotMat,NCoor,
+ ((gDoWeights!=WEIGHT_NONE)?gWeights:NULL),0);
+
+ /* If we used inverse B-values, invert the weights back again */
+ if(gDoWeights==WEIGHT_INVBVAL)
+ {
+ int i;
+ for(i=0; i<NCoor; i++)
+ gWeights[i] = (REAL)1.0 / gWeights[i];
+ }
+
+ /* Now copy the mobile PDB linked list to the rotation list */
+ if(gFitPDB[strucnum] != NULL) FREELIST(gFitPDB[strucnum], PDB);
+ gFitPDB[strucnum] = NULL;
+
+ for(p=gMobPDB[strucnum], q=NULL; p!=NULL; NEXT(p))
+ {
+ if(q==NULL)
+ {
+ INIT(gFitPDB[strucnum],PDB);
+ q = gFitPDB[strucnum];
+ }
+ else
+ {
+ ALLOCNEXT(q,PDB);
+ }
+
+ if(q==NULL)
+ {
+ printf(" Error==> No memory for creating fitted \
+structure.\n");
+ return(FALSE);
+ }
+
+ /* Copy the coordinates */
+ CopyPDB(q, p);
+ }
+
+ /* Now we can rotate the rotation list */
+ CofG.x = -1.0 * gMobCofG[strucnum].x;
+ CofG.y = -1.0 * gMobCofG[strucnum].y;
+ CofG.z = -1.0 * gMobCofG[strucnum].z;
+ TranslatePDB(gFitPDB[strucnum], CofG);
+
+ ApplyMatrixPDB(gFitPDB[strucnum], RotMat);
+
+ TranslatePDB(gFitPDB[strucnum], gRefCofG);
+
+ /* Record a copy of the rotation matrix for display with the MATRIX
+ command
+ */
+ for(i=0; i<3; i++)
+ {
+ for(j=0; j<3; j++)
+ {
+ gRotMat[strucnum][i][j] = RotMat[i][j];
+ }
+ }
+
+ gFitted = TRUE;
+ gNFittedCoor = NCoor;
+ }
+
+#ifdef ROTATE_REFIT
+ /* Rotate and Refit */
+ /* ================ */
+
+ /* This was an attempt yo get around the saddle point local minimum
+ problem in which fitting could convert 180degrees from the true
+ minumum. This is now fixed in fit.c
+ */
+ if(!gIterate)
+ {
+ /* Set Variables */
+ BOOL multivsref = gMultiVsRef;
+ REAL rmsd_a = 0.0;
+ REAL rmsd_b = 0.0;
+
+ REAL rotmat_repos[3][3],
+ rotmat_refit[3][3],
+ rotmat_final[3][3];
+
+ PDB *ReFitPDB = NULL,
+ *SwapPDB = NULL;
+
+ /* Set RMSD Calc vs Ref */
+ gMultiVsRef = TRUE;
+
+ /* Calculate RMSD -A- */
+ rmsd_a = CalcRMS(FALSE,NULL,strucnum,FALSE,FALSE);
+
+ /* Calculate Reposition Matrix */
+ MatMult33_33(gRotMat[strucnum],gRotMatTwist,rotmat_repos);
+
+ /* Rotate Coordinates to New Position */
+ ApplyMatrixCOOR(gMobCoor[strucnum], rotmat_repos, NCoor);
+
+ /* Re-Fit */
+ matfit(gRefCoor,gMobCoor[strucnum],rotmat_refit,NCoor,
+ ((gDoWeights!=WEIGHT_NONE)?gWeights:NULL),0);
+
+ /* Calculate Final Matrix */
+ MatMult33_33(rotmat_repos,rotmat_refit,rotmat_final);
+
+ /* Make New Fitted PDB */
+ for(p=gMobPDB[strucnum], q=NULL; p!=NULL; NEXT(p))
+ {
+ if(q==NULL)
+ {
+ INIT(ReFitPDB,PDB);
+ q = ReFitPDB;
+ }
+ else
+ {
+ ALLOCNEXT(q,PDB);
+ }
+
+ if(q==NULL)
+ {
+ printf(" Error==> No memory for creating fitted \
+structure.\n");
+ return(FALSE);
+ }
+
+ /* Copy the coordinates */
+ CopyPDB(q, p);
+ }
+
+ /* Now we can rotate the rotation list */
+ TranslatePDB(ReFitPDB, CofG);
+ ApplyMatrixPDB(ReFitPDB, rotmat_final);
+ TranslatePDB(ReFitPDB, gRefCofG);
+
+ /* Calculate RMSD -B- */
+ SwapPDB = gFitPDB[strucnum];
+ gFitPDB[strucnum] = ReFitPDB;
+ rmsd_b = CalcRMS(FALSE,NULL,strucnum,FALSE,FALSE);
+ gFitPDB[strucnum] = SwapPDB;
+
+ /* Reset RMSD Calc vs Ref */
+ gMultiVsRef = multivsref;
+
+ /* Select best result */
+ if(rmsd_a <= rmsd_b)
+ {
+ /* Free ReFitPDB */
+ if(ReFitPDB != NULL) FREELIST(ReFitPDB, PDB);
+ }
+ else
+ {
+ /* Free gFitPDB[strucnum] point to ReFitPDB */
+ if(gFitPDB[strucnum] != NULL) FREELIST(gFitPDB[strucnum], PDB);
+ gFitPDB[strucnum] = ReFitPDB;
+
+ /* Copy New Rotation Matrix */
+ for(i=0; i<3; i++)
+ {
+ for(j=0; j<3; j++)
+ {
+ gRotMat[strucnum][i][j] = rotmat_final[i][j];
+ }
+ }
+ }
+ }
+#endif /* ROTATE_REFIT */
+
+ gFitted = TRUE;
+ gNFittedCoor = NCoor;
+
+ return(TRUE);
+}
+
+
+/************************************************************************/
+/*>BOOL DoNoFitting(int strucnum)
+ ------------------------------
+ Sets gFitPDB if not already set. Sets rotation matrix to identity.
+
+ 27.10.08 Original based on DoFitting() By: CTP
+ 29.10.08 Tidied code.
+ 11.11.08 Simplified function - function no longer resets to starting
+ structures.
+*/
+BOOL DoNoFitting(int strucnum)
+{
+ PDB *p,
+ *q;
+ int i, j;
+
+ gNFittedCoor = 0;
+
+ /* Set rotation matrix to identity */
+ for(i=0; i<3; i++)
+ {
+ for(j=0; j<3; j++)
+ {
+ gRotMat[strucnum][i][j] = (REAL)0.0;
+ }
+ gRotMat[strucnum][i][i] = (REAL)1.0;
+ }
+
+ /* Reset centres of geometry */
+ gRefCofG.x = 0.0;
+ gRefCofG.y = 0.0;
+ gRefCofG.z = 0.0;
+ gMobCofG[strucnum].x = 0.0;
+ gMobCofG[strucnum].y = 0.0;
+ gMobCofG[strucnum].z = 0.0;
+
+ /* Return if gFitPDB exists */
+ if(gFitPDB[0] != NULL)
+ {
+ gFitted = TRUE;
+ return(TRUE);
+ }
+
+ /* Copy the mobile PDB linked list to the rotation list */
+ if(gFitPDB[strucnum] != NULL)
+ FREELIST(gFitPDB[strucnum], PDB);
+ gFitPDB[strucnum] = NULL;
+
+ for(p=gMobPDB[strucnum], q=NULL; p!=NULL; NEXT(p))
+ {
+ if(q==NULL)
+ {
+ INIT(gFitPDB[strucnum],PDB);
+ q = gFitPDB[strucnum];
+ }
+ else
+ {
+ ALLOCNEXT(q,PDB);
+ }
+
+ if(q==NULL)
+ {
+ printf(" Error==> ");
+ printf("No memory for creating structure.\n");
+ return(FALSE);
+ }
+
+ /* Copy the coordinates */
+ CopyPDB(q, p);
+ }
+
+ gFitted = TRUE;
+ return(TRUE);
+}
+
+
+/************************************************************************/
+/*>int ValidAtom(char *atnam, int mode)
+ ------------------------------------
+ Tests whether this atoms is in the appropriate list
+
+ 30.09.92 Original
+ 23.07.96 Calls AtomNameMatch() rather than strncmp; this handles
+ wildcards in atom names
+ 15.02.01 Calls AtomNameRawMatch() instead of AtomNameMatch()
+*/
+
+int ValidAtom(char *atnam, int mode)
+{
+ int DefReturn = FALSE,
+ j;
+ BOOL ErrorWarn;
+
+ if(mode == ATOM_FITTING)
+ {
+ /* First check for all atoms */
+ if(gFitAtoms[0][0] == '*') return(TRUE);
+
+ for(j=0;j<NUMTYPES;j++)
+ {
+ if(gFitAtoms[j][0] == '\0') break;
+
+ if(gNOTFitAtoms)
+ {
+ DefReturn = TRUE;
+ ErrorWarn = TRUE;
+ if(AtomNameRawMatch(atnam,gFitAtoms[j],&ErrorWarn))
+ return(FALSE);
+ }
+ else
+ {
+ ErrorWarn = TRUE;
+ if(AtomNameRawMatch(atnam,gFitAtoms[j],&ErrorWarn))
+ return(TRUE);
+ }
+ }
+ }
+
+ if(mode == ATOM_RMS)
+ {
+ /* First check for all atoms */
+ if(gRMSAtoms[0][0] == '*') return(TRUE);
+
+ for(j=0;j<NUMTYPES;j++)
+ {
+ if(gRMSAtoms[j][0] == '\0') break;
+
+ if(gNOTRMSAtoms)
+ {
+ DefReturn = TRUE;
+ ErrorWarn = TRUE;
+ if(AtomNameRawMatch(atnam,gRMSAtoms[j],&ErrorWarn))
+ return(FALSE);
+ }
+ else
+ {
+ ErrorWarn = TRUE;
+ if(AtomNameRawMatch(atnam,gRMSAtoms[j],&ErrorWarn))
+ return(TRUE);
+ }
+ }
+ }
+
+ return(DefReturn);
+}
+
+
+/************************************************************************/
+/*>REAL CalcRMS(BOOL ByRes, FILE *fp, int strucnum, BOOL UpdateReference,
+ BOOL ByAtm)
+ ----------------------------------------------------------------------
+ Calculates RMS over currently defined RMS zones and atoms and prints
+ it.
+
+ 30.09.92 Original
+ 01.10.92 Added check on NULL coordinates. Fix to finding mobile atoms.
+ 17.07.95 Changed screen() to printf()
+ 18.07.95 Removed zeroing of CoorCount which broke multi-zone fitting
+ as in CreateFitArrays()
+ Added initialisation of inserts in zones
+ Added calls to FormatZone()
+ 19.07.95 Added ByRes parameter and calculation. Now prints the RMS
+ and returns type void rather than the RMS
+ 25.07.95 Added fp parameter
+ 31.07.95 Added printing of number of residues if mismatch
+ 14.08.95 Was skipping the last residue when printing RMS by residue
+ 31.05.96 Added test on b-value
+ 18.06.96 Replaced MODE_* with ZONE_MODE_*
+ Replaced FindZone() with FindZonePDB()
+ 06.11.96 Negative BVal cutoff interpreted as > bval
+ 11.11.96 Checks actual value of gUseBVal
+ 18.11.96 Added check on gIgnoreMissing
+ 11.01.01 gFitPDB now an array
+ 15.01.01 Now returns the RMS. Checks fp is non-null before printing
+ Returns (-1.0) if error or doing by-residue RMSD.
+ 01.02.01 Added UpdateReference parameter
+ 20.02.01 -999 for start or end of structure rather than -1
+ 28.02.01 Fixed gRZoneList[0] to gRZoneList[strucnum]
+ 03.04.08 Modified to print distances between equivalenced atom pairs.
+ Added parameter ByAtm to control printing. By: CTP
+ 07.04.08 Added include pair based on atom distance.
+ 16.04.08 Atom pairs outside of cutoff are marked in output.
+ 17.04.08 Residues partially/fully outside distance cutoff are marked.
+ 29.07.08 Multi-structure fitting updated. Fitting is against averaged
+ reference but simple rms/distance calculations are against
+ the first mobile structure.
+ 12.09.08 Added weighted average option for updating reference.
+ 23.10.08 Added gWtAverage flag.
+ 07.11.08 Simple rms/distance calculations are against the mobile
+ structure indicated by gMultiRef.
+*/
+REAL CalcRMS(BOOL ByRes, FILE *fp, int strucnum, BOOL UpdateReference,
+ BOOL ByAtm)
+{
+ REAL SumSq = 0.0,
+ rms = 0.0;
+ PDB *refpdblist = NULL,
+ *ref_start = NULL,
+ *ref_stop = NULL,
+ *fit_start = NULL,
+ *fit_stop = NULL,
+ *prevp = NULL,
+ *prevq = NULL,
+ *p,
+ *q,
+ *r,
+ *m;
+ ZONE *z;
+ char ref_insert,
+ fit_insert;
+ int ref_resnum,
+ fit_resnum,
+ ref_nres,
+ fit_nres,
+ CoorCount = 0,
+ Found,
+ CoorOutside = 0;
+
+
+ /* If no zones have been specified, create a single all atoms zone */
+ if(gRZoneList[strucnum] == NULL)
+ {
+ INIT(gRZoneList[strucnum],ZONE);
+ gRZoneList[strucnum]->chain1 = ' ';
+ gRZoneList[strucnum]->start1 = -999;
+ gRZoneList[strucnum]->startinsert1 = ' ';
+ gRZoneList[strucnum]->stop1 = -999;
+ gRZoneList[strucnum]->stopinsert1 = ' ';
+ gRZoneList[strucnum]->chain2 = ' ';
+ gRZoneList[strucnum]->start2 = -999;
+ gRZoneList[strucnum]->startinsert2 = ' ';
+ gRZoneList[strucnum]->stop2 = -999;
+ gRZoneList[strucnum]->stopinsert2 = ' ';
+ gRZoneList[strucnum]->mode = gCurrentMode;
+ }
+
+ /* Step through each zone */
+ for(z=gRZoneList[strucnum]; z!=NULL; NEXT(z))
+ {
+ /* Set Reference Zone */
+ /* - Multi-structure fitting will fit to an averaged structure */
+ /* - Simple RMS or Distance calculations are against the mobile */
+ /* structure set as the reference (gMultiRef) or the averaged */
+ /* reference depending on gMultiVsRef. */
+/***
+ refpdblist = (gMultiCount > 1 && !UpdateReference && !gMultiVsRef)
+ ? gFitPDB[0] : gRefPDB;
+***/
+ refpdblist = (gMultiCount > 1 && !UpdateReference && !gMultiVsRef)
+ ? gFitPDB[gMultiRef] : gRefPDB;
+
+ /* Reference structure */
+ /*
+ if(!FindZonePDB(gRefPDB, z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1, z->chain1, z->mode,
+ &ref_start, &ref_stop))
+ */
+ if(!FindZonePDB(refpdblist, z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1, z->chain1, z->mode,
+ &ref_start, &ref_stop))
+ {
+ char zone1[64],
+ zone2[64];
+
+ /* Check ranges have been found */
+ printf(" Error==> Reference structure zone not found.\n");
+
+ FormatZone(zone1, z->chain1,
+ z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1);
+ FormatZone(zone2, z->chain2,
+ z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2);
+ printf(" %-16s with %-16s %s\n",
+ zone1, zone2,
+ ((z->mode == ZONE_MODE_RESNUM)?"(Residue numbering)"
+ :"(Sequential numbering)"));
+
+ return(-1.0);
+ }
+
+ /* Mobile structure */
+ if(!FindZonePDB(gFitPDB[strucnum], z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2, z->chain2, z->mode,
+ &fit_start, &fit_stop))
+ {
+ char zone1[64],
+ zone2[64];
+
+ /* Check ranges have been found */
+ printf(" Error==> Mobile structure zone not found.\n");
+
+ FormatZone(zone1, z->chain1,
+ z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1);
+ FormatZone(zone2, z->chain2,
+ z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2);
+ printf(" %-16s with %-16s %s\n",
+ zone1, zone2,
+ ((z->mode == ZONE_MODE_RESNUM)?"(Residue numbering)"
+ :"(Sequential numbering)"));
+
+ return(-1.0);
+ }
+
+
+ /* Check we have the same number of residues in each zone */
+ ref_nres = 1;
+ ref_resnum = ref_start->resnum;
+ ref_insert = ref_start->insert[0];
+ for(p=ref_start; p!=ref_stop; NEXT(p))
+ {
+ if(p->resnum != ref_resnum || p->insert[0] != ref_insert)
+ {
+ ref_nres++;
+ ref_resnum = p->resnum;
+ ref_insert = p->insert[0];
+ }
+ }
+
+ fit_nres = 1;
+ fit_resnum = fit_start->resnum;
+ fit_insert = fit_start->insert[0];
+ for(p=fit_start; p!=fit_stop; NEXT(p))
+ {
+ if(p->resnum != fit_resnum || p->insert[0] != fit_insert)
+ {
+ fit_nres++;
+ fit_resnum = p->resnum;
+ fit_insert = p->insert[0];
+ }
+ }
+
+ if(ref_nres != fit_nres)
+ {
+ char zone1[64],
+ zone2[64];
+
+ printf(" Error==> Number of residues in zone does not \
+match.\n");
+ /* Added 13.12.95 */
+ FormatZone(zone1, z->chain1,
+ z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1);
+
+ FormatZone(zone2, z->chain2,
+ z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2);
+
+ printf(" %-16s with %-16s %s\n",
+ zone1, zone2,
+ ((z->mode == ZONE_MODE_RESNUM)?"(Residue numbering)"
+ :"(Sequential numbering)"));
+
+ /* Added 31.07.95 */
+ printf(" Reference: %d, Mobile: %d\n",
+ ref_nres, fit_nres);
+
+
+ return(-1.0);
+ }
+
+ /* Insert the atoms from this zone into the coordinate arrays */
+/* Removed 17.07.95.....
+// CoorCount = 0;
+*/
+
+ ref_nres = 0;
+ ref_resnum = -999;
+ ref_insert = ' ';
+
+ for(p=ref_start; p!=ref_stop; NEXT(p))
+ {
+ /* Is this the start of a new reference residue? */
+ if(p->resnum != ref_resnum || p->insert[0] != ref_insert)
+ {
+ /* If we're displaying by residue and at least one residue
+ has been processed, display the RMS for that (previous)
+ residue. Reset other value to zero.
+ */
+ if(ByRes && ref_nres && CoorCount && fp!=NULL && !ByAtm)
+ {
+ char buff1[16],
+ buff2[16];
+
+ SumSq /= CoorCount;
+ rms = (REAL)sqrt((double)SumSq);
+
+ sprintf(buff1,"%c%d%c",
+ prevp->chain[0], prevp->resnum, prevp->insert[0]);
+ sprintf(buff2,"%c%d%c",
+ prevq->chain[0], prevq->resnum, prevq->insert[0]);
+
+ fprintf(fp,"%8s %s : %8s %s RMS: %.3f",
+ buff1,
+ prevp->resnam,
+ buff2,
+ prevq->resnam,
+ rms);
+
+ /* Flag Residues Outside Distance Cutoff By: CTP */
+ if(gUseDistCutoff && CoorOutside)
+ {
+ if(CoorOutside < CoorCount)
+ fprintf(fp," *\n"); /* Partially within cutoff */
+ else
+ fprintf(fp," **\n"); /* Outside cutoff */
+ }
+ else
+ fprintf(fp,"\n"); /* Inside cutoff */
+
+ /* Zero the sum of squares and the coordinate count */
+ SumSq = (REAL)0.0;
+ CoorCount = 0;
+ CoorOutside = 0;
+ }
+
+ ref_nres++;
+ ref_resnum = p->resnum;
+ ref_insert = p->insert[0];
+
+ fit_nres = 0;
+ fit_resnum = -999;
+ fit_insert = ' ';
+
+ for(q=fit_start; q!=fit_stop; NEXT(q))
+ {
+ if(q->resnum != fit_resnum || q->insert[0] != fit_insert)
+ {
+ /* Start of a new mobile residue */
+ fit_nres++;
+ fit_resnum = q->resnum;
+ fit_insert = q->insert[0];
+
+ /* Consider the equivalent residues */
+ if(ref_nres == fit_nres)
+ {
+ /* p points to the start of a residue in reference,
+ q points to the start of a residue in mobile
+
+ Step through reference set
+ */
+ for(r=p;
+ r!=NULL && r->resnum==ref_resnum
+ && r->insert[0]==ref_insert;
+ NEXT(r))
+ {
+ /* 15.02.01 Changed from atnam to atnam_raw */
+ if(ValidAtom(r->atnam_raw, ATOM_RMS))
+ {
+ if(r->x == 9999.0 &&
+ r->y == 9999.0 &&
+ r->z == 9999.0)
+ {
+ /* Ignore NULL atoms */
+ continue;
+ }
+
+ /* 31.05.96 Test the BVal
+ 06.11.96 Modified to handle -ve
+ specifications
+ 11.11.96 Checks actual value of gUseBVal
+ */
+ if(gUseBVal==1 || gUseBVal==2)
+ {
+ if(gBValue >= (REAL)0.0)
+ {
+ if(r->bval > gBValue)
+ continue;
+ }
+ else
+ {
+ if(-(r->bval) > gBValue)
+ continue;
+ }
+ }
+
+ /* Find this atom in the mobile set */
+ Found = FALSE;
+ for(m=q;
+ m!=NULL && m->resnum==fit_resnum
+ && m->insert[0]==fit_insert;
+ NEXT(m))
+ {
+ /* 28.02.01 Changed from atnam... */
+ if(!strcmp(r->atnam_raw,m->atnam_raw))
+ {
+ int pair_total = 1;
+ int i = 0;
+ Found = TRUE;
+
+ if(m->x == 9999.0 &&
+ m->y == 9999.0 &&
+ m->z == 9999.0)
+ {
+ /* Ignore NULL atoms */
+ continue;
+ }
+
+
+ /* 31.05.96 Test the BVal
+ 06.11.96 Modified to handle -ve
+ specifications
+ 11.11.96 Checks actual value of
+ gUseBVal
+ */
+ if(gUseBVal==1 || gUseBVal==3)
+ {
+ if(gBValue >= (REAL)0.0)
+ {
+ if(m->bval > gBValue)
+ continue;
+ }
+ else
+ {
+ if(-(m->bval) > gBValue)
+ continue;
+ }
+ }
+
+
+ /* Auto-match Symmetrical Atoms By: CTP*/
+ if(gMatchSymAtoms &&
+ r->next != NULL && m->next != NULL)
+ {
+ int i = 0;
+ for(i=0;i < SYMM_ATM_PAIRS ;i++)
+ {
+ if(gSymType[i][3][0] == TRUE &&
+ !strcmp(gSymType[i][0],
+ r->resnam) &&
+ !strcmp(gSymType[i][1],
+ r->atnam_raw) &&
+ !strcmp(gSymType[i][2],
+ r->next->atnam_raw) &&
+ !strcmp(gSymType[i][0],
+ m->resnam) &&
+ !strcmp(gSymType[i][1],
+ m->atnam_raw) &&
+ !strcmp(gSymType[i][2],
+ m->next->atnam_raw))
+ {
+ double sqdist_a = 0.0;
+ double sqdist_b = 0.0;
+
+ sqdist_a = PDBDISTSQ(r,m);
+ sqdist_a +=
+ PDBDISTSQ(r->next, m->next);
+
+ sqdist_b = PDBDISTSQ(r,m->next);
+ sqdist_b +=
+ PDBDISTSQ(r->next, m);
+
+ /* Set Number Pairs to Process*/
+ if(sqdist_a > sqdist_b)
+ pair_total = 2;
+
+ break;
+ }
+ }
+ }
+
+ /* Calculate RMSd By: CTP */
+ for(i = 0; i < pair_total; i++)
+ {
+
+ /* Set atom pair to process */
+ PDB *r_atm = NULL;
+ PDB *m_atm = NULL;
+
+ if(pair_total == 1)
+ {
+ /* Process single pair */
+ r_atm = r;
+ m_atm = m;
+ }
+ else if(pair_total == 2 && i == 0)
+ {
+ /* Process pair one of two */
+ r_atm = r;
+ m_atm = m->next;
+ }
+ else
+ {
+ /* Process pair two of two */
+ r_atm = r->next;
+ m_atm = m;
+ }
+
+ /* Output Atom Distances */
+ if(ByRes && ByAtm)
+ {
+ double dist = 0.0;
+ char res1[10];
+ char res2[10];
+
+ dist =
+ sqrt(PDBDISTSQ(r_atm,m_atm));
+
+ sprintf(res1,"%c%4d%c",
+ r_atm->chain[0],
+ r_atm->resnum,
+ r_atm->insert[0]);
+ sprintf(res2,"%c%4d%c",
+ m_atm->chain[0],
+ m_atm->resnum,
+ m_atm->insert[0]);
+
+ fprintf(fp,"%8s %4s %4s :",res1,
+ r_atm->resnam,
+ r_atm->atnam_raw);
+ fprintf(fp,"%8s %4s %4s ",res2,
+ m_atm->resnam,
+ m_atm->atnam_raw);
+ fprintf(fp,"Dist: %.3f",dist);
+
+ /* Flag pairs outside cutoff */
+ if(gUseDistCutoff &&
+ dist > gDistCutoff )
+ fprintf(fp," *");
+
+ fprintf(fp,"\n");
+ }
+
+ /* Include pair based on atom
+ distance
+ */
+ /* If displaying RMSd by residue then
+ count coordinates outside cutoff.
+ */
+ if((gUseDistCutoff) &&
+ (PDBDISTSQ(r_atm,m_atm) >
+ (gDistCutoff * gDistCutoff)))
+ {
+ if(ByRes)
+ {
+ CoorOutside++;
+ }
+ else
+ {
+ continue;
+ }
+ }
+
+ /* Sum Squares */
+ SumSq += PDBDISTSQ(r_atm,m_atm);
+ CoorCount ++;
+
+
+ /* If we are averaging the coordinates
+ for multi-structure fitting, then
+ update the reference atom
+ coordinates with the mean of that
+ and the mobile
+ */
+
+ /* Mean of Ref + Mobile Coordinates */
+ if(UpdateReference && !gWtAverage)
+ {
+ r_atm->x = (r_atm->x+m_atm->x)/2.0;
+ r_atm->y = (r_atm->y+m_atm->y)/2.0;
+ r_atm->z = (r_atm->z+m_atm->z)/2.0;
+ }
+
+ /* Weighted Mean of Ref + Mobile
+ Coords
+ */
+ if(UpdateReference && gWtAverage)
+ {
+ REAL MultiCount = gMultiCount;
+
+ r_atm->x = ((MultiCount - 1) *
+ r_atm->x
+ + m_atm->x)/MultiCount;
+ r_atm->y = ((MultiCount - 1) *
+ r_atm->y
+ + m_atm->y)/MultiCount;
+ r_atm->z = ((MultiCount - 1) *
+ r_atm->z
+ + m_atm->z)/MultiCount;
+ }
+
+ }
+
+ /* Set Pointers */
+ if(pair_total == 2)
+ {
+ r = r->next;
+ m = m->next;
+ }
+ prevq = m;
+
+ }
+ }
+
+ if(!Found)
+ {
+ if(!gIgnoreMissing)
+ {
+ printf(" Error==> Atoms do not match \
+in residue:\n");
+ printf(" Reference %4s %5d%c \
+Mobile %4s %5d%c\n",
+ p->resnam,p->resnum,p->insert[0],
+ q->resnam,q->resnum,q->insert[0]);
+ /* 28.02.01 Changed from r->atnam */
+ printf(" Unable to find reference \
+atom %4s in mobile.\n", r->atnam_raw);
+ return(-1.0);
+ }
+ }
+ }
+ } /* End of loop through reference residue */
+ break;
+ } /* End of if() equivalent residues */
+ } /* End of start-of-new-mobile-residue */
+ } /* End of loop through mobile set */
+ } /* End of start-of-new-reference-residue */
+ prevp = p;
+ } /* End if loop through reference set */
+
+ /* For the last residue, if we're displaying by residue and we have
+ some atoms, display the RMS for this residue.
+ */
+ if(ByRes && ref_nres && CoorCount && fp!=NULL && !ByAtm)
+ {
+ char buff1[16],
+ buff2[16];
+
+ SumSq /= CoorCount;
+ rms = (REAL)sqrt((double)SumSq);
+
+ sprintf(buff1,"%c%d%c",
+ prevp->chain[0], prevp->resnum, prevp->insert[0]);
+ sprintf(buff2,"%c%d%c",
+ prevq->chain[0], prevq->resnum, prevq->insert[0]);
+
+ fprintf(fp,"%8s %s : %8s %s RMS: %.3f",
+ buff1,
+ prevp->resnam,
+ buff2,
+ prevq->resnam,
+ rms);
+
+ /* Flag Residues Outside Distance Cutoff By: CTP */
+ if(gUseDistCutoff && CoorOutside)
+ {
+ if(CoorOutside < CoorCount)
+ fprintf(fp," *\n"); /* Partially within cutoff */
+ else
+ fprintf(fp," **\n"); /* Outside cutoff */
+ }
+ else
+ {
+ fprintf(fp,"\n"); /* Inside cutoff */
+ }
+
+ /* Zero the sum of squares and the coordinate count */
+ SumSq = (REAL)0.0;
+ CoorCount = 0;
+ CoorOutside = 0;
+ }
+ } /* End of loop through zones */
+
+ if(!ByRes && CoorCount == 0)
+ printf(" Error==> No atoms in specified zones\n");
+
+ /* Calculate the RMS */
+#ifdef DEBUG
+ fprintf(stderr,"\nCalculating RMS on %d atoms\n",CoorCount);
+ fprintf(stderr,"Sum of squares = %8.3f\n",SumSq);
+#endif
+
+ /* Output legend if using distance cutoff By: CTP */
+ if(gUseDistCutoff && ByRes)
+ {
+ if(ByAtm)
+ {
+ if(fp!=NULL)
+ fprintf(fp," %35s * Outside distance cutoff\n","");
+ }
+ else
+ {
+ if(fp!=NULL)
+ {
+ fprintf(fp," %28s * Partially outside distance cutoff.\n",
+ "");
+ fprintf(fp," %28s ** Fully outside distance cutoff\n",
+ "");
+ }
+ }
+ }
+
+ /* Output RMSd By: CTP */
+ if(!ByRes)
+ {
+ SumSq /= CoorCount;
+ rms = (REAL)sqrt((double)SumSq);
+
+ if(fp!=NULL)
+ fprintf(fp," RMS: %.3f\n",rms);
+
+ return(rms);
+ }
+
+ return(-1.0);
+}
+
+
+/************************************************************************/
+/*>void ShowNFitted(void)
+ ----------------------
+ Displays the number of equivalent atom pairs used in the last fitting.
+
+ 20.12.96 Original By: ACRM
+*/
+void ShowNFitted(void)
+{
+ if(gFitted && gNFittedCoor)
+ {
+ printf(" Number of fitted atoms: %d\n",gNFittedCoor);
+ }
+ else
+ {
+ printf(" Warning==> Structures have not yet been fitted.\n");
+ }
+}
+
+
+/************************************************************************/
+/*>REAL ShowRMS(BOOL ByRes, char *filename, int strucnum,
+ BOOL UpdateReference, BOOL ByAtm)
+ ------------------------------------------------------
+ Display the RMS over the currently defined zones.
+
+ 29.09.92 Framework
+ 30.09.92 Original
+ 17.07.95 Changed screen() to printf()
+ Handles inserts
+ 19.07.95 Added ByRes parameter passed to CalcRMS()
+ RMS is now printed by CalcRMS()
+ 25.07.95 Added filename parameter
+ Opens file if specified and passes FILE pointer to CalcRMS()
+ 27.06.97 Changed call to fopen() to OpenOrPipe
+ 01.02.01 Added strucnum and UpdateReference parameters
+ Now returns the RMSD
+ 20.02.01 -999 for start or end of structure rather than -1
+ 03.04.08 Added parameter to ShowRMS() The parameter, ByAtm, turns on
+ printing of atom distances by CalcRMS(). By: CTP
+*/
+REAL ShowRMS(BOOL ByRes, char *filename, int strucnum,
+ BOOL UpdateReference, BOOL ByAtm)
+{
+ ZONE *z1, *z2;
+ int atmnum;
+ FILE *fp = stdout;
+ REAL rmsd = (REAL)(-1.0);
+
+
+ if(gFitted)
+ {
+ if(filename)
+ {
+ if((fp=OpenOrPipe(filename))==NULL)
+ {
+ printf(" Warning==> unable to open file for by-residue \
+RMS\n");
+ fp = stdout;
+ }
+ }
+
+ /* Copy zones if user hasn't specified otherwise */
+ if(!gUserRMSZone)
+ {
+ /* Free the current zone list */
+ if(gRZoneList[strucnum] != NULL)
+ {
+ FREELIST(gRZoneList[strucnum],ZONE);
+ gRZoneList[strucnum] = NULL;
+ }
+
+ /* Create or copy a new one */
+ if(gZoneList[strucnum] == NULL)
+ {
+ INIT(gRZoneList[strucnum],ZONE);
+ gRZoneList[strucnum]->chain1 = ' ';
+ gRZoneList[strucnum]->start1 = -999;
+ gRZoneList[strucnum]->startinsert1 = ' ';
+ gRZoneList[strucnum]->stop1 = -999;
+ gRZoneList[strucnum]->stopinsert1 = ' ';
+ gRZoneList[strucnum]->chain2 = ' ';
+ gRZoneList[strucnum]->start2 = -999;
+ gRZoneList[strucnum]->startinsert2 = ' ';
+ gRZoneList[strucnum]->stop2 = -999;
+ gRZoneList[strucnum]->stopinsert2 = ' ';
+ gRZoneList[strucnum]->mode = gCurrentMode;
+ }
+ else
+ {
+ for(z1=gZoneList[strucnum]; z1!=NULL; NEXT(z1))
+ {
+ /* Allocate an entry in RMS zone list */
+ if(gRZoneList[strucnum])
+ {
+ /* Move to end of zone list */
+ z2=gRZoneList[strucnum];
+ LAST(z2);
+ ALLOCNEXT(z2,ZONE);
+ }
+ else
+ {
+ INIT(gRZoneList[strucnum],ZONE);
+ z2 = gRZoneList[strucnum];
+ }
+
+ if(z2==NULL)
+ {
+ printf(" Error==> No memory for RMS zone!\n");
+ }
+ else
+ {
+ /* Add this zone to the RMS zone list */
+ z2->chain1 = z1->chain1;
+ z2->start1 = z1->start1;
+ z2->startinsert1 = z1->startinsert1;
+ z2->stop1 = z1->stop1;
+ z2->stopinsert1 = z1->stopinsert1;
+ z2->chain2 = z1->chain2;
+ z2->start2 = z1->start2;
+ z2->startinsert2 = z1->startinsert2;
+ z2->stop2 = z1->stop2;
+ z2->stopinsert2 = z1->stopinsert2;
+ z2->mode = z1->mode;
+ }
+ }
+ }
+ }
+
+ if(!gUserRMSAtoms)
+ {
+ /* Copy the atoms for RMS calculation if user hasn't specified */
+ for(atmnum=0; atmnum<NUMTYPES; atmnum++)
+ strcpy(gRMSAtoms[atmnum],gFitAtoms[atmnum]);
+ }
+
+ rmsd = CalcRMS(ByRes,fp, strucnum, UpdateReference, ByAtm);
+
+ if(fp != stdout)
+ CloseOrPipe(fp);
+ }
+ else
+ {
+ printf(" Warning==> Structures have not yet been fitted.\n");
+ }
+
+ return(rmsd);
+}
+
+
+/************************************************************************/
+/*>int CheckForConvergence(int NCoor, int strucnum)
+ ------------------------------------------------
+ Checks whether the RMSD has converged or we have done too many
+ iterations.
+
+ 15.01.01 Original By: ACRM
+ 01.02.01 Added strucnum parameter
+ 03.04.08 Added parameter to CalcRMS() By: CTP
+*/
+int CheckForConvergence(int NCoor, int strucnum)
+{
+ static REAL lastRMS = (REAL)(-1.0);
+ static int niter = 0;
+ REAL rms;
+
+ if(NCoor == 0)
+ {
+ lastRMS = (REAL)(-1.0);
+ niter = 0;
+ }
+ else
+ {
+ rms = CalcRMS(FALSE, NULL, strucnum, FALSE, FALSE);
+ if(lastRMS >= (REAL)(0.0))
+ {
+ REAL deltaRMS = (rms - lastRMS);
+ if((ABS(deltaRMS) < ITER_STOP) ||
+ (niter > MAXITER))
+ return(TRUE);
+ }
+
+ lastRMS = rms;
+ niter++;
+ }
+
+ return(FALSE);
+}
+
+
+/************************************************************************/
+/*>int UpdateFitArrays(int strucnum)
+ ---------------------------------
+ Update the fit arrays for iterative fitting by updating the equivalence
+ list using DP
+
+ 15.01.01 Original By: ACRM
+*/
+int UpdateFitArrays(int strucnum)
+{
+ int length1,
+ length2,
+ align_len,
+ NCoor;
+ char *ref_align = NULL,
+ *mob_align = NULL;
+ REAL score;
+ PDB *RefCaPDB = NULL,
+ *MobCaPDB = NULL,
+ **RefIndex = NULL,
+ **MobIndex = NULL;
+ char *sel[2];
+
+ if(!gQuiet)
+ {
+ printf(" Updating Fitting Zones...\n");
+ }
+
+ /* Extract the CA atoms and index them so they can be accessed by
+ offset number
+ */
+ SELECT(sel[0],"CA ");
+ RefCaPDB = SelectAtomsPDB(gRefPDB, 1, sel, &length1);
+ MobCaPDB = SelectAtomsPDB(gFitPDB[strucnum], 1, sel, &length2);
+ RefIndex = IndexPDB(RefCaPDB, &length1);
+ MobIndex = IndexPDB(MobCaPDB, &length2);
+
+ /* Allocate memory for alignment sequences */
+ if((ref_align = (char *)malloc((length1+length2)*sizeof(char)))==
+ NULL)
+ {
+ printf(" Warning==> No memory for alignment!\n");
+ return(0);
+ }
+ if((mob_align = (char *)malloc((length1+length2)*sizeof(char)))==
+ NULL)
+ {
+ printf(" Warning==> No memory for alignment!\n");
+ free(ref_align);
+ return(0);
+ }
+
+ /* Perform the alignment */
+ score = AlignOnCADistances(RefIndex, length1,
+ MobIndex, length2,
+ ref_align, mob_align, &align_len);
+ if(score <= (REAL)0.0)
+ {
+ printf(" Error==> Unable to perform alignment!\n");
+ return(0);
+ }
+
+ /* Clear any current fitting zones */
+ SetFitZone("CLEAR", strucnum);
+
+ /* Now set zones based on alignment */
+ SetNWZones(ref_align, mob_align, align_len, RefIndex, MobIndex,
+ strucnum);
+
+ /* Create the fitting arrays from the zones */
+ NCoor = CreateFitArrays(strucnum);
+
+ /* Free allocated memory */
+ free(ref_align);
+ free(mob_align);
+ FREELIST(RefCaPDB, PDB);
+ FREELIST(MobCaPDB, PDB);
+ free(RefIndex);
+ free(MobIndex);
+
+ return(NCoor);
+}
+
+/************************************************************************/
+/*>REAL Distance(PDB *p, PDB *q)
+ -----------------------------
+ Calculate a distance-based score for AlignOnCADistances() used to
+ update the equivalence list. Score returned is 1/distance (with
+ a minimum allowed distance of 0.0001
+
+ 15.01.01 Original By: ACRM
+*/
+#define TINY 0.0001
+REAL Distance(PDB *p, PDB *q)
+{
+ REAL dist;
+
+ dist = DIST(p, q);
+
+ if(dist < TINY)
+ dist = TINY;
+
+ return((REAL)1.0/dist);
+}
+
+/************************************************************************/
+/*>REAL AlignOnCADistances(PDB **RefIndex, int length1,
+ PDB **MobIndex, int length2,
+ char *align1, char *align2, int *align_len)
+ -------------------------------------------------------------------
+ Performs a DP alignment to find the updated equivalences on the basis
+ of selecting closest distances
+
+ 15.01.01 Original By: ACRM
+*/
+REAL AlignOnCADistances(PDB **RefIndex, int length1,
+ PDB **MobIndex, int length2,
+ char *align1, char *align2, int *align_len)
+{
+ XY **dirn = NULL;
+ int maxdim,
+ i, j, k, l,
+ i1, j1,
+ rcell, dcell;
+ REAL **matrix = NULL,
+ thisscore,
+ gapext,
+ score, maxoff,
+ dia, right, down;
+
+ /* gap penalties are set to zero - we don't care how many gaps we
+ introduce
+ */
+ REAL penalty = 0.0,
+ penext = 0.0;
+
+ maxdim = MAX(length1, length2);
+
+ /* Initialise the score matrix */
+ if((matrix = (REAL **)Array2D(sizeof(REAL), maxdim, maxdim))==NULL)
+ return(0);
+ if((dirn = (XY **)Array2D(sizeof(XY), maxdim, maxdim))==NULL)
+ return(0);
+
+ for(i=0;i<maxdim;i++)
+ {
+ for(j=0;j<maxdim;j++)
+ {
+ matrix[i][j] = (REAL)0.0;
+ dirn[i][j].x = -1;
+ dirn[i][j].y = -1;
+ }
+ }
+
+ /* Fill in scores up the right hand side of the matrix */
+ for(j=0; j<length2; j++)
+ {
+ REAL dist;
+ dist = Distance(RefIndex[length1-1], MobIndex[j]);
+ matrix[length1-1][j] = dist;
+ }
+
+ /* Fill in scores along the bottom row of the matrix */
+ for(i=0; i<length1; i++)
+ {
+ matrix[i][length2-1] = Distance(RefIndex[i], MobIndex[length2-1]);
+ }
+
+ i = length1 - 1;
+ j = length2 - 1;
+
+ /* Move back along the diagonal */
+ while(i > 0 && j > 0)
+ {
+ i--;
+ j--;
+
+ /* Fill in the scores along this row */
+ for(i1 = i; i1 > -1; i1--)
+ {
+ dia = matrix[i1+1][j+1];
+
+ /* Find highest score to right of diagonal */
+ rcell = i1+2;
+ if(i1+2 >= length1) right = 0;
+ else right = matrix[i1+2][j+1] - penalty;
+
+ gapext = 1;
+ for(k = i1+3; k<length1; k++, gapext++)
+ {
+ thisscore = matrix[k][j+1] - (penalty + gapext*penext);
+
+ if(thisscore > right)
+ {
+ right = thisscore;
+ rcell = k;
+ }
+ }
+
+ /* Find highest score below diagonal */
+ dcell = j+2;
+ if(j+2 >= length2) down = 0;
+ else down = matrix[i1+1][j+2] - penalty;
+
+ gapext = 1;
+ for(l = j+3; l<length2; l++, gapext++)
+ {
+ thisscore = matrix[i1+1][l] - (penalty + gapext*penext);
+
+ if(thisscore > down)
+ {
+ down = thisscore;
+ dcell = l;
+ }
+ }
+
+ /* Set score to best of these */
+ maxoff = MAX(right, down);
+ if(dia >= maxoff)
+ {
+ matrix[i1][j] = dia;
+ dirn[i1][j].x = i1+1;
+ dirn[i1][j].y = j+1;
+ }
+ else
+ {
+ if(right > down)
+ {
+ matrix[i1][j] = right;
+ dirn[i1][j].x = rcell;
+ dirn[i1][j].y = j+1;
+ }
+ else
+ {
+ matrix[i1][j] = down;
+ dirn[i1][j].x = i1+1;
+ dirn[i1][j].y = dcell;
+ }
+ }
+
+ /* Add the score for a match */
+ matrix[i1][j] += Distance(RefIndex[i1],MobIndex[j]);
+ }
+
+ /* Fill in the scores in this column */
+ for(j1 = j; j1 > -1; j1--)
+ {
+ dia = matrix[i+1][j1+1];
+
+ /* Find highest score to right of diagonal */
+ rcell = i+2;
+ if(i+2 >= length1) right = 0;
+ else right = matrix[i+2][j1+1] - penalty;
+
+ gapext = 1;
+ for(k = i+3; k<length1; k++, gapext++)
+ {
+ thisscore = matrix[k][j1+1] - (penalty + gapext*penext);
+
+ if(thisscore > right)
+ {
+ right = thisscore;
+ rcell = k;
+ }
+ }
+
+ /* Find highest score below diagonal */
+ dcell = j1+2;
+ if(j1+2 >= length2) down = 0;
+ else down = matrix[i+1][j1+2] - penalty;
+
+ gapext = 1;
+ for(l = j1+3; l<length2; l++, gapext++)
+ {
+ thisscore = matrix[i+1][l] - (penalty + gapext*penext);
+
+ if(thisscore > down)
+ {
+ down = thisscore;
+ dcell = l;
+ }
+ }
+
+ /* Set score to best of these */
+ maxoff = MAX(right, down);
+ if(dia >= maxoff)
+ {
+ matrix[i][j1] = dia;
+ dirn[i][j1].x = i+1;
+ dirn[i][j1].y = j1+1;
+ }
+ else
+ {
+ if(right > down)
+ {
+ matrix[i][j1] = right;
+ dirn[i][j1].x = rcell;
+ dirn[i][j1].y = j1+1;
+ }
+ else
+ {
+ matrix[i][j1] = down;
+ dirn[i][j1].x = i+1;
+ dirn[i][j1].y = dcell;
+ }
+ }
+
+ /* Add the score for a match */
+ matrix[i][j1] += Distance(RefIndex[i],MobIndex[j1]);
+ }
+ }
+
+ score = TraceBackDistMat(matrix, dirn, length1, length2,
+ RefIndex, MobIndex, align1, align2,
+ align_len);
+
+#ifdef VERBOSE
+ printf("Matrix:\n-------\n");
+ for(j=0; j<length2;j++)
+ {
+ for(i=0; i<length1; i++)
+ {
+ printf("%3d ",matrix[i][j]);
+ }
+ printf("\n");
+ }
+
+ printf("Path:\n-----\n");
+ for(j=0; j<length2;j++)
+ {
+ for(i=0; i<length1; i++)
+ {
+ printf("(%3d,%3d) ",dirn[i][j].x,dirn[i][j].y);
+ }
+ printf("\n");
+ }
+#endif
+
+ FreeArray2D((char **)matrix, maxdim, maxdim);
+ FreeArray2D((char **)dirn, maxdim, maxdim);
+
+ return(score);
+}
+
+
+
+/************************************************************************/
+/*>REAL TraceBackDistMat(int **matrix, XY **dirn,
+ int length1, int length2,
+ PDB **RefIndex, PDB **MobIndex,
+ char *align1,
+ char *align2, int *align_len)
+ -----------------------------------------------------
+ Input: int **matrix N&W matrix
+ XY **dirn Direction Matrix
+ int length1 Length of first sequence
+ int length2 Length of second sequence
+ PDB **RefIndex First sequence
+ ODB **MobIndex Second sequence
+ Output: char *align1 First sequence aligned
+ char *align2 Second sequence aligned
+ int *align_len Aligned sequence length
+ Returns: int Alignment score
+
+ Does the traceback to find the aligment.
+
+ 15.01.01 Original based on TraceBack()
+*/
+REAL TraceBackDistMat(REAL **matrix,
+ XY **dirn,
+ int length1,
+ int length2,
+ PDB **RefIndex,
+ PDB **MobIndex,
+ char *align1,
+ char *align2,
+ int *align_len)
+{
+ int i, j,
+ ai,
+ BestI,BestJ;
+ XY nextCell;
+
+ ai = SearchForBestDistMat(matrix, length1, length2, &BestI, &BestJ,
+ RefIndex, MobIndex, align1, align2);
+
+ /* Now trace back to find the alignment */
+ i = BestI;
+ j = BestJ;
+ align1[ai] = throne(RefIndex[i]->resnam);
+ align2[ai++] = throne(MobIndex[j]->resnam);
+
+ while(i < length1-1 && j < length2-1)
+ {
+ nextCell.x = dirn[i][j].x;
+ nextCell.y = dirn[i][j].y;
+ if((nextCell.x == i+1) && (nextCell.y == j+1))
+ {
+ /* We are inheriting from the diagonal */
+ i++;
+ j++;
+ }
+ else if(nextCell.y == j+1)
+ {
+ /* We are inheriting from the off-diagonal inserting a gap in
+ the y-sequence (MobIndex)
+ */
+ i++;
+ j++;
+ while((i < nextCell.x) && (i < length1-1))
+ {
+ align1[ai] = throne(RefIndex[i++]->resnam);
+ align2[ai++] = '-';
+ }
+ }
+ else if(nextCell.x == i+1)
+ {
+ /* We are inheriting from the off-diagonal inserting a gap in
+ the x-sequence (RefIndex)
+ */
+ i++;
+ j++;
+ while((j < nextCell.y) && (j < length2-1))
+ {
+ align1[ai] = '-';
+ align2[ai++] = throne(MobIndex[j++]->resnam);
+ }
+ }
+ else
+ {
+ /* Cockup! */
+ fprintf(stderr,"align.c/TraceBack() internal error\n");
+ }
+
+ align1[ai] = throne(RefIndex[i]->resnam);
+ align2[ai++] = throne(MobIndex[j]->resnam);
+ }
+
+ /* If one sequence finished first, fill in the end with insertions */
+ if(i < length1-1)
+ {
+ for(j=i+1; j<length1; j++)
+ {
+ align1[ai] = throne(RefIndex[j]->resnam);
+ align2[ai++] = '-';
+ }
+ }
+ else if(j < length2-1)
+ {
+ for(i=j+1; i<length2; i++)
+ {
+ align1[ai] = '-';
+ align2[ai++] = throne(MobIndex[i]->resnam);
+ }
+ }
+
+ *align_len = ai;
+
+ return(matrix[BestI][BestJ]);
+}
+
+
+/************************************************************************/
+/*>int SearchForBestDistMat(REAL **matrix, int length1,
+ int length2, int *BestI, int *BestJ,
+ PDB **RefIndex, PDB **MobIndex,
+ char *align1, char *align2)
+ -------------------------------------------------------------
+ Input: REAL **matrix N&W matrix
+ int length1 Length of first sequence
+ int length2 Length of second sequence
+ int *BestI x position of highest score
+ int *BestJ y position of highest score
+ PDB **RefIndex First sequence
+ PDB **MobIndex Second sequence
+ Output: char *align1 First sequence with end aligned correctly
+ char *align2 Second sequence with end aligned correctly
+ Returns: int Alignment length thus far
+
+ Searches the outside of the matrix for the best score and starts the
+ alignment by putting in any starting - characters.
+
+ 15.01.01 Original based on SearchForBest()
+*/
+int SearchForBestDistMat(REAL **matrix,
+ int length1,
+ int length2,
+ int *BestI,
+ int *BestJ,
+ PDB **RefIndex,
+ PDB **MobIndex,
+ char *align1,
+ char *align2)
+{
+ int ai,
+ besti, bestj,
+ i, j;
+
+ /* Now search the outside of the matrix for the highest scoring cell */
+ ai = 0;
+ besti = 0;
+ for(i = 1; i < length1; i++)
+ {
+ if(matrix[i][0] > matrix[besti][0]) besti = i;
+ }
+ bestj = 0;
+ for(j = 1; j < length2; j++)
+ {
+ if(matrix[0][j] > matrix[0][bestj]) bestj = j;
+ }
+ if(matrix[besti][0] > matrix[0][bestj])
+ {
+ *BestI = besti;
+ *BestJ = 0;
+ for(i=0; i<*BestI; i++)
+ {
+ align1[ai] = throne(RefIndex[i]->resnam);
+ align2[ai++] = '-';
+ }
+ }
+ else
+ {
+ *BestI = 0;
+ *BestJ = bestj;
+ for(j=0; j<*BestJ; j++)
+ {
+ align1[ai] = '-';
+ align2[ai++] = throne(MobIndex[j]->resnam);
+ }
+ }
+ return(ai);
+}
+
+
+/************************************************************************/
+/*>int CreateFitArrays(int strucnum)
+ ---------------------------------
+ Returns: Number of matched coordinates
+ 0: Failure owing to mismatch
+
+ Creates the coordinate arrays for fitting from the currently defined
+ zones.
+
+ 30.09.92 Original
+ 01.10.92 Ignores undefined atoms. Fix in finding mobile atoms.
+ Corrected use of CofGs
+ 09.10.92 Removed incorrect resetting of CoorCount. This broke
+ multi-zone fitting
+ 17.07.95 Changed screen() to printf()
+ 18.07.95 Added initialisation of inserts in zones
+ Added calls to FormatZone()
+ 31.07.95 Prints numbers of residues if mismatch
+ Fixed bug in counting for weights array; was only counting
+ reference structure, not mobile
+ 13.12.95 Added printing of zone info on number of residues mismatch
+ 31.05.96 Added test on B-values
+ 18.06.96 Replaced MODE_* with ZONE_MODE_*
+ Replaced FindZone() with FindZonePDB()
+ 06.11.96 Negative BVal cutoff interpreted as > bval
+ 11.11.96 Checks actual value of gUseBVal
+ 18.11.96 Added gIgnoreMissing handling
+ 03.07.97 Added a break when the mobile atom has been found. This will
+ speed things up and mean that we only find the first atom
+ from mobile if multiple occupancies have not been specified
+ correctly (i.e. the same atoms are not together); previously
+ this would core dump.
+ 12.01.01 gMobPDB[] now an array
+ 01.02.01 Added strucnum parameter
+ 15.02.01 Changed ValidAtom() calls to atnam_raw
+ 20.02.01 gMobCofG now an array
+ 20.02.01 -999 for start or end of structure rather than -1
+ 18.03.08 added call to CentreOnZone() for setting centres of
+ geometry. By: CTP
+*/
+int CreateFitArrays(int strucnum)
+{
+ PDB *ref_start = NULL,
+ *ref_stop = NULL,
+ *mob_start = NULL,
+ *mob_stop = NULL,
+ *p,
+ *q,
+ *r,
+ *m;
+ ZONE *z;
+ char ref_insert,
+ mob_insert;
+ int ref_resnum,
+ mob_resnum,
+ ref_nres,
+ mob_nres,
+ CoorCount = 0,
+ Found,
+ i,
+ natom1,
+ natom2;
+
+
+ if(gRefCoor==NULL || gMobCoor[strucnum]==NULL)
+ {
+ printf(" Error==> A coordinate array is undefined!\n");
+ return(0);
+ }
+
+ /* Allocate memory for a weights array */
+ for(p=gRefPDB, natom1=0; p!=NULL; NEXT(p))
+ natom1++;
+ for(p=gMobPDB[strucnum], natom2=0; p!=NULL; NEXT(p))
+ natom2++;
+
+ if(gWeights != NULL)
+ free(gWeights);
+ if((gWeights = (REAL *)malloc(MAX(natom1, natom2) * sizeof(REAL)))
+ == NULL)
+ {
+ printf(" Error==> No memory for weights array!\n");
+ return(0);
+ }
+
+ /* If no zones have been specified, create a single all atoms zone */
+ if(gZoneList[strucnum] == NULL)
+ {
+ INIT(gZoneList[strucnum],ZONE);
+ gZoneList[strucnum]->chain1 = ' ';
+ gZoneList[strucnum]->start1 = -999;
+ gZoneList[strucnum]->startinsert1 = ' ';
+ gZoneList[strucnum]->stop1 = -999;
+ gZoneList[strucnum]->stopinsert1 = ' ';
+ gZoneList[strucnum]->chain2 = ' ';
+ gZoneList[strucnum]->start2 = -999;
+ gZoneList[strucnum]->startinsert2 = ' ';
+ gZoneList[strucnum]->stop2 = -999;
+ gZoneList[strucnum]->stopinsert2 = ' ';
+ gZoneList[strucnum]->mode = gCurrentMode;
+ }
+
+ /* Step through each zone */
+ for(z=gZoneList[strucnum]; z!=NULL; NEXT(z))
+ {
+ /* Reference structure */
+ if(!FindZonePDB(gRefPDB, z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1, z->chain1, z->mode,
+ &ref_start, &ref_stop))
+ {
+ char zone1[64],
+ zone2[64];
+
+ /* Check ranges have been found */
+ printf(" Error==> Reference structure zone not found.\n");
+
+ FormatZone(zone1, z->chain1,
+ z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1);
+ FormatZone(zone2, z->chain2,
+ z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2);
+ printf(" %-16s with %-16s %s\n",
+ zone1, zone2,
+ ((z->mode == ZONE_MODE_RESNUM)?"(Residue numbering)"
+ :"(Sequential numbering)"));
+
+ return(0);
+ }
+
+ /* Mobile structure */
+ if(!FindZonePDB(gMobPDB[strucnum], z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2, z->chain2, z->mode,
+ &mob_start, &mob_stop))
+ {
+ char zone1[64],
+ zone2[64];
+
+ /* Check ranges have been found */
+ printf(" Error==> Mobile structure zone not found.\n");
+
+ FormatZone(zone1, z->chain1,
+ z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1);
+ FormatZone(zone2, z->chain2,
+ z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2);
+ printf(" %-16s with %-16s %s\n",
+ zone1, zone2,
+ ((z->mode == ZONE_MODE_RESNUM)?"(Residue numbering)"
+ :"(Sequential numbering)"));
+ return(0);
+ }
+
+ /* Check we have the same number of residues in each zone */
+ ref_nres = 1;
+ ref_resnum = ref_start->resnum;
+ ref_insert = ref_start->insert[0];
+ for(p=ref_start; p!=ref_stop; NEXT(p))
+ {
+ if(p->resnum != ref_resnum || p->insert[0] != ref_insert)
+ {
+ ref_nres++;
+ ref_resnum = p->resnum;
+ ref_insert = p->insert[0];
+ }
+ }
+
+ mob_nres = 1;
+ mob_resnum = mob_start->resnum;
+ mob_insert = mob_start->insert[0];
+ for(p=mob_start; p!=mob_stop; NEXT(p))
+ {
+ if(p->resnum != mob_resnum || p->insert[0] != mob_insert)
+ {
+ mob_nres++;
+ mob_resnum = p->resnum;
+ mob_insert = p->insert[0];
+ }
+ }
+
+ if(ref_nres != mob_nres)
+ {
+ char zone1[64],
+ zone2[64];
+
+ printf(" Error==> Number of residues in zone does not \
+match.\n");
+ /* Added 13.12.95 */
+ FormatZone(zone1, z->chain1,
+ z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1);
+
+ FormatZone(zone2, z->chain2,
+ z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2);
+
+ printf(" %-16s with %-16s %s\n",
+ zone1, zone2,
+ ((z->mode == ZONE_MODE_RESNUM)?"(Residue numbering)"
+ :"(Sequential numbering)"));
+
+ /* Added 31.07.95 */
+ printf(" Reference: %d, Mobile: %d\n",
+ ref_nres, mob_nres);
+
+ return(0);
+ }
+
+
+ /* Insert the atoms from this zone into the coordinate arrays */
+
+/* Removed 09.10.92.....
+// CoorCount = 0;
+*/
+
+ ref_nres = 0;
+ ref_resnum = -999;
+ ref_insert = ' ';
+
+ for(p=ref_start; p!=ref_stop; NEXT(p))
+ {
+ if(p->resnum != ref_resnum || p->insert[0] != ref_insert)
+ {
+ /* Start of a new reference residue */
+ ref_nres++;
+ ref_resnum = p->resnum;
+ ref_insert = p->insert[0];
+
+ mob_nres = 0;
+ mob_resnum = -999;
+ mob_insert = ' ';
+
+ for(q=mob_start;
+ q!=mob_stop;
+ NEXT(q))
+ {
+ if(q->resnum != mob_resnum || q->insert[0] != mob_insert)
+ {
+ /* Start of a new mobile residue */
+ mob_nres++;
+ mob_resnum = q->resnum;
+ mob_insert = q->insert[0];
+
+ /* Consider the equivalent residues */
+ if(ref_nres == mob_nres)
+ {
+ /* p points to the start of a residue in reference,
+ q points to the start of a residue in mobile
+
+ Step through reference set
+ */
+ for(r=p;
+ r!=NULL && r->resnum==ref_resnum
+ && r->insert[0]==ref_insert;
+ NEXT(r))
+ {
+ /* 15.02.01 Changed from atnam to atnam_raw */
+ if(ValidAtom(r->atnam_raw, ATOM_FITTING))
+ {
+ if(r->x == 9999.0 &&
+ r->y == 9999.0 &&
+ r->z == 9999.0)
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> Undefined atom in \
+reference set ignored:\n");
+ /* 28.02.01 Changed from r->atnam */
+ printf(" Residue: %4s %5d%2s, \
+Atom: %4s\n",
+ r->resnam,r->resnum,
+ r->insert,r->atnam_raw);
+ }
+
+ continue;
+ }
+
+ /* 31.05.96 Test the BVal
+ 06.11.96 Modified to handle -ve
+ specifications
+ 11.11.96 Checks for 1 or 2 as value
+ */
+ if(gUseBVal==1 || gUseBVal==2)
+ {
+ if(gBValue >= (REAL)0.0)
+ {
+ if(r->bval > gBValue)
+ continue;
+ }
+ else
+ {
+ if(-(r->bval) > gBValue)
+ continue;
+ }
+ }
+
+ /* Find this atom in the mobile set */
+ Found = FALSE;
+ for(m=q;
+ m!=NULL && m->resnum==mob_resnum
+ && m->insert[0]==mob_insert;
+ NEXT(m))
+ {
+ /* 28.02.01 Changed from ->atnam */
+ if(!strcmp(r->atnam_raw,m->atnam_raw))
+ {
+ Found = TRUE;
+
+ if(m->x == 9999.0 &&
+ m->y == 9999.0 &&
+ m->z == 9999.0)
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> Undefined \
+atom in mobile set ignored:\n");
+ /* 28.02.01 Changed from r->atnam*/
+ printf(" Residue: %4s \
+%5d%2s, Atom: %4s\n",
+ m->resnam,m->resnum,
+ m->insert,m->atnam_raw);
+ }
+
+ continue;
+ }
+
+
+ /* 31.05.96 Test the BVal
+ 06.11.96 Modified to handle -ve
+ specifications
+ 11.11.96 Checks for 1 or 3 as value
+ */
+ if(gUseBVal==1 || gUseBVal==3)
+ {
+ if(gBValue >= (REAL)0.0)
+ {
+ if(m->bval > gBValue)
+ continue;
+ }
+ else
+ {
+ if(-(m->bval) > gBValue)
+ continue;
+ }
+ }
+
+ /* Copy the coordinates */
+ gRefCoor[CoorCount].x = r->x;
+ gRefCoor[CoorCount].y = r->y;
+ gRefCoor[CoorCount].z = r->z;
+ gMobCoor[strucnum][CoorCount].x = m->x;
+ gMobCoor[strucnum][CoorCount].y = m->y;
+ gMobCoor[strucnum][CoorCount].z = m->z;
+ gWeights[CoorCount] =
+ (m->bval + r->bval)/(REAL)2.0;
+ CoorCount++;
+ break; /* 03.07.97 ACRM */
+ }
+ }
+ if(!Found)
+ {
+ if(gIgnoreMissing)
+ {
+ if(!gQuiet)
+ {
+ /* 28.02.01 Changed from r->atnam */
+ printf(" Warning==> Ignored \
+reference atom %4s not found in mobile.\n",
+ r->atnam_raw);
+ printf(" Reference %4s %5d%c \
+Mobile %4s %5d%c\n",
+ p->resnam,p->resnum,
+ p->insert[0],
+ q->resnam,q->resnum,
+ q->insert[0]);
+ }
+ }
+ else
+ {
+ printf(" Error==> Atoms do not match \
+in residue:\n");
+ printf(" Reference %4s %5d%c \
+Mobile %4s %5d%c\n",
+ p->resnam,p->resnum,p->insert[0],
+ q->resnam,q->resnum,q->insert[0]);
+ /* 28.02.01 Changed from r->atnam */
+ printf(" Unable to find \
+reference atom %4s in mobile.\n",
+ r->atnam_raw);
+ return(0);
+ }
+ }
+ }
+ } /* End of loop through reference residue */
+ break;
+ } /* End of if() equivalent residues */
+ } /* End of start-of-new-mobile-residue */
+ } /* End of loop through mobile set */
+ } /* End of start-of-new-reference-residue */
+ } /* End if loop through reference set */
+ } /* End of loop through zones */
+
+ if(CoorCount == 0)
+ {
+ printf(" Error==> No atoms in specified zones\n");
+ }
+ else
+ {
+ /* Calculate the centre of geometry */
+ /* 18.03.08 Call to CentreOnZone added By: CTP */
+ gMobCofG[strucnum].x = gMobCofG[strucnum].y
+ = gMobCofG[strucnum].z = 0.0;
+ gRefCofG.x = gRefCofG.y = gRefCofG.z = 0.0;
+
+ if(gCZoneList[strucnum] == NULL)
+ {
+ for(i=0; i<CoorCount; i++)
+ {
+ gMobCofG[strucnum].x += gMobCoor[strucnum][i].x;
+ gMobCofG[strucnum].y += gMobCoor[strucnum][i].y;
+ gMobCofG[strucnum].z += gMobCoor[strucnum][i].z;
+ gRefCofG.x += gRefCoor[i].x;
+ gRefCofG.y += gRefCoor[i].y;
+ gRefCofG.z += gRefCoor[i].z;
+ }
+
+ gMobCofG[strucnum].x /= CoorCount;
+ gMobCofG[strucnum].y /= CoorCount;
+ gMobCofG[strucnum].z /= CoorCount;
+ gRefCofG.x /= CoorCount;
+ gRefCofG.y /= CoorCount;
+ gRefCofG.z /= CoorCount;
+ }
+ else
+ {
+ if(!CentreOnZone(strucnum))
+ {
+ printf(" Error==> No centre residues set.\n");
+ return(0);
+ }
+ }
+
+#ifdef DEBUG
+ fprintf(stderr,"Before fitting\n%d coordinates.\n", CoorCount);
+ fprintf(stderr,"Ref CofG: %8.3f %8.3f %8.3f\n",
+ gRefCofG.x, gRefCofG.y, gRefCofG.z);
+ fprintf(stderr,"Mob CofG: %8.3f %8.3f %8.3f\n",
+ gMobCofG[strucnum].x,
+ gMobCofG[strucnum].y,
+ gMobCofG[strucnum].z);
+#endif
+
+ /* Move coordinate arrays to the origin */
+ for(i=0; i<CoorCount; i++)
+ {
+ gRefCoor[i].x -= gRefCofG.x;
+ gRefCoor[i].y -= gRefCofG.y;
+ gRefCoor[i].z -= gRefCofG.z;
+ gMobCoor[strucnum][i].x -= gMobCofG[strucnum].x;
+ gMobCoor[strucnum][i].y -= gMobCofG[strucnum].y;
+ gMobCoor[strucnum][i].z -= gMobCofG[strucnum].z;
+ }
+ }
+
+ return(CoorCount);
+}
+
+
+/************************************************************************/
+/*>int CentreOnZone(int strucnum)
+ ------------------------------
+ Returns: Number of matched coordinates
+ 0: Failure owing to mismatch
+
+ Sets the centres of geometry (gRefCofG and gRefCofG[structnum]) to
+ correspond to centres of user-defined zones.
+
+ 18.03.08 Original based on CreateFitArrays() By: CTP
+*/
+int CentreOnZone(int strucnum)
+{
+ PDB *ref_start = NULL,
+ *ref_stop = NULL,
+ *mob_start = NULL,
+ *mob_stop = NULL,
+ *p,
+ *q,
+ *r,
+ *m;
+ ZONE *z;
+ char ref_insert,
+ mob_insert;
+ int ref_resnum,
+ mob_resnum,
+ ref_nres,
+ mob_nres,
+ CoorCount = 0;
+ BOOL Found;
+
+ VEC3F ref_CofG;
+ VEC3F mob_CofG;
+
+ ref_CofG.x = ref_CofG.y = ref_CofG.z = 0.0;
+ mob_CofG.x = mob_CofG.y = mob_CofG.z = 0.0;
+
+ if(gRefCoor==NULL || gMobCoor[strucnum]==NULL)
+ {
+ printf(" Error==> A coordinate array is undefined!\n");
+ return(0);
+ }
+
+ /* If no zones have been specified, return. */
+ if(gCZoneList[strucnum] == NULL)
+ {
+ printf(" CentreOnZone: No Zones Specified.\n");
+ return(0);
+ }
+
+ /* Step through each zone */
+ for(z=gCZoneList[strucnum]; z!=NULL; NEXT(z))
+ {
+ /* Reference structure */
+ if(!FindZonePDB(gRefPDB, z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1, z->chain1, z->mode,
+ &ref_start, &ref_stop))
+ {
+
+ /* Check ranges have been found */
+ printf(" Error==> Reference centre residue not found.\n");
+ return(0);
+ }
+
+ /* Mobile structure */
+ if(!FindZonePDB(gMobPDB[strucnum], z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2, z->chain2, z->mode,
+ &mob_start, &mob_stop))
+ {
+ /* Check ranges have been found */
+ printf(" Error==> Mobile centre residue not found.\n");
+ return(0);
+ }
+
+
+ /* Find atom coordinates */
+ ref_nres = 0;
+ ref_resnum = -999;
+ ref_insert = ' ';
+
+ for(p=ref_start; p!=ref_stop; NEXT(p))
+ {
+ if(p->resnum != ref_resnum || p->insert[0] != ref_insert)
+ {
+ /* Start of a new reference residue */
+ ref_nres++;
+ ref_resnum = p->resnum;
+ ref_insert = p->insert[0];
+
+ mob_nres = 0;
+ mob_resnum = -999;
+ mob_insert = ' ';
+
+ for(q=mob_start;
+ q!=mob_stop;
+ NEXT(q))
+ {
+ if(q->resnum != mob_resnum || q->insert[0] != mob_insert)
+ {
+ /* Start of a new mobile residue */
+ mob_nres++;
+ mob_resnum = q->resnum;
+ mob_insert = q->insert[0];
+
+ /* Consider the equivalent residues */
+ if(ref_nres == mob_nres)
+ {
+ /* p points to the start of a residue in reference,
+ q points to the start of a residue in mobile
+
+ Step through reference set
+ */
+ for(r=p;
+ r!=NULL && r->resnum==ref_resnum
+ && r->insert[0]==ref_insert;
+ NEXT(r))
+ {
+ /* 15.02.01 Changed from atnam to atnam_raw */
+ if(ValidAtom(r->atnam_raw, ATOM_FITTING))
+ {
+ if(r->x == 9999.0 &&
+ r->y == 9999.0 &&
+ r->z == 9999.0)
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> Undefined atom in \
+reference set ignored:\n");
+ /* 28.02.01 Changed from r->atnam */
+ printf(" Residue: %4s %5d%2s, \
+Atom: %4s\n",
+ r->resnam,r->resnum,
+ r->insert,r->atnam_raw);
+ }
+
+ continue;
+ }
+
+ /* Find this atom in the mobile set */
+ Found = FALSE;
+ for(m=q;
+ m!=NULL && m->resnum==mob_resnum
+ && m->insert[0]==mob_insert;
+ NEXT(m))
+ {
+ /* 28.02.01 Changed from ->atnam */
+ if(!strcmp(r->atnam_raw,m->atnam_raw))
+ {
+ Found = TRUE;
+
+ if(m->x == 9999.0 &&
+ m->y == 9999.0 &&
+ m->z == 9999.0)
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> Undefined \
+atom in mobile set ignored:\n");
+ /* 28.02.01 Changed from r->atnam*/
+ printf(" Residue: %4s \
+%5d%2s, Atom: %4s\n",
+ m->resnam,m->resnum,
+ m->insert,m->atnam_raw);
+ }
+
+ continue;
+ }
+
+ /* Sum of Coordinates */
+ ref_CofG.x += r->x;
+ ref_CofG.y += r->y;
+ ref_CofG.z += r->z;
+ mob_CofG.x += m->x;
+ mob_CofG.y += m->y;
+ mob_CofG.z += m->z;
+ CoorCount++;
+ break;
+ }
+ }
+ if(!Found)
+ {
+ if(gIgnoreMissing)
+ {
+ if(!gQuiet)
+ {
+ /* 28.02.01 Changed from r->atnam */
+ printf(" Warning==> Ignored \
+reference atom %4s not found in mobile.\n",
+ r->atnam_raw);
+ printf(" Reference %4s %5d%c \
+Mobile %4s %5d%c\n",
+ p->resnam,p->resnum,
+ p->insert[0],
+ q->resnam,q->resnum,
+ q->insert[0]);
+ }
+ }
+ else
+ {
+ printf(" Error==> Atoms do not match \
+in residue:\n");
+ printf(" Reference %4s %5d%c \
+Mobile %4s %5d%c\n",
+ p->resnam,p->resnum,p->insert[0],
+ q->resnam,q->resnum,q->insert[0]);
+ /* 28.02.01 Changed from r->atnam */
+ printf(" Unable to find \
+reference atom %4s in mobile.\n",
+ r->atnam_raw);
+ return(0);
+ }
+ }
+ }
+ } /* End of loop through reference residue */
+ break;
+ } /* End of if() equivalent residues */
+ } /* End of start-of-new-mobile-residue */
+ } /* End of loop through mobile set */
+ } /* End of start-of-new-reference-residue */
+ } /* End if loop through reference set */
+ } /* End of loop through zones */
+
+ if(CoorCount == 0)
+ {
+ printf(" Error==> No atoms in specified zones\n");
+ }
+ else
+ {
+ /* Calculate the centre of geometry */
+ gRefCofG.x = ref_CofG.x / CoorCount;
+ gRefCofG.y = ref_CofG.y / CoorCount;
+ gRefCofG.z = ref_CofG.z / CoorCount;
+ gMobCofG[strucnum].x = mob_CofG.x / CoorCount;
+ gMobCofG[strucnum].y = mob_CofG.y / CoorCount;
+ gMobCofG[strucnum].z = mob_CofG.z / CoorCount;
+ }
+
+ return(CoorCount);
+}
+
+
+/************************************************************************/
+/*>void SetSymmetricalAtomPAirs()
+ ------------------------------
+ Sets the atom pairs used when auto-matching symmetrical atoms.
+ (eg CD1 - CD2 and CE1 - CE2 in Tyr)
+
+ 04.06.08 Original By: CTP
+*/
+void SetSymmetricalAtomPAirs(void)
+{
+ /* Charged */
+ strcpy(gSymType[0][0],"ARG ");
+ strcpy(gSymType[0][1]," NH1");
+ strcpy(gSymType[0][2]," NH2");
+
+ strcpy(gSymType[1][0],"ASP ");
+ strcpy(gSymType[1][1]," OD1");
+ strcpy(gSymType[1][2]," OD2");
+
+ strcpy(gSymType[2][0],"GLU ");
+ strcpy(gSymType[2][1]," OE1");
+ strcpy(gSymType[2][2]," OE2");
+
+ /* Aromatic */
+ strcpy(gSymType[3][0],"PHE ");
+ strcpy(gSymType[3][1]," CD1");
+ strcpy(gSymType[3][2]," CD2");
+ strcpy(gSymType[4][0],"PHE ");
+ strcpy(gSymType[4][1]," CE1");
+ strcpy(gSymType[4][2]," CE2");
+
+ strcpy(gSymType[5][0],"TYR ");
+ strcpy(gSymType[5][1]," CD1");
+ strcpy(gSymType[5][2]," CD2");
+ strcpy(gSymType[6][0],"TYR ");
+ strcpy(gSymType[6][1]," CE1");
+ strcpy(gSymType[6][2]," CE2");
+
+ /* Amide Nitrogen and Oxygen */
+ strcpy(gSymType[7][0],"ASN ");
+ strcpy(gSymType[7][1]," OD1");
+ strcpy(gSymType[7][2]," ND2");
+
+ strcpy(gSymType[8][0],"GLN ");
+ strcpy(gSymType[8][1]," OE1");
+ strcpy(gSymType[8][2]," NE2");
+
+ /* Prochiral Methyls */
+ strcpy(gSymType[9][0],"VAL ");
+ strcpy(gSymType[9][1]," CG1");
+ strcpy(gSymType[9][2]," CG2");
+
+ strcpy(gSymType[10][0],"LEU ");
+ strcpy(gSymType[10][1]," CD1");
+ strcpy(gSymType[10][2]," CD2");
+
+ /* Default Match Settings */
+ gSymType[0][3][0] = TRUE; /* Arg */
+ gSymType[1][3][0] = TRUE; /* Asp */
+ gSymType[2][3][0] = TRUE; /* Glu */
+ gSymType[3][3][0] = TRUE; /* Phe */
+ gSymType[4][3][0] = TRUE; /* Phe */
+ gSymType[5][3][0] = TRUE; /* Tyr */
+ gSymType[6][3][0] = TRUE; /* Tyr */
+ gSymType[7][3][0] = FALSE; /* Asn */
+ gSymType[8][3][0] = FALSE; /* Gln */
+ gSymType[9][3][0] = FALSE; /* Val */
+ gSymType[10][3][0] = FALSE; /* Leu */
+
+ return;
+}
+
+/************************************************************************/
+/*>void ApplyMatrixCOOR(COOR *incoords, REAL matrix[3][3],int ncoor)
+ ------------------------------------------------------------------
+ I/O: COOR *incoords Coordinate Array
+ Input: REAL matrix[3][3] Matrix to apply
+ int ncoor Number of coordinates.
+
+ Apply a rotation matrix to a coordinate array.
+ Based on ApplyMatrixPDB() by ACRM.
+
+ 18.08.08 Original By: CTP
+*/
+void ApplyMatrixCOOR(COOR *incoords,
+ REAL matrix[3][3],
+ int ncoor)
+{
+ int i;
+ VEC3F outcoords;
+
+ for(i=0; i<ncoor; i++)
+ {
+ if(incoords[i].x != 9999.0 && incoords[i].y != 9999.0 &&
+ incoords[i].z != 9999.0)
+ {
+ MatMult3_33(incoords[i],matrix,&outcoords);
+
+ incoords[i].x = outcoords.x;
+ incoords[i].y = outcoords.y;
+ incoords[i].z = outcoords.z;
+ }
+ }
+
+ return;
+}
+
+
+/************************************************************************/
+/*>void CalculateRotationMatrix(REAL RotAngle, REAL Matrix[3][3])
+ --------------------------------------------------------------
+ Calculate rotation matrix around Z axis for input angle, RotAngle, in
+ degrees.
+
+ 18.08.08 Original By: CTP
+ 16.02.09 Rewritten as wrapper for bioplib function CreateRotMat()
+*/
+void CalculateRotationMatrix(REAL RotAngle, REAL Matrix[3][3])
+{
+ /* Convert to radians */
+ REAL psi = RotAngle * -1.0 * (PI/180.0);
+
+ /* Rotation Matrix */
+ CreateRotMat('z', psi, Matrix);
+ return;
+}
+
+
+/************************************************************************/
+/*>REAL FitSingleStructure(int strucnum, BOOL single_iteration)
+ ------------------------------------------------------------
+ Fits single mobile structure to the reference structure and returns
+ the RMSD. Called during all vs all comparisons, when setting the order
+ for fitting structures in order and when fitting structures in order.
+
+ Function returns -1.0 if error.
+
+ 20.10.08 Original based on FitStructures() By: CTP
+*/
+REAL FitSingleStructure(int strucnum, BOOL single_iteration)
+{
+ ZONE *z1,
+ *z2;
+ int atmnum,
+ NCoor,
+ niter;
+ REAL rmstot,
+ rmsprev = (-100.0),
+ deltaRMS,
+ rmscurr = -1.0;
+ BOOL final = FALSE;
+
+
+ gFitted = FALSE;
+
+ if(!gRefFilename[0])
+ {
+ printf(" Error==> Reference structure undefined.\n");
+ return(-1.0);
+ }
+ if(!gMobFilename[0][0])
+ {
+ printf(" Error==> Mobile structure undefined.\n");
+ return(-1.0);
+ }
+
+ /* First copy the zones for display to match those for fitting */
+ if(gRZoneList[strucnum] != NULL)
+ {
+ FREELIST(gRZoneList[strucnum],ZONE);
+ gRZoneList[strucnum] = NULL;
+ }
+ for(z1=gZoneList[strucnum]; z1!=NULL; NEXT(z1))
+ {
+ /* Allocate an entry in RMS zone list */
+ if(gRZoneList[strucnum])
+ {
+ /* Move to end of zone list */
+ z2=gRZoneList[strucnum];
+ LAST(z2);
+ ALLOCNEXT(z2,ZONE);
+ }
+ else
+ {
+ INIT(gRZoneList[strucnum],ZONE);
+ z2 = gRZoneList[strucnum];
+ }
+
+ if(z2==NULL)
+ {
+ printf(" Error==> No memory for RMS zone!\n");
+ }
+ else
+ {
+ /* Add this zone to the RMS zone list */
+ z2->chain1 = z1->chain1;
+ z2->start1 = z1->start1;
+ z2->startinsert1 = z1->startinsert1;
+ z2->stop1 = z1->stop1;
+ z2->stopinsert1 = z1->stopinsert1;
+ z2->chain2 = z1->chain2;
+ z2->start2 = z1->start2;
+ z2->startinsert2 = z1->startinsert2;
+ z2->stop2 = z1->stop2;
+ z2->stopinsert2 = z1->stopinsert2;
+ z2->mode = z1->mode;
+ }
+ }
+
+ /* Now copy the atoms for RMS calculation */
+ gNOTRMSAtoms = gNOTFitAtoms;
+ for(atmnum=0; atmnum<NUMTYPES; atmnum++)
+ strcpy(gRMSAtoms[atmnum],gFitAtoms[atmnum]);
+
+ if(gMultiCount > 1)
+ {
+ /* Keep looping counting the iterations */
+ for(niter=0; ; niter++)
+ {
+ /* printf (" Multi-structure fit iteration %d\n", niter); */
+ rmstot = (REAL)0.0;
+
+ /*** Fit single structure ***/
+
+ /* Set up arrays for fitting */
+ if((NCoor=CreateFitArrays(strucnum))!=0)
+ {
+ /* Reset the convergence criterion */
+ CheckForConvergence(0, strucnum);
+
+ /* Perform the fit */
+ if(DoFitting(NCoor, strucnum))
+ {
+ while(gIterate && !final)
+ {
+ if((NCoor = UpdateFitArrays(strucnum))!=0)
+ {
+ if(!DoFitting(NCoor, strucnum))
+ return(-1.0);
+ if(CheckForConvergence(NCoor, strucnum))
+ break;
+ }
+ else
+ {
+ break;
+ }
+ }
+
+ /* Find RMS - Do not update during final iteration */
+ /* rmstot += ShowRMS(FALSE,NULL,strucnum,!final,FALSE); */
+ rmscurr = ShowRMS(FALSE,NULL,strucnum,
+ !final && !single_iteration,FALSE);
+ rmstot += rmscurr;
+ if(gIterate && !gQuiet)
+ {
+/***
+// printf(" (Over %d equivalenced CA-atoms)\n",
+// NCoor);
+***/
+ }
+ }
+ }
+
+ deltaRMS = (rmstot - rmsprev);
+ rmsprev = rmstot;
+
+ /* If we've converged or done too many iterations, do a final */
+ /* iteration then break out. */
+ if(final) break;
+
+ if((ABS(deltaRMS) < MULTI_ITER_STOP) ||
+ (niter > MAXMULTIITER))
+ final = TRUE;
+
+ /* Break out if single iteration */
+ if(single_iteration) break;
+ }
+ }
+
+ return(rmscurr);
+}
+
+
+/************************************************************************/
+/*>void FitStructuresInOrder(REAL sortorder[][2])
+ ----------------------------------------------
+ Fit structures in order based on 2D array, sortorder. The first element
+ of sortorder is the structure number and the second is a score used for
+ sorting (eg RMSD).
+
+ 20.10.08 Original based on FitStructures() By: CTP
+*/
+void FitStructuresInOrder(REAL sortorder[][2])
+{
+ ZONE *z1,
+ *z2;
+ int atmnum,
+ NCoor,
+ strucnum,
+ niter,
+ i;
+ REAL rmstot,
+ rmsprev = (-100.0),
+ deltaRMS;
+ BOOL final = FALSE;
+
+ gFitted = FALSE;
+
+ if(!gRefFilename[0])
+ {
+ printf(" Error==> Reference structure undefined.\n");
+ return;
+ }
+ if(!gMobFilename[0][0])
+ {
+ printf(" Error==> Mobile structure undefined.\n");
+ return;
+ }
+ if(gMultiCount <= 1)
+ {
+ printf(" Error==> ");
+ printf("ORDERFIT can only be used with multiple structures.\n");
+ return;
+ }
+
+ if(!gQuiet)
+ {
+ printf(" Fitting structures...\n");
+ }
+
+ /* First copy the zones for display to match those for fitting */
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ if(gRZoneList[strucnum] != NULL)
+ {
+ FREELIST(gRZoneList[strucnum],ZONE);
+ gRZoneList[strucnum] = NULL;
+ }
+ for(z1=gZoneList[strucnum]; z1!=NULL; NEXT(z1))
+ {
+ /* Allocate an entry in RMS zone list */
+ if(gRZoneList[strucnum])
+ {
+ /* Move to end of zone list */
+ z2=gRZoneList[strucnum];
+ LAST(z2);
+ ALLOCNEXT(z2,ZONE);
+ }
+ else
+ {
+ INIT(gRZoneList[strucnum],ZONE);
+ z2 = gRZoneList[strucnum];
+ }
+
+ if(z2==NULL)
+ {
+ printf(" Error==> No memory for RMS zone!\n");
+ }
+ else
+ {
+ /* Add this zone to the RMS zone list */
+ z2->chain1 = z1->chain1;
+ z2->start1 = z1->start1;
+ z2->startinsert1 = z1->startinsert1;
+ z2->stop1 = z1->stop1;
+ z2->stopinsert1 = z1->stopinsert1;
+ z2->chain2 = z1->chain2;
+ z2->start2 = z1->start2;
+ z2->startinsert2 = z1->startinsert2;
+ z2->stop2 = z1->stop2;
+ z2->stopinsert2 = z1->stopinsert2;
+ z2->mode = z1->mode;
+ }
+ }
+ }
+
+ /* Now copy the atoms for RMS calculation */
+ gNOTRMSAtoms = gNOTFitAtoms;
+ for(atmnum=0; atmnum<NUMTYPES; atmnum++)
+ strcpy(gRMSAtoms[atmnum],gFitAtoms[atmnum]);
+
+ /* Keep looping counting the iterations */
+ for(niter=0; ; niter++)
+ {
+ printf (" Multi-structure fit iteration %d\n", niter);
+
+ rmstot = (REAL)0.0;
+
+ /* Loop through the structures we are fitting */
+ /* for(strucnum=0; strucnum<gMultiCount; strucnum++) */
+ for(i=0; i<gMultiCount; i++)
+ {
+
+ /* Set strucnum */
+ strucnum = (int)sortorder[i][0];
+ printf(" Fitting structure %d\n",strucnum + 1);
+
+ /* Set up arrays for fitting */
+ if((NCoor=CreateFitArrays(strucnum))!=0)
+ {
+ /* Reset the convergence criterion */
+ CheckForConvergence(0, strucnum);
+
+ /* Perform the fit */
+ if(DoFitting(NCoor, strucnum))
+ {
+ while(gIterate && !final)
+ {
+ if((NCoor = UpdateFitArrays(strucnum))!=0)
+ {
+ if(!DoFitting(NCoor, strucnum))
+ return;
+ if(CheckForConvergence(NCoor, strucnum))
+ break;
+ }
+ else
+ {
+ break;
+ }
+ }
+
+ /* Find RMS - Do not update during final iteration */
+ rmstot += ShowRMS(FALSE,NULL,strucnum,!final,FALSE);
+ if(gIterate && !gQuiet)
+ {
+ printf(" (Over %d equivalenced CA-atoms)\n",
+ NCoor);
+ }
+ }
+ }
+ }
+ deltaRMS = (rmstot - rmsprev);
+ rmsprev = rmstot;
+
+ /* If we've converged or done too many iterations, do a final
+ iteration then break out.
+ */
+ if(final) break;
+
+ if((ABS(deltaRMS) < MULTI_ITER_STOP) ||
+ (niter > MAXMULTIITER))
+ final = TRUE;
+ }
+
+ return;
+}
+
+
+
diff --git a/src/fitting.p b/src/fitting.p
new file mode 100644
index 0000000..0700631
--- /dev/null
+++ b/src/fitting.p
@@ -0,0 +1,66 @@
+void FitStructures(void)
+;
+BOOL DoFitting(int NCoor, int strucnum)
+;
+int ValidAtom(char *atnam, int mode)
+;
+REAL CalcRMS(BOOL ByRes, FILE *fp, int strucnum, BOOL UpdateReference,
+ BOOL ByAtm)
+;
+void ShowNFitted(void)
+;
+REAL ShowRMS(BOOL ByRes, char *filename, int strucnum,
+ BOOL UpdateReference, BOOL ByAtm)
+;
+int CheckForConvergence(int NCoor, int strucnum)
+;
+int UpdateFitArrays(int strucnum)
+;
+REAL Distance(PDB *p, PDB *q)
+;
+REAL AlignOnCADistances(PDB **RefIndex, int length1,
+ PDB **MobIndex, int length2,
+ char *align1, char *align2, int *align_len)
+;
+REAL TraceBackDistMat(REAL **matrix,
+ XY **dirn,
+ int length1,
+ int length2,
+ PDB **RefIndex,
+ PDB **MobIndex,
+ char *align1,
+ char *align2,
+ int *align_len)
+;
+int SearchForBestDistMat(REAL **matrix,
+ int length1,
+ int length2,
+ int *BestI,
+ int *BestJ,
+ PDB **RefIndex,
+ PDB **MobIndex,
+ char *align1,
+ char *align2)
+;
+int CreateFitArrays(int strucnum)
+;
+int CentreOnZone(int strucnum)
+;
+void SetSymmetricalAtomPAirs(void)
+;
+void ApplyMatrixCOOR(COOR *incoords,
+ REAL matrix[3][3],
+ int ncoor)
+;
+void CalculateRotationMatrix(REAL RotAngle,
+ REAL Matrix[3][3])
+;
+REAL FitSingleStructure(int strucnum,
+ BOOL single_iteration)
+;
+void FitStructuresInOrder(REAL sortorder[][2])
+;
+void NoFitStructures(void)
+;
+BOOL DoNoFitting(int strucnum)
+;
diff --git a/src/main.c b/src/main.c
new file mode 100644
index 0000000..396de53
--- /dev/null
+++ b/src/main.c
@@ -0,0 +1,6159 @@
+/*************************************************************************
+
+ Program: ProFit
+ File: main.c
+
+ Version: V3.1
+ Date: 31.03.09
+ Function: Protein Fitting program. Main routines.
+
+ Copyright: SciTech Software / UCL 1992-2009
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain.
+
+ It may not be copied or made available to third parties, but may be
+ freely used by non-profit-making organisations who have obtained it
+ directly from the author or by FTP.
+
+ You are requested to send EMail to the author to say that you are
+ using this code so that you may be informed of future updates.
+
+ The code may not be made available on other FTP sites without express
+ permission from the author.
+
+ The code may be modified as required, but any modifications must be
+ documented so that the person responsible can be identified. If
+ someone else breaks this code, the author doesn't want to be blamed
+ for code that does not work! You may not distribute any
+ modifications, but are encouraged to send them to the author so
+ that they may be incorporated into future versions of the code.
+
+ Such modifications become the property of Dr. Andrew C.R. Martin and
+ SciTech Software though their origin will be acknowledged.
+
+ The code may not be sold commercially or used for commercial purposes
+ without prior permission from the author.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V0.1 25.09.92 Original
+ V0.2 02.10.92 Added CURSES support
+ V0.3 07.10.92 Added Amiga windows and paging support
+ V0.4 09.10.92 Added N&W alignment support & fixed bug in multi-zones
+ V0.5 08.10.93 Various tidying for Unix & chaned for booklib
+ V0.6 05.01.94 Reads HELPDIR and DATADIR environment variables under
+ Unix
+ V0.7 24.11.94 Uses ReadPDBAtoms()
+ V0.8 17.07.95 Stripped out the windowing support (never used!). Will
+ add a Tcl/Tk interface later!
+ Multiple chains now work correctly.
+ V1.0 18.07.95 Insert codes now work.
+ First official release (at last!).
+ V1.1 20.07.95 Added WEIGHT command support and translation vector
+ output from MATRIX command
+ V1.1a 22.07.95 Stop crash on writing when not fitted
+ V1.2 25.07.95 Added GAPPEN command
+ Added chain label printing in Status
+ Added optional filename parameter to RESIDUE command
+ V1.3 31.07.95 Fixed bug in fitting.c where end of zone=end of chain
+ other than the last chain
+ V1.4 14.08.95 Fixed bug in fitting.c with RESIDUE command was not
+ printing RMS for last residue
+ V1.5 21.08.95 Fixed bug in NWAlign.c (when last zone not at end of
+ chain) and Bioplib library bug in align()
+ V1.5a 02.10.95 Added printing of CofG information with MATRIX command
+ V1.5b 15.11.95 NWAlign.c: Prints normalised score
+ V1.6 20.11.95 Added READALIGNMENT command and code
+ V1.6a 21.11.95 Fixed a couple of warnings under gcc
+ V1.6b 22.11.95 Modified code in SetNWZones() such that deletions at
+ the same position in both sequences don't cause a
+ problem.
+ V1.6c 13.12.95 Further fix to double deletions. Added info when
+ zones mismatch in fitting.c
+ V1.6d 24.01.96 Fixed bug in status command when printing atom names
+ containing spaces.
+ V1.6e 31.05.96 Added BVAL command and code
+ V1.6f 13.06.96 Added BWEIGHT command and code
+ V1.6g 18.06.96 Replaced MODE_* with ZONE_MODE_* and use FindZonePDB()
+ from bioplib rather than own version
+ V1.7 23.07.96 Supports atom wildcards. Some comment tidying.
+ V1.7b 11.11.96 Added REF keyword to end of BVAL command allowing
+ only the reference structure to be considered
+ V1.7c 18.11.96 Added IGNOREMISSING option
+ V1.7d 20.12.96 Added NFITTED command
+ V1.7e 27.06.97 Allows WRITE and RESIDUE to output to a pipe
+ V1.7f 03.07.97 Added break into CreateFitArrays() to fix core dump
+ on bad multiple-occupancy PDB files
+ V1.7g 06.05.98 Rewrite of NWAlign/SetNWZones()
+ V1.8 07.05.98 Skipped for release
+ V2.0 01.03.01 Now supports multiple structure fitting and iterative
+ zone updating
+ V2.1 28.03.01 Parameter for ITERATE and added CENTRE command
+ V2.2 20.02.01 Fixed some Bioplib problems related to raw atom names
+ and multiple occupancies
+ V2.3 01.12.04 Skipped for release
+ V2.4 03.06.05 Some significant changes in Bioplib for better handling
+ of raw atom names with multiple occupancies
+ V2.5 07.06.05 Another bug in bioplib
+ V2.5.1 10.06.05 Bug in bioplib PDB2Seq() with CA-only chains
+ V2.5.2 14.10.05 Modified doReadPDB() to cope with corrupt partial
+ occupancies like 1zeh/B13
+ V2.5.2 06.07.06 Fixed raw atom name when ILE-CD changed to ILE-CD1
+ V2.5.4 29.06.07 Bioplib change for Mac compilation
+ V-.-.- 14.03.08 Added DELZONE command.
+ V-.-.- 17.03.08 Added DELRZONE command.
+ V-.-.- 19.03.08 Added SETCENTRE/SETCENTER command.
+ V-.-.- 27.03.08 Added SCRIPT command.
+ V-.-.- 02.04.08 Added HEADER command.
+ V-.-.- 03.04.08 Added PAIRDIST command.
+ V-.-.- 07.04.08 Added DISTCUTOFF command.
+ V-.-.- 22.04.08 Added handling of lowercase chain and inserts.
+ V-.-.- 29.04.08 Added calls to ReadWholePDB() for reading PDB structures.
+ V-.-.- 01.05.08 Added support for reading low occupancy atoms.
+ V-.-.- 02.05.08 Removed InitPDBCoordRecord() and IsPDBCoordRecord().
+ Removed ReadPDBHeader(), WritePDBHeader() and
+ WritePDBFooter().
+ V2.5.9 22.05.08 Cleaned-up comments. Added version number. Removed unused
+ functions, ShowOverlap() and ShowAllOverlap().
+ V2.6 12.06.08 Modified ReadStructure() to use updated read functions in
+ bioplib.
+ V2.6 11.08.08 ShowRMS() turned-off for SetRMSAtoms() and SetRMSZone()
+ when in quiet mode. Added MULTIREF command.
+ V2.6 16.10.08 Added ALLVSALL, SETREF, ORDERFIT and TRIMZONES commands.
+ V2.6 20.10.08 Cleaned-up code.
+ V2.6 23.10.08 Added gWtAverage flag to allow old use of old
+ weighting system.
+ V2.6 27.10.08 Added NOFIT command.
+ V3.0 06.11.08 Fixed bug with SETREF.
+ V3.0 07.11.08 Added option for STATUS to output to a file.
+ V3.0 14.11.08 Added GNU Readline Library support.
+ V3.0 11.02.09 Changed DELZONE and DELRZONE commands to use ALL.
+ V3.0 16.02.09 Removed Invert3x3Matrix() and Transpose3x3Matrix().
+ V3.0 13.03.09 Fixed bug with ATOMS and RATOMS commands. Set
+ ReadStructure() to clear fitted coordinates when
+ loading a new mobile structure.
+ V3.1 31.03.09 Updated version number.
+
+*************************************************************************/
+#define MAIN
+#include "ProFit.h"
+
+
+/************************************************************************/
+/*>void logo(void)
+ ---------------
+ Displays the ProFit logo.
+
+ 25.09.92 Original
+ 01.10.92 Added version
+ 10.10.93 Version 0.5
+ 05.01.94 Version 0.6
+ 24.11.94 Version 0.7
+ 17.07.95 Version 0.8; removed screen() version
+ 18.07.95 Version 1.0
+ 20.07.95 Version 1.1
+ 22.07.95 Version 1.2
+ 31.07.95 Version 1.3
+ 14.08.95 Version 1.4
+ 21.08.95 Version 1.5
+ 20.11.95 Version 1.6
+ 23.07.96 Version 1.7
+ 07.05.98 Version 1.8
+ 15.02.01 Version 2.0
+ 20.03.01 Version 2.1
+ 20.12.01 Version 2.2
+ 01.12.04 Version 2.3
+ 03.06.05 Version 2.4
+ 07.06.05 Version 2.5
+ 10.06.05 Version 2.5.1
+ 14.10.05 Version 2.5.2
+ 06.07.06 Version 2.5.3
+ 29.06.07 Version 2.5.4
+ 28.03.08 Version Working_Copy By: CTP
+ 22.05.08 Version 2.6
+ 04.06.08 Version Working Copy
+ 03.11.08 Version 3.0
+ 31.03.09 Version 3.1
+*/
+void logo(void)
+{
+ printf("\n PPPPP FFFFFF ii tt\n");
+ printf(" PP PP FF tt\n");
+ printf(" PP PP rrrrr oooo FF ii ttttt\n");
+ printf(" PPPPP rr rr oo oo FFFF ii tt\n");
+ printf(" PP rr oo oo FF ii tt\n");
+ printf(" PP rr oo oo FF ii tt\n");
+ printf(" PP rr oooo FF ii ttt\n\n");
+ printf(" Protein Least Squares Fitting\n\n");
+ printf(" Version 3.1\n\n");
+ /*printf(" 3.0\n\n");*/
+ printf(" Copyright (c) Dr. Andrew C.R. Martin, SciTech Software \
+1992-2009\n");
+ printf(" Copyright (c) Dr. Craig T. Porter, UCL \
+2008-2009\n\n");
+}
+
+
+/************************************************************************/
+/*>int main(int argc, char **argv)
+ --------------------------------
+ The main program. Checks parameters, sets some defaults, reads files if
+ specified, initialises command parser and starts the command loop.
+
+ 25.09.92 Original
+ 29.09.92 Added gCurrentMode, Changed to call ReadStructure()
+ 02.10.92 Calls Cleanup() as it should have done!
+ 05.10.92 Added AMIGA_WINDOWS support
+ 06.10.92 Changed argc check for icon start
+ 08.10.93 Changed version number
+ Chaned CURSES calls for book library
+ 17.07.95 Removed windowing support
+ 19.07.95 Handles -h flag
+ 20.07.95 Initialise gDoWeights
+ 13.06.96 Changed initialisation of gDoWeights since it now has 3 states
+ 18.06.96 Replaced MODE_* with ZONE_MODE_*
+ 01.03.01 Added -x handling
+ 19.03.08 Initialise gCZoneList[] By: CTP
+ 27.03.08 Changed call to DoCommandLoop()
+ 02.04.08 Initialise gRefHeader, etc... for PDB headers and footers.
+ Added call to InitPDBCoordRecord().
+ 15.04.08 Added option to run script from command line.
+ 02.05.08 Removed call to InitPDBCoordRecord(). Removed gRefHeader and
+ gRefFooter
+ 04.06.08 Added gMatchSymAtoms and call to SetSymmetricalAtomPAirs()
+ 21.08.08 Added gMultiVsRef and gRotateRefit flags and calculation of 42
+ degree rotation matrix for refitting structure to avoid local
+ minimum.
+ 07.11.08 Added gMultiRef.
+ 04.02.09 gRotateRefit replaced by ROTATE_REFIT #define
+*/
+int main(int argc, char **argv)
+{
+ BOOL CmdError = FALSE;
+ BOOL xmasFormat = FALSE;
+ BOOL runscript = FALSE;
+ char filename[MAXSTRLEN];
+ int i;
+
+ logo();
+
+ argc--;
+ argv++;
+
+ /* See if a flag has been given */
+ while(argc>0 && argv[0][0] == '-')
+ {
+ switch(argv[0][1])
+ {
+ case 'h':
+ gHetAtoms = TRUE;
+ break;
+ case 'x':
+#ifdef USE_XMAS
+ xmasFormat = TRUE;
+#else
+ fprintf(stderr,"Error: XMAS support not compiled in this \
+version\n");
+ return(1);
+#endif
+ break;
+ case 'f':
+ if(runscript)
+ {
+ CmdError = TRUE;
+ break;
+ }
+ runscript = TRUE;
+ filename[0] = '\0';
+ strcpy(filename,argv[1]);
+ argc--;
+ argv++;
+ break;
+ default:
+ CmdError = TRUE;
+ break;
+ }
+ argc--;
+ argv++;
+ }
+
+
+ if(CmdError || (argc>0 && argc!=2))
+ {
+ printf("\n\nSyntax: ProFit [-h] [-f <scriptfile.txt>]");
+ printf(" [<reference.pdb> <mobile.pdb>]\n");
+ printf(" -h Include HETATM records when reading PDB \
+files\n");
+ printf(" -f Run script\n\n");
+
+ exit(1);
+ }
+
+ /* Initialise various things */
+ strcpy(gFitAtoms[0],"*");
+ strcpy(gRMSAtoms[0],"*");
+ gRefFilename[0] = '\0';
+ gCurrentMode = ZONE_MODE_RESNUM;
+ gUserRMSAtoms = FALSE;
+ gUserRMSZone = FALSE;
+ gUserFitZone = FALSE;
+ gFitted = FALSE;
+ gDoWeights = WEIGHT_NONE;
+ gReadHeader = FALSE;
+ gMatchSymAtoms = FALSE;
+ gMultiVsRef = FALSE;
+ gTwistAngle = 42.0;
+ CalculateRotationMatrix(gTwistAngle, gRotMatTwist);
+
+ gWtAverage = TRUE;
+ gMultiRef = 0;
+
+ for(i=0; i<MAXSTRUC; i++)
+ {
+ gMobPDB[i] = NULL;
+ gMobWPDB[i] = NULL;
+ gFitPDB[i] = NULL;
+ gMobCoor[i] = NULL;
+ gMobFilename[i][0] = '\0';
+ gZoneList[i] = NULL;
+ gRZoneList[i] = NULL;
+ gCZoneList[i] = NULL;
+ }
+ gLimit[0] = gLimit[1] = (-1);
+
+
+ if(argc==2)
+ {
+ /* Filenames have been specified, so open and read files */
+ ReadStructure(STRUC_REFERENCE, argv[0], 0, xmasFormat);
+ ReadStructure(STRUC_MOBILE, argv[1], 0, xmasFormat);
+ gMultiCount = 1;
+ }
+
+
+ InitParser();
+ SetSymmetricalAtomPAirs();
+
+ if(runscript)
+ RunScript(filename);
+ else
+ DoCommandLoop(stdin);
+
+ Cleanup();
+
+ return(0);
+}
+
+
+/************************************************************************/
+/*>void SetRMSAtoms(char *command)
+ -------------------------------
+ Responds to a command to set the atoms for calculation of RMS
+
+ 28.09.92 Original.
+ 29.09.92 Added gUserRMSAtoms
+ 30.09.92 Added NOT option; pad gRMSAtoms[]
+ 17.07.95 Replaced calls to screen() with printf()
+ 19.07.95 Added parameter to ShowRMS()
+ 25.07.95 Added another parameter to ShowRMS()
+ 23.07.96 Changed to call PADMINTERM() rather than padterm()
+ Checks for legal wildcard specifications
+ Improved out-of-bounds error checking
+ 03.04.08 Added parameter to ShowRMS() By: CTP
+ 11.08.08 ShowRMS() turned-off when in quiet mode.
+ 13.03.09 Added temporary fix for atom type recognition.
+
+*/
+void SetRMSAtoms(char *command)
+{
+ int comptr,
+ atmnum,
+ atmpos,
+ comstart,
+ strucnum;
+ char *cmnd;
+
+ /* Put to upper case */
+ UPPER(command);
+
+ /* Assume this is not a NOT selection */
+ gNOTRMSAtoms = FALSE;
+ cmnd = command;
+
+ /* Set NOT flag if required and step over the symbol */
+ if(command[0] == '~' || command[0] == '^')
+ {
+ gNOTRMSAtoms = TRUE;
+ cmnd++;
+ }
+
+ /* Blank all rmsatoms */
+ for(atmnum=0; atmnum<NUMTYPES; atmnum++)
+ for(atmpos=0; atmpos<8; atmpos++)
+ gRMSAtoms[atmnum][atmpos] = '\0';
+
+ atmnum=0;
+ atmpos=0;
+ comstart=0;
+ for(comptr=0;comptr<strlen(cmnd);comptr++)
+ {
+ if(cmnd[comptr]==',')
+ {
+ comstart=comptr+1;
+ gRMSAtoms[atmnum][atmpos] = '\0'; /* Terminate string */
+ PADMINTERM(gRMSAtoms[atmnum],4);
+ if(++atmnum >= NUMTYPES)
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> Too many atoms in specification. \
+Only %d used\n",NUMTYPES);
+ }
+ break;
+ }
+ atmpos=0;
+ }
+ else
+ {
+ if(atmpos >= MAXATSPEC)
+ {
+ TERMAT(cmnd+comptr, ',');
+ if(!gQuiet)
+ {
+ printf(" Warning==> Atom name specification too long \
+(%s)\n", cmnd+comstart);
+ printf(" Using all atoms.\n");
+ }
+ gRMSAtoms[0][0] = '*';
+ gRMSAtoms[0][1] = '\0';
+ break;
+ }
+ gRMSAtoms[atmnum][atmpos++] = cmnd[comptr];
+ }
+ }
+
+ /* Terminate last one */
+ gRMSAtoms[atmnum][atmpos] = '\0';
+ PADMINTERM(gRMSAtoms[atmnum],4);
+
+ /* See if a * was specified not in first position; move it if so.
+ Also check for legal atom wildcard specifications
+ */
+ for(atmnum=0; atmnum<NUMTYPES; atmnum++)
+ {
+ if(!LegalAtomSpec(gRMSAtoms[atmnum]))
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> Illegal atom specification (%s). Using \
+all atoms.\n",
+ gRMSAtoms[atmnum]);
+ }
+ gRMSAtoms[0][0] = '*';
+ gRMSAtoms[0][1] = '\0';
+ break;
+ }
+
+ if(gRMSAtoms[atmnum][0] == '*')
+ {
+ if(gNOTRMSAtoms)
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> NOT option ignored.\n");
+ }
+ gNOTRMSAtoms = FALSE;
+ }
+ gRMSAtoms[0][0] = '*';
+ gRMSAtoms[0][1] = '\0';
+ break;
+ }
+ }
+
+ gUserRMSAtoms = TRUE;
+
+
+ /* Bugfix - Atom Recognition By: CTP */
+ /* Temporary fix for atom type recognition. Removed trailing spaces
+ for atom type strings. Need to check with ACRM why strings were
+ padded. */
+ if(TRUE)
+ {
+ int i = 0;
+ for(i=0;i<NUMTYPES;i++)
+ {
+ if(gRMSAtoms[i][0] == '\0') break;
+ KILLTRAILSPACES(gRMSAtoms[i]);
+ }
+ }
+ /* End of Bugfix */
+
+
+ if(!gQuiet)
+ {
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ ShowRMS(FALSE,NULL,strucnum,FALSE,FALSE);
+ }
+ }
+
+ return;
+}
+
+
+/************************************************************************/
+/*>void SetRMSZone(char *command)
+ ------------------------------
+ Responds to a command to set the zone for RMS calculation
+
+ 29.09.92 Original.
+ 17.07.95 Replaced calls to screen() with printf()
+ 18.07.95 Added initialisation of inserts in zones
+ 19.07.95 Added parameter to ShowRMS()
+ * on its own equivalent to CLEAR
+ 25.07.95 Added another parameter to ShowRMS()
+ 18.06.96 Replaced MODE_* with ZONE_MODE_*
+ 01.02.01 Added multiple structures
+ 28.02.01 Error message from ParseZone() when trying to specify multiple
+ zones for multi-structure now here instead of in ParseZone()
+ 03.04.08 Added parameter to ShowRMS() By: CTP
+ 11.04.08 Added warning for overlapping zones.
+ 22.04.08 Added handling of lowercase chain and inserts.
+ 11.08.08 ShowRMS() turned-off when in quiet mode.
+ 29.10.08 Fixed bug that gave segmentation fault with CLEAR.
+*/
+void SetRMSZone(char *command)
+{
+ int start1, stop1,
+ start2, stop2,
+ SeqZone, strucnum;
+ char chain1, chain2,
+ startinsert1, stopinsert1,
+ startinsert2, stopinsert2;
+ ZONE *z;
+ int warned = 0;
+
+
+ /* See if this is clearing the zones */
+ if(!upstrncmp(command,"CLEAR",5) || !strcmp(command,"*"))
+ {
+ gUserRMSZone = FALSE;
+
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ if(gRZoneList[strucnum]!=NULL)
+ {
+ FREELIST(gRZoneList[strucnum],ZONE);
+ gRZoneList[strucnum] = NULL;
+ }
+
+ if(!gQuiet)
+ {
+ ShowRMS(FALSE,NULL,strucnum,FALSE,FALSE);
+ }
+ }
+ return;
+ }
+
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ SeqZone = ParseZone(command, &start1, &stop1, &chain1,
+ &startinsert1, &stopinsert1,
+ &start2, &stop2, &chain2,
+ &startinsert2, &stopinsert2,
+ strucnum);
+
+ if((SeqZone == (-2)) && (!warned))
+ {
+ printf(" Error==> You cannot specify zones for each \
+structure when performing\n");
+ printf(" multiple structure fitting.\n");
+ warned = 1;
+ }
+
+ if(SeqZone > -1)
+ {
+ /* If the user has not already specified an RMS zone, blank
+ the current list.
+ */
+ if(!gUserRMSZone)
+ {
+ if(gRZoneList[strucnum]!=NULL)
+ {
+ FREELIST(gRZoneList[strucnum],ZONE);
+ gRZoneList[strucnum] = NULL;
+ }
+ }
+
+ /* Allocate an entry in zone list */
+ if(gRZoneList[strucnum])
+ {
+ /* Move to end of zone list */
+ z=gRZoneList[strucnum];
+ LAST(z);
+ ALLOCNEXT(z,ZONE);
+ }
+ else
+ {
+ INIT(gRZoneList[strucnum],ZONE);
+ z = gRZoneList[strucnum];
+ }
+
+ if(z==NULL)
+ {
+ printf(" Error==> No memory for zone!\n");
+ }
+ else
+ {
+ /* Add this zone to the zone list */
+ z->chain1 = chain1;
+ z->start1 = start1;
+ z->startinsert1 = startinsert1;
+ z->stop1 = stop1;
+ z->stopinsert1 = stopinsert1;
+ z->chain2 = chain2;
+ z->start2 = start2;
+ z->startinsert2 = startinsert2;
+ z->stop2 = stop2;
+ z->stopinsert2 = stopinsert2;
+ z->mode = SeqZone?ZONE_MODE_SEQUENTIAL:gCurrentMode;
+
+ /* CTP: Check for overlap */
+ if(CheckOverlap(z,gRZoneList[strucnum],strucnum) > 1)
+ printf(" Warning: New zone overlaps existing zone.\n");
+ else if(CheckOverlap(z,gRZoneList[strucnum],strucnum) == -1)
+ printf(" Error: Failed to find new zone.\n");
+ }
+
+ gUserRMSZone = TRUE;
+ if(!gQuiet)
+ {
+ ShowRMS(FALSE,NULL,strucnum,FALSE,FALSE);
+ }
+ }
+ }
+
+ return;
+}
+
+
+/************************************************************************/
+/*>void Cleanup(void)
+ ------------------
+ Clean up screen and any malloc'd memory
+
+ 28.09.92 Original
+ 29.09.92 Added sequences
+ 01.10.92 Freed strparam and *Coor memory.
+ 05.10.92 Added libraries for AMIGA_WINDOWS
+ 08.10.93 Changed CURSES calls for book library
+ 17.07.95 Removed CURSES clean up and Amiga Windows cleanup
+ 20.07.95 Frees gWeights
+ 12.01.01 gMobPDB[] now an array
+ 01.02.01 Various other arrays now freed for multi-structure support
+ 28.03.08 Added gCZoneList[] By: CTP
+ 02.04.08 Added Header and Footer storage.
+ 02.04.08 Removed Header and Footer storage.
+*/
+void Cleanup(void)
+{
+ int i;
+
+ for(i=0; i<MAXSTRUC; i++)
+ {
+ if(gMobPDB[i]) FREELIST(gMobPDB[i], PDB);
+ gMobPDB[i] = NULL;
+ }
+
+ if(gRefPDB) FREELIST(gRefPDB, PDB);
+ gRefPDB = NULL;
+
+ for(i=0; i<MAXSTRUC; i++)
+ {
+ if(gFitPDB[i]) FREELIST(gFitPDB[i], PDB);
+ gFitPDB[i] = NULL;
+
+ if(gMobCoor[i]) free(gMobCoor[i]);
+ gMobCoor[i] = NULL;
+
+ if(gMobSeq[i]) free(gMobSeq[i]);
+ gMobSeq[i] = NULL;
+
+ if(gZoneList[i]) FREELIST(gZoneList[i], ZONE);
+ gZoneList[i] = NULL;
+
+ if(gRZoneList[i]) FREELIST(gRZoneList[i], ZONE);
+ gRZoneList[i] = NULL;
+
+ if(gCZoneList[i]) FREELIST(gCZoneList[i], ZONE);
+ gCZoneList[i] = NULL;
+ }
+
+ if(gRefSeq) free(gRefSeq);
+ gRefSeq = NULL;
+
+ if(gRefCoor) free(gRefCoor);
+ gRefCoor = NULL;
+
+ if(gWeights) free(gWeights);
+ gWeights = NULL;
+
+ for(i=0; i<MAXSTRPARAM; i++)
+ free(gStrParam[i]);
+
+ Help("Dummy","CLOSE");
+}
+
+
+/************************************************************************/
+/*>void Die(char *message)
+ -----------------------
+ Program death. Writes message and cleans up before exit.
+
+ 25.09.92 Original
+ 17.07.95 Replaced calls to screen() with puts()
+*/
+void Die(char *message)
+{
+ puts(message);
+
+ Cleanup();
+
+ exit(0);
+}
+
+
+/************************************************************************/
+/*>BOOL ReadStructure(int structure, char *filename, int strucnum,
+ int xmasFormat)
+ ---------------------------------------------------------------
+ Reads one of the 2 PDB structures.
+
+ 28.09.92 Original
+ 30.09.92 Added coordinate array allocation & check for inserts
+ 08.10.93 Modified for new ReadPDB()
+ 24.11.94 Changed to call ReadPDBAtoms() rather than ReadPDB()
+ 17.07.95 Replaced calls to screen() with printf()
+ 18.07.95 Uses fopen() rather than OpenRead()
+ Uses fclose() rather than CloseFile()
+ 19.07.95 Selects whether to read HETATM records depending on
+ gHetAtoms
+ 24.01.96 Improved error messages when no atoms read since ReadPDB()
+ can distinguish between no memory and no atoms.
+ 03.07.97 Warning message if finds multiple occupancies
+ Changed to CD1 rather than CD for ILE
+ 12.01.01 gMobPDB[] now an array
+ 01.02.01 Added strucnum param; returns BOOL
+ 20.02.01 Frees all previously loaded structures if we had a multiple
+ set loaded previously.
+ 01.03.01 Added xmasFormat parameter
+ 06.07.08 Correction of CD -> CD1 had an extra space in the raw atom
+ name (i.e. it was " CD1 " rather than " CD1") By: CTP
+ 02.04.08 CTP added call to ReadPDBHeader()
+ 29.04.08 CTP added calls to ReadWholePDB() for reading pdb structures.
+ Removed call to ReadPDBHeader().
+ 01.05.08 CTP added support for reading low occupancy atoms.
+ 02.05.08 CTP header and trailer now stored within wholepdb datatype.
+ 12.06.08 Changed call from doReadWholePDB() to ReadWholePDB() and
+ ReadWholePDBAtom().
+ Added calls to ReadPDBOccRank() and ReadPDBAtomsOccRank()
+ for reading partial occupancies.
+ 13.03.09 Set function to clear fitted coordinates when loading a new
+ mobile structure. Required for the NOFIT command which
+ (unlike the FIT command) does not update any existing fitted
+ coordinates.
+*/
+BOOL ReadStructure(int structure,
+ char *filename,
+ int strucnum,
+ int xmasFormat)
+{
+ file *fp;
+ int natoms;
+ PDB *p;
+ int i;
+ static int sLastStrucNum = (-1);
+
+ gPDBPartialOcc = FALSE;
+
+ /* Open the file */
+ if((fp = fopen(filename,"r"))==NULL)
+ {
+ printf(" Unable to open file %s\n",filename);
+ return(FALSE);
+ }
+
+ /* Handle appropriate pdb linked list and give message */
+ if(structure==STRUC_REFERENCE)
+ {
+ printf(" Reading reference structure...\n");
+ strcpy(gRefFilename,filename);
+
+ /* CTP: If there is something in the current PDB list, free it */
+ if(gRefWPDB)
+ {
+ /* Free current WHOLEPDB. */
+ FreeWholePDB(gRefWPDB);
+ gRefWPDB = NULL;
+ gRefPDB = NULL;
+ natoms = 0;
+ }
+ /* CTP: Free current PDB (Needed if using XMAS format) */
+ if(gRefPDB)
+ {
+ FREELIST(gRefPDB,PDB);
+ gRefPDB = NULL;
+ gRefWPDB = NULL;
+ natoms = 0;
+ }
+
+ /* Read the structure */
+ if(xmasFormat)
+ {
+ if(gHetAtoms)
+ gRefPDB = ReadXMAS(fp, &natoms);
+ else
+ gRefPDB = ReadXMASAtoms(fp, &natoms);
+ }
+ else
+ {
+ if(gHetAtoms)
+ gRefWPDB = ReadWholePDB(fp);
+ else
+ gRefWPDB = ReadWholePDBAtoms(fp);
+
+ if(gOccRank > 1)
+ {
+ FREELIST(gRefWPDB->pdb, PDB);
+ gRefWPDB->pdb = NULL;
+ rewind(fp);
+
+ if(gHetAtoms)
+ gRefWPDB->pdb = ReadPDBOccRank(fp,&natoms,gOccRank);
+ else
+ gRefWPDB->pdb = ReadPDBAtomsOccRank(fp,&natoms,gOccRank);
+ }
+
+ gRefWPDB->pdb = RemoveAlternates(gRefWPDB->pdb);
+ natoms = gRefWPDB->natoms;
+ gRefPDB = gRefWPDB->pdb;
+ }
+
+ if(gRefPDB==NULL)
+ {
+ if(!natoms)
+ {
+ printf(" Error==> No atoms read from reference PDB \
+file!\n");
+ }
+ else
+ {
+ printf(" Error==> No memory to read reference PDB file!\n");
+ }
+
+ gRefFilename[0] = '\0';
+ fclose(fp);
+ return(FALSE);
+ }
+
+ /* 03.07.97 Warning about multiple-occupancies */
+ if(gPDBPartialOcc && !gQuiet)
+ {
+ printf(" Warning==> Reference set contains multiple occupancy \
+atoms.\n");
+ if(gOccRank == 1)
+ printf(" Only the first atom will be \
+considered.\n");
+ }
+
+ /* Allocate coordinate array */
+ if(gRefCoor) free(gRefCoor);
+ if((gRefCoor = (COOR *)malloc(natoms * sizeof(COOR))) == NULL)
+ printf(" Error==> Unable to allocate reference coordinate \
+memory!\n");
+
+ /* Convert to sequence */
+ if(gRefSeq != NULL) free(gRefSeq);
+ if((gRefSeq = PDB2Seq(gRefPDB))==NULL)
+ printf(" Error==> Unable to read sequence for reference \
+structure!\n");
+
+ /* Check for inserts */
+ for(p=gRefPDB; p!=NULL; NEXT(p))
+ {
+ if(p->insert[0] != ' ')
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> Reference protein contains \
+insertions.\n");
+ }
+ break;
+ }
+ }
+
+ /* Fix ILE CD to CD1 (03.07.97 was other way round) */
+ for(p=gRefPDB; p!=NULL; NEXT(p))
+ {
+ /* 28.02.01 added ->atnam_raw */
+ if(!strncmp(p->resnam,"ILE ",4) && !strncmp(p->atnam,"CD ",4))
+ {
+ strcpy(p->atnam,"CD1 ");
+ strcpy(p->atnam_raw," CD1"); /* 06.07.06 Removed extra space*/
+ }
+ }
+ }
+ else if(structure==STRUC_MOBILE)
+ {
+ printf(" Reading mobile structure...\n");
+ strcpy(gMobFilename[strucnum],filename);
+
+ /* If there is something in the current list, free it */
+ for(i=strucnum; i<=sLastStrucNum; i++)
+ {
+ if(gMobWPDB[i])
+ {
+ FreeWholePDB(gMobWPDB[i]);
+ gMobWPDB[i] = NULL;
+ gMobPDB[i] = NULL;
+ natoms = 0;
+ }
+ if(gMobPDB[i])
+ {
+ FREELIST(gMobPDB[i],PDB);
+ gMobPDB[i] = NULL;
+ gMobWPDB[i] = NULL;
+ natoms = 0;
+ }
+ }
+ sLastStrucNum = strucnum;
+
+
+ /* Read the structure */
+ if(xmasFormat)
+ {
+ if(gHetAtoms)
+ gMobPDB[strucnum] = ReadXMAS(fp, &natoms);
+ else
+ gMobPDB[strucnum] = ReadXMASAtoms(fp, &natoms);
+ }
+ else
+ {
+ if(gHetAtoms)
+ gMobWPDB[strucnum] = ReadWholePDB(fp);
+ else
+ gMobWPDB[strucnum] = ReadWholePDBAtoms(fp);
+
+ if(gOccRank > 1)
+ {
+ FREELIST(gMobWPDB[strucnum]->pdb, PDB);
+ gMobWPDB[strucnum]->pdb = NULL;
+ rewind(fp);
+
+ if(gHetAtoms)
+ gMobWPDB[strucnum]->pdb =
+ ReadPDBOccRank(fp,&natoms,gOccRank);
+ else
+ gMobWPDB[strucnum]->pdb =
+ ReadPDBAtomsOccRank(fp,&natoms,gOccRank);
+ }
+
+ gMobWPDB[strucnum]->pdb =
+ RemoveAlternates(gMobWPDB[strucnum]->pdb);
+ natoms = gMobWPDB[strucnum]->natoms;
+ gMobPDB[strucnum] = gMobWPDB[strucnum]->pdb;
+ }
+
+
+ if(gMobPDB[strucnum]==NULL)
+ {
+ if(!natoms)
+ {
+ printf(" Error==> No atoms read from mobile PDB file!\n");
+ }
+ else
+ {
+ printf(" Error==> No memory to read mobile PDB file!\n");
+ }
+
+ gMobFilename[strucnum][0] = '\0';
+ fclose(fp);
+ return(FALSE);
+ }
+
+ /* 03.07.97 Warning about multiple-occupancies */
+ if(gPDBPartialOcc && !gQuiet)
+ {
+ printf(" Warning==> Mobile set contains multiple occupancy \
+atoms.\n");
+ if(gOccRank == 1)
+ printf(" Only the first atom will be \
+considered.\n");
+ }
+
+ /* Allocate coordinate array */
+ if(gMobCoor[strucnum]) free(gMobCoor[strucnum]);
+ if((gMobCoor[strucnum] = (COOR *)malloc(natoms * sizeof(COOR)))
+ == NULL)
+ printf(" Error==> Unable to allocate mobile coordinate \
+memory!\n");
+
+ /* Convert to sequence */
+ if(gMobSeq[strucnum] != NULL) free(gMobSeq[strucnum]);
+ if((gMobSeq[strucnum] = PDB2Seq(gMobPDB[strucnum]))==NULL)
+ printf(" Error==> Unable to read sequence for reference \
+structure!\n");
+
+ /* Check for inserts */
+ for(p=gMobPDB[strucnum]; p!=NULL; NEXT(p))
+ {
+ if(p->insert[0] != ' ')
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> Mobile protein contains \
+insertions.\n");
+ }
+ break;
+ }
+ }
+
+ /* Fix ILE CD to CD1 (03.07.97 was other way round) */
+ for(p=gMobPDB[strucnum]; p!=NULL; NEXT(p))
+ {
+ /* 28.02.01 added ->atnam_raw */
+ if(!strncmp(p->resnam,"ILE ",4) && !strncmp(p->atnam,"CD ",4))
+ {
+ strcpy(p->atnam,"CD1 ");
+ strcpy(p->atnam_raw," CD1"); /* 06.07.06 Removed extra space*/
+ }
+ }
+
+ /* Free Fitted Structure */
+ if(gFitPDB[strucnum] != NULL) FREELIST(gFitPDB[strucnum], PDB);
+ gFitPDB[strucnum] = NULL;
+
+ }
+ else
+ {
+ printf(" ReadStructure(): Internal Error!\n");
+ return(FALSE);
+ }
+
+ gUserRMSAtoms = FALSE;
+ gUserRMSZone = FALSE;
+ gFitted = FALSE;
+
+ /* and close file */
+ fclose(fp);
+
+ return(TRUE);
+}
+
+
+/************************************************************************/
+/*>int InitParser(void)
+ --------------------
+ Initialise the command parser.
+
+ 25.09.92 Original
+ 09.10.92 Changed ALIGN to 0 parameters
+ 19.07.95 Added RESIDUE, HETATOMS & NOHETATOMS
+ 20.07.95 Added WEIGHT/NOWEIGHT
+ 22.07.95 Added GAPPEN
+ 20.11.95 Added READALIGNMENT
+ 31.05.96 Added BVALUE
+ 13.06.96 Added BWEIGHT
+ 11.11.96 BVALUE command now takes 1 or 2 params
+ 18.11.96 Added (NO)IGNOREMISSING
+ 20.12.96 Added NFITTED
+ 12.01.01 Added ITERATE
+ 01.02.01 Added MULTI, QUIET and MWRITE
+ 20.02.01 Added LIMIT
+ 01.03.01 Added optional 'xmas' parameter to REF, MOB, MULTI if
+ XMAS support is compiled in
+ 20.03.01 Added CENTER/CENTRE commands
+ 28.03.01 Added optional 'reference' parameter to WRITE
+ 14.03.08 Added DELZONE command. By: CTP
+ 17.03.08 Added DELRZONE command.
+ 19.03.08 Added SETCENTRE/SETCENTER command.
+ 27.03.08 Added SCRIPT command.
+ 02.04.08 Added HEADER command.
+ 03.04.08 Added PAIRDIST command.
+ 07.04.08 Added DISTCUTOFF command.
+ 15.04.08 Added RETURN command
+ 04.06.08 Added OCCRANK, BZONE and SYMMATOMS commands.
+ Removed RETURN command.
+ 16.07.08 GAPPEN command now takes 1 or 2 params
+ 18.07.08 Altered ALIGN command to take string parameters.
+ Added PRINTALIGN command.
+ 07.08.08 Added SETMULTI command.
+ 16.10.08 Added ALLVSALL, SETREF, ORDERFIT and TRIMZONES commands.
+ 23.10.08 Added WTAVERAGE
+ 27.10.08 Added NOFIT.
+ 07.11.08 Changed STATUS to take 0 or 1 string parameters.
+
+*/
+int InitParser(void)
+{
+ int i;
+
+ /* Initialise returned string array */
+ for(i=0; i<MAXSTRPARAM; i++)
+ gStrParam[i] = (char *)malloc(MAXSTRLEN * sizeof(char));
+
+ /* Construct the gKeyWords */
+#ifdef USE_XMAS
+ MAKEMKEY(gKeyWords[0], "REFERENCE", STRING,1,2);
+ MAKEMKEY(gKeyWords[1], "MOBILE", STRING,1,2);
+#else
+ MAKEMKEY(gKeyWords[0], "REFERENCE", STRING,1,1);
+ MAKEMKEY(gKeyWords[1], "MOBILE", STRING,1,1);
+#endif
+ MAKEMKEY(gKeyWords[2], "FIT", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[3], "ATOMS", STRING,1,1);
+ MAKEMKEY(gKeyWords[4], "ZONE", STRING,1,1);
+ MAKEMKEY(gKeyWords[5], "GRAPHIC", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[6], "ALIGN", STRING,0,2);
+ MAKEMKEY(gKeyWords[7], "RATOMS", STRING,1,1);
+ MAKEMKEY(gKeyWords[8], "RZONE", STRING,1,1);
+ MAKEMKEY(gKeyWords[9], "WRITE", STRING,1,2);
+ MAKEMKEY(gKeyWords[10], "MATRIX", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[11], "STATUS", STRING,0,1);
+ MAKEMKEY(gKeyWords[12], "QUIT", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[13], "NUMBER", STRING,1,1);
+ MAKEMKEY(gKeyWords[14], "RMS", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[15], "RESIDUE", STRING,0,1);
+ MAKEMKEY(gKeyWords[16], "HETATOMS", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[17], "NOHETATOMS", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[18], "WEIGHT", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[19], "NOWEIGHT", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[20], "GAPPEN", NUMBER,1,2);
+ MAKEMKEY(gKeyWords[21], "READALIGNMENT", STRING,1,1);
+ MAKEMKEY(gKeyWords[22], "BVALUE", STRING,1,2);
+ MAKEMKEY(gKeyWords[23], "BWEIGHT", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[24], "IGNOREMISSING", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[25], "NOIGNOREMISSING", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[26], "NFITTED", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[27], "ITERATE", STRING,0,1);
+#ifdef USE_XMAS
+ MAKEMKEY(gKeyWords[28], "MULTI", STRING,1,2);
+#else
+ MAKEMKEY(gKeyWords[28], "MULTI", STRING,1,1);
+#endif
+ MAKEMKEY(gKeyWords[29], "QUIET", STRING,0,1);
+ MAKEMKEY(gKeyWords[30], "MWRITE", STRING,0,1);
+ MAKEMKEY(gKeyWords[31], "LIMIT", STRING,1,2);
+ MAKEMKEY(gKeyWords[32], "CENTRE", STRING,0,1);
+ MAKEMKEY(gKeyWords[33], "CENTER", STRING,0,1);
+ MAKEMKEY(gKeyWords[34], "DELZONE", STRING,1,1);
+ MAKEMKEY(gKeyWords[35], "DELRZONE", STRING,1,1);
+ MAKEMKEY(gKeyWords[36], "SETCENTRE", STRING,1,1);
+ MAKEMKEY(gKeyWords[37], "SETCENTER", STRING,1,1);
+ MAKEMKEY(gKeyWords[38], "SCRIPT", STRING,1,1);
+ MAKEMKEY(gKeyWords[39], "HEADER", STRING,0,1);
+ MAKEMKEY(gKeyWords[40], "PAIRDIST", STRING,0,1);
+ MAKEMKEY(gKeyWords[41], "DISTCUTOFF", STRING,0,1);
+ MAKEMKEY(gKeyWords[42], "OCCRANK", NUMBER,1,1);
+ MAKEMKEY(gKeyWords[43], "BZONE", NUMBER,0,0);
+ MAKEMKEY(gKeyWords[44], "SYMMATOMS", STRING,0,2);
+ MAKEMKEY(gKeyWords[45], "PRINTALIGN", STRING,0,2);
+ MAKEMKEY(gKeyWords[46], "MULTREF", STRING,0,1);
+ MAKEMKEY(gKeyWords[47], "ALLVSALL", STRING,0,1);
+ MAKEMKEY(gKeyWords[48], "SETREF", NUMBER,0,1);
+ MAKEMKEY(gKeyWords[49], "ORDERFIT", STRING,0,0);
+ MAKEMKEY(gKeyWords[50], "TRIMZONES", STRING,0,0);
+ MAKEMKEY(gKeyWords[51], "WTAVERAGE", STRING,0,1);
+ MAKEMKEY(gKeyWords[52], "NOFIT", STRING,0,0);
+
+ return(0);
+}
+
+
+/************************************************************************/
+/*>int DoCommandLoop(FILE *script)
+ -------------------------------
+ Sit and wait for commands to be processed by the parser. Also checks
+ for help commands.
+
+ 25.09.92 Original
+ 28.09.92 Added ReadStructure(), OS commands
+ 02.10.92 Changed gets() to GetKybdString()
+ 17.07.95 Replaced calls to screen() with printf()
+ Back to fgets() rather than GetKybdString()
+ 19.07.95 Added ByRes parameter to ShowRMS() and added RESIDUE, HETATOMS
+ and NOHETATOMS
+ 20.07.95 Added WEIGHT/NOWEIGHT
+ 25.07.95 Changed to use mparse() and RESIDUE command may now take
+ a filename for output. Added filename parameter to ShowRMS()
+ 20.11.95 Added READALIGNMENT handling
+ 21.11.95 Corrected `defaut:' to `default:'
+ 31.05.96 Added BVALUE handling
+ 13.06.96 Added BWEIGHT handling
+ 11.11.96 Handles second parameter to BVALUE
+ 18.11.96 Added IGNOREMISSING handling
+ 20.12.96 Added NFITTED handling
+ 15.01.01 Added ITER handling
+ 01.02.01 Added MULTI, QUIET, MWRITE handling
+ 20.02.01 Added LIMIT handling
+ 01.03.01 Added xmas parameters to REF, MOB, MULTI
+ 20.03.01 Added numeric parameter for ITERATE
+ Added CENTRE/CENTER
+ 28.03.01 Added REFERENCE parameter to WRITE command
+ 14.03.08 Added DELZONE command. By: CTP
+ 17.03.08 Added DELRZONE command.
+ 19.03.08 Added SETCENTRE/SETCENTER command.
+ 27.03.08 Added SCRIPT command. Added parameter - DoCommandLoop() now
+ takes input from FILE (either stdin or from script file).
+ Added comment marker #
+ Prompt only displayed when input from stdin.
+ 02.04.08 Added HEADER command.
+ 03.04.08 Added PAIRDIST command. Added parameter to ShowRMS()
+ 07.04.08 Added DISTCUTOFF command.
+ 16.04.08 Quitting anywhere exits from ProFit.
+ 04.06.08 Added SYMMATOMS command.
+ 16.07.08 GAPPEN now takes one or two parameters.
+ 18.07.08 Altered ALIGN command to call AlignWrapper().
+ Added PRINTALIGN command.
+ 24.07.08 Set RMS, RESIDUE and PAIRDIST commands to cycle through all
+ mobile structures.
+ 07.08.08 Added MULTIREF command.
+ 16.10.08 Added ALLVSALL, SETREF, ORDERFIT and TRIMZONES commands.
+ 23.10.08 Added WTAVERAGE command
+ 27.10.08 Added NOFIT command.
+ 07.11.08 Changed STATUS to take optional output filename.
+ 14.11.08 Added GNU Readline Library support.
+ 14.01.09 Added Output to file for PRINTALIGN.
+
+*/
+int DoCommandLoop(FILE *script)
+{
+ int nletters,
+ NParam,
+ i;
+ char comline[MAXBUFF],
+ *ptr;
+
+ for(;;)
+ {
+#ifdef READLINE_SUPPORT
+ if(script == stdin)
+ {
+ /* Get user input using readline */
+ char *line_read;
+ line_read = readline("ProFit>");
+
+ if(line_read && *line_read)
+ add_history (line_read);
+
+ strcpy(comline, line_read);
+ TERMINATE(comline);
+
+ if(line_read)
+ {
+ free(line_read);
+ line_read =(char *)NULL;
+ }
+ }
+ else
+ {
+ /* Read script using fgets */
+ if(!fgets(comline, MAXBUFF, script))
+ return(0);
+ TERMINATE(comline);
+ }
+#else
+ /* Get all input using fgets */
+ if(script == stdin)
+ stdprompt("ProFit");
+
+ if(!fgets(comline, MAXBUFF, script))
+ return(0);
+ TERMINATE(comline);
+#endif
+
+ /* Any line which starts with a $ is passed to the OS */
+ if(comline[0] == '$')
+ {
+ ptr = comline+1;
+ system(ptr);
+ continue;
+ }
+
+ /* Any line which starts with a # is echoed to the screen */
+ ptr = KillLeadSpaces(comline);
+ if(ptr[0] == '#')
+ {
+ if(!gQuiet)
+ printf(" %s\n",comline);
+ continue;
+ }
+
+ /* We need to check HELP outside the main parser as it has an
+ optional parameter (N.B. mparse() could now handle this)
+ */
+ if(match(comline,"HELP",&nletters))
+ {
+ if(nletters == 4)
+ {
+ DoHelp(comline,HELPFILE);
+ continue;
+ }
+ }
+
+ /* Main parser */
+ switch(mparse(comline,NCOMM,gKeyWords,gNumParam,gStrParam,&NParam))
+ {
+ case PARSE_ERRC:
+ printf(" Unrecognised keyword: %s\n",comline);
+ break;
+ case PARSE_ERRP:
+ printf(" Invalid parameters: %s\n",comline);
+ break;
+ case PARSE_COMMENT:
+ break;
+ case 0: /* REFERENCE */
+ gMultiCount = 1;
+ if(NParam==1)
+ ReadStructure(STRUC_REFERENCE,gStrParam[0],0,FALSE);
+ else
+ ReadStructure(STRUC_REFERENCE,gStrParam[1],0,TRUE);
+ break;
+ case 1: /* MOBILE */
+ gMultiCount = 1;
+ if(NParam==1)
+ ReadStructure(STRUC_MOBILE,gStrParam[0],0,FALSE);
+ else
+ ReadStructure(STRUC_MOBILE,gStrParam[1],0,TRUE);
+ break;
+ case 2: /* FIT */
+ FitStructures();
+ break;
+ case 3: /* ATOMS */
+ if(gIterate)
+ {
+ printf(" Warning==> You cannot change the atoms when \
+ITERATE is set.\n");
+ printf(" Command ignored\n");
+ }
+ else
+ {
+ SetFitAtoms(gStrParam[0]);
+ }
+ break;
+ case 4: /* ZONE */
+ SetFitZone(gStrParam[0], -1);
+ break;
+ case 5: /* GRAPHIC */
+ GraphicAlign();
+ break;
+ case 6: /* ALIGN */
+ /* CTP: Change to new alignment code and new command */
+ /*
+ for(i=0; i< gMultiCount; i++)
+ NWAlign(i);
+ break;
+ */
+ if(NParam==0)
+ {
+ AlignmentWrapper(-1,NULL,FALSE);
+ }
+ else if(NParam==1)
+ {
+ AlignmentWrapper(-1,gStrParam[0],FALSE);
+ }
+ else if(NParam==2 && !upstrncmp(gStrParam[1],"APPEND",6))
+ {
+ AlignmentWrapper(-1,gStrParam[0],TRUE);
+ }
+ else
+ {
+ printf(" Error: Command format is ");
+ printf("ALIGN [[WHOLE|*]|zone [APPEND]]\n");
+ }
+ break;
+ case 7: /* RATOMS */
+ SetRMSAtoms(gStrParam[0]);
+ break;
+ case 8: /* RZONE */
+ SetRMSZone(gStrParam[0]);
+ break;
+ case 9: /* WRITE */
+ if(NParam == 2)
+ {
+ if(!upstrncmp(gStrParam[0],"REF", 3))
+ {
+ WriteCoordinates(gStrParam[1], -1);
+ }
+ else
+ {
+ printf(" Error==> Invalid qualifier for WRITE: %s\n",
+ gStrParam[0]);
+ }
+ }
+ else
+ {
+ WriteCoordinates(gStrParam[0], 0);
+ }
+ break;
+ case 10: /* MATRIX */
+ ShowMatrix();
+ break;
+ case 11: /* STATUS */
+ ShowStatus(NParam?gStrParam[0]:NULL);
+ break;
+ case 12: /* QUIT */
+ Cleanup();
+ exit(0);
+ break;
+ case 13: /* NUMBER */
+ SetZoneStatus(gStrParam[0]);
+ break;
+ case 14: /* RMS */
+ if(gFitted)
+ {
+ if(gMultiCount == 1)
+ {
+ ShowRMS(FALSE,NULL,0,FALSE,FALSE);
+ }
+ else
+ {
+ int i;
+ for(i=0;i<gMultiCount;i++)
+ {
+ ShowRMS(FALSE,NULL,i,FALSE,FALSE);
+ }
+ }
+ }
+ else
+ {
+ printf(" Warning: Structures have not been fitted.\n");
+ }
+ break;
+ case 15: /* RESIDUE */
+ if(gFitted)
+ {
+ if(gMultiCount == 1)
+ {
+ ShowRMS(TRUE,(NParam?gStrParam[0]:NULL),0,FALSE,FALSE);
+ }
+ else
+ {
+ int i;
+ for(i=0;i<gMultiCount;i++)
+ ShowRMS(TRUE,(NParam?gStrParam[0]:NULL),i,FALSE,FALSE);
+ }
+ }
+ else
+ {
+ printf(" Warning: Structures have not been fitted.\n");
+ }
+ break;
+ case 16: /* HETATOMS */
+ gHetAtoms = TRUE;
+ printf(" Hetatoms will be read with future MOBILE or \
+REFERENCE commands\n");
+ break;
+ case 17: /* NOHETATOMS */
+ gHetAtoms = FALSE;
+ printf(" Hetatoms will be ignored with future MOBILE or \
+REFERENCE commands\n");
+ break;
+ case 18: /* WEIGHT */
+ gDoWeights = WEIGHT_BVAL;
+ break;
+ case 19: /* NOWEIGHT */
+ gDoWeights = WEIGHT_NONE;
+ break;
+ case 20: /* GAPPEN */
+ gGapPen = (int)gNumParam[0];
+ if(NParam == 2)
+ {
+ gGapPenExt = (int)gNumParam[1];
+ }
+ break;
+ case 21: /* READALIGNMENT */
+ ReadAlignment(gStrParam[0]);
+ break;
+ case 22: /* BVALUE */
+ if((!isdigit(gStrParam[0][0])) &&
+ (gStrParam[0][0] != '-') &&
+ (gStrParam[0][0] != '+') &&
+ (gStrParam[0][0] != '.'))
+ {
+ gUseBVal = 0;
+ printf(" Atoms will be included regardless of B-value\n");
+ }
+ else
+ {
+ if(NParam == 2)
+ {
+ if(!upstrncmp(gStrParam[1],"REF",3))
+ {
+ gUseBVal = 2;
+ }
+ else if(!upstrncmp(gStrParam[1],"MOB",3))
+ {
+ gUseBVal = 3;
+ }
+ else
+ {
+ printf(" %s is not a valid parameter to BVALUE. \
+Command ignored.\n",gStrParam[1]);
+ break;
+ }
+ }
+ else
+ {
+ gUseBVal = 1;
+ }
+ sscanf(gStrParam[0], "%lf", &gBValue);
+ }
+ break;
+ case 23: /* BWEIGHT */
+ gDoWeights = WEIGHT_INVBVAL;
+ break;
+ case 24: /* IGNOREMISSING */
+ gIgnoreMissing = TRUE;
+ break;
+ case 25: /* NOIGNOREMISSING */
+ gIgnoreMissing = FALSE;
+ break;
+ case 26: /* NFITTED */
+ ShowNFitted();
+ break;
+ case 27: /* ITERATE */
+ if((NParam == 1) && !upstrncmp(gStrParam[0], "OFF",3))
+ {
+ gIterate = FALSE;
+ }
+ else
+ {
+ int ok = TRUE;
+
+ /* If a parameter specified use this as the cutoff for adding
+ or removing equivalenced pairs
+ */
+ if(NParam == 1)
+ {
+ if(!sscanf(gStrParam[0], "%lf", &gMaxEquivDistSq))
+ {
+ printf(" Error==> Couldn't read cutoff from \
+parameter\n");
+ break;
+ }
+ gMaxEquivDistSq *= gMaxEquivDistSq;
+ }
+
+ /* CTP: Removed this check as we now handle multiple chains */
+ /* Check for numbers of chains */
+/***
+ if(countchar(gRefSeq,'*') > 0)
+ {
+ printf(" Error==> Structures must have only one chain \
+for iterative zones\n");
+ ok = FALSE;
+ }
+ for(i=0; i<gMultiCount; i++)
+ {
+ if(countchar(gMobSeq[i],'*') > 0)
+ {
+ printf(" Error==> Structures must have only one \
+chain for iterative zones\n");
+ ok = FALSE;
+ break;
+ }
+ }
+***/
+
+ if(ok)
+ {
+ gIterate = TRUE;
+ SetFitAtoms("CA");
+ if(!gQuiet)
+ {
+ printf(" Info==> Setting atom selection to CA \
+only\n");
+ }
+ }
+ }
+ break;
+ case 28: /* MULTI */
+ if(NParam==1)
+ ReadMulti(gStrParam[0], FALSE);
+ else
+ ReadMulti(gStrParam[0], TRUE);
+ break;
+ case 29: /* QUIET */
+ if((NParam == 1) && !upstrncmp(gStrParam[0], "OFF",3))
+ {
+ gQuiet = FALSE;
+ }
+ else
+ {
+ gQuiet = TRUE;
+ }
+ break;
+ case 30: /* MWRITE */
+ if(NParam == 1)
+ {
+ WriteMulti(gStrParam[0]);
+ }
+ else
+ {
+ WriteMulti("fit");
+ }
+ break;
+ case 31: /* LIMIT */
+ if((NParam == 1) && (!upstrncmp(gStrParam[0], "OFF",3)))
+ {
+ gLimit[0] = gLimit[1] = (-1);
+ break;
+ }
+ else if(NParam == 2)
+ {
+ if(sscanf(gStrParam[0], "%d", &(gLimit[0])) &&
+ sscanf(gStrParam[1], "%d", &(gLimit[1])))
+ {
+ break;
+ }
+ }
+
+ /* Error message */
+ printf(" Invalid parameters: %s\n",comline);
+ break;
+ case 32: /* CENTER & CENTRE */
+ case 33:
+ if((NParam == 1) && (!upstrncmp(gStrParam[0], "OFF",3)))
+ {
+ gCentre = FALSE;
+ }
+ else
+ {
+ gCentre = TRUE;
+ }
+ break;
+ case 34: /* DELZONE */
+ DelFitZone(gStrParam[0], -1);
+ break;
+ case 35: /* DELRZONE */
+ DelRMSZone(gStrParam[0], -1);
+ break;
+ case 36: /* SETCENTRE */
+ case 37: /* SETCENTER */
+ SetCentreResidue(gStrParam[0]);
+ break;
+ case 38: /* SCRIPT */
+ RunScript(gStrParam[0]);
+ break;
+ case 39: /* HEADER */
+ if((NParam == 1) && !upstrncmp(gStrParam[0], "OFF",3))
+ {
+ gReadHeader = FALSE;
+ }
+ else
+ {
+ gReadHeader = TRUE;
+ }
+ break;
+ case 40: /* PAIRDIST */
+ if(gFitted)
+ {
+ if(gMultiCount == 1)
+ {
+ ShowRMS(TRUE,(NParam?gStrParam[0]:NULL),0,FALSE,TRUE);
+ }
+ else
+ {
+ int i = 0;
+ for(i = 0; i<gMultiCount; i++)
+ {
+ printf("\n Mobile Structure: %d\n",i+1);
+ ShowRMS(TRUE,(NParam?gStrParam[0]:NULL),i,FALSE,TRUE);
+ }
+ }
+ }
+ else
+ {
+ printf(" Warning: Structures have not been fitted.\n");
+ }
+
+ break;
+ case 41: /* DISTCUTOFF */
+ if((NParam == 1) && !upstrncmp(gStrParam[0], "OFF",3))
+ {
+ gUseDistCutoff = FALSE;
+ }
+ else if((NParam == 1) && !upstrncmp(gStrParam[0], "ON",2))
+ {
+ gUseDistCutoff = TRUE;
+ }
+ else
+ {
+ sscanf(gStrParam[0], "%lf", &gDistCutoff);
+ gUseDistCutoff = TRUE;
+ }
+ if(!gQuiet)
+ {
+ if(gUseDistCutoff)
+ {
+ printf(" Atom pairs will be discarded if their \
+interatomic distance is > %.2f\n", gDistCutoff);
+ if(!gDistCutoff)
+ {
+ printf(" Warning: No atom pairs will be \
+included when the cutoff is set to zero.\n");
+ }
+ }
+ else
+ {
+ printf(" Atom pairs will be included regardless\
+ of interatomic distance\n");
+ }
+ }
+ break;
+ case 42: /* OCCRANK */
+ if((int)gNumParam[0] >= 1)
+ gOccRank = (int)gNumParam[0];
+ else
+ printf( "Error: Occupancy rank must be >= 1\n");
+ break;
+ case 43: /* BZONE */
+ SetZoneFromBValCol();
+ break;
+ case 44: /* SYMMATOMS */
+ /* Set Status */
+ if(NParam)
+ {
+ if(!upstrncmp(gStrParam[0],"ON", 2) ||
+ !upstrncmp(gStrParam[1],"ON", 2) ||
+ !upstrncmp(gStrParam[0],"ALL",3))
+ gMatchSymAtoms = TRUE;
+
+ if(!upstrncmp(gStrParam[0],"OFF",3) ||
+ (!upstrncmp(gStrParam[0],"ALL",3) &&
+ !upstrncmp(gStrParam[1],"OFF",3)))
+ gMatchSymAtoms = FALSE;
+
+ for(i=0; i < SYMM_ATM_PAIRS; i++)
+ {
+ if(!upstrncmp(gStrParam[0],gSymType[i][0],3) ||
+ !upstrncmp(gStrParam[0],"ALL",3))
+ {
+ if(!upstrncmp(gStrParam[1],"OFF",3))
+ gSymType[i][3][0] = FALSE;
+ else
+ gSymType[i][3][0] = TRUE;
+ }
+ }
+ }
+
+ /* Print Status */
+ if(gMatchSymAtoms)
+ printf(" Match Symmetric Atoms is ON\n");
+ else
+ printf(" Match Symmetric Atoms is OFF\n");
+
+ printf(" Atom pairs matched:\n");
+
+ for(i=0; i<SYMM_ATM_PAIRS; i++)
+ {
+ printf(" %s%s -%s ",gSymType[i][0],
+ gSymType[i][1],gSymType[i][2]);
+
+ if(gSymType[i][3][0] == TRUE)
+ printf(" ON\n");
+ else
+ printf(" OFF\n");
+ }
+
+ break;
+ case 45: /* PRINTALIGN */
+ /* Error if no user fit zones */
+ if(!gUserFitZone)
+ {
+ printf(" Error: No user-defined zones found.\n");
+ break;
+ }
+
+ /* Parameter FASTA prints a FASTA-formatted pairwise alignment */
+ if(NParam>=1 && !upstrncmp(gStrParam[0],"FASTA",5) &&
+ strlen(gStrParam[0]) == 5)
+ {
+ AlignmentFromZones((NParam == 2)?gStrParam[1]:NULL, TRUE);
+ }
+ else if(NParam>=1 && !upstrncmp(gStrParam[0],"PIR",3) &&
+ strlen(gStrParam[0]) == 3)
+ {
+ AlignmentFromZones_PIR((NParam == 2)?gStrParam[1]:NULL);
+ }
+ else
+ {
+ AlignmentFromZones(NParam?gStrParam[0]:NULL, FALSE);
+ }
+ break;
+ case 46: /* MULTREF */
+ if(NParam==1 && !upstrncmp(gStrParam[0],"OFF",3))
+ {
+ gMultiVsRef = FALSE;
+ /* Reset reference to current multistructure reference */
+ SetMobileToReference(gMultiRef);
+ printf(" Multi: RMS, RESIDUE, PARDIST and MATRIX \
+commands\n");
+ printf(" compare with mobile structure: %d.\n",
+ gMultiRef+1);
+
+ }
+ else
+ {
+ gMultiVsRef = TRUE;
+ printf(" Multi: RMS, RESIDUE, PARDIST and MATRIX \
+commands\n");
+ printf(" compare with Averaged Reference \
+structure.\n");
+ }
+ break;
+ case 47: /* ALLVSALL */
+ if(gMultiCount > 1 && !gIterate)
+ {
+ AllVsAllRMS((NParam?gStrParam[0]:NULL),
+ (NParam?TRUE:!gQuiet),FALSE);
+ }
+ else
+ {
+ /* Error Messages */
+ if(gMultiCount < 2)
+ {
+ printf(" Error: All vs all comparison ");
+ printf("can only be used with MULTI.\n");
+ }
+ if(gIterate)
+ {
+ printf(" Error: All vs all comparison ");
+ printf("cannot be used with iterative zones.\n");
+ }
+ }
+ break;
+ case 48: /* SETREF */
+ if(gMultiCount > 1)
+ {
+ if(NParam==0)
+ {
+ if(!gIterate)
+ {
+ /* Automated set */
+ AllVsAllRMS(NULL,FALSE,TRUE);
+ }
+ else
+ {
+ /* Error: Can only use with MULTI */
+ printf(" Error: Automated reference selection ");
+ printf("cannot be used with iterative zones.\n");
+
+ }
+ }
+ else
+ {
+ if((int)gNumParam[0] > 0 &&
+ (int)gNumParam[0] <= gMultiCount)
+ {
+ /* Manual set */
+ SetMobileToReference((int)gNumParam[0] - 1);
+ gMultiRef = (int)gNumParam[0] - 1;
+ if(!gQuiet)
+ printf(" Reference set to mobile %d\n",
+ (int)gNumParam[0]);
+
+ }
+ else
+ {
+ /* Structure number out of range */
+ printf(" Error: ");
+ printf("Number entered must be between 1 and %d\n",
+ gMultiCount);
+ }
+ }
+ }
+ else
+ {
+ /* Error: Can only use with MULTI */
+ printf(" Error: ");
+ printf("SETREF can only be used with MULTI.\n");
+ }
+
+ break;
+ case 49: /* ORDERFIT */
+ FitStructuresWrapper();
+ break;
+ case 50: /* TRIMZONES */
+ if(gMultiCount > 1)
+ {
+ TrimZones();
+ }
+ else
+ {
+ printf(" Error: ");
+ printf("TRIMZONES can only be used with MULTI.\n");
+ }
+ break;
+ case 51: /* WTAVERAGE */
+
+ if(NParam==1 && !upstrncmp(gStrParam[0],"OFF",3))
+ {
+ gWtAverage = FALSE;
+ }
+ else if((NParam==1 && !upstrncmp(gStrParam[0],"ON",3)) ||
+ (NParam==0))
+ {
+ gWtAverage = TRUE;
+ }
+ break;
+ case 52: /* NOFIT */
+ NoFitStructures();
+ break;
+ default:
+ break;
+ }
+ }
+ return(0);
+}
+
+
+/************************************************************************/
+/*>void SetFitAtoms(char *command)
+ -------------------------------
+ This splits up a list of atoms names and inserts them in the
+ fitatoms list.
+
+ 28.09.92 Framework
+ 29.09.92 Original
+ 30.09.92 Added NOT option; pad fitatoms[]
+ 17.07.95 Replaced calls to screen() with printf()
+ 23.07.96 Changed to call PADMINTERM() rather than padterm()
+ Checks for legal wildcard specifications
+ Improved out-of-bounds error checking
+ 13.03.09 Added temporary fix for atom type recognition.
+*/
+void SetFitAtoms(char *command)
+{
+ int comptr,
+ comstart,
+ atmnum,
+ atmpos;
+ char *cmnd;
+
+ /* Put to upper case */
+ UPPER(command);
+
+ /* Assume this is not a NOT selection */
+ gNOTFitAtoms = FALSE;
+ cmnd = command;
+
+ /* Set NOT flag if required and step over the symbol */
+ if(command[0] == '~' || command[0] == '^')
+ {
+ gNOTFitAtoms = TRUE;
+ cmnd++;
+ }
+
+ /* Blank all fitatoms */
+ for(atmnum=0; atmnum<NUMTYPES; atmnum++)
+ for(atmpos=0; atmpos<8; atmpos++)
+ gFitAtoms[atmnum][atmpos] = '\0';
+
+ atmnum=0;
+ atmpos=0;
+ comstart=0;
+ for(comptr=0;comptr<strlen(cmnd);comptr++)
+ {
+ if(cmnd[comptr]==',')
+ {
+ comstart=comptr+1;
+ gFitAtoms[atmnum][atmpos] = '\0'; /* Terminate string */
+ PADMINTERM(gFitAtoms[atmnum],4);
+ if(++atmnum >= NUMTYPES)
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> Too many atoms in specification. \
+Only %d used\n",NUMTYPES);
+ }
+ break;
+ }
+ atmpos=0;
+ }
+ else
+ {
+ if(atmpos >= MAXATSPEC)
+ {
+ TERMAT(cmnd+comptr, ',');
+ if(!gQuiet)
+ {
+ printf(" Warning==> Atom name specification too long \
+(%s)\n", cmnd+comstart);
+ printf(" Using all atoms.\n");
+ }
+ gFitAtoms[0][0] = '*';
+ gFitAtoms[0][1] = '\0';
+ break;
+ }
+
+ gFitAtoms[atmnum][atmpos++] = cmnd[comptr];
+ }
+ }
+
+ /* Terminate last one */
+ gFitAtoms[atmnum][atmpos] = '\0';
+ PADMINTERM(gFitAtoms[atmnum],4);
+
+ /* See if a * was specified not in first position; move it if so.
+ Also check for legal atom wildcard specifications
+ */
+ for(atmnum=0; atmnum<NUMTYPES; atmnum++)
+ {
+ if(!LegalAtomSpec(gFitAtoms[atmnum]))
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> Illegal atom specification (%s). Using \
+all atoms\n",
+ gFitAtoms[atmnum]);
+ }
+ gFitAtoms[0][0] = '*';
+ gFitAtoms[0][1] = '\0';
+ break;
+ }
+
+ if(gFitAtoms[atmnum][0] == '*')
+ {
+ if(gNOTFitAtoms)
+ {
+ if(!gQuiet)
+ {
+ printf(" Warning==> NOT option ignored.\n");
+ }
+ gNOTFitAtoms = FALSE;
+ }
+ gFitAtoms[0][0] = '*';
+ gFitAtoms[0][1] = '\0';
+ break;
+ }
+ }
+
+ gFitted = FALSE;
+
+
+ /* Bugfix - Atom Recognition By: CTP */
+ /* Temporary fix for atom type recognition. Removed trailing spaces
+ for atom type strings. Need to check with ACRM why strings were
+ padded. */
+ if(TRUE)
+ {
+ int i = 0;
+ for(i=0;i<NUMTYPES;i++)
+ {
+ if(gFitAtoms[i][0] == '\0') break;
+ KILLTRAILSPACES(gFitAtoms[i]);
+ }
+ }
+ /* End of Bugfix */
+
+
+ return;
+}
+
+
+/************************************************************************/
+/*>void SetFitZone(char *command, int strucnum)
+ --------------------------------------------
+ Processes commands to specify the zone for fitting.
+
+ 28.09.92 Original
+ 29.09.92 Added gCurrentMode
+ 01.10.92 Added gUserFitZone flag
+ 17.07.95 Replaced calls to screen() with printf()
+ 18.07.95 Added initialisation of inserts in zones
+ 19.07.95 * on its own equivalent to CLEAR
+ 18.06.96 Replaced MODE_* with ZONE_MODE_*
+ 01.02.01 Added multi-structure support incl. strucnum param
+ 28.02.01 Error message from ParseZone() when trying to specify multiple
+ zones for multi-structure now here instead of in ParseZone()
+ 11.04.08 Added warning for overlapping zones. By: CTP
+ 22.04.08 Added handling of lowercase chain and inserts.
+*/
+void SetFitZone(char *command, int strucnum)
+{
+ int start1, stop1,
+ start2, stop2,
+ SeqZone, maxstruc,
+ snum;
+ char chain1, chain2,
+ startinsert1, stopinsert1,
+ startinsert2, stopinsert2;
+ ZONE *z;
+ int warned = 0;
+
+ snum = strucnum;
+
+ /* See if this is clearing the zones */
+ if(!gUserFitZone || !upstrncmp(command,"CLEAR",5) ||
+ !strcmp(command,"*"))
+ {
+ if(strucnum > (-1))
+ {
+ if(gZoneList[snum] != NULL)
+ {
+ FREELIST(gZoneList[snum],ZONE);
+ gZoneList[snum] = NULL;
+ }
+ }
+ else
+ {
+ for(snum=0; snum<gMultiCount; snum++)
+ {
+ if(gZoneList[snum] != NULL)
+ {
+ FREELIST(gZoneList[snum],ZONE);
+ gZoneList[snum] = NULL;
+ }
+ }
+ }
+ gUserFitZone = FALSE;
+ if(!upstrncmp(command,"CLEAR",5) || !strcmp(command,"*"))
+ return;
+ }
+
+ /* If strucnum is -1 then set range so we will do all structures */
+ if(strucnum == (-1))
+ {
+ snum = 0;
+ maxstruc = gMultiCount;
+ }
+ else /* Just do the one structure specified */
+ {
+ maxstruc = strucnum+1;
+ }
+
+ for(; snum<maxstruc; snum++)
+ {
+ SeqZone = ParseZone(command, &start1, &stop1, &chain1,
+ &startinsert1, &stopinsert1,
+ &start2, &stop2, &chain2,
+ &startinsert2, &stopinsert2, snum);
+ if((SeqZone == (-2)) && (!warned))
+ {
+ printf(" Error==> You cannot specify zones for each \
+structure when performing\n");
+ printf(" multiple structure fitting.\n");
+ warned = 1;
+ }
+
+ if(SeqZone > -1)
+ {
+ /* Allocate an entry in zone list */
+ if(gZoneList[snum])
+ {
+ /* Move to end of zone list */
+ z=gZoneList[snum];
+ LAST(z);
+ ALLOCNEXT(z,ZONE);
+ }
+ else
+ {
+ INIT(gZoneList[snum],ZONE);
+ z = gZoneList[snum];
+ }
+
+ if(z==NULL)
+ {
+ printf(" Error==> No memory for zone!\n");
+ }
+ else
+ {
+ /* Add this zone to the zone list */
+ z->chain1 = chain1;
+ z->start1 = start1;
+ z->startinsert1 = startinsert1;
+ z->stop1 = stop1;
+ z->stopinsert1 = stopinsert1;
+ z->chain2 = chain2;
+ z->start2 = start2;
+ z->startinsert2 = startinsert2;
+ z->stop2 = stop2;
+ z->stopinsert2 = stopinsert2;
+ z->mode = SeqZone ?
+ ZONE_MODE_SEQUENTIAL : gCurrentMode;
+
+ /* CTP: Check for overlap */
+ if(CheckOverlap(z,gZoneList[snum],snum) > 1)
+ printf(" Warning: New zone overlaps existing zone.\n");
+ else if(CheckOverlap(z,gZoneList[snum],snum) == -1)
+ printf(" Error: Failed to find new zone.\n");
+
+ }
+ }
+ }
+
+ gFitted = FALSE;
+ gUserFitZone = TRUE;
+
+ return;
+}
+
+/************************************************************************/
+/*>void DelFitZone(char *command, int strucnum)
+ --------------------------------------------
+ Processes commands to delete a zone for fitting.
+
+ 14.03.08 Original based on SetFitZone() By: CTP
+ 22.04.08 Added handling of lowercase chain and inserts.
+ 11.02.09 Changed CLEAR to ALL.
+*/
+void DelFitZone(char *command, int strucnum)
+{
+ int start1, stop1;
+ int start2, stop2;
+ int SeqZone, maxstruc;
+ int snum;
+ char chain1, chain2;
+ char startinsert1, stopinsert1;
+ char startinsert2, stopinsert2;
+ ZONE *z;
+ int warned = 0;
+
+ snum = strucnum;
+
+ /* Clear All Zones */
+ if(!gUserFitZone || !upstrncmp(command,"ALL",3) ||
+ !strcmp(command,"*"))
+ {
+ if(strucnum > (-1))
+ {
+ if(gZoneList[snum] != NULL)
+ {
+ FREELIST(gZoneList[snum],ZONE);
+ gZoneList[snum] = NULL;
+ }
+ }
+ else
+ {
+ for(snum=0; snum<gMultiCount; snum++)
+ {
+ if(gZoneList[snum] != NULL)
+ {
+ FREELIST(gZoneList[snum],ZONE);
+ gZoneList[snum] = NULL;
+ }
+ }
+ }
+ gUserFitZone = FALSE;
+ if(!upstrncmp(command,"ALL",3) || !strcmp(command,"*"))
+ {
+ printf(" All zones cleared.\n");
+ return;
+ }
+ }
+
+ /* If strucnum is -1 then set range so we will do all structures */
+ if(strucnum == (-1))
+ {
+ snum = 0;
+ maxstruc = gMultiCount;
+ }
+ else /* Just do the one structure specified */
+ {
+ maxstruc = strucnum+1;
+ }
+
+ /* Main Loop: Find and Delete Zone */
+ for(;snum<maxstruc; snum++)
+ {
+ /* Parse Zone */
+ SeqZone = ParseZone(command, &start1, &stop1, &chain1,
+ &startinsert1, &stopinsert1,
+ &start2, &stop2, &chain2,
+ &startinsert2, &stopinsert2, snum);
+
+ if((SeqZone == (-2)) && (!warned))
+ {
+ printf(" Error==> You cannot specify zones for each \
+structure when performing\n");
+ printf(" multiple structure fitting.\n");
+ warned = 1;
+ }
+
+ /* Compare with Zone list */
+ z=gZoneList[snum];
+
+ while(z!=NULL)
+ {
+ /* Compare with current zone */
+ if((z->chain1 == chain1) &&
+ (z->start1 == start1) &&
+ (z->startinsert1 == startinsert1) &&
+ (z->stop1 == stop1) &&
+ (z->stopinsert1 == stopinsert1) &&
+ (z->chain2 == chain2) &&
+ (z->start2 == start2) &&
+ (z->startinsert2 == startinsert2) &&
+ (z->stop2 == stop2) &&
+ (z->stopinsert2 == stopinsert2) &&
+ (z->mode == SeqZone ?
+ ZONE_MODE_SEQUENTIAL : gCurrentMode))
+ {
+ /* Delete Zone and Return */
+ DELETE(gZoneList[snum],z,ZONE);
+ gFitted = FALSE;
+ if(gZoneList[snum]!=NULL)
+ gUserFitZone = TRUE;
+ else
+ gUserFitZone = FALSE;
+ printf(" Zone Deleted\n");
+ return;
+ }
+ NEXT(z);
+ }
+ }
+ /* Zone Not Found */
+ printf(" No matching zone found.\n");
+ return;
+}
+
+
+/************************************************************************/
+/*>void DelRMSZone(char *command, int strucnum)
+ --------------------------------------------
+ Processes commands to delete a user-defined RMSd comparison zone.
+
+ 17.03.08 Original based on DelFitZone() By: CTP
+ 22.04.08 Added handling of lowercase chain and inserts.
+ 11.02.09 Changed CLEAR to ALL.
+*/
+void DelRMSZone(char *command, int strucnum)
+{
+ int start1, stop1;
+ int start2, stop2;
+ int SeqZone, maxstruc;
+ int snum;
+ char chain1, chain2;
+ char startinsert1, stopinsert1;
+ char startinsert2, stopinsert2;
+ ZONE *z;
+ int warned = 0;
+
+ snum = strucnum;
+
+ /* Clear All Zones */
+ if(!upstrncmp(command,"ALL",3) || !strcmp(command,"*"))
+ {
+ if(strucnum > (-1))
+ {
+ if(gRZoneList[snum] != NULL)
+ {
+ FREELIST(gRZoneList[snum],ZONE);
+ gRZoneList[snum] = NULL;
+ }
+ }
+ else
+ {
+ for(snum=0; snum<gMultiCount; snum++)
+ {
+ if(gRZoneList[snum] != NULL)
+ {
+ FREELIST(gRZoneList[snum],ZONE);
+ gRZoneList[snum] = NULL;
+ }
+ }
+ }
+ gUserRMSZone = TRUE;
+ printf(" All zones cleared.\n");
+ return;
+ }
+
+ /* If strucnum is -1 then set range so we will do all structures */
+ if(strucnum == (-1))
+ {
+ snum = 0;
+ maxstruc = gMultiCount;
+ }
+ else /* Just do the one structure specified */
+ {
+ maxstruc = strucnum+1;
+ }
+
+ /* Main Loop: Find and Delete RMSd Calc Zone */
+ for(;snum<maxstruc; snum++)
+ {
+ /* Parse Zone */
+ SeqZone = ParseZone(command, &start1, &stop1, &chain1,
+ &startinsert1, &stopinsert1,
+ &start2, &stop2, &chain2,
+ &startinsert2, &stopinsert2, snum);
+
+ if((SeqZone == (-2)) && (!warned))
+ {
+ printf(" Error==> You cannot specify zones for each \
+structure when performing\n");
+ printf(" multiple structure fitting.\n");
+ warned = 1;
+ }
+
+ /* Compare with Zone list */
+ z=gRZoneList[snum];
+
+ while(z!=NULL)
+ {
+ /* Compare to current zone */
+ if((z->chain1 == chain1) &&
+ (z->start1 == start1) &&
+ (z->startinsert1 == startinsert1) &&
+ (z->stop1 == stop1) &&
+ (z->stopinsert1 == stopinsert1) &&
+ (z->chain2 == chain2) &&
+ (z->start2 == start2) &&
+ (z->startinsert2 == startinsert2) &&
+ (z->stop2 == stop2) &&
+ (z->stopinsert2 == stopinsert2) &&
+ (z->mode == SeqZone ?
+ ZONE_MODE_SEQUENTIAL : gCurrentMode))
+ {
+ /* Delete Zone and Return */
+ DELETE(gRZoneList[snum],z,ZONE);
+ gUserRMSZone = TRUE;
+ printf(" Zone Deleted\n");
+ return;
+ }
+ NEXT(z);
+ }
+ }
+ /* Zone Not Found */
+ printf(" No matching zone found.\n");
+ return;
+}
+
+
+/************************************************************************/
+/*>void SetZoneStatus(char *status)
+ --------------------------------
+ Processes the command to set the mode for interpretation of residue
+ numbers.
+
+ 28.09.92 Framework
+ 29.09.92 Original
+ 08.10.93 Modified only to toupper lowercase letters
+ 17.07.95 Replaced calls to screen() with printf()
+ 18.06.96 Replaced MODE_* with ZONE_MODE_*
+*/
+void SetZoneStatus(char *status)
+{
+ char ch;
+
+ ch = status[0];
+ if(islower(ch)) ch = toupper(ch);
+
+ if(ch == 'R')
+ gCurrentMode = ZONE_MODE_RESNUM;
+ else if(ch == 'S')
+ gCurrentMode = ZONE_MODE_SEQUENTIAL;
+ else
+ printf(" Error==> Invalid numbering mode. Must be RESIDUE or \
+SEQUENTIAL\n");
+
+ return;
+}
+
+
+/************************************************************************/
+/*>int GetResSpec(char *resspec, int *resnum, char *chain, char *insert)
+ ---------------------------------------------------------------------
+ Extracts residue number and chain name from a string. If the string is
+ blank, the residue number will be specified as -999 and the chain name
+ will be unmodified.
+
+ 29.09.92 Original
+ 17.07.95 Returns insert code. Uses ParseResSpec()
+ 20.02.01 Now makes its own copy of the string with an \ characters
+ removed
+ 20.02.01 -999 for start or end of structure rather than -1
+ 22.04.08 Set to use ParseResSpecNoUpper() allowing function to deal
+ with lowercase chain names and inserts. By: CTP
+*/
+int GetResSpec(char *resspec, int *resnum, char *chain, char *insert)
+{
+ char *ptr, *buffer;
+ int i,
+ retval = 0;
+
+ if((buffer = (char *)malloc(strlen(resspec)+1))==NULL)
+ {
+ printf(" Error==> no memory for copy of residue spec\n");
+ return(2);
+ }
+
+ /* Move pointer over any spaces */
+ for(ptr=resspec; *ptr==' '||*ptr=='\t'; ptr++) ;
+
+ /* Copy the resspec string skipping \ characters */
+ for(i=0; *ptr; ptr++)
+ {
+ if(*ptr != '\\')
+ buffer[i++] = *ptr;
+ }
+ buffer[i] = '\0';
+
+ /* Assume blank insert code */
+ *insert = ' ';
+
+ /* If the zone is blank, it will be all residues */
+ if(*buffer == '\0')
+ {
+ *resnum = -999;
+ free(buffer);
+ return(0);
+ }
+
+ if(!ParseResSpecNoUpper(buffer, chain, resnum, insert))
+ retval = 1; /* Indicates an error */
+
+ free(buffer);
+ return(retval);
+}
+
+
+/************************************************************************/
+/*>int ParseZone(char *zonespec,
+ int *start1, int *stop1, char *chain1,
+ char *startinsert1, char *stopinsert1,
+ int *start2, int *stop2, char *chain2,
+ char *startinsert2, char *stopinsert2,
+ int strucnum)
+ ----------------------------------------------------
+ Returns: 0 for numeric zones,
+ 1 for sequence specified zones,
+ -1 for error.
+ -2 for error in multi-zone
+
+ Sorts out zone specification. The specified residue ranges will be
+ returned in start1/stop1/chain1 and start2/stop2/chain2. The first or
+ last residue present will be specified by -999. Zones may be specified
+ by number or by sequence.
+
+ 28.09.92 Original
+ 29.09.92 Added Sequence zones and return flag
+ 17.07.95 Replaced calls to screen() with printf()
+ Added inserts
+ 01.02.01 Added strucnum parameter
+ 20.02.01 Allow negative residue numbers to be escaped with a \
+ Code realises this - is not a range spcifier.
+ 20.02.01 -999 for start or end of structure rather than -1
+ 28.02.01 -2 return for multi-zone error - moved message out to calling
+ routine
+ 22.04.08 Modified to alow use of lower case chain names. By: CTP
+ 16.05.08 Case sensitivity set after zonespec split into zone1 and zone2
+
+*/
+int ParseZone(char *zonespec,
+ int *start1,
+ int *stop1,
+ char *chain1,
+ char *startinsert1,
+ char *stopinsert1,
+ int *start2,
+ int *stop2,
+ char *chain2,
+ char *startinsert2,
+ char *stopinsert2,
+ int strucnum)
+{
+ char *zone1,
+ *zone2,
+ *ptr,
+ *dash = NULL,
+ buffer[80];
+ int retval = 0;
+
+ /* Blank the chain and insert names */
+ *chain1 = *chain2 = ' ';
+ *startinsert1 = *startinsert2 = ' ';
+ *stopinsert1 = *stopinsert2 = ' ';
+
+ /* First split into 2 parts if required
+ 17.07.95 Added calls to KILLLEADSPACES()
+ */
+ KILLLEADSPACES(zone1, zonespec);
+ zone2 = zone1;
+
+ if((ptr=strchr(zonespec,':'))!=NULL)
+ {
+ /* 20.02.01 We don't allow this type of zone spec when we have
+ multiple structures
+ */
+ if(gMultiCount > 1)
+ {
+ return(-2);
+ }
+
+ KILLLEADSPACES(zone2, ptr+1);
+ *ptr = '\0';
+ }
+
+ /* Convert to uppercase unless a full stop is used as a separator. */
+ /* A full stop is used as a separator with either a numeric or lower
+ case chain names. */
+ if(strchr(zone1,'.')==NULL)
+ UPPER(zone1);
+ if(strchr(zone2,'.')==NULL)
+ UPPER(zone2);
+
+/*
+*** Do zone1 first ***
+*/
+
+ /* See if there is a dash representing a range, skipping any - sign
+ escaped with a \ (\- is used to represent a negative residue
+ number)
+ */
+ dash=FindDash(zone1);
+
+ /* If there's a * in the zone spec. it's all residues */
+ if((ptr=strchr(zone1,'*'))!=NULL)
+ {
+ *start1 = -999;
+ *stop1 = -999;
+
+ /* If * was not the first char in the zone spec., then the first
+ character was the chain name.
+ */
+ if(ptr != zone1)
+ *chain1 = *zone1;
+ }
+ else if(dash != NULL) /* - indicates a numeric zone */
+ {
+ /* Make a temporary copy */
+ strcpy(buffer,zone1);
+ /* Terminate copy at the - */
+ *(FindDash(buffer)) = '\0';
+ /* Read the start of the zone */
+ if(GetResSpec(buffer,start1,chain1,startinsert1))
+ {
+ printf(" Invalid zone specification: ");
+ printf("%s\n",buffer);
+ return(-1);
+ }
+
+ /* Read end of zone */
+ if(GetResSpec(dash+1,stop1,chain1,stopinsert1))
+ {
+ printf(" Invalid zone specification: ");
+ printf("%s\n",dash+1);
+ return(-1);
+ }
+ }
+ else /* It's a sequence specified zone */
+ {
+ if(FindSeq(zone1,gRefSeq,start1,stop1,chain1))
+ {
+ printf(" Sequence zone specification not found: ");
+ printf("%s\n",zone1);
+ return(-1);
+ }
+ else
+ {
+ retval = 1;
+ }
+ }
+
+/*
+*** Do zone2 if different from zone1 ***
+*/
+ if((zone1 == zone2) && retval==0)
+ {
+ *start2 = *start1;
+ *stop2 = *stop1;
+ *chain2 = *chain1;
+ *startinsert2 = *startinsert1;
+ *stopinsert2 = *stopinsert1;
+ }
+ else
+ {
+ dash=FindDash(zone2);
+
+ /* If there's a * in the zone spec. it's all residues */
+ if((ptr=strchr(zone2,'*'))!=NULL)
+ {
+ *start2 = -999;
+ *stop2 = -999;
+
+ /* If * was not the first char in the zone spec., then the first
+ character was the chain name.
+ */
+ if(ptr != zone2)
+ *chain2 = *zone2;
+ }
+ else if(dash!=NULL) /* - shows a numeric zone */
+ {
+ /* Make a temporary copy */
+ strcpy(buffer,zone2);
+ /* Terminate copy at the - */
+ *(FindDash(buffer)) = '\0';
+ /* Read the start of the zone */
+ if(GetResSpec(buffer,start2,chain2,startinsert2))
+ {
+ printf(" Invalid zone specification: ");
+ printf("%s\n",buffer);
+ return(-1);
+ }
+
+ /* Read end of zone */
+ if(GetResSpec(dash+1,stop2,chain2,stopinsert2))
+ {
+ printf(" Invalid zone specification: ");
+ printf("%s\n",dash+1);
+ return(-1);
+ }
+ }
+ else /* It's a sequence specified zone */
+ {
+ if(FindSeq(zone2,gMobSeq[strucnum],start2,stop2,chain2))
+ {
+ printf(" Sequence zone specification not found: ");
+ printf("%s\n",zone2);
+ return(-1);
+ }
+ else
+ {
+ retval = 1;
+ }
+ }
+ }
+
+ return(retval);
+}
+
+
+/************************************************************************/
+/*>int FindSeq(char *zonespec, char *sequence, int *start, int *stop,
+ int *chain)
+ ------------------------------------------------------------------
+ Returns: 0 Sequence found
+ 1 Sequence not found
+
+ Finds the start and stop on the basis of sequence.
+
+ 29.09.92 Original
+ 30.09.92 +1 Correction to length
+*/
+int FindSeq(char *zonespec,
+ char *sequence,
+ int *start,
+ int *stop,
+ char *chain)
+{
+ char zoneseq[40],
+ *ptr;
+ int length,
+ occurence,
+ noccur,
+ j;
+
+ /* Extract the sequence part */
+ strcpy(zoneseq,zonespec);
+ ptr = strchr(zoneseq,',');
+ if(ptr!=NULL) *ptr = '\0';
+ ptr = strchr(zoneseq,'/');
+ if(ptr!=NULL) *ptr = '\0';
+
+ /* Extract the length */
+ ptr = strchr(zonespec,',');
+ if(ptr!=NULL) sscanf(ptr+1,"%d",&length);
+ else length = strlen(zoneseq);
+
+ /* Extract the occurence number */
+ ptr = strchr(zonespec,'/');
+ if(ptr!=NULL) sscanf(ptr+1,"%d",&occurence);
+ else occurence = 1;
+
+ /* Now search for the occurence'th occurence of zoneseq */
+ noccur = 0;
+ for(j=0; j<strlen(sequence)-length+1; j++)
+ {
+ if(!strncmp(zoneseq,sequence+j,strlen(zoneseq))) /* 30.09.92 */
+ {
+ if(++noccur == occurence)
+ {
+ *start = j+1;
+ *stop = j+length;
+ *chain = ' ';
+ return(0);
+ }
+ }
+ }
+
+ *start = -2;
+ *stop = -2;
+ *chain = 'X';
+
+ return(1);
+}
+
+
+/************************************************************************/
+/*>void ShowMatrix(void)
+ ---------------------
+ Displays the rotation matrix.
+
+ 28.09.92 Framework
+ 30.09.92 Original
+ 17.07.95 Replaced calls to screen() with printf()
+ 02.10.95 Added printing of CofGs
+ 20.02.01 gMobCofG now an array
+ 25.07.08 Modified to give translation and rotation vector to
+ gMobCofG[0] instead of the averaged reference structure,
+ gRefCofG. By: CTP
+ 11.11.08 Modified to give translation + rotation to gMultiRef.
+ 16.02.09 Modified rotation matrix inversion.
+*/
+void ShowMatrix(void)
+{
+ int i,j,
+ strucnum;
+
+ /* Return if not fitted */
+ if(!gFitted)
+ {
+ printf(" Warning==> Structures have not yet been fitted.\n");
+ }
+
+ if(gMultiCount == 1)
+ {
+ /* Deal with single mobile structure. */
+ /* ================================== */
+
+ printf(" Reference CofG...\n");
+ printf(" %8.4f %8.4f %8.4f\n",gRefCofG.x,
+ gRefCofG.y,
+ gRefCofG.z);
+ printf(" Mobile CofG...\n");
+ printf(" %8.4f %8.4f %8.4f\n",gMobCofG[0].x,
+ gMobCofG[0].y,
+ gMobCofG[0].z);
+
+ printf(" Rotation matrix...\n");
+ for(i=0; i<3; i++)
+ {
+ printf(" %8.4f %8.4f %8.4f\n",gRotMat[0][i][0],
+ gRotMat[0][i][1],
+ gRotMat[0][i][2]);
+ }
+
+ printf(" Translation vector (between CofGs)...\n");
+ printf(" %8.4f %8.4f %8.4f\n",gRefCofG.x - gMobCofG[0].x,
+ gRefCofG.y - gMobCofG[0].y,
+ gRefCofG.z - gMobCofG[0].z);
+ }
+ else
+ {
+ /* Deal with multiple mobile stuctures. */
+ /* ==================================== */
+ REAL InvRotMat[3][3], ModRotMat[3][3];
+
+ /* Invert rotation matrix gRotMat[gMultiRef] */
+ /* The inverse of a rotation matrix is its transpose so copy
+ gRotMat[gMultiRef] to InvRotMat while transposing elements. */
+ for(i=0;i<3;i++)
+ {
+ for(j=0;j<3;j++)
+ {
+ InvRotMat[j][i] = gRotMat[gMultiRef][i][j];
+ }
+ }
+
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ printf(" Structure %d CofG...\n", strucnum+1);
+
+ printf(" %8.4f %8.4f %8.4f\n",gMobCofG[strucnum].x,
+ gMobCofG[strucnum].y,
+ gMobCofG[strucnum].z);
+ }
+
+ printf(" Rotation matrix...\n");
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ /* Calculate rotation matrix */
+ MatMult33_33(gRotMat[strucnum],InvRotMat,ModRotMat);
+ printf(" Structure %d:\n", strucnum+1);
+
+ if(gMultiVsRef)
+ {
+ for(i=0; i<3; i++)
+ {
+ printf(" %8.4f %8.4f %8.4f\n",
+ gRotMat[strucnum][i][0],
+ gRotMat[strucnum][i][1],
+ gRotMat[strucnum][i][2]);
+ }
+ }
+ else
+ {
+ for(i=0; i<3; i++)
+ {
+ printf(" %8.4f %8.4f %8.4f\n",
+ ModRotMat[i][0],
+ ModRotMat[i][1],
+ ModRotMat[i][2]);
+ }
+ }
+
+ }
+
+ printf(" Translation vector (between CofGs)...\n");
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ printf(" Structure %d:\n", strucnum+1);
+
+ if(gMultiVsRef)
+ {
+ printf(" %8.4f %8.4f %8.4f\n",
+ gRefCofG.x - gMobCofG[strucnum].x,
+ gRefCofG.y - gMobCofG[strucnum].y,
+ gRefCofG.z - gMobCofG[strucnum].z);
+ }
+ else
+ {
+ printf(" %8.4f %8.4f %8.4f\n",
+ gMobCofG[gMultiRef].x - gMobCofG[strucnum].x,
+ gMobCofG[gMultiRef].y - gMobCofG[strucnum].y,
+ gMobCofG[gMultiRef].z - gMobCofG[strucnum].z);
+ }
+ }
+ }
+
+ return;
+}
+
+
+/************************************************************************/
+/*>void ShowStatus(char *filename)
+ -------------------------------
+ Shows the current program status.
+
+ 28.09.92 Framework
+ 29.09.92 Original
+ 30.09.92 Modified for padding of atom names
+ 01.10.92 Modification to All zone printing
+ 17.07.95 Replaced calls to screen() with printf()
+ 18.07.95 Prints zones with chain first and with inserts
+ Prints *'s rather than -1 for all residues
+ Added calls to FormatZone()
+ 20.07.95 Added WEIGHTS
+ 25.07.95 Added code to print chain labels
+ 24.01.96 Fixed bug in printing atom names containing spaces
+ 31.06.96 Added BValue cutoff
+ 13.06.96 Modified B-value weighting message
+ 18.06.96 Replaced MODE_* with ZONE_MODE_*
+ 11.11.96 BValue cutoff message reflects new REF and MOB
+ parameters
+ 12.01.01 gMobPDB[] now an array
+ Added iterate mode printing
+ 01.02.01 Added printing of multi structure data
+ 20.02.01 -999 for start or end of structure rather than -1
+ 28.03.01 Reports CENTRE mode
+ 20.03.08 Reports fit centred on residue. By: CTP
+ 07.04.08 Distance cutoff included.
+ 10.06.08 Trying auto-setting zones to current numbering mode.
+ 01.08.08 Auto-setting zones feature disabled.
+ 05.09.08 Added reporting of gap penalties.
+ 07.11.08 Altered option to output to a file.
+ Marked reference for multi fitting.
+*/
+void ShowStatus(char *filename)
+{
+ char buffer[240],
+ atm[8],
+ *chains;
+ int i, j, strucnum;
+ ZONE *z;
+ FILE *fp = stdout;
+
+ /* Convert zones to current numbering mode */
+ /* Feature disabled - 01.08.08
+ This will put all zones into the current residue numbering scheme.
+ However, this is slow and may not be what users want, so has been
+ disabled. It hasn't been tested for some while, so re-enable at your
+ own risk!
+ */
+/***
+ SortAllZones();
+ ConvertAllZones(gCurrentMode);
+***/
+
+ /* Open output file/pipe */
+ if(filename)
+ {
+ if((fp=OpenOrPipe(filename))==NULL)
+ {
+ printf(" Warning: Unable to open output file\n");
+ fp = stdout;
+ }
+ }
+
+ fprintf(fp," Reference structure: %s\n",
+ (gRefFilename[0]?gRefFilename:"Undefined"));
+
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ /* Mark reference */
+ if(gMultiCount > 1 && strucnum == gMultiRef)
+ fprintf(fp," >");
+ else
+ fprintf(fp," ");
+ if(gMultiCount == 1)
+ {
+ fprintf(fp,"Mobile structure: %s\n",
+ (gMobFilename[strucnum][0] ?
+ gMobFilename[strucnum]:"Undefined"));
+ }
+ else
+ {
+ fprintf(fp,"Mobile structure:%4d %s\n",
+ strucnum + 1,
+ (gMobFilename[strucnum][0] ?
+ gMobFilename[strucnum]:"Undefined"));
+ }
+ }
+
+ fprintf(fp," HETATM records are: %s\n",
+ (gHetAtoms?"Included":"Ignored"));
+
+ fprintf(fp," Align gap penalty: %2d\n",gGapPen);
+ fprintf(fp," Align gap extend penalty: %2d\n",gGapPenExt);
+
+ fprintf(fp," Fitting will be: %s\n",
+ (gDoWeights==WEIGHT_BVAL?"Weighted by B-value":
+ (gDoWeights==WEIGHT_INVBVAL?"Weighted by 1/B-value":
+ "Normal (unweighted)")));
+
+ if(((chains = GetPDBChainLabels(gRefPDB)) != NULL) &&
+ ((strlen(chains) > 1) || (chains[0] != ' ')))
+ fprintf(fp," Reference structure Chains: %s\n",chains);
+ if(chains != NULL)
+ free(chains);
+
+ if(((chains = GetPDBChainLabels(gMobPDB[0])) != NULL) &&
+ ((strlen(chains) > 1) || (chains[0] != ' ')))
+ fprintf(fp," Mobile structure Chains: %s\n",chains);
+ if(chains != NULL)
+ free(chains);
+
+ fprintf(fp," Current numbering mode: ");
+ if(gCurrentMode == ZONE_MODE_RESNUM)
+ fprintf(fp,"Residue\n");
+ else if(gCurrentMode == ZONE_MODE_SEQUENTIAL)
+ fprintf(fp,"Sequential\n");
+
+ fprintf(fp," Iterative zone updating: ");
+ fprintf(fp,"%s\n", ((gIterate)?"On":"Off"));
+
+ fprintf(fp," Atoms being fitted: ");
+ if(gFitAtoms[0][0] == '*')
+ {
+ fprintf(fp,"All\n");
+ }
+ else
+ {
+ if(gNOTFitAtoms) fprintf(fp,"NOT ");
+
+ for(i=0;i<NUMTYPES;i++)
+ {
+ if(gFitAtoms[i][0] == '\0') break;
+ strcpy(atm,gFitAtoms[i]);
+ /* Remove trailing spaces */
+ /* 24.01.96 Was terminating at the first space. This broke
+ atom names containing spaces (e.g. "N A" in heme groups)
+ */
+ for(j=strlen(atm)-1; j>=0; j--)
+ {
+ if(atm[j] == ' ')
+ atm[j] = '\0';
+ else
+ break;
+ }
+
+ if(i)
+ sprintf(buffer, ", %s", atm);
+ else
+ sprintf(buffer, "%s", atm);
+ fprintf(fp,buffer);
+ }
+ fprintf(fp,"\n");
+ }
+ if(gUseBVal)
+ {
+ fprintf(fp," Atoms will be discarded if their B-value is > %.2f \
+in %s structure\n", gBValue,
+ (gUseBVal==2?"the reference":
+ (gUseBVal==3?"the mobile":
+ "either")));
+ }
+ else
+ {
+ fprintf(fp," Atoms will be included regardless of B-value\n");
+ }
+
+ if(gUseDistCutoff)
+ {
+ fprintf(fp," Atom pairs will be discarded if their interatomic \
+distance is > %.2f\n", gDistCutoff);
+ }
+ else
+ {
+ fprintf(fp," Atom pairs will be included regardless of \
+interatomic distance\n");
+ }
+
+ fprintf(fp," Zones being fitted: ");
+
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ if(gMultiCount > 2)
+ fprintf(fp,"\n (Mobile Structure: %d) ", strucnum+1);
+
+ if(gZoneList[strucnum]==NULL || !gUserFitZone)
+ {
+ fprintf(fp,"All\n");
+ }
+ else
+ {
+ int overlap = 0;
+
+ fprintf(fp,"\n");
+ for(z=gZoneList[strucnum]; z!=NULL; NEXT(z))
+ {
+ char zone1[64],
+ zone2[64];
+
+ FormatZone(zone1, z->chain1,
+ z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1);
+
+ FormatZone(zone2, z->chain2,
+ z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2);
+
+ fprintf(fp," %-16s with %-16s %s",
+ zone1, zone2,
+ ((z->mode == ZONE_MODE_RESNUM)?"(Residue numbering)"
+ :"(Sequential numbering)"));
+
+ /* CTP: Check for Overlap */
+ if(CheckOverlap(z,gZoneList[strucnum],strucnum) > 1)
+ {
+ overlap++;
+ fprintf(fp,"*\n");
+ }
+ else
+ {
+ fprintf(fp,"\n");
+ }
+ }
+ if(overlap)
+ fprintf(fp,"%44s * Overlapping zones.\n","");
+ }
+
+ /* CTP: Display centre for fitting */
+ if(gCZoneList[strucnum] != NULL)
+ {
+ char res1[64], res2[64];
+
+ sprintf(res1,"%c%d%c",
+ gCZoneList[strucnum]->chain1,
+ gCZoneList[strucnum]->start1,
+ gCZoneList[strucnum]->startinsert1);
+
+ sprintf(res2,"%c%d%c",
+ gCZoneList[strucnum]->chain2,
+ gCZoneList[strucnum]->start2,
+ gCZoneList[strucnum]->startinsert2);
+
+ fprintf(fp," Fit centred on residues:\n");
+ fprintf(fp," %-6s to %-6s %s\n",
+ res1, res2,
+ ((gCZoneList[strucnum]->mode == ZONE_MODE_RESNUM)?
+ "(Residue numbering)":"(Sequential numbering)"));
+ }
+ }
+
+ if(gFitted) /* Only display this when its definitely valid */
+ {
+ fprintf(fp," Atoms for RMS calculation: ");
+
+ if(gUserRMSAtoms)
+ {
+ if(gRMSAtoms[0][0] == '*')
+ {
+ fprintf(fp,"All\n");
+ }
+ else
+ {
+ if(gNOTRMSAtoms) fprintf(fp,"NOT ");
+
+ for(i=0;i<NUMTYPES;i++)
+ {
+ if(gRMSAtoms[i][0] == '\0') break;
+ strcpy(atm,gRMSAtoms[i]);
+ if(strchr(atm,' ')) *strchr(atm,' ') = '\0';
+
+ if(i)
+ sprintf(buffer, ", %s", atm);
+ else
+ sprintf(buffer, "%s", atm);
+ fprintf(fp,buffer);
+ }
+ fprintf(fp,"\n");
+ }
+ }
+ else
+ {
+ if(gFitAtoms[0][0] == '*')
+ {
+ fprintf(fp,"All\n");
+ }
+ else
+ {
+ if(gNOTFitAtoms) fprintf(fp,"NOT ");
+
+ for(i=0;i<NUMTYPES;i++)
+ {
+ if(gFitAtoms[i][0] == '\0') break;
+ strcpy(atm,gFitAtoms[i]);
+ if(strchr(atm,' ')) *strchr(atm,' ') = '\0';
+
+ if(i)
+ sprintf(buffer, ", %s", atm);
+ else
+ sprintf(buffer, "%s", atm);
+ fprintf(fp,buffer);
+ }
+ fprintf(fp,"\n");
+ }
+ }
+
+ fprintf(fp," Zones for RMS calculation: ");
+
+ if(gUserRMSZone)
+ {
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ int overlap = 0;
+ if(gMultiCount > 2)
+ fprintf(fp,"\n (Mobile Structure: %d) ", strucnum+1);
+
+ if(gRZoneList[strucnum]==NULL) fprintf(fp,"All\n");
+ else fprintf(fp,"\n");
+
+ for(z=gRZoneList[strucnum]; z!=NULL; NEXT(z))
+ {
+ char zone1[64],
+ zone2[64];
+
+ FormatZone(zone1, z->chain1,
+ z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1);
+
+ FormatZone(zone2, z->chain2,
+ z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2);
+
+ fprintf(fp," %-16s with %-16s %s",
+ zone1, zone2,
+ ((z->mode == ZONE_MODE_RESNUM)?"(Residue numbering)"
+ :"(Sequential numbering)"));
+ if(CheckOverlap(z,gRZoneList[strucnum],strucnum) > 1)
+ {
+ overlap++;
+ fprintf(fp,"*\n");
+ }
+ else
+ {
+ fprintf(fp,"\n");
+ }
+ }
+ if(overlap)
+ fprintf(fp,"%44s * Overlapping zones.\n","");
+ }
+ }
+ else
+ {
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ if(gMultiCount > 2)
+ fprintf(fp,"\n (Mobile Structure: %d) ", strucnum+1);
+ if(gZoneList[0]==NULL || !gUserFitZone)
+ {
+ fprintf(fp,"All\n");
+ }
+ else
+ {
+ int overlap = 0;
+ fprintf(fp,"\n");
+
+ for(z=gZoneList[strucnum]; z!=NULL; NEXT(z))
+ {
+ char zone1[64],
+ zone2[64];
+
+ FormatZone(zone1, z->chain1,
+ z->start1, z->startinsert1,
+ z->stop1, z->stopinsert1);
+
+ FormatZone(zone2, z->chain2,
+ z->start2, z->startinsert2,
+ z->stop2, z->stopinsert2);
+
+ fprintf(fp," %-16s with %-16s %s",
+ zone1, zone2,
+ ((z->mode == ZONE_MODE_RESNUM)?
+ "(Residue numbering)"
+ :"(Sequential numbering)"));
+
+ /* CTP: Check for overlap */
+ if(CheckOverlap(z,gZoneList[strucnum],strucnum) > 1)
+ {
+ overlap++;
+ fprintf(fp,"*\n");
+ }
+ else
+ {
+ fprintf(fp,"\n");
+ }
+ }
+ if(overlap)
+ fprintf(fp,"%44s * Overlapping zones.\n","");
+ }
+ }
+ }
+ }
+
+ if(gCentre)
+ {
+ fprintf(fp,"\n Coordinates written centred on: ORIGIN\n\n");
+ }
+ else
+ {
+ fprintf(fp,"\n Coordinates written centred on: REFERENCE \
+SET\n\n");
+ }
+
+ fprintf(fp," Reference sequence: ");
+ if(gRefSeq)
+ {
+ for(i=0, j=0; i<strlen(gRefSeq); i++)
+ {
+ buffer[j++] = gRefSeq[i];
+
+ if(j>59 || i==strlen(gRefSeq)-1)
+ {
+ buffer[j] = '\0';
+ fprintf(fp,"\n ");
+ fprintf(fp,buffer);
+ j=0;
+ }
+ }
+ fprintf(fp,"\n");
+ }
+ else
+ {
+ fprintf(fp,"Undefined\n");
+ }
+
+ /* CTP */
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ if(gMultiCount == 1)
+ fprintf(fp," Mobile sequence: ");
+ else
+ fprintf(fp," Mobile sequence:%4d ", strucnum+1);
+
+ if(gMobSeq[strucnum])
+ {
+ for(i=0, j=0; i<strlen(gMobSeq[strucnum]); i++)
+ {
+ buffer[j++] = gMobSeq[strucnum][i];
+
+ if(j>59 || i==strlen(gMobSeq[strucnum])-1)
+ {
+ buffer[j] = '\0';
+ fprintf(fp,"\n ");
+ fprintf(fp,buffer);
+ j=0;
+ }
+ }
+ fprintf(fp,"\n");
+ }
+ else
+ {
+ fprintf(fp,"Undefined\n");
+ }
+ }
+
+ /* Close output file/pipe */
+ if(fp != stdout)
+ CloseOrPipe(fp);
+
+ return;
+}
+
+
+/************************************************************************/
+/*>void stdprompt(char *string)
+ ----------------------------
+ Issues a prompt to stdout providing stdin is a tty
+
+ 18.07.95 Original By: ACRM
+*/
+#include <unistd.h>
+void stdprompt(char *string)
+{
+#if (unix || __unix__)
+ if(!isatty(0))
+ return;
+#endif
+
+ printf("%s> ",string);
+ fflush(stdout);
+}
+
+
+/************************************************************************/
+/*>void FormatZone(char *zone, char chain, int start, char startinsert,
+ int stop, char stopinsert)
+ --------------------------------------------------------------------
+ Formats a zone specification accounting for -999 values which represent
+ all residues
+
+ 18.07.95 Original By: ACRM
+ 20.02.01 -999 for start or end of structure rather than -1
+*/
+void FormatZone(char *zone, char chain, int start, char startinsert,
+ int stop, char stopinsert)
+{
+ char part1[16],
+ part2[16];
+
+ if((start == (-999)) && (stop == (-999)))
+ {
+ if(chain == ' ')
+ sprintf(zone,"All residues");
+ else
+ sprintf(zone,"Chain %c",chain);
+ }
+ else
+ {
+ if(start==(-999))
+ {
+ sprintf(part1,"%c*", chain);
+ sprintf(part2,"%c%d%c", chain, stop, stopinsert);
+ }
+ else if(stop==(-999))
+ {
+ sprintf(part1,"%c%d%c", chain, start, startinsert);
+ sprintf(part2,"%c*", chain);
+ }
+ else
+ {
+ sprintf(part1,"%c%d%c", chain, start, startinsert);
+ sprintf(part2,"%c%d%c", chain, stop, stopinsert);
+ }
+
+ sprintf(zone,"%-6s to %-6s", part1, part2);
+ }
+}
+
+
+/************************************************************************/
+/*>void ReadMulti(char *filename, BOOL xmasFormat)
+ -----------------------------------------------
+ Reads in multiple structures as given by the MULTI command
+
+ 01.02.01 Original By: ACRM
+ 01.03.01 Added xmasFormat parameter
+ 08.01.09 CTP modified error check for maximum number of structures.
+*/
+void ReadMulti(char *filename, BOOL xmasFormat)
+{
+ FILE *fof = NULL;
+ BOOL GotRef = FALSE;
+ char buffer[MAXBUFF], *ch;
+
+ gMultiCount = 0;
+
+ if((fof=fopen(filename,"r"))==NULL)
+ {
+ printf(" Error==> Can't open list of files: %s\n",
+ filename);
+ }
+ else
+ {
+ while(fgets(buffer, MAXBUFF, fof))
+ {
+ TERMINATE(buffer);
+ KILLTRAILSPACES(buffer);
+ KILLLEADSPACES(ch, buffer);
+
+ if((*ch != '#') && (*ch != '!') && (strlen(ch)))
+ {
+ if(!GotRef)
+ {
+ ReadStructure(STRUC_REFERENCE, ch, 0, xmasFormat);
+ GotRef=TRUE;
+ }
+
+ /* CTP */
+ if(gMultiCount < MAXSTRUC)
+ {
+ if(ReadStructure(STRUC_MOBILE,ch,gMultiCount,xmasFormat))
+ {
+ gMultiCount++;
+ }
+ }
+ else
+ {
+ printf(" Error==> Maximum structure count (%d) \
+exceeded. Increase MAXSTRUC\n", MAXSTRUC);
+ printf(" Skipped structure: %s\n",ch);
+ }
+ }
+ }
+ }
+}
+
+
+/************************************************************************/
+/*>void WriteCoordinates(char *filename, int strucnum)
+ ---------------------------------------------------
+ Writes a coordinate file.
+
+ 28.09.92 Framework
+ 01.10.92 Original
+ 17.07.95 Replaced calls to screen() with printf()
+ 18.07.95 Uses fopen() rather than OpenWrite()
+ Uses fclose() rather than CloseFile()
+ 21.07.95 Corrected logic of test for un-fitted PDB
+ 27.06.97 Changed call to fopen() to OpenOrPipe
+ 11.01.01 gFitPDB now an array
+ 01.02.01 Added strucnum parameter
+ 28.03.01 Added code to support strucnum < 0 to write the centred
+ reference set and centering of coordinates
+ 02.04.08 added calls to WritePDBHeader() and WritePDBFooter() By: CTP
+ 02.05.08 changed WritePDBHeader() and WritePDBFooter() calls to
+ calls to WriteWholePDBHeader() and WriteWholePDBTrailer()
+*/
+void WriteCoordinates(char *filename, int strucnum)
+{
+ FILE *fp;
+ PDB *pdb,
+ *pdbc;
+ VEC3F CofG;
+ WHOLEPDB *wpdb;
+
+ /* Point pdb to the coordinate set of interest */
+ if(strucnum < 0) /* use the reference set */
+ {
+ pdb = gRefPDB;
+ wpdb = gRefWPDB;
+ }
+ else /* use a mobile set */
+ {
+ pdb = gFitPDB[strucnum];
+ wpdb = gMobWPDB[strucnum];
+ }
+
+ CofG.x = -1.0 * gRefCofG.x;
+ CofG.y = -1.0 * gRefCofG.y;
+ CofG.z = -1.0 * gRefCofG.z;
+
+ /* If centering, then make a copy of the coordinates and translate
+ to the origin
+ */
+ if(gCentre && (pdb!=NULL))
+ {
+ if((pdbc = DupePDB(pdb))==NULL)
+ {
+ printf(" Error==> No memory for translating the reference \
+coordinates\n");
+ printf(" Centred reference coordinates not \
+written.\n");
+ return;
+ }
+
+ /* Point pdb to this copy of the coordinates */
+ pdb = pdbc;
+ TranslatePDB(pdb, CofG);
+ }
+
+ if(!gFitted || pdb == NULL)
+ {
+ printf(" Error==> Fitting has not yet been performed.\n");
+ }
+ else
+ {
+ if((fp=OpenOrPipe(filename))==NULL)
+ {
+ printf(" Error==> Enable to open file for writing.\n");
+ }
+ else
+ {
+ printf(" Writing coordinates...\n");
+
+ if(gReadHeader)
+ WriteWholePDBHeader(fp, wpdb);
+
+ WritePDB(fp, pdb);
+
+ if(gReadHeader)
+ WriteWholePDBTrailer(fp, wpdb);
+
+ CloseOrPipe(fp);
+ }
+ }
+
+ /* If centering, free the copy of the reference set */
+ if(gCentre && pdb!=NULL)
+ {
+ FREELIST(pdb, PDB);
+ }
+
+ return;
+}
+
+
+/************************************************************************/
+/*>void WriteMulti(char *ext)
+ --------------------------
+ Writes a set of fitted files from multi-fitting
+
+ 01.02.01 Original By: ACRM
+ 20.02.01 Added more defensive checks on string lengths and handle case
+ where input filename didn't contain a .
+*/
+void WriteMulti(char *ext)
+{
+ char filename[MAXBUFF];
+ int i,
+ j;
+
+ for(i=0; i<gMultiCount; i++)
+ {
+ strncpy(filename, gMobFilename[i], MAXBUFF);
+ /* Work from the end of string to strip off the extension */
+ j = strlen(filename)-1;
+
+
+ /* Step back until we find a . delimiting the extension */
+ while((filename[j] != '.') && (j>=0))
+ {
+ /* Break out if we find a / \ ] or : since we are in the path */
+ if(filename[j] == '/' ||
+ filename[j] == '\\' ||
+ filename[j] == ']' ||
+ filename[j] == ':')
+ {
+ j=0;
+ break;
+ }
+ j--;
+ }
+
+ /* There was a . in the filename, so truncate the string there */
+ if(j)
+ {
+ filename[j] = '\0';
+ }
+
+ /* Report error if filename too long */
+ if((strlen(filename) + strlen(ext) + 1) >= MAXBUFF)
+ {
+ printf(" Error==> Filename too long to add new extension\n");
+ printf(" Fitted file not written: %s\n",
+ gMobFilename[i]);
+ return;
+ }
+
+ /* Append extension */
+ strcat(filename,".");
+ strcat(filename, ext);
+
+ /* Write the file */
+ WriteCoordinates(filename, i);
+ }
+}
+
+
+/************************************************************************/
+/*>char *FindDash(char *buffer)
+ ----------------------------
+ Find a dash representing a range, skipping any - sign escaped with a
+ \ (\- is used to represent a negative residue number)
+
+ 20.02.01 Original By: ACRM
+*/
+char *FindDash(char *buffer)
+{
+ char *dash = NULL;
+
+ dash=strchr(buffer,'-');
+ while((dash != NULL) && (dash > buffer) && (*(dash-1) == '\\'))
+ {
+ buffer = dash+1;
+ dash=strchr(buffer,'-');
+ }
+
+ return(dash);
+}
+
+
+/************************************************************************/
+/*>PDB *ReadXMAS(FILE *fp, int *natoms)
+ ------------------------------------
+ Like ReadPDB() but reads from an XMAS file instead of a PDB file
+
+ 01.03.01 Original By: ACRM
+*/
+PDB *ReadXMAS(FILE *fp, int *natoms)
+{
+ return(doReadXMAS(fp, natoms, TRUE));
+}
+
+
+/************************************************************************/
+/*>PDB *ReadXMASAtoms(FILE *fp, int *natoms)
+ -----------------------------------------
+ Like ReadPDBAtoms() but reads from an XMAS file instead of a PDB file
+
+ 01.03.01 Original By: ACRM
+*/
+PDB *ReadXMASAtoms(FILE *fp, int *natoms)
+{
+ return(doReadXMAS(fp, natoms, FALSE));
+}
+
+
+/************************************************************************/
+#ifdef USE_XMAS
+# define _LIBACRM_H /* Stops libacrm.h being included */
+# define FDWRAP_MAX_BUFF 1024
+ typedef struct
+ {
+ int fd;
+ char buffer[FDWRAP_MAX_BUFF];
+ int buffpos,
+ maxbuff,
+ eof,
+ socket;
+ }
+ FDWRAP;
+
+# include "xmas.h"
+#endif
+
+
+/************************************************************************/
+/*>PDB *doReadXMAS(FILE *fp, int *natoms, int readhet)
+ ---------------------------------------------------
+ Reads an XMAS file into a PDB linked list
+
+ 01.03.01 Original By: ACRM
+ 15.03.01 Removed special call to ReadXmasData() - no longer needed as
+ XMAS column index is now stored in the XMAS structure
+*/
+PDB *doReadXMAS(FILE *fp, int *natoms, int readhet)
+{
+ PDB *pdb = NULL;
+
+#ifdef USE_XMAS
+ XMAS *xmas = NULL;
+ PDB *p;
+ char atnum[16],
+ atnam[16],
+ x[16],
+ y[16],
+ z[16],
+ occup[16],
+ bval[16],
+ resnam[16],
+ resnum[16],
+ chain[16],
+ type[16];
+
+ *natoms = 0;
+
+ /* Read the XMAS header */
+ if((xmas = ReadXmasHeader(fp))==NULL)
+ {
+ printf(" Error==> Couldn't read XMAS header: %s\n", gXMASError);
+ return(NULL);
+ }
+
+ /* Read in the XMAS data */
+ if(!CacheXmasData(xmas))
+ {
+ printf(" Error==> Couldn't read XMAS data: %s\n", gXMASError);
+ FreeXmasData(xmas);
+ return(NULL);
+ }
+
+ /* Check the data contains atom records */
+ if(!DoesXmasContain(xmas, "atoms"))
+ {
+ fprintf(stderr," Error==> XMAS file does not have ATOM \
+records!\n");
+ FreeXmasData(xmas);
+ return(NULL);
+ }
+
+ /* All looks OK, read the ATOM data into a PDB linked list */
+ while(ReadXmasData(xmas, "atoms",
+ "atnum, atnam, x, y, z, occup, bval, resnam, \
+ resnum, chain, type",
+ atnum, atnam, x, y, z, occup,
+ bval, resnam, resnum, chain, type))
+ {
+ if((!strcmp(type, "ATOM")) ||
+ (!strcmp(type, "HETATM") && readhet))
+ {
+
+ /* Allocate memory in the PDB linked list */
+ if(pdb==NULL)
+ {
+ INIT(pdb, PDB);
+ p=pdb;
+ }
+ else
+ {
+ ALLOCNEXT(p, PDB);
+ }
+ if(p==NULL)
+ {
+ FREELIST(pdb, PDB);
+ *natoms = 0;
+ return(NULL);
+ }
+
+ sscanf(atnum, "%d", &(p->atnum));
+
+ DEDOTIFY(atnam);
+ strcat(atnam, " ");
+ strcpy(p->atnam_raw, atnam);
+ strcpy(p->atnam, FixAtomName(atnam));
+ p->atnam[4] = '\0';
+ sscanf(x, "%lf", &(p->x));
+ sscanf(y, "%lf", &(p->y));
+ sscanf(z, "%lf", &(p->z));
+ sscanf(occup, "%lf", &(p->occ));
+ sscanf(bval, "%lf", &(p->bval));
+ strncpy(p->resnam, resnam, 3);
+ strcat(p->resnam, " ");
+
+ DEDOTIFY(resnum);
+ p->insert[0] = resnum[4];
+ p->insert[1] = '\0';
+ resnum[4] = '\0';
+ sscanf(resnum, "%d", &(p->resnum));
+
+ p->chain[0] = chain[0];
+ p->chain[1] = '\0';
+
+ strcpy(p->junk, type);
+ PADCHARMINTERM(p->junk, ' ', 6);
+
+ (*natoms)++;
+ }
+ }
+
+ /* Free the XMAS data */
+ FreeXmasData(xmas);
+
+#endif
+
+ return(pdb);
+}
+
+
+/************************************************************************/
+/*>void SetCentreResidue(char *command)
+ ------------------------------------
+ Sets a residue (a single-residue zone in a one zone list) as the
+ centre for fitting.
+ As the function sets-up a list of zones, future developments of ProFit
+ could be set up to define the centre for fitting around a list of
+ zones. (For instance, around an active site.)
+
+ 19.03.08 Original based on SetRMSZone() By: CTP
+ 22.04.08 Added handling of lowercase chain and inserts.
+*/
+void SetCentreResidue(char *command)
+{
+ int start1, stop1,
+ start2, stop2,
+ SeqZone, strucnum;
+ char chain1, chain2,
+ startinsert1, stopinsert1,
+ startinsert2, stopinsert2;
+ ZONE *z;
+ int warned = 0;
+
+ char *zone1, *zone2, *ptr;
+ char zone_command[20];
+
+ /* See if this is clearing the zones */
+ if(!upstrncmp(command,"CLEAR",5) || !strcmp(command,"*"))
+ {
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ if(gCZoneList[strucnum]!=NULL)
+ {
+ FREELIST(gRZoneList[strucnum],ZONE);
+ gCZoneList[strucnum] = NULL;
+ }
+ }
+
+ gFitted = FALSE;
+ return;
+ }
+
+ /* Parse command here */
+ KILLLEADSPACES(zone1, command);
+ zone2 = zone1;
+ if((ptr=strchr(command,':'))!=NULL)
+ {
+ if(gMultiCount > 1)
+ {
+ printf(" Warning: Cannot define residue using ':' \
+notation.\n");
+ return;
+ }
+
+ KILLLEADSPACES(zone2, ptr+1);
+ *ptr = '\0';
+ }
+
+ strcpy(zone_command,zone1);
+ strcat(zone_command,"-");
+ strcat(zone_command,zone1);
+
+ if(zone1 != zone2)
+ {
+ strcat(zone_command,":");
+ strcat(zone_command,zone2);
+ strcat(zone_command,"-");
+ strcat(zone_command,zone2);
+ }
+
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ SeqZone = ParseZone(zone_command, &start1, &stop1, &chain1,
+ &startinsert1, &stopinsert1,
+ &start2, &stop2, &chain2,
+ &startinsert2, &stopinsert2,
+ strucnum);
+
+ if((SeqZone == (-2)) && (!warned))
+ {
+ printf(" Error==> You cannot specify zones for each \
+structure when performing\n");
+ printf(" multiple structure fitting.\n");
+ warned = 1;
+ }
+
+ if(SeqZone > -1)
+ {
+ /* Blank Current List */
+ if(gCZoneList[strucnum]!=NULL)
+ {
+ FREELIST(gCZoneList[strucnum],ZONE);
+ gCZoneList[strucnum] = NULL;
+ }
+
+ /* Allocate entry */
+ INIT(gCZoneList[strucnum],ZONE);
+ z = gCZoneList[strucnum];
+
+ if(z==NULL)
+ {
+ printf(" Error==> No memory for zone!\n");
+ }
+ else
+ {
+ /* Add this zone to the zone list */
+ z->chain1 = chain1;
+ z->start1 = start1;
+ z->startinsert1 = startinsert1;
+ z->stop1 = stop1;
+ z->stopinsert1 = stopinsert1;
+ z->chain2 = chain2;
+ z->start2 = start2;
+ z->startinsert2 = startinsert2;
+ z->stop2 = stop2;
+ z->stopinsert2 = stopinsert2;
+ z->mode = SeqZone?ZONE_MODE_SEQUENTIAL:gCurrentMode;
+ }
+
+ gFitted = FALSE;
+ }
+ }
+
+ return;
+}
+
+
+/************************************************************************/
+/*>int RunScript(char *command)
+ ----------------------------
+ Run a script file.
+ Opens file, runs DoCommandLoop() then closes file. Also limits number
+ of scripts within scripts.
+
+ 27.03.08 Original By: CTP
+ 15.04.08 Quitting in a script now exits from ProFit instead of
+ returning to DoCommandLoop().
+*/
+int RunScript(char *command)
+{
+ const int max_recur_level = 1000;
+ static int recur_level = 0;
+ FILE *script = NULL;
+ char filename[MAXSTRLEN];
+
+
+ /* Only allow limited level of scripts running scripts */
+ if(recur_level >= max_recur_level)
+ {
+ printf(" Error==> The maximum number of nested scripts is %d.\n",
+ max_recur_level);
+ return(1);
+ }
+
+ /* Copy global variable 'command' to local 'filename' */
+ strcpy(filename, command);
+
+ /* Open script file */
+ if((script = fopen(filename,"r")) == NULL)
+ {
+ printf(" Error==> Failed to open: '%s'\n",filename);
+ return(1);
+ }
+
+ /* Run Script File */
+ recur_level++;
+ if(!gQuiet)
+ printf("\n Starting script: '%s'\n",filename);
+
+ if(!DoCommandLoop(script))
+ {
+ if(!gQuiet)
+ printf(" Finished script: '%s'\n\n",filename);
+ }
+ else
+ {
+ printf(" Error in script: '%s'\n\n",filename);
+ }
+
+ fclose(script);
+ recur_level--;
+
+ return(0);
+}
+
+
+
+/************************************************************************/
+/*>int ConvertResidueToSequential(PDB *pdb, ZONE *res_zone,
+ ZONE *seq_zone)
+ --------------------------------------------------------
+ Converts residue-numbered zone into sequential-numbered zone.
+
+ 08.04.08 Original By: CTP
+ 10.06.08 Set chain id to space for sequential-numbered zone.
+*/
+int ConvertResidueToSequential(ZONE *input_zone, int strucnum)
+{
+ PDB *p = NULL;
+ ZONE *output_zone = NULL;
+ int residue_count = 0;
+ int prev_residue = 0;
+ char prev_insert = ' ';
+ char prev_chain = ' ';
+
+ /* Zero start and stop for output_zone */
+ INIT(output_zone,ZONE);
+ output_zone->start1 = 0;
+ output_zone->start2 = 0;
+ output_zone->stop1 = 0;
+ output_zone->stop2 = 0;
+
+ /* Reference Residue */
+ residue_count = 0;
+ prev_residue = 0;
+ prev_insert = ' ';
+ prev_chain = ' ';
+
+ for(p=gRefPDB; p!=NULL; NEXT(p))
+ {
+ /* Update residue count */
+ if((p->resnum != prev_residue) ||
+ (p->insert[0] != prev_insert) ||
+ (p->chain[0] != prev_chain))
+ {
+ residue_count++;
+ }
+
+ /* Check for Start */
+ /* Start residue number == -999 */
+ if(((input_zone->chain1 == ' ') ||
+ (input_zone->chain1 == p->chain[0])) &&
+ (input_zone->start1 == -999 && output_zone->start1 == 0))
+ {
+ output_zone->start1 = residue_count;
+ }
+ /* Start residue number defined */
+ if(((input_zone->chain1 == ' ') ||
+ (input_zone->chain1 == p->chain[0])) &&
+ (input_zone->start1 == p->resnum) &&
+ (input_zone->startinsert1 == p->insert[0]) &&
+ (output_zone->start1 == 0))
+ {
+ output_zone->start1 = residue_count;
+ }
+
+ /* Check for Finish */
+ /* Stop residue number == -999 */
+ if(((input_zone->chain1 == ' ') ||
+ (input_zone->chain1 == p->chain[0])) &&
+ (input_zone->stop1 == -999))
+ {
+ output_zone->stop1 = residue_count;
+ }
+ /* Stop residue number defined */
+ if(((input_zone->chain1 == ' ') ||
+ (input_zone->chain1 == p->chain[0])) &&
+ (input_zone->stop1 == p->resnum) &&
+ (input_zone->stopinsert1 == p->insert[0]) &&
+ (output_zone->stop1 == 0))
+ {
+ output_zone->stop1 = residue_count;
+ }
+
+ /* Update previous residue */
+ prev_residue = p->resnum;
+ prev_insert = p->insert[0];
+ prev_chain = p->chain[0];
+ }
+
+ /* Mobile Residue */
+ residue_count = 0;
+ prev_residue = 0;
+ prev_insert = ' ';
+ prev_chain = ' ';
+
+ for(p=gMobPDB[strucnum]; p!=NULL; NEXT(p))
+ {
+ /* Update residue count */
+ if((p->resnum != prev_residue) ||
+ (p->insert[0] != prev_insert) ||
+ (p->chain[0] != prev_chain))
+ {
+ residue_count++;
+ }
+
+ /* Check for Start */
+ /* Start residue number == -999 */
+ if(((input_zone->chain2 == ' ') ||
+ (input_zone->chain2 == p->chain[0])) &&
+ (input_zone->start2 == -999 && output_zone->start2 == 0))
+ {
+ output_zone->start2 = residue_count;
+ }
+ /* Start residue number defined */
+ if(((input_zone->chain2 == ' ') ||
+ (input_zone->chain2 == p->chain[0])) &&
+ (input_zone->start2 == p->resnum) &&
+ (input_zone->startinsert2 == p->insert[0]) &&
+ (output_zone->start2 == 0))
+ {
+ output_zone->start2 = residue_count;
+ }
+
+ /* Check for Finish */
+ /* Stop residue number == -999 */
+ if(((input_zone->chain2 == ' ') ||
+ (input_zone->chain2 == p->chain[0])) &&
+ (input_zone->stop2 == -999))
+ {
+ output_zone->stop2 = residue_count;
+ }
+ /* Stop residue number defined */
+ if(((input_zone->chain2 == ' ') ||
+ (input_zone->chain2 == p->chain[0])) &&
+ (input_zone->stop2 == p->resnum) &&
+ (input_zone->stopinsert2 == p->insert[0]) &&
+ (output_zone->stop2 == 0))
+ {
+ output_zone->stop2 = residue_count;
+ }
+
+ /* Update previous residue */
+ prev_residue = p->resnum;
+ prev_insert = p->insert[0];
+ prev_chain = p->chain[0];
+ }
+
+ /* Start and Stop Assigned? */
+ if(!(output_zone->start1) || !(output_zone->stop1) ||
+ !(output_zone->start2) || !(output_zone->stop2))
+ {
+ /* ZONE NOT FOUND */
+ return(1);
+ }
+
+ /* Set Sequential Numbering */
+ input_zone->start1 = output_zone->start1;
+ input_zone->stop1 = output_zone->stop1;
+ input_zone->start2 = output_zone->start2;
+ input_zone->stop2 = output_zone->stop2;
+ input_zone->chain1 = ' ';
+ input_zone->chain2 = ' ';
+ input_zone->mode = ZONE_MODE_SEQUENTIAL;
+
+ /* Cleanup */
+ FREELIST(output_zone,ZONE);
+ return(0);
+}
+
+
+/************************************************************************/
+/*>int CheckOverlap(ZONE *inputtest, ZONE *inputlist, int strucnum)
+ ----------------------------------------------------------------
+ Function returns number of pairs of zones overlapping or -1 if the
+ function cannot find the zone inputtest or the list of zones
+ inputlist.
+
+ This function is called by:
+ ShowStatus() when flagging overlapping zones.
+ SetFitZone() and SetRMSZone() when checking new user defined zones
+ for overlaps with existing zones.
+
+ The function duplicates and converts residue-numbered zones into
+ sequentially-numbered zones for comparison.
+
+ 11.04.08 Original By: CTP
+ 14.04.08 Modified to make zonelist a list of two zones, A and B.
+*/
+int CheckOverlap(ZONE *inputtest, ZONE *inputlist, int strucnum)
+{
+ ZONE *zonelist = NULL; /* New list */
+ ZONE *za = NULL; /* Pointers to zones */
+ ZONE *zb = NULL;
+ ZONE *zc = NULL;
+ int overlap = 0; /* Pairs overlapping zones */
+
+ /* Check input */
+ if(!inputtest || !inputlist)
+ {
+ /* No Zone or Zone List Specified */
+ return(-1);
+ }
+
+ /* Duplicate test zone and put at head of list of zones */
+ INIT(zonelist,ZONE);
+ zonelist->chain1 = inputtest->chain1;
+ zonelist->start1 = inputtest->start1;
+ zonelist->startinsert1 = inputtest->startinsert1;
+ zonelist->stop1 = inputtest->stop1;
+ zonelist->stopinsert1 = inputtest->stopinsert1;
+ zonelist->chain2 = inputtest->chain2;
+ zonelist->start2 = inputtest->start2;
+ zonelist->startinsert2 = inputtest->startinsert2;
+ zonelist->stop2 = inputtest->stop2;
+ zonelist->stopinsert2 = inputtest->stopinsert2;
+ zonelist->mode = inputtest->mode;
+
+ /* Convert to sequential zone */
+ if(zonelist->mode == ZONE_MODE_RESNUM)
+ {
+ if(ConvertResidueToSequential(zonelist, strucnum))
+ {
+ /* Could not convert test zone */
+ return(-1);
+ }
+ }
+
+ /* Set zone za and zb */
+ za = zb = zonelist;
+ ALLOCNEXT(zb,ZONE);
+
+ /* Check zones in inputlist */
+ for(zc=inputlist; zc!=NULL; NEXT(zc))
+ {
+ zb->chain1 = zc->chain1;
+ zb->start1 = zc->start1;
+ zb->startinsert1 = zc->startinsert1;
+ zb->stop1 = zc->stop1;
+ zb->stopinsert1 = zc->stopinsert1;
+ zb->chain2 = zc->chain2;
+ zb->start2 = zc->start2;
+ zb->startinsert2 = zc->startinsert2;
+ zb->stop2 = zc->stop2;
+ zb->stopinsert2 = zc->stopinsert2;
+ zb->mode = zc->mode;
+
+ /* Convert residue zones to sequential zones */
+ if(zb->mode == ZONE_MODE_RESNUM)
+ ConvertResidueToSequential(zb, strucnum);
+
+ /* Test for overlap between zone za and zone zb */
+ /* Ignore zones that can't be converted */
+ if((((za->start1 <= zb->start1) && (za->stop1 >= zb->start1)) ||
+ ((zb->start1 <= za->start1) && (zb->stop1 >= za->start1)) ||
+ ((za->start2 <= zb->start2) && (za->stop2 >= zb->start2)) ||
+ ((zb->start2 <= za->start2) && (zb->stop2 >= za->start2))) &&
+ (zb->mode != ZONE_MODE_RESNUM))
+ overlap++;
+ }
+
+ /* Cleanup */
+ FREELIST(zonelist,ZONE);
+
+ return(overlap);
+}
+
+
+/************************************************************************/
+/*>void SetZoneFromBValCol(void)
+ -----------------------------
+ Function sets zones using markers set in the temperature factor column
+ of a PDB file.
+
+ Zones are marked by stretches of positive whole numbers or integers in
+ the temperature factor column. Zeros are ignored. If both the Reference
+ structure and the mobile structure(s) are marked then zones are assigned
+ between corresponding stretches of labeled residues. If only the
+ reference structure is marked then the same residue numbers for both the
+ reference structure and the mobile structure zones.
+
+ 12.05.08 Original By: CTP
+ 16.05.08 Corrected bug with assigning zones. Added assignment from
+ Reference Structure only if there are no markers set in the
+ mobile structure.
+ 22.05.08 Removed Remove-Duplicate-Zones section - not needed.
+*/
+void SetZoneFromBValCol(void)
+{
+ int snum = 0;
+ int label = 0;
+ int maxbval = 0;
+ int ref_res = 0;
+ int mob_res = 0;
+ PDB *pdbref = NULL;
+ PDB *pdbmob = NULL;
+ PDB *currpdbref = NULL;
+ PDB *currpdbmob = NULL;
+ ZONE *zonelist = NULL;
+ ZONE *z = NULL;
+ ZONE *zn = NULL;
+ BOOL converged = TRUE;
+ BOOL ref_assigned = FALSE;
+ BOOL mob_assigned = FALSE;
+ BOOL ref_only = FALSE;
+ BOOL all_zero = TRUE;
+
+ if(!gQuiet)
+ printf(" Assigning zones from Temperature Factor column.\n");
+
+ /* Find highest BVal + Error check */
+ for(pdbref = gRefPDB; pdbref!= NULL; NEXT(pdbref))
+ {
+ if( (int)(pdbref->bval) > maxbval)
+ maxbval = (int)(pdbref->bval);
+
+ /* Return if column not integers/whole numbers */
+ if(pdbref->bval != (int)(pdbref->bval))
+ {
+ printf(" ERROR: Integer or whole numbers");
+ printf(" needed in Temperature Factor column\n");
+ return;
+ }
+
+ /* Return if column is less than zero */
+ if((int)(pdbref->bval) < 0)
+ {
+ printf(" ERROR: Temperature Factors cannot be negative.\n");
+ return;
+ }
+ }
+
+ /* Return if no values in column */
+ if(!maxbval)
+ {
+ printf(" ERROR: No values found in Temperature Factor column\n");
+ return;
+ }
+
+ /* Check format of Mobile Structures */
+ for(snum = 0; snum < gMultiCount; snum++)
+ {
+ for(pdbmob = gMobPDB[snum]; pdbmob != NULL; NEXT(pdbmob))
+ {
+ /* Check Format */
+ if(pdbmob->bval != (int)(pdbmob->bval))
+ ref_only = TRUE;
+
+ if(pdbmob->bval != 0.0)
+ all_zero = FALSE;
+ }
+ }
+
+ if(all_zero)
+ ref_only = TRUE;
+
+ /* Clear Existing Zones */
+ for(snum = 0; snum < gMultiCount; snum++)
+ {
+ if(gZoneList[snum] != NULL)
+ {
+ FREELIST(gZoneList[snum],ZONE);
+ gZoneList[snum] = NULL;
+ }
+ }
+ gUserFitZone = FALSE;
+
+ /* Cycle through mobile structures */
+ for(snum = 0; snum < gMultiCount; snum++)
+ {
+ /* Loop through ZoneLabels (B Value Column) */
+ for(label = 1; label <= maxbval; label++)
+ {
+ /* Assign Zones */
+ /* Clear zonelist */
+ if(zonelist != NULL)
+ {
+ FREELIST(zonelist,ZONE);
+ zonelist = NULL;
+ }
+
+ /* Reset Residue Count */
+ ref_res = 1;
+ mob_res = 1;
+ pdbref = gRefPDB;
+ pdbmob = gMobPDB[snum];
+ currpdbref = gRefPDB;
+ currpdbmob = gMobPDB[snum];
+ ref_assigned = FALSE;
+ mob_assigned = FALSE;
+
+ /* Loop Through Ref PDB */
+ for(; (pdbref!= NULL) && (pdbmob != NULL); NEXT(pdbref))
+ {
+ if((pdbref->resnum != currpdbref->resnum) ||
+ strcmp(pdbref->chain, currpdbref->chain)||
+ strcmp(pdbref->insert,currpdbref->insert))
+ {
+ currpdbref = pdbref;
+ ref_res++;
+ ref_assigned = FALSE;
+ }
+
+ /* Find Label in Reference */
+ if(((int)(pdbref->bval) == label) && !ref_assigned)
+ {
+ if(ref_only)
+ {
+ /* Assign Zone from Reference Stucture Alone */
+ ref_assigned = TRUE;
+
+ /* Update Zone */
+ if(!zonelist)
+ {
+ INIT(zonelist,ZONE);
+ z = zonelist;
+ }
+ else
+ {
+ z=zonelist;
+ LAST(z);
+ ALLOCNEXT(z,ZONE);
+ }
+
+ z->chain1 = ' ';
+ z->start1 = ref_res;
+ z->startinsert1 = ' ';
+ z->stop1 = ref_res;
+ z->stopinsert1 = ' ';
+ z->chain2 = ' ';
+ z->start2 = ref_res;
+ z->startinsert2 = ' ';
+ z->stop2 = ref_res;
+ z->stopinsert2 = ' ';
+ z->mode = ZONE_MODE_SEQUENTIAL;
+ }
+ else
+ {
+ /* Find Label in Mobile */
+ for(; pdbmob != NULL && !ref_assigned; NEXT(pdbmob))
+ {
+ if((pdbmob->resnum != currpdbmob->resnum) ||
+ strcmp(pdbmob->chain, currpdbmob->chain)||
+ strcmp(pdbmob->insert,currpdbmob->insert))
+ {
+ currpdbmob = pdbmob;
+ mob_res++;
+ mob_assigned = FALSE;
+ }
+
+ if(((int)(pdbmob->bval) == label) &&
+ !ref_assigned &&
+ !mob_assigned)
+ {
+ ref_assigned = TRUE;
+ mob_assigned = TRUE;
+
+ /* Update Zone */
+ if(!zonelist)
+ {
+ INIT(zonelist,ZONE);
+ z = zonelist;
+ }
+ else
+ {
+ z=zonelist;
+ LAST(z);
+ ALLOCNEXT(z,ZONE);
+ }
+
+ z->chain1 = ' ';
+ z->start1 = ref_res;
+ z->startinsert1 = ' ';
+ z->stop1 = ref_res;
+ z->stopinsert1 = ' ';
+ z->chain2 = ' ';
+ z->start2 = mob_res;
+ z->startinsert2 = ' ';
+ z->stop2 = mob_res;
+ z->stopinsert2 = ' ';
+ z->mode = ZONE_MODE_SEQUENTIAL;
+ }
+ }
+ }
+ }
+ }
+
+ /* Merge Zones */
+ if(zonelist)
+ {
+ do
+ {
+ /* Assume we have converged */
+ converged = TRUE;
+ for(z=zonelist; z!=NULL; NEXT(z))
+ {
+ zn = z->next;
+ if(zn)
+ {
+ /* See if the two zones are sequential */
+ if((zn->start1 == (z->stop1 + 1)) &&
+ (zn->start2 == (z->stop2 + 1)))
+ {
+ z->stop1 = zn->stop1;
+ z->stop2 = zn->stop2;
+ z->next = zn->next;
+ free(zn);
+ converged = FALSE;
+ }
+ }
+ }
+ } while(!converged);
+ }
+
+
+ /* Append to global ZoneList */
+ if(zonelist)
+ {
+ if(!gZoneList[snum])
+ {
+ gZoneList[snum] = zonelist;
+ }
+ else
+ {
+ z = gZoneList[snum];
+ LAST(z);
+ z->next = zonelist;
+ }
+ zonelist = NULL;
+ }
+
+ } /* End of loop through labels */
+ } /* End of loop through mobile structures */
+
+ gUserFitZone = TRUE;
+ return;
+}
+
+
+/************************************************************************/
+/*>int ConvertSequentialToResidue(ZONE *input_zone, int strucnum)
+ --------------------------------------------------------------
+ Converts sequential-numbered zone into residue-numbered zone.
+
+ 06.06.08 Original By: CTP
+*/
+int ConvertSequentialToResidue(ZONE *input_zone, int strucnum)
+{
+ PDB *p = NULL;
+ PDB *p_start = NULL;
+ PDB *q = NULL;
+ PDB *q_start = NULL;
+ ZONE *output_zonelist = NULL;
+ ZONE *z = NULL;
+ int residue_count = 0;
+ int prev_residue = 0;
+ char prev_insert = ' ';
+ char prev_chain = ' ';
+ int ref_start = 0;
+ int ref_stop = 0;
+ int mob_start = 0;
+ int mob_stop = 0;
+ int i = 0;
+
+
+ /* Check input zone is sequential and not NULL */
+ if(!input_zone || input_zone->mode == ZONE_MODE_RESNUM)
+ return(0);
+
+ /* Zero start and stop for output_zone */
+ INIT(output_zonelist,ZONE);
+ output_zonelist->start1 = 0;
+ output_zonelist->start2 = 0;
+ output_zonelist->stop1 = 0;
+ output_zonelist->stop2 = 0;
+
+ /* Find Reference Residues */
+ residue_count = 0;
+ prev_residue = 0;
+ prev_insert = ' ';
+ prev_chain = ' ';
+
+ /* Find ref start residue p */
+ ref_start = input_zone->start1;
+ ref_stop = input_zone->stop1;
+
+ /* Start is undefined */
+ if(ref_start == -999)
+ ref_start = 1;
+
+ for(p=gRefPDB; p!=NULL && (p_start==NULL || ref_stop == -999); NEXT(p))
+ {
+ /* Update residue count */
+ if((p->resnum != prev_residue) ||
+ (p->insert[0] != prev_insert) ||
+ (p->chain[0] != prev_chain))
+ {
+ residue_count++;
+ }
+
+ /* Set start atom */
+ if(!p_start && residue_count == ref_start)
+ p_start = p;
+
+ /* Update previous residue */
+ prev_residue = p->resnum;
+ prev_insert = p->insert[0];
+ prev_chain = p->chain[0];
+ }
+
+ /* Set stop if undefined */
+ if(ref_stop == -999)
+ ref_stop = residue_count;
+
+ /* Find Mobile Residues */
+ residue_count = 0;
+ prev_residue = 0;
+ prev_insert = ' ';
+ prev_chain = ' ';
+
+ /* Find mob start residue p */
+ mob_start = input_zone->start2;
+ mob_stop = input_zone->stop2;
+
+ /* Start is undefined */
+ if(mob_start == -999)
+ mob_start = 1;
+
+ for(p=gMobPDB[strucnum];
+ p!=NULL && (q_start==NULL || mob_stop == -999);
+ NEXT(p))
+ {
+ /* Update residue count */
+ if((p->resnum != prev_residue) ||
+ (p->insert[0] != prev_insert) ||
+ (p->chain[0] != prev_chain))
+ {
+ residue_count++;
+ }
+
+ /* Set start atom */
+ if(!q_start && residue_count == mob_start)
+ q_start = p;
+
+ /* Update previous residue */
+ prev_residue = p->resnum;
+ prev_insert = p->insert[0];
+ prev_chain = p->chain[0];
+ }
+
+ /* Set stop if undefined */
+ if(mob_stop == -999)
+ mob_stop = residue_count;
+
+ /* Error Checking */
+
+ /* Error: Ref Start Not Set */
+ if(!p_start)
+ {
+ printf(" Error: Reference start residue not found.\n");
+ return(1);
+ }
+
+ /* Error: Mob Start Not Set */
+ if(!q_start)
+ {
+ printf(" Error: Mobile start residue not found.\n");
+ return(1);
+ }
+
+ /* Error: Number of Residues Not Matched */
+ if((ref_stop - ref_start) != (mob_stop - mob_start))
+ {
+ char zone1[64];
+ char zone2[64];
+
+ printf(" Error: Number of residues in zone does not match.\n");
+
+ FormatZone(zone1, input_zone->chain1,
+ input_zone->start1, input_zone->startinsert1,
+ input_zone->stop1, input_zone->stopinsert1);
+
+ FormatZone(zone2, input_zone->chain2,
+ input_zone->start2, input_zone->startinsert2,
+ input_zone->stop2, input_zone->stopinsert2);
+
+ printf(" %-16s with %-16s %s\n",
+ zone1, zone2,
+ ((input_zone->mode == ZONE_MODE_RESNUM)?"(Residue numbering)"
+ :"(Sequential numbering)"));
+
+ printf(" Reference: %d, Mobile: %d\n\n",
+ (ref_stop - ref_start + 1),(mob_stop - mob_start + 1));
+
+ return(1);
+ }
+
+ /* Assign the Zones */
+
+ /* Set Start Residues */
+ p = p_start;
+ q = q_start;
+
+ /* Set Zone start1 resnum,insert,chain & start2 resnum,insert,chain */
+ z = output_zonelist;
+ z->chain1 = p->chain[0];
+ z->start1 = p->resnum;
+ z->startinsert1 = p->insert[0];
+ z->stop1 = p->resnum;
+ z->stopinsert1 = p->insert[0];
+ z->chain2 = q->chain[0];
+ z->start2 = q->resnum;
+ z->startinsert2 = p->insert[0];
+ z->stop2 = q->resnum;
+ z->stopinsert2 = q->insert[0];
+ z->mode = ZONE_MODE_RESNUM;
+ z->next = NULL;
+
+ /* Find continuous zones */
+ for(i=ref_start; i < ref_stop; i++)
+ {
+ /* Skip to Next Ref */
+ while(p->chain[0] == z->chain1 &&
+ p->resnum == z->stop1 &&
+ p->insert[0] == z->stopinsert1)
+ {
+ NEXT(p);
+ }
+
+ /* Skip to Next Mob */
+ while(q->chain[0] == z->chain2 &&
+ q->resnum == z->stop2 &&
+ q->insert[0] == z->stopinsert2)
+ {
+ NEXT(q);
+ }
+
+ /* Check Prev Chain */
+ if(p->chain[0] != z->chain1 || q->chain[0] != z->chain2)
+ {
+ /* Add New Zone */
+ ALLOCNEXT(z,ZONE);
+
+ /* Set Start + Stop Position */
+ z->chain1 = p->chain[0];
+ z->start1 = p->resnum;
+ z->startinsert1 = p->insert[0];
+ z->stop1 = p->resnum;
+ z->stopinsert1 = p->insert[0];
+ z->chain2 = q->chain[0];
+ z->start2 = q->resnum;
+ z->startinsert2 = p->insert[0];
+ z->stop2 = q->resnum;
+ z->stopinsert2 = q->insert[0];
+ z->mode = ZONE_MODE_RESNUM;
+ z->next = NULL;
+ }
+ else
+ {
+ /* Update Stop Position */
+ z->stop1 = p->resnum;
+ z->stopinsert1 = p->insert[0];
+ z->stop2 = q->resnum;
+ z->stopinsert2 = q->insert[0];
+ }
+ }
+
+
+ /* Splice output_zonelist into input_zone position */
+ /* Set End of List */
+ z->next = input_zone->next;
+
+ /* Set Start of List */
+ input_zone->chain1 = output_zonelist->chain1;
+ input_zone->start1 = output_zonelist->start1;
+ input_zone->startinsert1 = output_zonelist->startinsert1;
+ input_zone->stop1 = output_zonelist->stop1;
+ input_zone->stopinsert1 = output_zonelist->stopinsert1;
+ input_zone->chain2 = output_zonelist->chain2;
+ input_zone->start2 = output_zonelist->start2;
+ input_zone->startinsert2 = output_zonelist->startinsert2;
+ input_zone->stop2 = output_zonelist->stop2;
+ input_zone->stopinsert2 = output_zonelist->stopinsert2;
+ input_zone->mode = output_zonelist->mode;
+ input_zone->next = output_zonelist->next;
+
+ /* Cleanup - this is only used as a single structure and not a linked
+ list so it only needs a free()
+ */
+ free(output_zonelist);
+
+ return(0);
+}
+
+
+/************************************************************************/
+/*>ZONE *SortZoneList(ZONE *zonelist)
+ ----------------------------------
+ Sorts a sequentially-numbered zone list by setting head of sorted list
+ and inserting zones into sorted list either when start of next zone is
+ greater than current zone or at end of sorted list. Residue-numbered
+ zones are appended to the end of the linked list.
+
+ 10.06.08 Original By: CTP
+*/
+ZONE *SortZoneList(ZONE *zonelist)
+{
+ ZONE *sortlist = NULL;
+ ZONE *z_curr = NULL;
+ ZONE *z_next = NULL;
+ ZONE *z_sort = NULL;
+
+ /* Make sortlist with blank zone at head as placeholder */
+ INIT(sortlist,ZONE);
+
+ z_curr = zonelist; /* Set current zone at head of zonelist */
+ z_sort = sortlist; /* Set sort zone at head of sortlist */
+
+
+ /* Insertion Sort (sort of...) */
+ z_curr = zonelist; /* Set current zone at head of zonelist */
+ while(z_curr != NULL)
+ {
+ /* Set zone pointers */
+ z_sort = sortlist;
+ z_next = z_curr->next;
+
+ /* Append unconverted zones to end of sortlist */
+ if(z_curr->mode == ZONE_MODE_RESNUM)
+ {
+ LAST(z_sort);
+ z_sort->next = z_curr;
+ z_curr->next = NULL;
+ z_sort = NULL;
+ }
+
+ /* Scan through sortlist */
+ while(z_sort != NULL)
+ {
+ if(z_sort->next != NULL)
+ {
+ /* Insert curr_zone in sortlist */
+ if((z_sort->next->start1 > z_curr->start1) ||
+ (z_sort->next->mode == ZONE_MODE_RESNUM))
+ {
+ z_curr->next = z_sort->next;
+ z_sort->next = z_curr;
+ z_sort = NULL;
+ }
+ else
+ {
+ NEXT(z_sort);
+ }
+ }
+ else
+ {
+ /* Append unmatched zone to end of sortlist */
+ z_sort->next = z_curr;
+ z_curr->next = NULL;
+ z_sort = NULL;
+ }
+ }
+
+ /* Set current zone to next zone */
+ z_curr = z_next;
+ }
+
+ /* Return sorted list */
+ zonelist = sortlist->next;
+ sortlist->next = NULL;
+ FREELIST(sortlist,ZONE);
+
+ return(zonelist);
+}
+
+
+/************************************************************************/
+/*>int ConvertZoneList(ZONE *zonelist, int strucnum, int mode)
+ -----------------------------------------------------------
+ Convert zonelist to numbering mode.
+
+ 10.06.08 Original By: CTP
+*/
+int ConvertZoneList(ZONE *zonelist, int strucnum, int mode)
+{
+ ZONE *z = NULL;
+ int error = 0;
+
+ for(z=zonelist; z!=NULL; NEXT(z))
+ {
+ if(z->mode == mode)
+ {
+ continue;
+ }
+
+ if(mode == ZONE_MODE_RESNUM)
+ error += ConvertSequentialToResidue(z, strucnum);
+ else
+ error += ConvertResidueToSequential(z, strucnum);
+ }
+
+ return(error);
+}
+
+
+/************************************************************************/
+/*>int ConvertAllZones(int mode)
+ -----------------------------
+ Convert all zonelists to numbering mode.
+
+ 10.06.08 Original By: CTP
+*/
+int ConvertAllZones(int mode)
+{
+ int strucnum = 0;
+ int error = 0;
+
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ /* Convert Fitting Zones */
+ error += ConvertZoneList(gZoneList[strucnum], strucnum, mode);
+
+ /* Convert RMSd Calculation Zones */
+ error += ConvertZoneList(gRZoneList[strucnum], strucnum, mode);
+
+ /* Convert Centre Zone */
+ error += ConvertZoneList(gCZoneList[strucnum], strucnum, mode);
+ }
+ return(error);
+}
+
+
+/************************************************************************/
+/*> int SortAllZones(void)
+ ----------------------
+ Sort all zone lists.
+
+ 10.06.08 Original By: CTP
+*/
+int SortAllZones(void)
+{
+ int strucnum = 0;
+ int error = 0;
+
+ for(strucnum=0; strucnum<gMultiCount; strucnum++)
+ {
+ /* Convert Zones to Sequential Numbering */
+ error += ConvertZoneList(gZoneList[strucnum], strucnum,
+ ZONE_MODE_SEQUENTIAL);
+ error += ConvertZoneList(gRZoneList[strucnum], strucnum,
+ ZONE_MODE_SEQUENTIAL);
+
+ /* Sort Zonelists */
+ gZoneList[strucnum] = SortZoneList(gZoneList[strucnum]);
+ gRZoneList[strucnum] = SortZoneList(gRZoneList[strucnum]);
+ }
+ return(error);
+}
+
+
+/************************************************************************/
+/*>ZONE *ChainList(PDB *pdblist)
+ -----------------------------
+ Takes a PDB sequence and returns the chains as a linked list of ZONEs.
+ Output is a sequentially numbered zonelist with the chain ID set.
+
+ Unlike the ZONEs defined by other functions (such as SetFitZone()) the
+ ZONEs defined by this function only cover a single chain rather than
+ indicating the equivalent regions of two chains.
+
+ 25.06.08 Original By: CTP
+*/
+ZONE *ChainList(PDB *pdblist)
+{
+ ZONE *chainlist, *z;
+ PDB *p = NULL;
+ int residue_count = 0;
+ int prev_residue = 0;
+ char prev_insert = ' ';
+ char prev_chain = ' ';
+
+ /* Return if no pdblist */
+ if(!pdblist)
+ {
+ return(NULL);
+ }
+
+ /* Zero start stop for chainlist */
+ INIT(chainlist,ZONE);
+ chainlist->start1 = 1;
+ chainlist->mode = ZONE_MODE_SEQUENTIAL;
+
+ /* Zero Residue Count */
+ residue_count = 0;
+ prev_residue = 0;
+ prev_insert = ' ';
+
+ /* Set Chain ID */
+ chainlist->chain1 = pdblist->chain[0];
+ prev_chain = pdblist->chain[0];
+
+ /* Cycle through pdblist */
+ for(z = chainlist, p=pdblist; p!=NULL; NEXT(p))
+ {
+ /* Update residue count */
+ if((p->resnum != prev_residue) ||
+ (p->insert[0] != prev_insert) ||
+ (p->chain[0] != prev_chain))
+ {
+ residue_count++;
+ }
+
+ /* If chain changes then add zone */
+ if(p->chain[0] != prev_chain)
+ {
+ z->stop1 = residue_count - 1;
+ ALLOCNEXT(z,ZONE);
+ z->start1 = residue_count;
+ z->chain1 = p->chain[0];
+ z->mode = ZONE_MODE_SEQUENTIAL;
+ }
+
+ /* Update previous residue */
+ prev_residue = p->resnum;
+ prev_insert = p->insert[0];
+ prev_chain = p->chain[0];
+
+ }
+
+ /* Set Final Stop */
+ z->stop1 = residue_count;
+
+ /* Return */
+ return(chainlist);
+}
+
+
+/************************************************************************/
+/*>BOOL SequentialZones(int strucnum)
+ ----------------------------------
+ Checks zones in a list occur sequentially along the reference and
+ mobile protein sequences for each chain.
+
+ Fit zones CANNOT be converted into sequence alignments if the zones
+ aren't in sequence.
+
+ Note: The zones in the zonelist should be converted to SEQUENTIAL
+ numbering mode and sorted prior to calling this function.
+
+ 31.07.08 Original By: CTP
+*/
+BOOL SequentialZones(int strucnum)
+{
+ ZONE *chainlist_mob, *chainlist_ref;
+ ZONE *ref, *mob, *z, *p;
+
+ /* Set chains */
+ chainlist_ref = ChainList(gRefPDB);
+ chainlist_mob = ChainList(gMobPDB[strucnum]);
+
+ /* Cycle through ref */
+ for(ref=chainlist_ref; ref!=NULL; NEXT(ref))
+ {
+ /* Cycle through mob */
+ for(mob=chainlist_mob; mob!=NULL; NEXT(mob))
+ {
+ p=NULL; /* Reset previous residue pointer */
+ for(z=gZoneList[strucnum];z!=NULL; NEXT(z))
+ {
+ /* Check zone between ref and mob */
+ if((z->start1 >= ref->start1)&&(z->stop1 <= ref->stop1)&&
+ (z->start2 >= mob->start1)&&(z->stop2 <= mob->stop1)&&
+ (z->mode == ZONE_MODE_SEQUENTIAL))
+ {
+ if(p==NULL)
+ {
+ p=z;
+ continue;
+ }
+
+ if((p->stop1 >= z->start1)||
+ (p->stop2 >= z->start2))
+ {
+ /* Return - Out of Sequence */
+ return(FALSE);
+ }
+ else
+ {
+ p=z;
+ }
+ }
+ }
+ }
+ }
+
+ /* Return - In Sequence */
+ return(TRUE);
+}
+
+
+/************************************************************************/
+/*>BOOL SequentialZonesWholeSeq(int strucnum)
+ ------------------------------------------
+ Checks zones in a list occur sequentially along the reference and
+ mobile protein sequences for whole sequence.
+
+ Fit zones CANNOT be converted into sequence alignments if the zones
+ aren't in sequence.
+
+ Note: The zones in the zonelist should be converted to SEQUENTIAL
+ numbering mode and sorted prior to calling this function.
+
+ 30.01.09 Original based on SequentialZones() By: CTP
+*/
+BOOL SequentialZonesWholeSeq(int strucnum)
+{
+ ZONE *z = NULL;
+ ZONE *p = NULL;
+
+
+ for(z=gZoneList[strucnum];z!=NULL; NEXT(z))
+ {
+ /* Check zone numbering */
+ if(z->mode != ZONE_MODE_SEQUENTIAL)
+ return(FALSE);
+
+ /* Set p = z for first zone */
+ if(p==NULL)
+ {
+ p=z;
+ continue;
+ }
+
+ if((p->stop1 >= z->start1)|| (p->stop2 >= z->start2))
+ {
+ /* Return - Out of Sequence */
+ return(FALSE);
+ }
+ else
+ {
+ p=z;
+ }
+ }
+
+ /* Return - In Sequence */
+ return(TRUE);
+}
+
+
+/************************************************************************/
+/*>BOOL OneToOneChains(int strucnum)
+ ---------------------------------
+ Function tests to see if there is a one to one match between aligned
+ chains when assigning zones based on alignment.
+
+ 20.10.08 Original By: CTP
+*/
+BOOL OneToOneChains(int strucnum)
+{
+ ZONE *chainlist[2];
+ ZONE *ref, *mob, *z;
+ BOOL found = FALSE;
+ BOOL OneToOne = TRUE; /* Return Value */
+ char chainid = ' ';
+ int i;
+
+ if(gZoneList[strucnum] == NULL) return(TRUE);
+
+ chainlist[0] = ChainList(gRefPDB);
+ chainlist[1] = ChainList(gMobPDB[strucnum]);
+
+ /* Check each ref chain goes to one mob chain */
+ /* Check each mob chain goes to one ref chain */
+
+ for(i=0; i<2; i++)
+ {
+ for(ref = chainlist[i]; ref!=NULL; NEXT(ref))
+ {
+ /* Note:
+ The expression [i * -1 + 1] alternates between 0 and 1
+ when: i = 0, i * -1 +1 = 1
+ i = 1, i * -1 +1 = 0
+ */
+ for(mob = chainlist[i * -1 + 1]; mob!=NULL; NEXT(mob))
+ {
+ for(z=gZoneList[strucnum]; z!=NULL; NEXT(z))
+ {
+ /* Find zones in both chains */
+ if(((i == 0) &&
+ (z->start1>=ref->start1)&&(z->stop1<=ref->stop1)&&
+ (z->start2>=mob->start1)&&(z->stop2<=mob->stop1)&&
+ (z->mode == ZONE_MODE_SEQUENTIAL)) ||
+ ((i == 1) &&
+ (z->start1>=mob->start1)&&(z->stop1<=mob->stop1)&&
+ (z->start2>=ref->start1)&&(z->stop2<=ref->stop1)&&
+ (z->mode == ZONE_MODE_SEQUENTIAL)))
+ {
+ if(!found)
+ {
+ /* Remember first chain ID */
+ chainid = mob->chain1;
+ found = TRUE;
+ }
+ else
+ {
+ /* Return FALSE if different chain ID found */
+ if(mob->chain1 != chainid)
+ {
+ OneToOne = FALSE;
+ goto Return;
+ }
+ }
+
+ }
+ }
+ }
+
+ /* Reset found flag */
+ found = FALSE;
+ }
+ }
+
+Return:
+ /* Free memory and return */
+ if(chainlist[0]) FREELIST(chainlist[0],ZONE);
+ if(chainlist[1]) FREELIST(chainlist[1],ZONE);
+
+ return(OneToOne);
+}
+
+
+/************************************************************************/
+/*>int EnforceOneToOneChains(int strucnum)
+ ---------------------------------------
+ Enforces a one-to-one alignment of chains between the reference
+ structure and the mobile structure.
+
+ 31.07.08 Original By: CTP
+*/
+int EnforceOneToOneChains(int strucnum)
+{
+ ZONE *chainlist_mob, *chainlist_ref;
+ ZONE *zonelist, *deletelist;
+ ZONE *ref, *mob, *z, *p, *d;
+
+ int best;
+ int curr;
+
+ /* Null Pointers */
+ chainlist_mob = chainlist_ref = zonelist = deletelist = NULL;
+ ref = mob = z = p = d = NULL;
+
+ /* Set blank place holder */
+ INIT(zonelist, ZONE);
+ INIT(deletelist,ZONE);
+
+ /* Copy global zones to function */
+ zonelist->next = gZoneList[strucnum];
+
+ /* Set chains */
+ chainlist_ref = ChainList(gRefPDB);
+ chainlist_mob = ChainList(gMobPDB[strucnum]);
+
+ /*** Pass A: Ref to Mob ***/
+
+ /* Cycle through ref */
+ for(ref=chainlist_ref; ref!=NULL; NEXT(ref))
+ {
+ best = 0;
+ ref->chain2 = chainlist_mob->chain1;
+ ref->start2 = chainlist_mob->start1;
+ ref->stop2 = chainlist_mob->stop1;
+
+ /* Cycle through mob */
+ for(mob=chainlist_mob; mob!=NULL; NEXT(mob))
+ {
+ curr = 0;
+ /* Count matching residues */
+ for(z=zonelist->next;z!=NULL; NEXT(z))
+ {
+ if((z->start1 >= ref->start1)&&(z->stop1 <= ref->stop1)&&
+ (z->start2 >= mob->start1)&&(z->stop2 <= mob->stop1)&&
+ (z->mode == ZONE_MODE_SEQUENTIAL))
+ {
+ curr += (z->stop1 - z->start1 + 1);
+ }
+ }
+
+ /* Set current best match. */
+ if(curr > best)
+ {
+ best = curr;
+ ref->chain2 = mob->chain1;
+ ref->start2 = mob->start1;
+ ref->stop2 = mob->stop1;
+ }
+ }
+ }
+
+ /*** Pass B: Mob to Ref ***/
+
+ /* Cycle through mob */
+ for(mob=chainlist_mob; mob!=NULL; NEXT(mob))
+ {
+ best = 0;
+ mob->chain2 = chainlist_ref->chain1;
+ mob->start2 = chainlist_ref->start1;
+ mob->stop2 = chainlist_ref->stop1;
+
+
+ /* Cycle through ref */
+ for(ref=chainlist_ref; ref!=NULL; NEXT(ref))
+ {
+ curr = 0;
+ /* Count matching residues */
+ for(z=zonelist->next;z!=NULL; NEXT(z))
+ {
+ if((z->start1 >= ref->start1)&&(z->stop1 <= ref->stop1)&&
+ (z->start2 >= mob->start1)&&(z->stop2 <= mob->stop1)&&
+ (z->mode == ZONE_MODE_SEQUENTIAL))
+ {
+ curr += (z->stop1 - z->start1 + 1);
+ }
+ }
+
+ /* Set current best match. */
+ if(curr > best)
+ {
+ best = curr;
+ mob->chain2 = ref->chain1;
+ mob->start2 = ref->start1;
+ mob->stop2 = ref->stop1;
+ }
+ }
+ }
+
+ /* Keep / Delete Structures */
+ /* ------------------------ */
+
+ /* Set delete pointer */
+ d = deletelist;
+
+ /*** Pass A Ref to Mobile ***/
+ /* Cycle through ref */
+ for(ref=chainlist_ref; ref!=NULL; NEXT(ref))
+ {
+ /* Cycle through mob */
+ for(mob=chainlist_mob; mob!=NULL; NEXT(mob))
+ {
+ /* Cycle through zones */
+ p = zonelist; /* previous pointer */
+ for(z=zonelist->next;z!=NULL; NEXT(z))
+ {
+ if(((z->start1 >= ref->start1)&&(z->stop1 <= ref->stop1))&&
+ ((z->start2 < ref->start2)||(z->stop2 > ref->stop2))&&
+ (z->mode == ZONE_MODE_SEQUENTIAL))
+ {
+ /* Remove zone from zonelist */
+ p->next = z->next;
+
+ /* Append zone to deletelist */
+ d->next = z;
+ z->next = NULL;
+ d = z;
+
+ /* Set zone pointer to previous zone */
+ z = p;
+ }
+ else
+ {
+ p = z;
+ }
+ }
+ }
+ }
+
+ /*** Pass B Mobile to Ref ***/
+ /* Cycle through mob */
+ for(mob=chainlist_mob; mob!=NULL; NEXT(mob))
+ {
+ /* Cycle through mob */
+ for(ref=chainlist_ref; ref!=NULL; NEXT(ref))
+ {
+ /* Cycle through zones */
+ for(z=zonelist->next;z!=NULL; NEXT(z))
+ {
+ if(((z->start2 >= mob->start1)&&(z->stop2 <= mob->stop1))&&
+ ((z->start1 < mob->start2)||(z->stop1 > mob->stop2))&&
+ (z->mode == ZONE_MODE_SEQUENTIAL))
+ {
+ /* Remove zone from zonelist */
+ p->next = z->next;
+
+ /* Append zone to deletelist */
+ d->next = z;
+ z->next = NULL;
+ d = z;
+
+ /* Set zone pointer to previous zone */
+ z = p;
+ }
+ else
+ {
+ p = z;
+ }
+ }
+ }
+ }
+
+ /* Announce deleted zones */
+ if(!gQuiet)
+ {
+ if(deletelist->next != NULL)
+ {
+ /* Convert to Current Numbering Scheme */
+ ConvertZoneList(deletelist, strucnum, gCurrentMode);
+
+ printf(" Deleted zones:\n");
+ for(d=deletelist->next; d!=NULL; NEXT(d))
+ {
+ char zone1[64], zone2[64];
+
+ FormatZone(zone1, d->chain1,
+ d->start1, d->startinsert1,
+ d->stop1, d->stopinsert1);
+
+ FormatZone(zone2, d->chain2,
+ d->start2, d->startinsert2,
+ d->stop2, d->stopinsert2);
+
+ printf(" %-16s with %-16s %s\n",
+ zone1, zone2,
+ ((d->mode == ZONE_MODE_RESNUM)
+ ?"(Residue numbering)"
+ :"(Sequential numbering)"));
+ }
+ }
+ else
+ {
+ printf(" No zones deleted\n");
+ }
+ }
+
+ /* Reset Global Zonelist */
+ gZoneList[strucnum] = zonelist->next;
+ zonelist->next = NULL;
+
+ /* Free Memory */
+ FREELIST(zonelist ,ZONE);
+ FREELIST(deletelist,ZONE);
+
+ return(0);
+}
+
+
+/************************************************************************/
+/*>int CopyPDBListToRef(int strucnum)
+ ----------------------------------
+ Copies gMobPDB[strucnum] to gRefPDB. Called by SetMobileToReference().
+
+ 20.10.08 Original By: CTP
+ 05.11.08 Fixed bug - Function scanned through whole PDBList when
+ allocating memory for next item in new reference list.
+*/
+int CopyPDBListToRef(int strucnum)
+{
+ PDB *mp = NULL;
+ PDB *rp = NULL;
+ int natoms = 0;
+
+ for(mp = gMobPDB[strucnum]; mp != NULL; NEXT(mp), natoms++)
+ {
+ if(!gRefPDB)
+ {
+ INIT(gRefPDB,PDB);
+ rp = gRefPDB;
+ }
+ else
+ {
+ ALLOCNEXT(rp,PDB);
+ }
+
+ if(rp)
+ {
+ rp->atnum = mp->atnum;
+ rp->resnum = mp->resnum;
+ rp->x = mp->x;
+ rp->y = mp->y;
+ rp->z = mp->z;
+ rp->occ = mp->occ;
+ rp->bval = mp->bval;
+ rp->altpos = mp->altpos;
+
+ strcpy(rp->record_type, mp->record_type);
+ strcpy(rp->atnam, mp->atnam);
+ strcpy(rp->atnam_raw, mp->atnam_raw);
+ strcpy(rp->resnam, mp->resnam);
+ strcpy(rp->chain, mp->chain);
+ strcpy(rp->insert, mp->insert);
+
+ rp->next = NULL;
+ }
+ else
+ {
+ printf(" Error==> No memory for operation!\n");
+ return(0);
+ }
+ }
+
+ return(natoms);
+}
+
+
+/************************************************************************/
+/*>int SetMobileToReference(int strucnum)
+ --------------------------------------
+ This function sets a mobile structure to the reference structure. This
+ function is called by the SETREF command and ALLVsAllRMS().
+
+ 20.10.08 Original By: CTP
+*/
+int SetMobileToReference(int strucnum)
+{
+ int natoms = 0;
+ int i = 0;
+
+ /* Copy Filename */
+ strcpy(gRefFilename,gMobFilename[strucnum]);
+
+ /* Free current reference WHOLEPDB */
+ if(gRefWPDB)
+ {
+ STRINGLIST *s = NULL;
+
+ if(gRefWPDB->header) FreeStringList(gRefWPDB->header);
+ if(gRefWPDB->trailer) FreeStringList(gRefWPDB->trailer);
+ gRefWPDB->header = NULL;
+ gRefWPDB->trailer = NULL;
+
+ for(s=gMobWPDB[strucnum]->header; s!=NULL; NEXT(s))
+ {
+ gRefWPDB->header = StoreString(gRefWPDB->header, s->string);
+ }
+
+ for(s=gMobWPDB[strucnum]->trailer; s!=NULL; NEXT(s))
+ {
+ gMobWPDB[strucnum]->trailer = StoreString(gRefWPDB->trailer,
+ s->string);
+ }
+
+ gRefWPDB->natoms = 0;
+ }
+
+ /* Free current reference PDB */
+ if(gRefPDB)
+ {
+ FREELIST(gRefPDB,PDB);
+ gRefPDB = NULL;
+ }
+
+ /* Copy Mobile PDB List to Reference */
+ natoms = CopyPDBListToRef(strucnum);
+
+ if(gRefWPDB)
+ {
+ gRefWPDB->natoms = natoms;
+ }
+
+ /* Allocate coordinate array */
+ if(gRefCoor) free(gRefCoor);
+ if((gRefCoor = (COOR *)malloc(natoms * sizeof(COOR))) == NULL)
+ printf(" Error==> Unable to allocate reference coordinate \
+memory!\n");
+
+ /* Convert sequence */
+ if(gRefSeq != NULL) free(gRefSeq);
+ if((gRefSeq = PDB2Seq(gRefPDB))==NULL)
+ printf(" Error==> Unable to read sequence for reference \
+structure!\n");
+
+ /* Renumber old ref zones with new mobile zones. */
+ for(i=0;i<gMultiCount && gUserFitZone;i++)
+ {
+ ZONE *za, *zb;
+ za=gZoneList[i];
+ zb=gZoneList[strucnum];
+
+ for(;za != NULL && zb != NULL; NEXT(za), NEXT(zb))
+ {
+ za->chain1 = zb->chain2;
+ za->start1 = zb->start2;
+ za->startinsert1 = zb->startinsert2;
+ za->stop1 = zb->stop2;
+ za->stopinsert1 = zb->stopinsert2;
+ }
+
+ za=gRZoneList[i];
+ zb=gRZoneList[strucnum];
+
+ for(;za != NULL && zb != NULL; NEXT(za), NEXT(zb))
+ {
+ za->chain1 = zb->chain2;
+ za->start1 = zb->start2;
+ za->startinsert1 = zb->startinsert2;
+ za->stop1 = zb->stop2;
+ za->stopinsert1 = zb->stopinsert2;
+ }
+
+ za=gCZoneList[i];
+ zb=gRZoneList[strucnum];
+
+ for(;za != NULL && zb != NULL; NEXT(za), NEXT(zb))
+ {
+ za->chain1 = zb->chain2;
+ za->start1 = zb->start2;
+ za->startinsert1 = zb->startinsert2;
+ za->stop1 = zb->stop2;
+ za->stopinsert1 = zb->stopinsert2;
+ }
+ }
+
+ return(0);
+}
+
+
+/************************************************************************/
+/*>int fit_order_cmp(const void *ptr_scoreA, const void *ptr_scoreB)
+ -----------------------------------------------------------------
+ Comparison function for quicksorting a 2D array in ascending order.
+ The first element of the array is the structure number and the second
+ element is a score (usually RMSD). The array elements are first
+ compared by the score and (if necessary) the structure number. The
+ comparison returns either -1 or 1 giving a stable sort.
+
+ 20.10.08 Original By: CTP
+*/
+int fit_order_cmp(const void *ptr_scoreA, const void *ptr_scoreB)
+{
+ const REAL *pa = *(const REAL (*)[2])ptr_scoreA;
+ const REAL *pb = *(const REAL (*)[2])ptr_scoreB;
+
+ if(pa[1] != pb[1])
+ {
+ /* Sort by score (ascending) */
+ return((pa[1] < pb[1])?-1:1);
+ }
+ else
+ {
+ /* Sort by structure number (ascending) */
+ return((pa[0] < pb[0])?-1:1);
+ }
+}
+
+
+/************************************************************************/
+/*>int AllVsAllRMS(char *filename, BOOL print_tab, BOOL set_ref)
+ -------------------------------------------------------------
+ Runs an all vs all comparison of mobile structures. The function
+ prints the all vs all comparison as a tab-delimited list if the
+ print_tab flag is set. The function prints to stdout by default but
+ will print to a file if a filename is supplied.
+
+ The set_ref flag sets function to select the most central mobile
+ structure as the reference structure by performing an all vs all
+ comparison and selecting the structure with the lowest overall RMSD to
+ the other structures.
+
+ 20.10.08 Original By: CTP
+ 31.10.08 Set to detect error when printing tab-delim output.
+ 07.11.08 Sets gMultiRef for automatic selection. Resets to gMultiRef
+ after all vs all.
+ 25.11.08 Added error message under AllVsAll output.
+*/
+int AllVsAllRMS(char *filename, BOOL print_tab, BOOL set_ref)
+{
+ FILE *fp = stdout;
+ int mob_i, mob_j;
+ BOOL error_found = FALSE;
+
+ REAL (*sortrms)[2] = NULL;
+ REAL **all_rms = NULL;
+
+ /* Set gMultiVsRef to TRUE */
+ BOOL multivsref = gMultiVsRef;
+ gMultiVsRef = TRUE;
+
+ /* Open Output File/Pipe */
+ if(filename)
+ {
+ if((fp=OpenOrPipe(filename))==NULL)
+ {
+ printf(" Warning==> Uunable to open file: %s\n", filename);
+ fp = stdout;
+ }
+ }
+
+ /* Allocate memory */
+ sortrms = malloc(gMultiCount * sizeof *sortrms);
+ all_rms = malloc(gMultiCount * sizeof(REAL *));
+ for(mob_i = 0; mob_i < gMultiCount; mob_i++)
+ {
+ all_rms[mob_i] = malloc(gMultiCount * sizeof(REAL));
+ }
+
+ /* Feedback for user */
+ if(!gQuiet && !set_ref)
+ printf(" All vs All...\n");
+
+ /* Trim Zones */
+ TrimZones();
+
+ /* All vs All RMSD */
+ for(mob_i = 0;mob_i < gMultiCount; mob_i++)
+ {
+ for(mob_j = 0;mob_j < gMultiCount; mob_j++)
+ all_rms[mob_i][mob_j] = 0.0;
+ }
+
+ for(mob_i = 0;mob_i < gMultiCount - 1; mob_i++)
+ {
+ SetMobileToReference(mob_i);
+
+ /* all_rms[mob_i][mob_i] = 0.0; */
+
+ for(mob_j = mob_i + 1 ;mob_j < gMultiCount; mob_j++)
+ {
+ REAL rmsd = 0.0;
+ printf(" RMS %2d vs %2d ",mob_i+1,mob_j+1);
+ rmsd = FitSingleStructure(mob_j, TRUE);
+ all_rms[mob_i][mob_j] = rmsd;
+ all_rms[mob_j][mob_i] = rmsd;
+
+ /* Error Flag */
+ if(rmsd == -1.0) error_found = TRUE;
+ }
+ }
+
+ /* Print All vs All Comparison */
+ if(print_tab)
+ {
+ for(mob_i = 0; mob_i < gMultiCount; mob_i++)
+ {
+ fprintf(fp,"\t%d",mob_i+1);
+ }
+ fprintf(fp,"\n");
+
+ for(mob_i = 0; mob_i < gMultiCount; mob_i++)
+ {
+ fprintf(fp,"%d",mob_i+1);
+ for(mob_j = 0; mob_j < gMultiCount; mob_j++)
+ {
+ if(all_rms[mob_i][mob_j] != -1.0)
+ fprintf(fp,"\t%.3f",all_rms[mob_i][mob_j]);
+ else
+ fprintf(fp,"\terror");
+ }
+ fprintf(fp,"\n");
+ }
+
+ /* Print Error Message */
+ if(error_found)
+ {
+ printf("\n Error: Error found calculating RMSD");
+ printf(" in all vs all comparison\n");
+ }
+ }
+
+ /* Set Reference Structure */
+ if(set_ref)
+ {
+ /* Auto Set Reference Structure */
+ /* Find sum of RMSDs from all from all other structures */
+ for(mob_i = 0; mob_i < gMultiCount; mob_i++)
+ {
+ sortrms[mob_i][0] = (REAL)mob_i;
+ sortrms[mob_i][1] = 0.0;
+
+ for(mob_j = 0; mob_j < gMultiCount; mob_j++)
+ {
+ sortrms[mob_i][1] += all_rms[mob_i][mob_j];
+ }
+ }
+
+ /* Sort Array to Set Fit Order */
+ qsort(sortrms,gMultiCount,sizeof(*sortrms),fit_order_cmp);
+
+ /* Set Reference Structure */
+ SetMobileToReference((int)sortrms[0][0]);
+ gMultiRef = (int)sortrms[0][0];
+
+ if(!gQuiet)
+ {
+ printf(" Mobile structure %d used as reference.\n",
+ (int)sortrms[0][0] + 1);
+ }
+ }
+ else
+ {
+ /* Reset reference to current multistructure reference */
+ SetMobileToReference(gMultiRef);
+ }
+
+ /* Reset gMultiVsRef */
+ gMultiVsRef = multivsref;
+
+ /* Close Output File/Pipe */
+ if(fp != stdout)
+ CloseOrPipe(fp);
+
+ /* Free Memory */
+ free(sortrms);
+ for(mob_i = 0; mob_i < gMultiCount; mob_i++)
+ {
+ free(all_rms[mob_i]);
+ }
+ free(all_rms);
+
+ return(0);
+}
+
+
+/************************************************************************/
+/*>ZONE *SetOverlappingZones(ZONE *InputA, ZONE *InputB)
+ -----------------------------------------------------
+ This function is called by TrimZones() and returns a zonelist with the
+ sections of sequence in the reference structure that are common to both
+ the zonelists InputA and InputB.
+
+ 20.10.08 Original By: CTP
+ 04.02.09 Altered if statement from form 'x == y == z' to form
+ 'x == z && y == z' as first form generates warning on Mac.
+ */
+ZONE *SetOverlappingZones(ZONE *InputA, ZONE *InputB)
+{
+ ZONE *OutputList;
+ ZONE *za, *zb, *zo;
+
+ OutputList = za = zb = zo = NULL;
+
+
+ for(za = InputA; za != NULL; NEXT(za))
+ {
+ for(zb = InputB; zb != NULL; NEXT(zb))
+ {
+ /* Test for overlap between zone za and zone zb */
+ if(((za->start1 <= zb->start1 && za->stop1 >= zb->start1) ||
+ (zb->start1 <= za->start1 && zb->stop1 >= za->start1)) &&
+ (za->mode == ZONE_MODE_SEQUENTIAL &&
+ zb->mode == ZONE_MODE_SEQUENTIAL))
+ {
+ /* Add to Output List */
+ if(!OutputList)
+ {
+ /* Init zonelist */
+ INIT(OutputList,ZONE);
+ zo = OutputList;
+ }
+ else
+ {
+ /* Append zone */
+ zo = OutputList;
+ LAST(zo);
+ ALLOCNEXT(zo,ZONE);
+ }
+
+ /* Set Start/Stop */
+ zo->start1 = (za->start1 <= zb->start1) ?
+ zb->start1 : za->start1;
+ zo->stop1 = (za->stop1 <= zb->stop1 ) ?
+ za->stop1 : zb->stop1;
+
+ /* Set Everything Else */
+ zo->startinsert1 = zo->stopinsert1 = ' ';
+ zo->startinsert2 = zo->stopinsert2 = ' ';
+ zo->chain1 = zo->chain2 = ' ';
+ zo->start2 = zo->stop2 = 0;
+ zo->mode = ZONE_MODE_SEQUENTIAL;
+ }
+ }
+ }
+
+ return(OutputList);
+}
+
+
+/************************************************************************/
+/*>ZONE *RenumberZone(ZONE *InputZone, ZONE *OverlapZone)
+ ------------------------------------------------------
+ This function is called by TrimZones() and returns a zonelist with the
+ fitting zones for a mobile stucture trimed/renumbered based on the
+ sections of fitting zones common to all mobile structures. Each zone in
+ the InputZone list is compared to the OverlapZone list and a renumbered
+ zonelist is returned.
+
+ 20.10.08 Original By: CTP
+ 04.02.09 Altered if statement from form 'x == y == z' to form
+ 'x == z && y == z' as first form generates warning on Mac.
+*/
+ZONE *RenumberZone(ZONE *InputZone, ZONE *OverlapZone)
+{
+ ZONE *NewZoneList = NULL;
+ ZONE *zi, *zo, *zn;
+
+ zi = zo = zn = NULL;
+
+ for(zi = InputZone; zi != NULL; NEXT(zi))
+ {
+ for(zo = OverlapZone; zo != NULL; NEXT(zo))
+ {
+ if(((zi->start1 <= zo->start1 && zi->stop1 >= zo->start1) ||
+ (zo->start1 <= zi->start1 && zo->stop1 >= zi->start1)) &&
+ (zi->mode == ZONE_MODE_SEQUENTIAL &&
+ zo->mode == ZONE_MODE_SEQUENTIAL))
+ {
+ /* If overlap then add to New Zone List */
+ if(!NewZoneList)
+ {
+ /* Init zonelist */
+ INIT(NewZoneList,ZONE);
+ zn = NewZoneList;
+ }
+ else
+ {
+ /* Append zone */
+ zn = NewZoneList;
+ LAST(zn);
+ ALLOCNEXT(zn,ZONE);
+ }
+
+ /* Set reference residues */
+ zn->start1 = zo->start1;
+ zn->stop1 = zo->stop1;
+
+ /* Renumber mobile residues */
+ zn->start2 = zi->start2 + (zo->start1 - zi->start1);
+ zn->stop2 = zi->stop2 - (zi->stop1 - zo->stop1 );
+
+ /* Set other values */
+ zn->startinsert1 = zn->stopinsert1 = ' ';
+ zn->startinsert2 = zn->stopinsert2 = ' ';
+ zn->chain1 = zn->chain2 = ' ';
+ zn->mode = ZONE_MODE_SEQUENTIAL;
+ }
+ }
+ }
+
+ return(NewZoneList);
+}
+
+/************************************************************************/
+/*>int TrimZones(void)
+ -------------------
+ This function organizes the fitting zones so that the zones are
+ identical for all mobile structures. For zones derived by pairwise
+ alignment this meant "trimming" the ends of the alignment and ensuring
+ that gaps from one sequence are included across all sequences. This
+ allows meaningful like vs like comparisons when comparing multiple
+ structures and allows setting of mobile structures to the reference
+ structure without having to redefine zones.
+
+ 20.10.08 Original By: CTP
+ 31.10.08 Set error traps for no overlapping/user-defined zones.
+ 18.12.08 Added conversion to residue numbering then sequential
+ numbering to ensure chain breaks included.
+ */
+int TrimZones(void)
+{
+ int i;
+ ZONE *OverlapList = NULL;
+
+ /* Check for User-Defined Zones */
+ if(!gUserFitZone)
+ {
+ if(!gQuiet)
+ printf(" Error: No user-defined zones found.\n");
+ return(0);
+ }
+ else
+ {
+ if(!gQuiet)
+ printf(" Finding common zones...\n");
+ }
+
+ /* Convert to sequential numbering with breaks between chains */
+ if(ConvertAllZones(ZONE_MODE_RESNUM) ||
+ ConvertAllZones(ZONE_MODE_SEQUENTIAL))
+ {
+ printf(" Error: Could not convert zones.\n");
+ return(1);
+ }
+
+ /* Find common residues across all structures */
+ for(i=1;i<gMultiCount && gUserFitZone;i++)
+ {
+ ZONE *TempList = NULL;
+ if(OverlapList)
+ {
+ TempList = SetOverlappingZones(OverlapList, gZoneList[i]);
+ FREELIST(OverlapList,ZONE);
+ }
+ else
+ {
+ TempList = SetOverlappingZones(gZoneList[0], gZoneList[i]);
+ }
+
+ OverlapList = TempList;
+ }
+
+ /* Error Trap: No ZONES */
+ if(OverlapList == NULL)
+ {
+ if(!gQuiet)
+ printf(" Warning: No common zones found.\n");
+ return(1);
+ }
+
+ /* Renumber Fit Zones */
+ for(i=0;i<gMultiCount && gUserFitZone;i++)
+ {
+ ZONE *TempList = NULL;
+
+ TempList = RenumberZone(gZoneList[i],OverlapList);
+ FREELIST(gZoneList[i],ZONE);
+ gZoneList[i] = TempList;
+ }
+
+ /* Set Fitted Flags */
+ gFitted = FALSE;
+ gUserFitZone = TRUE;
+
+ return(0);
+}
+
+
+/************************************************************************/
+/*>int FitStructuresWrapper(void)
+ ------------------------------
+ Wrapper function for FitStructuresInOrder(). This function finds the
+ RMSD for each mobile structure and sorts the fitting order by RMSD.
+ Although the structures are ordered by RMSD, the function could be
+ expanded to use any other score (for example, fitting by alignment
+ score).
+
+ 20.10.08 Original By: CTP
+ 03.11.08 Turned-off iterative updates when setting fit order.
+*/
+int FitStructuresWrapper(void)
+{
+ int mob_i;
+ REAL (*sortrms)[2] = NULL;
+ BOOL iterate = FALSE;
+
+ /* Allocate memory */
+ sortrms = malloc(gMultiCount * sizeof *sortrms);
+
+ /* Find Sort Order */
+ if(!gQuiet)
+ printf(" Setting Fit Order...\n");
+
+ iterate = gIterate;
+ gIterate = FALSE;
+
+ for(mob_i = 0;mob_i < gMultiCount; mob_i++)
+ {
+ printf(" Mobile %2d ",mob_i + 1);
+ sortrms[mob_i][0] = (REAL)mob_i;
+ sortrms[mob_i][1] = FitSingleStructure(mob_i, TRUE);
+ }
+
+ gIterate = iterate;
+
+ /* Sort Array to Set Fit Order */
+ qsort(sortrms,gMultiCount,sizeof(*sortrms),fit_order_cmp);
+
+ /* Fit Stuctures in order */
+ FitStructuresInOrder(sortrms);
+
+ /* Free Memory */
+ free(sortrms);
+
+ return(0);
+}
diff --git a/src/main.p b/src/main.p
new file mode 100644
index 0000000..4f59615
--- /dev/null
+++ b/src/main.p
@@ -0,0 +1,123 @@
+void logo(void)
+;
+int main(int argc, char **argv)
+;
+void SetRMSAtoms(char *command)
+;
+void SetRMSZone(char *command)
+;
+void Cleanup(void)
+;
+void Die(char *message)
+;
+BOOL ReadStructure(int structure,
+ char *filename,
+ int strucnum,
+ int xmasFormat)
+;
+int InitParser(void)
+;
+int DoCommandLoop(FILE *script)
+;
+void SetFitAtoms(char *command)
+;
+void SetFitZone(char *command, int strucnum)
+;
+void DelFitZone(char *command, int strucnum)
+;
+void DelRMSZone(char *command, int strucnum)
+;
+void SetZoneStatus(char *status)
+;
+int GetResSpec(char *resspec, int *resnum, char *chain, char *insert)
+;
+int ParseZone(char *zonespec,
+ int *start1,
+ int *stop1,
+ char *chain1,
+ char *startinsert1,
+ char *stopinsert1,
+ int *start2,
+ int *stop2,
+ char *chain2,
+ char *startinsert2,
+ char *stopinsert2,
+ int strucnum)
+;
+int FindSeq(char *zonespec,
+ char *sequence,
+ int *start,
+ int *stop,
+ char *chain)
+;
+void ShowMatrix(void)
+;
+void ShowStatus(char *filename)
+;
+void stdprompt(char *string)
+;
+void FormatZone(char *zone, char chain, int start, char startinsert,
+ int stop, char stopinsert)
+;
+void ReadMulti(char *filename, BOOL xmasFormat)
+;
+void WriteCoordinates(char *filename, int strucnum)
+;
+void WriteMulti(char *ext)
+;
+char *FindDash(char *buffer)
+;
+PDB *ReadXMAS(FILE *fp, int *natoms)
+;
+PDB *ReadXMASAtoms(FILE *fp, int *natoms)
+;
+PDB *doReadXMAS(FILE *fp, int *natoms, int readhet)
+;
+void SetCentreResidue(char *command)
+;
+int RunScript(char *command)
+;
+int ConvertResidueToSequential(ZONE *input_zone, int strucnum)
+;
+int CheckOverlap(ZONE *inputtest, ZONE *inputlist, int strucnum)
+;
+void SetZoneFromBValCol(void)
+;
+int ConvertSequentialToResidue(ZONE *input_zone, int strucnum)
+;
+ZONE *SortZoneList(ZONE *zonelist)
+;
+int ConvertZoneList(ZONE *zonelist, int strucnum, int mode)
+;
+int ConvertAllZones(int mode)
+;
+int SortAllZones(void)
+;
+ZONE *ChainList(PDB *pdblist)
+;
+BOOL SequentialZones(int strucnum)
+;
+BOOL SequentialZonesWholeSeq(int strucnum)
+;
+BOOL OneToOneChains(int strucnum)
+;
+int EnforceOneToOneChains(int strucnum)
+;
+BOOL *CheckListOverlap(ZONE *zonelist)
+;
+int CopyPDBListToRef(int strucnum)
+;
+int SetMobileToReference(int strucnum)
+;
+int fit_order_cmp(const void *ptr_scoreA, const void *ptr_scoreB)
+;
+int AllVsAllRMS(char *filename, BOOL print_tab, BOOL set_ref)
+;
+ZONE *SetOverlappingZones(ZONE *InputA, ZONE *InputB)
+;
+ZONE *RenumberZone(ZONE *InputZone, ZONE *OverlapZone)
+;
+int TrimZones(void)
+;
+int FitStructuresWrapper(void)
+;
diff --git a/src/protos.h b/src/protos.h
new file mode 100644
index 0000000..6763909
--- /dev/null
+++ b/src/protos.h
@@ -0,0 +1,87 @@
+/*************************************************************************
+
+ Program: ProFit
+ File: Protos.h
+
+ Version: V3.1
+ Date: 31.03.09
+ Function: Include prototype files
+
+ Copyright: SciTech Software 1992-2009
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain.
+
+ It may not be copied or made available to third parties, but may be
+ freely used by non-profit-making organisations who have obtained it
+ directly from the author or by FTP.
+
+ You are requested to send EMail to the author to say that you are
+ using this code so that you may be informed of future updates.
+
+ The code may not be made available on other FTP sites without express
+ permission from the author.
+
+ The code may be modified as required, but any modifications must be
+ documented so that the person responsible can be identified. If
+ someone else breaks this code, the author doesn't want to be blamed
+ for code that does not work! You may not distribute any
+ modifications, but are encouraged to send them to the author so
+ that they may be incorporated into future versions of the code.
+
+ Such modifications become the property of Dr. Andrew C.R. Martin and
+ SciTech Software though their origin will be acknowledged.
+
+ The code may not be sold commercially or used for commercial purposes
+ without prior permission from the author.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V0.1 25.09.92 Original
+ V0.5 08.10.93 Various tidying for Unix & changed for booklib
+ V0.6 05.01.94 Changed to .p prototype filenames
+ V0.7 24.11.94 Skipped
+ V0.8 17.07.95 Skipped
+ V1.0 18.07.95 Removed io.p
+ First official release (at last!).
+ V1.1 20.07.95 Skipped
+ V1.2 22.07.95 Skipped
+ V1.3 31.07.95 Skipped
+ V1.4 14.08.95 Skipped
+ V1.5 21.08.95 Skipped
+ V1.6 20.11.95 Skipped
+ V1.6a 21.11.95 Changed case for nwalpign.p so makefile works correctly
+ V1.7 23.07.96 Skipped
+ V1.8 07.05.98 Skipped for release
+ V2.0 01.03.01 Skipped for release
+ V2.1 28.03.01 Skipped for release
+ V2.2 20.12.01 Skipped for release
+ V2.3 01.12.04 Skipped for release
+ V2.4 03.06.05 Skipped for release
+ V2.5 07.06.05 Skipped for release
+ V3.0 06.11.08 Skipped for release
+ V3.1 31.03.09 Skipped for release
+
+*************************************************************************/
+
+#ifndef DOPROTOS
+#include "main.p"
+#include "todo.p"
+#include "fitting.p"
+#include "NWAlign.p"
+#endif
diff --git a/src/todo.c b/src/todo.c
new file mode 100644
index 0000000..fe1a02a
--- /dev/null
+++ b/src/todo.c
@@ -0,0 +1,95 @@
+/*************************************************************************
+
+ Program: ProFit
+ File: todo.c
+
+ Version: V3.1
+ Date: 31.03.09
+ Function: Protein Fitting program.
+
+ Copyright: SciTech Software / UCL 1992-2009
+ Author: Dr. Andrew C. R. Martin
+ EMail: andrew at bioinf.org.uk
+
+**************************************************************************
+
+ This program is not in the public domain.
+
+ It may not be copied or made available to third parties, but may be
+ freely used by non-profit-making organisations who have obtained it
+ directly from the author or by FTP.
+
+ You are requested to send EMail to the author to say that you are
+ using this code so that you may be informed of future updates.
+
+ The code may not be made available on other FTP sites without express
+ permission from the author.
+
+ The code may be modified as required, but any modifications must be
+ documented so that the person responsible can be identified. If
+ someone else breaks this code, the author doesn't want to be blamed
+ for code that does not work! You may not distribute any
+ modifications, but are encouraged to send them to the author so
+ that they may be incorporated into future versions of the code.
+
+ Such modifications become the property of Dr. Andrew C.R. Martin and
+ SciTech Software though their origin will be acknowledged.
+
+ The code may not be sold commercially or used for commercial purposes
+ without prior permission from the author.
+
+**************************************************************************
+
+ Description:
+ ============
+
+**************************************************************************
+
+ Usage:
+ ======
+
+**************************************************************************
+
+ Revision History:
+ =================
+ V0.1 25.09.92 Original
+ V0.5 08.10.93 Various tidying for Unix & chaned for booklib
+ V0.6 05.01.94 Skipped
+ V0.7 24.11.94 Skipped
+ V0.8 17.07.95 Skipped
+ V1.0 18.07.95 Initial release version (at last!)
+ V1.1 20.07.95 Skipped
+ V1.2 22.07.95 Skipped
+ V1.3 31.07.95 Skipped
+ V1.4 14.08.95 Skipped
+ V1.5 21.08.95 Skipped
+ V1.6 20.11.95 Skipped
+ V1.7 23.07.96 Skipped
+ V1.8 07.05.98 Skipped for release
+ V2.0 01.03.01 Skipped for release
+ V2.1 28.03.01 Skipped for release
+ V2.2 20.12.01 Skipped for release
+ V2.3 01.12.04 Skipped for release
+ V2.4 03.06.05 Skipped for release
+ V2.5 07.06.05 Skipped for release
+ V2.6 16.06.08 Skipped for release
+ V3.0 06.11.08 Skipped for release
+ V3.1 31.03.09 Skipped for release
+
+*************************************************************************/
+/* Includes
+*/
+#include "ProFit.h"
+
+/************************************************************************/
+/*>GraphicAlign(void)
+ ------------------
+ 28.09.92 Framework
+*/
+void GraphicAlign(void)
+{
+ printf(" Starting graphical alignment...\n");
+ printf(" Sorry! Not implemented...\n");
+ return;
+}
+
diff --git a/src/todo.p b/src/todo.p
new file mode 100644
index 0000000..cc59562
--- /dev/null
+++ b/src/todo.p
@@ -0,0 +1,2 @@
+void GraphicAlign(void)
+;
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/profit.git
More information about the debian-med-commit
mailing list