[med-svn] [Git][med-team/htslib][debian/experimental] 1058 commits: Merge version number and soname bump from master

Michael R. Crusoe gitlab at salsa.debian.org
Tue Dec 11 02:17:52 GMT 2018


Michael R. Crusoe pushed to branch debian/experimental at Debian Med / htslib


Commits:
1f1e3f63 by John Marshall at 2014-08-15T10:06:34Z
Merge version number and soname bump from master

- - - - -
1903fd40 by John Marshall at 2014-08-20T10:06:29Z
Set explicit REF_PATH to make tests self-contained

Set REF_PATH to an empty value (but not the empty string, as that
produces the default setting), or in general to point within the test
suite, for tests that exercise it -- currently just test_view.pl.

This makes the test suite self-contained for reference areas, and
in particular prevents travis build testing from trying to access
the EBI reference server.

- - - - -
b6934ff8 by James Bonfield at 2014-08-26T11:20:06Z
Force nul termination of the kstr holding the CRAM index so that
sscanf doesn't attempt to check beyond the bounds of the string
memory.

On correctly formatted indices it has no need to do this (but
apparently does), but the fix is necessary anyway in order to be
robust on invalid indices.

- - - - -
6c6c75b9 by dpryan79 at 2014-08-26T12:05:46Z
Make hts_idx_push no longer return 0 after the first unmapped read. This fixes idxstats

- - - - -
3fe4cf8e by John Marshall at 2014-08-26T12:59:52Z
Ensure hand-made kstring is NUL-terminated

The kputsn()s used to load the file into memory do NUL-terminate (like
most other kstring functions), so we only need to ensure this by hand
for the hand-made kstring made from zlib_mem_inflate().

- - - - -
89e267a4 by John Marshall at 2014-08-26T16:09:25Z
Ensure items to be installed are built and up-to-date

Fixes #125.

- - - - -
546bfcb5 by Petr Danecek at 2014-08-27T12:48:08Z
vcf: Output "." not "nan" in scalar INFO fields

- - - - -
6af42b15 by Mauricio Carneiro at 2014-08-30T04:48:07Z
vcf: separate header lookup from field getter

Create two alternatives to bcf_get_info and bcf_get_fmt allowing the
user to query the header id for the tag once and then accessing the
fields directly using index lookups.  Modified the two existing
functions to use the index lookups so we don't repeat functionality.

Added the following functions to the public API:
```c
bcf_get_fmt_id (bcf_t* line, const int id)
bcf_get_info (bcf_t* line, const int id)
```

- - - - -
410cf54d by pd3 at 2014-09-01T07:18:32Z
Merge pull request #127 from broadinstitute/broad_header_lookup

vcf: separate header lookup from field getter
- - - - -
ebbad4b7 by pd3 at 2014-09-01T11:14:42Z
Merge pull request #124 from dpryan79/develop

Make hts_idx_push no longer return 0 after the first unmapped read. This...
- - - - -
830ea739 by Petr Danecek at 2014-09-03T14:23:47Z
faidx API to access sequence stats

- - - - -
20558bc9 by Petr Danecek at 2014-09-08T16:09:43Z
bcf_get_info_values: Return correct INT8 and INT16 missing value

- - - - -
cf4eab09 by Petr Danecek at 2014-09-09T17:40:11Z
Minor documentation update

- - - - -
575be149 by Joel Thibault at 2014-09-09T18:59:15Z
Correct reference lines

- - - - -
3f308835 by James Bonfield at 2014-09-12T13:34:07Z
Removed the .gz, .Z, .sz etc suffix searching when looking for reference files.

These are a hang-over from the Staden Package io_lib
days, but did not function correctly in htslib anyway as we had
already removed the decompression code (plus it was inappropriate for
http requests to boot).

- - - - -
d25c678a by James Bonfield at 2014-09-12T13:45:08Z
Remove the | from the default REF_PATH.

Although this is still syntacally considered legal by the code, it has
no effect and so this may just lead to confusion by the reader.

- - - - -
061cd10a by James Bonfield at 2014-09-12T14:29:59Z
Minor speed increases to cram_byte_array_stop_decode_init and
GET_BIT_MSB macros.

- - - - -
6454bd4e by James Bonfield at 2014-09-15T09:00:46Z
Added a range coder (order 0 and 1) plus support for LZMA if compiled
in (no option for this at present).

Internally the choice of compression method is now a bit field
listing the various methods suitable.  To aid this, ZLIB_RLE is its
own method internally (although just ZLIB in the file format as it is
in the same ZLIB byte stream format), allowing better for auto-tuning
between fill ZLIB and RLE only variants.

This means we can specify as many encoding methods as we support and
get the code to auto-tune to the best methods available.

- - - - -
94cf60ac by James Bonfield at 2014-09-15T09:12:21Z
Tweaks for specifying version number. Now accepting version 3.0, and
also correctly setting the globals.

FIXME: shouldn't be using globals!

- - - - -
3d3f4001 by James Bonfield at 2014-09-15T10:28:01Z
Updated the version checks to lop off major/minor components rather
than using an exact == comparison.

Also fixed the EOF block for version 3.0.

- - - - -
37f42cd4 by James Bonfield at 2014-09-15T10:30:00Z
Fixed BETA codec so that it honours beta offset value for zero length
codes.

- - - - -
1de851bb by James Bonfield at 2014-09-15T10:52:32Z
Bug fix to the external decoders. If there is an attempt to decode 0
bytes then it no longer matters if the block does not exist. (This
comes about when faced with silly CIGAR strings like "0S".)

Bug fix to cope with block searching during encoding where
slice->block_by_id is not defined.  Note this doesn't happen in normal
behaviour, but the change is for code validity.

Plus a couple minor spacing changes and commenting.

- - - - -
0d394594 by James Bonfield at 2014-09-15T11:05:10Z
Added support for more codecs, as part of refactoring how data is
pushed.  These were in the V2.1 spec, but not used.

- EXTERNAL now access type char, instead of only for int.
- BYTE_ARRAY_LEN is now implemented fully for encoding allowing two
  distinct sub-codecs.
- BYTE_ARRAY_STOP is now fully implemented.

Both the _LEN and _STOP codecs can now be used as proper obj->encode()
style codecs, rather than assuming the calling code is manually
filling out the data in the appropriate format. (Although this is
still done by cram_encode.c.)

- - - - -
672f0798 by James Bonfield at 2014-09-16T11:35:37Z
Major refactoring of the way CRAM handles external blocks.
We now prefer to output as many things as possible to their own
specific external block instead of utilising the CORE block more
often.  This has the impact that it is much easier to do a partial
decode.

- The C structs are now more array based, preferring ->block[DS_XX]
  instead of ->XX_blk and similar.

- Many more block types wil be external by default now. It
  auto-selects CORE vs external based on size.

- Selection of compression algorithm for external blocks is more
  advanced.  Every external block has a metrics array element. This
  now tracks more than 2 types of compression and it culls candidate
  methods if they repeatedly are unpromising.

  The upshot is the hard-coded selection of which method to use for
  which block is (mostly) removed in favour of auto detection.

- CRAM now has the option to ignore certain types of fields when
  decoding.  Combined with more external blocks this permits faster
  decoding for tools like scram_flagstat, potentially 2-3x faster.

- Auxiliary tags now get split up into multiple external blocks based
  on data type. We still aggregate many together, but tag strings,
  integers, sequences and quality strings now get their own blocks.

Also sped up the process_one_read() function when converting from BAM
packed sequence to base-calls.

- - - - -
8210cf35 by Charles Plessy at 2014-09-16T11:44:40Z
htslib (1.0-2) unstable; urgency=medium

  Uploaded to unstable (see previous changelog from experimental).

 -- Charles Plessy <plessy at debian.org>  Tue, 16 Sep 2014 20:27:04 +0900

- - - - -
100f2d05 by James Bonfield at 2014-09-17T09:00:43Z
Removal of a couple memory leaks added in the restructing of the last
patch.

- - - - -
b73b5274 by James Bonfield at 2014-09-17T12:41:41Z
Bug fix of use of TAG_ID macro and k vs key in hash for auxiliary
headers.

Updated the encoding of auxiliary data itself to agree with the
headers (and with the latest Scramble updates).

Bug fix to encoding the 'd' (double) aux data type.  We've never
tested
this as it's not even in the SAM spec, but I added it anyway as
samtools seems to include it.

- - - - -
5a840dae by James Bonfield at 2014-09-17T12:42:22Z
Modified the thread pool to use as few threads as possible. The
intention is that when given, say, 16 threads but being I/O bound such
that 12 cores is enough to keep up with the I/O then we have 12
threads kept busy and 4 totally idle.

This is implemented by dispatching jobs on lower thread IDs in
preference and maintaining a queue of pending jobs in proportion to
the number of threads currently running, rather than instantly
dispatching.

Removed uninitialised memory behaviour (worse, freeing an
uninitialised pointer). This triggered crashes on SPARC/Solaris when
testing in multi-threading, but bizarrely never fired on intel/linux
not even under valgrind.

- - - - -
6929b9c7 by James Bonfield at 2014-09-17T12:56:22Z
Bug fixed scramble -x as it was not correctly setting the RI data
series for some tests.

- - - - -
ef7eeed6 by James Bonfield at 2014-09-17T13:36:25Z
A mishmash of changes for CRAM v3.0 (not yet the default output
format). Sorry it's munged together.

The 'b' and 'q' feature types have been implemented, allowing multiple
bases and qualities to be stored as a single feature. This greatly
speeds up scramble -x mode (referenceless) while also reducing the
file size.  Portions of this snuck in with the earlier commits; an
artifact of merging bit by bit with io_lib.

Added support for the BYTE_ARRAY_LEN encoder. It already existed in
decoder, but it is optimal for some cases to use as encoder instead of
the usual BYTE_ARRAY_STOP.

Bug fix to the external decoders. If there is an attempt to decode 0
bytes then it no longer matters if the block does not exist. (This
comes about when faced with silly CIGAR strings like "0S".) Changed
scramble to no longer output empty blocks.

Cram_dump: this now outputs the compression types used for each
external block as g(gzip), b(bzip2), l(lzma), r(rANS0) and R(rANS1).

- - - - -
a2542855 by James Bonfield at 2014-09-17T13:42:53Z
Minor gcc warnings fixups.

Small speed increase - only attempt to compress the CORE block when it
is of sufficient size to make it worth while.

- - - - -
2e298614 by James Bonfield at 2014-09-17T13:53:17Z
Added support for compressed SAM headers.

This involved removing the various #ifdefs for different types of
padding and instead coding it up with a variable that changes
depending on CRAM v2.1 or v3.0. (Cramtools doesn't allow for a second
padding block in 2.1, although technically it could fit within the
existing spec.)

Bug in the EOF block writing code. It was outputting
0xff 0xff 0xff 0xff 0xff instead of 0xff 0xff 0xff 0xff 0x0f for -1.
This caused the CRC to be invalid if you checksummed the raw data
instead of re-encoding it as Scramble does.

- - - - -
c7f8feec by James Bonfield at 2014-09-17T13:57:33Z
Final part of Staden io_lib's commit r3686.

The BD:Z and BI:Z were already split up in this git version, but I
failed to initialise the BI:Z block.

- - - - -
6d01daaa by James Bonfield at 2014-09-17T14:08:39Z
Initialise refs_t ->ref_id in refs_load_fai().  This has no bearing on
Samtools/htslib, but this function is used within Gap5 and calling it
in this order gave arise to crashes unless this initialisation code is
here, so it is a good belt and braces approach.

In htslib for now this function is still static and unexported, which
is appropriate until we see a good use case claiming otherwise.

- - - - -
fa6011a5 by Petr Danecek at 2014-09-17T14:48:29Z
bgzf: Write plain gzip files if is_gzip set

- - - - -
22718e5e by James Bonfield at 2014-09-17T15:38:34Z
Added CRC32 to the blocks and containers.

- - - - -
3ca319f5 by James Bonfield at 2014-09-17T15:42:46Z
Fixed the file format detection code so it handles CRAM v3.0 magic
number as a valid CRAM file.

- - - - -
10958ff8 by James Bonfield at 2014-09-17T16:01:43Z
Changed the hts_open code to support "C" as a format mode as well as
"c".  Both are CRAM, but "C" is version 3.0.  This is a temporary hack
while CRAM V3.0 is still undergoing work, but we will need a more
formal way of indicating output version numbers in the future.
Equally so the CRAM code itself for specifying versions also needs
improvements, to avoid global variables. Test_view has a -3 option to
use "C" instead of "c".

Also changed the compression formats (this is something else we cannot
yet expose to test_view) so that the rANS codec is used by default in
V3.0.

- - - - -
bf2d07c5 by James Bonfield at 2014-09-17T17:04:46Z
Fixed bam_construct_seq to cope with qual being NULL.

Moved the cram options and SAM field enums from cram_structs into
hts.h and added an hts_set_opt function to control them.

I'm not sure this is the correct location yet, but it is a start,
allowing me to test the rest of the functionality.

- - - - -
efa6537a by James Bonfield at 2014-09-18T09:03:03Z
Removed the previous "c" vs "C" hack and added a proper option parser
to test_view.  We still need to decide where this API belongs
properly, but test_view is a good point to test it.

- - - - -
1965be21 by James Bonfield at 2014-09-18T09:23:08Z
Fixed generation of MD and NM tags in cram_decode_seq() when using a
non-reference encoding. These cannot be stored or regenerated in such cases.

(Question: should we therefore explicitly store any MD and NM tags
when generating the CRAM if using non-reference encoding? Probably yes.)

- - - - -
b56f540e by James Bonfield at 2014-09-18T09:27:09Z
Added a special case for setting the reference.  When converting from
SAM to CRAM we tend to think of the reference as an input property; if
it is not in the SAM headers (M5 & UR @SQ tags) then we want to
specify it, as an input property.

However the way the code works is we parse the input header and use
that to generate the output header, in doing so setting the reference
to use for the output.  Hence technically the reference is an output
variable (as in this example in=SAM and out=CRAM).

To make things sane for the user, if we set the reference for the
input then we also now apply that option to the output file too.

- - - - -
ad3c4c56 by James Bonfield at 2014-09-18T10:42:39Z
Cope with cram_compress_slice failing. We were unwinding the stack
returning -1 each time, but then called cram_close which attempting to
flush any remaining data (dying in the process). We now free the
current on-going container upon an error, to avoid this flush later.

Fixed the cram_compress_block code so it doesn't end up thinking a
failure to compress with bzip2 or lzma (say, because they're not
linked in) means the zero sized output is the best compression option
to use.

- - - - -
1ef04ee8 by James Bonfield at 2014-09-18T11:04:13Z
Removed the need for -DSAMTOOLS when compiling CRAM.

The Io_lib and HTSLib sources are now divergent enough that there is
no need to try and keep common files between the two with #ifdefs.

- - - - -
b5ecdf7f by James Bonfield at 2014-09-18T13:38:02Z
(Commented out): ugly auto-configuration of HAVE_LIBBZ2 and
HAVE_LIBLZMA definitions.

These cannot yet go in place due to lacking a way of specifying
dependent libraries, when linking into samtools etc.

- - - - -
a223cae8 by James Bonfield at 2014-09-18T13:38:10Z
Added multi-threading support for reading and writing CRAM.

Note that the samtools command only calls hts_set_threads on the
output file descriptor, so only has multi-threading writing enabled.

- - - - -
4835f533 by James Bonfield at 2014-09-18T14:19:31Z
CRAM_OPT_VERSION now works (and only works) on an open file
descriptor, freshly opened before the call to sam_hdr_write.

This avoids the need for global variables and a requirement to process
the VERSION option at a different time to all other CRAM options.

- - - - -
f5aeebbb by Petr Danecek at 2014-09-19T07:24:21Z
bgzf_open(..,"g") for compressed gzip output, as opposed to BGZF

- - - - -
ac13591d by James Bonfield at 2014-09-19T13:56:19Z
Added SAM_RGAUX as another column identifier to allow for tools that
need to use RG tag without other tags to perform optimally.

Sped up the calculation of MD/NM by approx four fold by replacing
sprintf with custom number construction (append_uint).

- - - - -
2f10ff9e by James Bonfield at 2014-09-19T16:39:38Z
Removed defunct comment.

- - - - -
0cdb7b1a by James Bonfield at 2014-09-22T14:19:08Z
Minor tidyups to prevent some clang warnings.

- - - - -
dcefe5b7 by John Marshall at 2014-09-23T14:36:49Z
Don't install libhts.so.1 as a man page

- - - - -
5a0ee032 by John Marshall at 2014-09-23T14:39:41Z
Release 1.1: various minor bug fixes

- - - - -
f2080d5c by John Marshall at 2014-09-23T15:49:16Z
Merge version number bump from master

- - - - -
1357fbd6 by Charles Plessy at 2014-09-24T12:10:32Z
Merge tag '1.1' into debian/unstable

HTSlib release 1.1, various bug fixes

- - - - -
be365db8 by Charles Plessy at 2014-09-24T12:13:44Z
New upstream release, no new copyright nor license notice.

- - - - -
87549cd9 by Charles Plessy at 2014-09-24T12:17:01Z
Updated symbols file.

- - - - -
873ea9b6 by Charles Plessy at 2014-09-24T12:21:03Z
Removed orphan paragraph.

- - - - -
49de2b98 by Charles Plessy at 2014-09-24T12:21:24Z
Allow parallel build.

- - - - -
7fe3bb5d by Charles Plessy at 2014-09-24T12:23:07Z
Normalise source package control file (VCS-Browser, Pre-Depends).

- - - - -
0e5e9e62 by Charles Plessy at 2014-09-24T12:23:51Z
Conforms to Policy 3.9.6.

- - - - -
15eac2bd by Charles Plessy at 2014-09-24T12:29:59Z
htslib (1.1-1) unstable; urgency=medium

  1357fbd Merge tag '1.1' into debian/unstable
  87549cd Updated symbols file.  One symbol is missing.
  873ea9b Removed orphan paragraph in machine-readable copyright file.
  49de2b9 Allow parallel build.
  7fe3bb5 Normalise control file (VCS-Browser, Pre-Depends).
  0e5e9e6 Conforms to Policy 3.9.6.

 -- Charles Plessy <plessy at debian.org>  Wed, 24 Sep 2014 21:24:11 +0900

- - - - -
5b98adcd by Petr Danecek at 2014-09-25T09:18:44Z
Added regidx API which should replace bcf_sr_regions at some point

- - - - -
f4b4b617 by Petr Danecek at 2014-10-01T07:55:06Z
Clean after regidx init failure; fixed build dependencies

- - - - -
d27ae83d by Petr Danecek at 2014-10-01T11:57:10Z
Fix in bcf regions which in some situations would skip first

the first position on new chromosome in the loci list.

- - - - -
d2cb7bac by Petr Danecek at 2014-10-01T12:15:42Z
regidx: simple file format autodetection

- - - - -
eecc982e by James Bonfield at 2014-10-01T17:09:40Z
Fixes for handling range requests while also multi-threading.

- - - - -
d6cdff67 by James Bonfield at 2014-10-02T08:52:35Z
Fix for FSECONDARY reads; do not link into PNEXT/RNEXT.

- - - - -
c116e113 by Shane McCarthy at 2014-10-02T09:10:54Z
add new test-regidx binary to .gitignore

- - - - -
a0e35e03 by John Marshall at 2014-10-02T09:23:12Z
Formatting fixes for man page

There is no .R; replace badly-nested .RE with .PP;
use hyphens to introduce options rather than en dashes
(addresses samtools/www.htslib.org#5 for tabix man page).

- - - - -
db774cc5 by pd3 at 2014-10-02T09:50:31Z
Merge pull request #136 from mcshane/feature/ignore_test-regidx

add new test-regidx binary to .gitignore
- - - - -
9eb4bed0 by James Bonfield at 2014-10-02T10:28:47Z
Removed various small memory leaks.

The most major of these per 1 (empty) block per container due to block
DS_SC being allocated twice; once explicitly and once in the main
block creation loop.

- - - - -
14a4a81f by Petr Danecek at 2014-10-03T07:15:06Z
Return type of bgzf_getc is int, not char

- - - - -
56d50b27 by Petr Danecek at 2014-10-03T07:15:07Z
regidx: use size_t, not ssize_t

- - - - -
5ebf9b24 by John Marshall at 2014-10-03T14:01:47Z
Fix regidx.o dependencies and htslib.mk; alphabetise

- - - - -
ac4c98ad by Charles Plessy at 2014-10-05T00:34:38Z
Merge branch 'develop' into debian/unstable

- - - - -
97501235 by Charles Plessy at 2014-10-05T00:39:24Z
Current changelog; not sure to upload or not.

- - - - -
effc6fdf by John Marshall at 2014-10-06T10:15:38Z
Merge man page formatting fixes

- - - - -
912a7d09 by Petr Danecek at 2014-10-06T19:02:50Z
Support for Type=Character, in htslib same as Type=String

- - - - -
34f8089d by Petr Danecek at 2014-10-10T11:47:37Z
VCF header editing speedup for large number of ref sequences

- Don't sync the internals with each edit, just set a 'dirty' flag and call
  bcf_hdr_sync() when header is written.  This may have undesired consequences
  in existing programs which do not call bcf_hdr_write() or try to access newly
  added fields before bcf_hdr_sync() was called.

- bcf_hdr_add_sample() does not require NULL any more, but programs may
  continue calling it for backward compatibility.

Resolves https://github.com/samtools/samtools/issues/308

- - - - -
9d01cd60 by Petr Danecek at 2014-10-14T12:31:11Z
New bcf_copy API

- - - - -
432a2240 by James Bonfield at 2014-10-15T16:38:51Z
Split off the rans_byte.h portion of rANS_static.c back into its own
file, for reasons of copyright clarity.

Fixed the arguments to rans_uncompress and indeed the whole notion of
having RANS0 and RANS1 as two distinct codecs.  This was historical.
They got merged into a genric RANS codec during the CRAM3 discussions.

- - - - -
b5d11c3b by James Bonfield at 2014-10-16T13:00:32Z
Added copyright notice.

- - - - -
43f2d111 by John Marshall at 2014-10-16T13:11:40Z
Remove -DSAMTOOLS vestiges

hclose() is always paranoid (it flushes, syncs, and reports errors),
and cram_FILE is now always hFILE.

- - - - -
24c86999 by John Marshall at 2014-10-16T13:29:00Z
Fix cram/rANS_* dependencies

- - - - -
233e1592 by John Marshall at 2014-10-16T14:31:15Z
Merge CRAM v3 updates (PR #132)

- - - - -
abd1efb3 by John Marshall at 2014-10-17T10:17:42Z
Add htsFormat and format-detection API functions

Adds hts_detect_format() that returns an htsFormat based on peeking
at the beginning of an hFILE, and hts_hopen() that allows hts_open()
to be replaced by hopen()...hts_detect_format()...hts_hopen() where
desired.

Followups will add functions deriving an htsFormat based on filename
extension or -O-style option, functions to return a human-readable
description or hts_open()-mode based on an htsFormat, and an htsfile
utility that uses them.

- - - - -
bf909d66 by Petr Danecek at 2014-10-20T17:41:45Z
Removed old hts_file_type() API and replaced it with the new htsFormat;

vcf writers and synced_bcf_reader to use the new API; bgzf set
is_gzip flag when reading; new tabix -i option;  check return
status of hts_close in test-vcf-api

- - - - -
7c42dccd by Petr Danecek at 2014-10-21T08:19:52Z
Clean up, is_compressed and is_cram no longer necessary

- - - - -
eda497bd by Petr Danecek at 2014-10-21T09:05:30Z
Do not use 0x80 & co for missing alleles

- - - - -
3e24dfd2 by Petr Danecek at 2014-10-21T11:45:36Z
Do not load remote index if already exists locally

and be silent when testing the existence of .tbi or .csi.
Resolves https://github.com/samtools/bcftools/issues/121

- - - - -
529ca880 by John Marshall at 2014-10-22T09:52:11Z
Use htsFormat to replace htsFile's is_foo flags

Add an htsFormat member to htsFile, readable via hts_get_format().
Use hts_detect_format() within hts_hopen(fname, "r"), and fill in the
format based on mode letters for "w".

The htsFormat settings are clearer than the old is_foo flags, so begin
recoding SAM/BAM/CRAM and VCF/BCF tests to use fp->format.format (within
htslib; hts_get_format(fp)->format elsewhere) instead.  However several
of the flags are used directly outside htslib (see notes in htslib/hts.h)
so even though they are unused within htslib they cannot be removed
unless/until libhts.so's soversion is changed.

The is_be, is_compressed, and is_kstream flags are not used in samtools
or bcftools and we assume they are not used in other third-party code,
so they are safe to remove (and is_kstream has been removed).

- - - - -
41ab01b2 by John Marshall at 2014-10-22T09:52:11Z
Add htsfile utility

Note that hts_format_description() may possibly need to change to return
a string that the user must free() before release.  Manpage to follow.

- - - - -
5dec96ba by James Bonfield at 2014-10-27T12:36:23Z
Amended the compression level checking code in cram_dopen() to follow
the same detection logic used in bgzf.c.  Previously changing the
compression level did not work for CRAM.

- - - - -
b0df3d15 by Petr Danecek at 2014-10-29T10:12:50Z
Fix in bgzf's gzip reading plus added a test for this

- - - - -
9c510fb5 by Petr Danecek at 2014-10-29T10:51:04Z
bcf_translate: Be aware of gaps in BCF headers

- - - - -
b28efa4f by James Bonfield at 2014-10-29T16:50:12Z
In a bid to keep the DEBUG_printf line potentially printing up the
value of 'i', while also avoiding the complaints about 'i' being set but
unused, I changed the code to also handle potentially wide
characters.  (I haven't tested it works actually with wide characters,
but it'll be better than before and the main goal was the silence the
annoying warning!)

- - - - -
a98d88fd by James Bonfield at 2014-10-29T16:57:18Z
Fix a bug where hts_set_fai_filename() didn't pass this through to
CRAM.  It appears this bug has been long standing, since
https://github.com/samtools/htslib/commit/2402fc00fe1f2360cd9056173045f65bc0b683dc,
but was not detected due to the tests finding the reference via a UR:
@SQ tag instead.

Also amended the CRAM referencing handling code to handle bgzf at the
same time.  Ideally we would rip out a lot more code from cram_io.c
and replace it with more calls to faidx.c instead, but that can be the
subject of another update.

- - - - -
9e844e0b by James Bonfield at 2014-10-29T17:10:38Z
Fix bug spotted by valgrind when running "./test_view -D
ce#unmap.tmp.cram".

Normally the header is nul terminated, but in this case the header is
zero bytes long, leading to malloc(0) and a later strlen() of that
block.  We now forcibly always allocate +1 byte and nul terminate the
buffer, avoiding the issue and better coping with a potential cram
corruption too.

- - - - -
7a13d832 by James Bonfield at 2014-10-29T17:13:12Z
Fixed an error found via valgrind of test_view -D c1#pad1.tmp.cram.

The extra_len field for bam_construct_seq didn't include the +1 for
nul terminating the aux string.  I am unsure why this didn't crop up
in other files.

- - - - -
dd08ee0a by James Bonfield at 2014-10-30T10:08:01Z
Reverting 9e844e0be26875d29b6e2c853c2b4d02c4f1faa4 and rewriting in a
better manner.

Passing +1 to the extra_length in bam_construct_seq causes the length
of data in the bam structure to be +1, and then causes bam decoding to
have issues in some cases.

The correct solution made here is simply to remove the nul terminating
of the auxiliary block.  It's a hang-over from the io_lib days and
isn't necessary.

- - - - -
244dde88 by Petr Danecek at 2014-10-30T10:14:41Z
tbx: Detect faulty tbi files

- - - - -
b7f74f42 by John Marshall at 2014-11-03T12:05:37Z
Merge origin/develop, replacing htsFile.type by .format

Use the followups mentioned in abd1efb rather than the alternatives
via htsFile.type -- in particular, retain is_cram for the sake of
libhts.so binary compatibility.  The -i option added to tabix now
duplicates htsfile functionality, and can be removed in due course.

- - - - -
845c5153 by John Marshall at 2014-11-03T16:10:34Z
Parse SAM aux 'i' values > 2^31 correctly

BAM can represent 'i' values from -2^31 and 'I' values up to 2^32-1.
Especially on machines where long is 32 bits, we need to consider
positive and negative values carefully and separately to parse this
whole range correctly.

Addresses samtools/hts-specs#36 -- even if the spec doesn't require
this (which may soon change), as a quality of implementation issue
htslib should support the same range in SAM and BAM.

- - - - -
07c94ecb by James Bonfield at 2014-11-06T14:54:26Z
Sped up cram_index_load some 200 fold by replacing sscanf with our own
number decoding.

- - - - -
b96a3020 by James Bonfield at 2014-11-06T15:25:24Z
Fixed an incorrect EOF-style case when seeking multiple times.

When asking to fetch data for a specific chr:range, if the current
container entirely fits within the range but the next container is
outside the range then it sets the fd->ooc (out of containers) flag to
prevent subsequent containers from being decoded, to mark the end of
the sub-query.

However this then causes a subsequent cram_seek to also fail as the
ooc flag is still set. We now clear this on every new seek.

- - - - -
4d13ff04 by Petr Danecek at 2014-11-06T15:28:28Z
Fix of memmove bug in bcf_remove_filter()

- - - - -
4aa494f6 by reinders at 2014-11-06T20:49:22Z
Buffer overflow error in synced_bcf_reader.c

clearly, out is being accessed at location nout, so it should be at least size nout+1.
- - - - -
c4043c5f by pd3 at 2014-11-07T09:24:27Z
Merge pull request #142 from reinders/patch-1

Buffer overflow error in synced_bcf_reader.c
- - - - -
bd6f52ab by John Marshall at 2014-11-07T14:25:34Z
Fix compilation when ALLOW_UAC is not defined

- - - - -
7838da8e by David Roazen at 2014-11-07T17:47:25Z
bcf_sr_add_reader(): do not increment nreaders or perform any reallocs upon file open errors

This avoids a segfault if bcf_sr_destroy() is called after bcf_sr_add_reader()
when a file couldn't be opened.

- - - - -
54618dd8 by pd3 at 2014-11-10T09:08:26Z
Merge pull request #144 from broadinstitute/broad_bcf_sr_add_reader_segfault_fix

bcf_sr_add_reader(): do not increment nreaders or perform any reallocs upon file open errors
- - - - -
ee7343ed by Petr Danecek at 2014-11-10T09:14:15Z
bcf_sr_get_header() macro for accessing the synced readers' headers

- - - - -
29305dd8 by Petr Danecek at 2014-11-10T10:43:08Z
Add IDX to hdr tags of different type, fixes issue https://github.com/samtools/bcftools/issues/141

- - - - -
83a4e30d by James Bonfield at 2014-11-11T16:44:14Z
Amended/removed comments about scram_* API.

- - - - -
089c900a by John Marshall at 2014-11-13T11:57:56Z
Merge short read() bug fix from upstream

See attractivechaos/klib at 8d8d1a19f0c69b53d5ed8d9f6592dfa4b91c23f3
and lh3/seqtk#43.

- - - - -
3c4f33a7 by John Marshall at 2014-11-13T14:46:57Z
Fix ks_getuntil2() extra empty record at EOF bug

When the stream is an exact multiple of the buffer size, ks_getuntil2()
was returning a final empty record when it should have returned -1.
Fixed by moving the "EOF => return -1" check to after the read loop.

Fixes samtools/samtools#318.  See upstream PR attractivechaos/klib#39.

- - - - -
d8c03cf8 by Petr Danecek at 2014-11-17T16:48:41Z
bcf_hdr_subset: Return NULL on duplicate sample names

- - - - -
8916744b by Petr Danecek at 2014-11-18T12:31:25Z
Sanity check to detect broken GT fields in bcf_calc_ac()

- - - - -
9a88137b by John Marshall at 2014-11-18T14:30:43Z
Add seq_nt16_int[], equivalent to the old API's bam_nt16_nt4_table[]

Also improve descriptions of the usage of all three tables.

- - - - -
bf7e0ecf by Petr Danecek at 2014-11-27T08:54:11Z
bcf_*hrec* functions: check for existing/multiple IDX keys

- - - - -
4770a410 by James Bonfield at 2014-11-27T11:40:23Z
Fixed the generation of read names to use record_counter properly,
counting from 1, rather than slice:record-in-slice.

Also changed the record_counter to be 64-bit for CRAM v3.0 (change
under discussion). (I believe this to be invisible within the file
format though as itf8 0-2billion encode using the same binary values
as ltf8 0-2billion.)

- - - - -
876bfe4b by James Bonfield at 2014-11-28T10:16:12Z
Replaced sprintf with a home-brew append_uint64 function.  This is a
sizeable speed increase to the read name auto-generation code.

- - - - -
89bfcc4a by Martin O. Pollard at 2014-11-28T11:44:14Z
Fix comment in sam.h

Fix reference to bam1_seq to be bam_get_seq.

- - - - -
32a43b27 by Petr Danecek at 2014-11-28T14:42:23Z
vcf: skip empty INFO tags ";;". (Error might be more appropriate?)

- - - - -
8d921a5c by John Marshall at 2014-11-28T15:03:45Z
Account for read buffering in hseek(SEEK_CUR)

These relative offsets need to be converted to be relative to the
backend's stream position, which is different due to buffering.
It could be done when writing too, as begin==end after flush_buffer(),
but it seems clearer to convert only when reading.  Fixes #152.

- - - - -
57462fb7 by James Bonfield at 2014-11-28T15:08:22Z
Minor improvement to binary searching in CRAM indices.

Sometimes it could pick a slice one before the optimal case, causing
one additional slice header to be fetched and decoded.

- - - - -
9da961af by Petr Danecek at 2014-12-02T11:39:31Z
bcf_hdr_combine: Complain when trying to merge different types

- - - - -
7a0fdf4e by John Marshall at 2014-12-02T15:53:27Z
Sanity-check tid in hts_itr_query()

e.g. if the headers given to hts_itr_querys() have more reference
sequences than the index, we'd rather return failure than crash here.

- - - - -
10ed34e7 by Petr Danecek at 2014-12-03T11:18:48Z
bcf_calc_ac: Check for incorrect AC/AN counts

- - - - -
cef706f6 by Petr Danecek at 2014-12-03T22:32:56Z
vcf_parse_format: Throw an error on extra FORMAT fields

- - - - -
dfd67733 by Petr Danecek at 2014-12-05T10:54:14Z
vcf: Propagate hdr_add_sample() error

- - - - -
5e76b1ce by Petr Danecek at 2014-12-08T20:48:33Z
Fix alleles trimming with format Number=R/A/G tags

Resolves https://github.com/samtools/bcftools/issues/173

- - - - -
948a68c8 by James Bonfield at 2014-12-09T17:21:32Z
Removed spurious messages about missing EOF blocks in CRAM when
dealing with older versions of the file format.  It worries users to
see messages about lack of an EOF block (although technically true)
when reading v2.0 or earlier CRAM files.

- - - - -
ce1a547f by John Marshall at 2014-12-11T17:45:01Z
Parse regions without begin/end as 1..MAX_INT rather than 1..2^29

Indexing now works with large chromosomes (longer than 2^29) both when
coordinates are specified ("foo:1000-600,000,000") and when they are
defaulted to the whole chromosome ("foo").

For hts_itr_querys("."), HTS_IDX_START ignores begin/end, so pass these
as 0,0 rather than a misleading 1<<29.

/<< *29/ no longer appears in the htslib source code.

Fixes part 2 of samtools/samtools#241.

- - - - -
e5a964e2 by John Marshall at 2014-12-12T13:30:01Z
Update khash.h from upstream sources

Updated to attractivechaos/klib at 7163c2137856c22bebce87fd8f68a37fbb84a430
and its parents, in particular fixing new_flags memory leaks (cf #138).

- - - - -
20238f35 by John Marshall at 2014-12-12T14:07:48Z
Fix various simple memory leaks (cf #138)

- - - - -
fe88482d by James Bonfield at 2014-12-12T15:17:22Z
Added in the compressed length field to the rANS codec header.

Technically this isn't needed, but it adds a very small extra amount
of space and it allows for the internal rANS codec to be the same
format as an external block-based rANS codec, or to allow the rANS
codec internally to compress very large buffers using a smaller block
size.

This now brings this code into line with the Java cramtools

- - - - -
ef59ef2d by John Marshall at 2014-12-15T14:04:16Z
Deobfuscate memory allocations etc via sizeof()

- - - - -
6ee481f2 by Petr Danecek at 2014-12-16T13:48:40Z
vcf headers: Allow contig lines without length attribute. Resolves #155

- - - - -
547a3495 by Petr Danecek at 2014-12-18T08:41:11Z
tabix: Remove bcf and bam from presets,

both formats are autodected, thus it is not necessary.
Resolves #158

- - - - -
21fbc8be by Petr Danecek at 2014-12-19T16:11:44Z
vcf: Abort on duplicate sample names, resolves #184

- - - - -
ed3efe90 by Petr Danecek at 2014-12-19T22:07:00Z
New -R/-T options to tabix.

This requires caching of coordinates by hts_itr_next;
extended regidx API

- - - - -
f3e16021 by Petr Danecek at 2015-01-05T09:45:35Z
Fix a typo, VCF contig length should be stored.

- - - - -
dcffda58 by James Bonfield at 2015-01-05T12:42:20Z
Two changes to cram_encode_container() related to efficient encoding
of name-sorted data.

Firstly, if we have multiple reference sequences packed into one
container, only pre-populate the relevant fd->refs[] array entries if
we're doing reference based encoding.

Secondly, a bug in this same code was that we pre-cached without
incrementing the reference count (but did decrement it at the end of
the function).  The practical upshot of this is that the reference
count added due to fd->shared_ref (enabled when doing multi-ref
slices) was decremented and consequentially caused reference sequences
to be reloaded every container instead of cached between them.

- - - - -
ca6f60e1 by James Bonfield at 2015-01-05T15:11:08Z
Fixed a memory leak when destroying a BYTE_ARRAY_LEN encoder.

- - - - -
f7caefc2 by James Bonfield at 2015-01-05T15:12:06Z
Fixed a small memory leak where we didn't deallocate a cram_block that
we had created but later culled due to containing zero bytes.

- - - - -
cba1bf09 by James Bonfield at 2015-01-05T15:12:20Z
Fixed memory leak when trying to O1 compress a block <= 4 bytes long.

- - - - -
8ad29126 by James Bonfield at 2015-01-05T17:08:05Z
Make the multi_seq parameter default to auto.  This is the default in
Scramble, but oddly not the default in the code (so scramble always
reset it from 0 to -1).

The effect of this is to allow the CRAM containers to switch from one
ref per container to multiple refs per container if it spots unsorted
data or lots of rarely used references.

- - - - -
2b31b7d8 by James Bonfield at 2015-01-06T14:19:03Z
More ref and memory management fixes.

Fixed a logic error in the cram_ref_decr_locked function. It
shouldn'tset r->last_id to -1 ever, as with containers large enough to
fit the entire reference this caused it to end up alternating between
free and non-free of reference.

The set of -1 in cram_ref_incr_locked was already sufficient to
prevent repeated incr#1/decr#1/incr#1/decr#1 from freeing and
reloading.

Also only now create c->refs_used when multi_seq is enabled. This
prevents excess incr/decr calls.

- - - - -
ab238981 by James Bonfield at 2015-01-06T17:13:53Z
Reverted the cram_encode_container change to call cram_ref_incr and
added a ref incr in cram_get_ref instead. This better fixes the issue
of sharing references between containers, fixing decoding as well as
encoding.

Also prevented encoding from creating c->refs_used when multi_seq
isn't set.

Fixed ref count leak in cram_encode_container caused by not
decrementing the final sequence we ended up processing. (It has a
decr/incr loop every time we switch from one seq to the next, to cope
with packed chr-pos-sorted slices.)

- - - - -
11a33a61 by James Bonfield at 2015-01-07T09:33:28Z
Merge pull request #160 from jkbonfield/fix_unsorted_cram

Misc CRAM fixes (mainly unsorted data)
- - - - -
5f7a4eab by James Bonfield at 2015-01-08T17:06:49Z
Fixed a bogus warning about using 'cp' before initialised. (Bogus as
it's promptly reassigned again, so tidied up the code.)

- - - - -
25e8fac6 by James Bonfield at 2015-01-08T17:59:05Z
Minor change to allow REQUIRED_FIELDS option to be specified in hex or
octal.

- - - - -
dd709640 by James Bonfield at 2015-01-08T18:00:23Z
Overhauled the cram_dependent_data_series function and associated
code.

Previously it contained

    if (hdr->data_series & CRAM_SEQ) hdr->data_series |= CRAM_CIGAR

but this is too permissive as it means even something like CRAM_BF (in
the CRAM_SEQ expansion) would cause all over members of CRAM_SEQ to be
brought in when they are not strictly needed by the code path.

Instead these have been replaced by more explicit dependencies,
analysed from the source code. This has been tested by producing
CRAM files with random mixing of data series and then explicitly
requesting single columns vs all columns to compare the results.

In doing so found and fixed a few other long standing data-series bugs
too, such as a dependence on CRAM_BF for more fields that it would
appear obviously necessary.

- - - - -
9b1cb948 by John Marshall at 2015-01-16T10:33:24Z
Avoid aux.* filenames, which are invalid on Windows

Windows believes paths like aux.* refer to its AUX device and refuses
to create files with such names.  Hat tip Clare Venney.

- - - - -
37687077 by Matt Shirley at 2015-01-20T14:25:33Z
Update faidx.h

Just a small misspelling the the `fai_destroy` docs.
- - - - -
0ccc935a by John Marshall at 2015-01-21T09:47:12Z
Add iRODS hFILE backend

Initial version, as per October 2013 feature branch.  The code is only
lightly tested, but hopefully feature complete.

Not yet activated in hfile.c or added to the Makefile.
This awaits a configure script and probably --enable-irods and/or
--with-irods=IRODS_HOME configure options.

- - - - -
8bc776da by John Marshall at 2015-01-21T10:04:56Z
Add MIT/Expat license boilerplate

- - - - -
772ba540 by Petr Danecek at 2015-01-21T15:19:14Z
Allow scientific notation when specifying regions,

for example 20:30000000-32000000 can be now given also as 20:3e7-3.2e7.
Replaced atoi parameter parsing with strtol to recognised user errors.

- - - - -
3ac7d001 by John Marshall at 2015-01-26T11:43:07Z
Add configure.ac script

Use autoconf to generate a configure script.  Checks that zlib development
files are available, and emits a hopefully informative error if not.

The inclusion of config.mk is carefully placed so that it can add to
prerequisites as appropriate when later --enable-xxx/--with-yyy options
need to add extra libhts objects and rules.

Rewrite INSTALL to describe configure usage.

- - - - -
e13b6901 by James Bonfield at 2015-01-26T12:26:48Z
Fixes imported from Staden io_lib revisions 3792/3795.

Bug fix the TLEN decoding sign.  It wasn't correctly handling the case
when two reads start at the same coordinate.

Overhauled the pnext/tlen/flags for detached reads.

Now when we find a second (or more) copy of a read within a slice, we
do not automatically mark is as non-detached.  We check whether the
derived fields match those found in the file and if not we emit a
"detached" read causing these fields to be written verbatim.  We also
do this if the data is supplementary, as the meaning of that flag is
poorly defined.

This is particularly useful on bwa-mem output where there is currently
a disparity between bwa and io_lib on the interpretation of
supplementary/primary.

- - - - -
2ac7a826 by John Marshall at 2015-01-26T15:37:07Z
Add notes for building from a Git repository

- - - - -
c360ce47 by James Bonfield at 2015-01-27T10:28:28Z
Enforce the use of a local cache (use home dir if not defined)
whenever we automatically fall back to using the EBI reference
sequence server.

It is still possible for a user to manually set REF_PATH to the EBI
while not manually setting REF_CACHE, but in such situations the user
is taking responsibility for manual control.

- - - - -
77cdbec6 by James Bonfield at 2015-01-27T13:57:55Z
Use TMPDIR and if not set TEMP (common on Windows) environment
variables as the location of temporary files, in preference to a hard
coded /tmp.

Also changed the home directory path from .cram_cache to
.cache/genome-ref, as per linux base-directory standards.

- - - - -
0c74c754 by James Bonfield at 2015-01-27T14:14:57Z
Stylistic code change: !*ptr vs *ptr=='\0'.

- - - - -
8c80202a by James Bonfield at 2015-01-27T16:51:47Z
Additional comments.

- - - - -
83f1dbc7 by John Marshall at 2015-01-27T17:00:56Z
Merge iRODS hFILE backend

This merge commit adds --with-irods[=DIR] support in configure.ac
and config.mk.in.

Addresses PR #146 with the exception of test harness iRODS support.

- - - - -
7bd8c08f by James Bonfield at 2015-01-28T09:36:36Z
Merge pull request #161 from jkbonfield/cram_dependent_data_series

Improved the data-series interdependence analysis
- - - - -
ef3bd194 by John Marshall at 2015-01-28T11:50:35Z
Document ./configure --with-irods

- - - - -
91a471d8 by James Bonfield at 2015-01-28T14:13:48Z
Fixed unnecessary FAI building.

Bug fix a98d88fd423ef0b52c69100315ca93e0073f5911 added, amongst other
things, code to build the FAI file with the incorrect assumption that
this is only done when the file does not already exist.  It now
performs this check itself.

- - - - -
97a79332 by James Bonfield at 2015-01-28T14:14:05Z
Bug fix to refs_from_header().

The patch to hts_set_fai_filename() caused a new failure mode in this
function. If given a .fai file with sequence records that are not in
the same order as the @SQ lines in a SAM header, the code to parse the
@SQ header and potentially add new records that aren't in the original
.fai file was failing, returning -1.

We now do a better job of merging the references together.

- - - - -
0ad99655 by John Marshall at 2015-01-28T14:20:56Z
Add hisremote(), and convert faidx.c from knet to hFILE

Now that we have another remote hFILE backend (hfile_irods.c), we need
an hisremote() helper to test whether a file is remote and would benefit
from caching locally, rather than just using /^http|^ftp/.  For now
hisremote() is a simple check for several known schemes, but eventually
it should become another entry point within the hFILE backend.

Use hisremote() in hts.c and faidx.c's index file downloading/caching.

Rewrite faidx.c's download_and_open() using hFILE, so that it works with
other remote protocols.  _USE_KNETFILE no longer appears in HTSlib, and
config.h is used only by bgzf.c.

See also babaaf3381719904d328a7624c3d17de78633347 which converted
hts.c's similar index downloading to hFILE.

(Addresses one small aspect of PR #164.)

- - - - -
cf4811a6 by John Marshall at 2015-01-28T17:14:28Z
Set resource when writing to iRODS

(Fixes part of #168.)

- - - - -
519e0e76 by John Marshall at 2015-01-29T10:46:51Z
Add htsfile(1) man page

- - - - -
b2cfe4e8 by John Marshall at 2015-01-29T10:48:08Z
Remove tabix -i, which duplicates htsfile functionality

See b7f74f420bf325d34eaab7cc9971131666883b5f.

- - - - -
61a79f11 by Petr Danecek at 2015-01-29T14:30:12Z
Better index bugfix

Fixes samtools/samtools#341, index crashed for unmapped (via flags
and tid = -1) records with filled-in CIGAR.

- - - - -
22442633 by John Marshall at 2015-01-30T09:49:09Z
Detect file format versions in hts_detect_format()

Adds version fields to htsFormat and removes bcfv1 from htsExactFormat
(even though it's really a separate format, this way is less confusing).
As htsExactFormat and htsFormat have not yet appeared in an HTSlib release,
changing these types' layout now is not a compatibility problem.  But soon
it will be, so also adds compression_level and [format-]specific fields, to
be used by an upcoming hopen(filename, "w", &format) open-for-write call.

hts_format_description() now builds a more accurate description string
based on all the format fields.  The string it returns must now be freed
by the caller -- cf htsfile.c, to which we also add --version and fix -?.

(Addresses #149, include compression info in htsfile description output.)

- - - - -
0ec1bb14 by John Marshall at 2015-01-30T09:52:33Z
Temporarily avoid rcDataObjFsync() [workaround]

Work around #168 (rcDataObjFsync() always fails and seems unnecessary
anyway) while we figure out what's going on here.

- - - - -
3ec78c15 by James Bonfield at 2015-01-30T10:27:36Z
Improved the CRAM stats array usage.

Read-pair detection during encoding now has a saner view, with
better commenting, of the hoops to go through to keep the cram_stats
arrays up to date.  A few bugs were detected here, potentially leading
to suboptimal Huffman trees, by instrumenting the ->encode() calls
and comparing them to cram_dump_stats() outputs.

Passes tests, real world data files and struggled to find manually
constructed torture cases that cause it to fail.  It's still hairy and
complex though!

- - - - -
92a01290 by John Marshall at 2015-01-30T10:30:39Z
Merge CRAM TLEN updates (PR #165)

Fixes imported from Staden io_lib revisions 3792/3795.

- - - - -
32b534fa by James Bonfield at 2015-01-30T11:13:29Z
Added support for XDG_CACHE_HOME.

- - - - -
1493ea07 by John Marshall at 2015-01-30T12:01:48Z
Avoid hiding under .cache in temp directories

If we end up falling back to $TMPDIR/$TEMP/tmp, the cache directory
should be visible as $TMPDIR/hts-ref rather than under a hidden
directory in $TMPDIR/.cache/hts-ref.

- - - - -
6c6f02e5 by John Marshall at 2015-01-30T12:07:29Z
Merge default local reference cache (PR #166)

- - - - -
af9768db by John Marshall at 2015-01-30T13:21:46Z
Rationalise include guard macro name

See also dde9bdbe4174728f99d1bbe7326ffd631539bef6.

- - - - -
9ecdaaec by John Marshall at 2015-01-30T13:28:39Z
Move remainder of config.h to bgzf.c and remove it

The remaining configuration #defines in config.h are used only in
bgzf.c, so move them there where they can be switched on/off in place.

Now that we have a configure script, having an unrelated config.h
could have led to confusion.

- - - - -
38d93e22 by John Marshall at 2015-01-30T14:52:38Z
Reinstate faidx_fetch_nseq() alongside faidx_nseq()

(See #156 for background).  To re-establish compile-time and binary
compatibility, reinstate both functions -- they're trivial anyway.

At our leisure, we can deprecate and remove the faidx_fetch_nseq()
declaration, and eventually remove the implementation too (probably
when we bump the soversion).

Add trivial test case that just exercises both functions.

Fixes #156.

- - - - -
38b53743 by John Marshall at 2015-01-30T15:12:49Z
Formatting fix for HTML man page

- - - - -
09d86177 by Martin O. Pollard at 2015-02-02T12:50:34Z
Add macro for deprecating old APIs

Add the HTS_DEPRECATED macro to mark APIs that are going to be removed.

- - - - -
139317e2 by Martin O. Pollard at 2015-02-02T12:50:45Z
Deprecate faidx_fetch_nseq in favour of faidx_nseq

- - - - -
3839b529 by John Marshall at 2015-02-02T13:36:41Z
Add hts_parse_decimal() to parse scientific notation as int

Add a function encapsulating parsing either integer or scientific
notation and returning int.  This allows integer items such as regions
to be given in scientific notation if desired, while ensuring that when
given as integers they are unequivocally parsed accurately as integers.

- - - - -
bff5efb8 by John Marshall at 2015-02-02T14:52:31Z
Release 1.2: various bug fixes, htsfile utility, CRAM improvements, etc

- - - - -
b0742b0b by John Marshall at 2015-02-02T16:16:10Z
Merge version number bump and NEWS file from master

- - - - -
d963c7d8 by Martin O. Pollard at 2015-02-02T17:38:29Z
Add c++ name mangling protection where it's missing

regidx.h
vcf_sweep.h
vcfutils.h

are missing their:

#ifdef __cplusplus
extern C {
#endif
// functions
#ifdef __cplusplus
}
#endif

Anti-name mangling measures. Also vcf.h has some of the inline functions outside the extern C bracers.

- - - - -
849ac47e by pd3 at 2015-02-03T08:06:26Z
Merge pull request #173 from mp15/cpp_mangling

Add c++ name mangling protection where it's missing
- - - - -
5d93cf17 by pd3 at 2015-02-03T08:07:13Z
Merge pull request #170 from mp15/deprecation

Add support for deprecation and deprecate old faidx_nseq API
- - - - -
7ebc5ae7 by John Marshall at 2015-02-03T15:37:46Z
Reinstate deprecated hts_file_type() and FT_*

Reinstate hts_file_type() for ABI compatibility with htslib 1.1 and
previous.  Reinstate FT_* macros for source compatibility.  Both are
deprecated and will be removed in a future HTSlib release; calling code
should migrate to hts_detect_format() and friends instead.

- - - - -
26229a36 by John Marshall at 2015-02-03T16:22:23Z
Release 1.2.1: patch release over 1.2, reinstating hts_file_type()

- - - - -
94d13ce0 by John Marshall at 2015-02-04T09:54:16Z
Merge hts_file_type() patch release from master

- - - - -
a2333856 by John Marshall at 2015-02-09T13:52:04Z
GCC added __deprecated__(message) in version 4.5

Fix breakage with GCC versions 3.1 to 4.4.x, which only implemented
__attribute__((__deprecated__)) without any message parameter.

Introduce macros to aid in these compiler version checks.

For GCC, use HTS_GCC_AT_LEAST(3,0) as an approximation of since forever;
releases before that (2001) are old enough to be considered pre-history.

Clang has had __has_attribute() since release 2.9; versions prior to that
are old enough (2010 or before) that losing these annotations there is
acceptable.  Clang's __deprecated__ attribute has taken a parameter since
at least that release.

- - - - -
90525e00 by John Marshall at 2015-02-10T15:47:23Z
Add $(htslib_kfoo_h) make variables for htslib/k* headers

These variables are needed after all, so that they can be empty
when samtools (or other third-party software) is compiled against
an installed system htslib rather than a development htslib.

Also it is no longer true that they don't include other files; see
for example khash_str2int.h.

Building HTSlib always uses its own headers, so this Makefile doesn't
itself need to use $(htslib_k*_h) variables for the first reason.
However the second reason still applies, so change them all in case
any other k*.h headers acquire #includes of other headers.

- - - - -
d1eef285 by John Marshall at 2015-02-11T11:45:42Z
Fix dependencies

Added htslib/faidx.h prereq missed by 139317e2830b8edf530983c5374866b55f203a05
Added vcf.c prereq missed by d8c03cf8f976ce230cd2d7dc80f9e1fde9789cd1
Added cram_io.c prereqs missed by a98d88fd423ef0b52c69100315ca93e0073f5911
Added tabix.c prereq missed by ed3efe902d11a65a2f782bf40051e6cf06942fc3
Added test-vcf-api.c prereq missed by b0df3d153e6eb12a1188208e54b8a0a2dd3fadba

Fix test/test-regidx.o alphabetisation.

- - - - -
0ec5202d by Nathan Weeks at 2015-02-11T17:00:55Z
Include unistd.h & exec bash instead of /bin/bash

Explicitly include unistd.h in cram_io.c, as it declares several
(fsync(), getcwd(), and access()) and the symbolic constant R_OK, and
may not be included by any of the other included headers in cram_io.c.

In test.pl, use exec('bash', ...) instead of exec('/bin/bash', ...), as
bash isn't guaranteed to be in /bin/ on all platforms (e.g., FreeBSD),
or the user may want to specify a different version of bash via their
PATH environment variable.

- - - - -
e5a01245 by James Bonfield at 2015-02-12T15:26:19Z
Hard clips no longer add to the number of mismatches for the NM tag.

- - - - -
11cee578 by James Bonfield at 2015-02-13T11:34:15Z
Fixed .crai loading of local irods:... caches.

For .bai files on remote protocols we download the .bai locally and
use the local copy.

The previous .crai implementation just used hopen so it opened the
remote end and never cached it.  We now follow the same procedure used
in hts_idx_load for bai.

- - - - -
83190e90 by James Bonfield at 2015-02-13T15:29:38Z
Added a function to compare @SQ headers against the .fai file.

If the @SQ lines differ in length, then use the .fai ones in
preference as it is the actual .fa file which we use for performing
delta encoding. This avoids reading off the end of buffers.

- - - - -
1f3ef0ae by James Bonfield at 2015-02-13T15:30:13Z
Consist with calmd for undefined MD/NM tag behaviour.

MD and NM tags now handle the cases where edits are off the end of
a reference.  SAM is undefined in this situation, but these should be
operating in the same manner as samtools calmd.

- - - - -
477e31e0 by James Bonfield at 2015-02-17T15:19:59Z
Moved hts_idx_getfn() into a newly created hts_internal.h.

- - - - -
24e0337c by James Bonfield at 2015-02-17T16:53:21Z
Fixed additional prototype gaff.

- - - - -
694ff3f2 by James Bonfield at 2015-02-17T17:31:12Z
Fixed Makefile dependencies for hts_internal.h.

Also added include of it into hts.c so it sanity checks its own
external prototype.

- - - - -
b8473493 by James Bonfield at 2015-02-18T10:26:40Z
Removal of now-defunct commented out code.

- - - - -
38b5cfd5 by James Bonfield at 2015-02-18T10:27:54Z
Fixed faked up @SQ headers.

For the sake of brevity, the reference sequences have been truncated
in the test data.  The newer cram_io code now spots the disparity
between the .fai file and the @SQ header, issuing a warning.  The
tests worked, but this change removes the unnecessary warnings.

- - - - -
a8f2afe6 by James Bonfield at 2015-02-18T14:43:01Z
Moved the cram/md5.[ch] code up a level to the public API and renamed it.

- - - - -
18ac533e by James Bonfield at 2015-02-23T12:26:00Z
Switched to using sam_* functions over bam_* functions so that index
querying works on CRAM files too.

- - - - -
cc02b2e7 by John Marshall at 2015-02-23T15:01:26Z
Merge fix to caching of remote .crai indices (PR #178)

Fixes #176.

- - - - -
73e0eb7f by James Bonfield at 2015-02-23T17:54:55Z
Changed the htslib MD5 interface.

The hts_md5_init function now returns a pointer to an md5 context,
which must be freed with hts_md5_destroy.  Various other utility
functions have been added at the same time to keep things efficient.

- - - - -
b255985f by James Bonfield at 2015-02-24T09:39:32Z
Fixed cram header-ref parsing.

Silly i vs j bug, but only causing issues when the .fai file
mismatches the SAM headers so not detected earlier.

- - - - -
47a2046a by James Bonfield at 2015-02-24T10:08:30Z
Added error checking for failed "-t file" option.

- - - - -
889ee011 by John Marshall at 2015-02-26T09:15:03Z
Configure fixes

Propagate @CPPFLAGS@ from configure.  Hat tip @ghuls, fixes #183.
Now all 5 current "influential environment variables" are propagated.

Rebuild affected objects if configure is re-run.  Hat tip @jrandall.
The alternative is '$(LIBHTS_OBJS) $(EVERY_OBJ_THERE_IS): config.mk',
but that seems annoying if you made only localised option changes.

- - - - -
c5521eea by John Marshall at 2015-02-26T16:50:15Z
Tidy up header #ifdef __cplusplus / extern "C" wrappers

cram/string_alloc.h had the #ifdef/}/#endif outside the multiple
inclusion #endif, leading to unbalanced-brace syntax errors in C++
on the second and later inclusions.  Fixed; hat tip thorfinn.

Canonically headers should be of the form:

    /* boilerplate comments */
    #ifndef GUARD_H / #define GUARD_H
    #includes
    #ifdef __cplusplus / extern "C" { / #endif
    declarations etc
    #ifdef __cplusplus / } / #endif
    #endif /* GUARD_H */

(See http://stackoverflow.com/a/16087609 and nearby.)  Tidied up headers
to adhere to this form.  Fixes #172.

Added extern "C" wrapper to: hfile_internal.h, hts_internal.h, kfunc.h,
cram_samtools.h (lower than usual due to intermixed type decls/#includes),
pooled_alloc.h, rANS_static.h, thread_pool.h, zfio.h.

Changed multiple extern "C" wrappers to a single one in: hts.h.

Lowered extern "C" to below #includes in: cram_codecs.h, cram_io.h,
cram_structs.h, cram/sam_header.h, string_alloc.h.

Removed extern "C" wrapper from cram.h, which contains only #includes.

Lifted extern "C" to the top of: bgzf.h, faidx.h, regidx.h, sam.h,
synced_bcf_reader.h, tbx.h, vcf.h, vcf_sweep.h; and lowered } to the
bottom of: vcfutils.h.

(Left most htslib/k*.h files as is, as they track upstream klib.
Most contain only #defines and static inline functions, so are probably
correct enough.  Those that have non-inline functions have extern "C"
wrappers around those function declarations.)

- - - - -
2ff3610d by James Bonfield at 2015-03-03T14:11:22Z
Improvements to handling CRAM slices with multiple references.

For claimed genome sorted data (SO:coorindate), we no longer drop back
into unsorted mode for slices containing multiple reference sequences
with the AP (assembly pos) delta encoding disabled.  This avoids high
memory usage on very small files where a single slice can have
mappings against all chromosomes, and yet is still sorted.

Also avoid excessive mutex locking when processing a slice with mixed
reference IDs. If the record level reference ID doesn't change from
one record to the next, don't attempt to query the new reference.

- - - - -
57a10151 by James Bonfield at 2015-03-06T16:47:42Z
Protection against various CIGAR/seq issues.

1) During encoding, if the cigar string is too long or too short for
the sequence then we produce an error and bail out.

2) During decoding, if the cigar string maps off the end of the
sequence and our sequence was "*" then we now have better bounds
checking when copying from the reference sequence.

- - - - -
fbb9e25c by James Bonfield at 2015-03-10T14:36:23Z
Fix for unmapped reads with "*" sequence.  These were defaulting to a
sequence length of -1 instead of 0 (it's unknown really), causing an
attempt to extract -1 bytes.

Also fixed NM and MD handling for sequence "*".  In this case we
cannot derive it as we have nothing to compare against, so we store
any values verbatim.  The same bug has been fixed for CRAM_OPT_NO_REF
mode.

The change is quite complex as it now also checks which of MD and/or
NM tags are already present (now that we case sometimes omit them
verbatim) in order to avoid creating duplicate tags.

- - - - -
f83dfd23 by James Bonfield at 2015-03-10T15:05:20Z
Tests of mapped and unmapped data with "*" sequence.

- - - - -
3af49b48 by James Bonfield at 2015-03-12T17:09:41Z
Added mmap support for references.

This greatly reduces memory usage when many jobs are running on the
same machine as the references are then shared between processes.

- - - - -
be6bd1e9 by Joshua Randall at 2015-03-15T10:38:36Z
Adds handling of corrupted BAMs to bam_index()

bam_index will now fail if bam_read1 returns an error

- - - - -
0555cd5c by James Bonfield at 2015-03-16T17:17:24Z
Changed hts_md5_hex args.

- - - - -
3accd193 by John Marshall at 2015-03-16T17:43:33Z
Merge addition of MD5 API (PR #180)

- - - - -
67c074ea by John Marshall at 2015-03-17T15:16:02Z
Canonicalise md5.c whitespace

Leave the Openwall parts of the file as is (and move the non-Openwall
hts_md5_init() to lower down), for ease of synchronisation with upstream.

Canonicalise the remainder by expanding tabs as spaces.

- - - - -
6f64ca3f by John Marshall at 2015-03-17T15:22:54Z
Const fixes; define hts_md5_context instead of adding hts_md5_ctx

Add two const fixes, one of which fixes -DHAVE_OPENSSL compilation.

Define the hts_md5_context struct and use it, avoiding lots of casts
and keeping the MD5 code more similar to upstream.

- - - - -
8cb307b2 by John Marshall at 2015-03-17T15:33:51Z
Update md5.c from upstream sources

Updated to Openwall md5.c rev 1.13 tag Owl-3_1-release (previously we
had rev 1.9 tag Owl-3_0-release):

- Help the compiler detect a common subexpression between steps in round 3
- Applied a trivial patch by Werner LEMBERG to make md5.c compile with g++
- Renamed the local variable "free" to "available" [avoiding free(3)]
- Added const qualifiers where appropriate

- - - - -
8ca328a1 by John Marshall at 2015-03-17T15:39:52Z
Merge minor MD5 API fixes

- - - - -
56f06408 by Adrian Tan at 2015-03-17T17:17:51Z
updated documentation for hts_open for uncompressed bcf.

- - - - -
9affea05 by John Marshall at 2015-03-19T11:14:55Z
Check hts_verbose before printing various VCF warnings

In particular, enables control of the message that led to ga4gh/server#253
and ga4gh/server#255.

- - - - -
5ffc4a20 by John Marshall at 2015-03-24T09:45:12Z
Prevent klib unused function warnings

Recent versions of Clang warn about unused static inline functions
in .c files (though they suppress this warning for such definitions
in header files).  Definitions via KHASH_INIT etc are effectively in
the .c file, and it's impractical to make these inline other than
static inline; so add attributes to suppress these warnings.

See upstream PR attractivechaos/klib#47.

(One warning about ks_getc() remains; htslib's use of kstreams is
a disaster area of clashing types that needs further surgery.)

- - - - -
4b4349a6 by James Bonfield at 2015-03-27T17:52:14Z
Fixed the broken "special case" in index querying.

The binary search due to the comparisons may yield a value 1 bin too
low when the position asked for precisely matches the start coordinate
of an index bin. The intention of the special case is to correct this,
but it did so without checking whether the current bin returned
actually did overlap the requested range too.

This therefore lead to starting to decode too late into the file,
giving fewer overlapping sequences than desired.

- - - - -
6b403236 by John Marshall at 2015-03-30T09:45:49Z
Allow commas in hts_parse_decimal(), simplify hts_parse_reg()

Parse digits by hand in hts_parse_decimal(), allowing integer parsing
of integer or scientific notation possibly with commas (before the decimal
point), and allowing possible future expansion to "3Mbp" and the like.

Add doxygen-style documentation, noting that warnings will be produced
for invalid input.

Rewrite hts_parse_reg() to take advantage of hts_parse_decimal()'s comma
abilities and validation.  Return NULL (instead of suggesting the sequence
name is the whole region string) when the region cannot be parsed.

(Changing the "*beg > *end" test to ">=" fixes the off-by-1 aspect
of samtools/samtools#353; returning NULL provides the opportunity to
improve the error message.)

- - - - -
62e4541d by John Marshall at 2015-03-30T10:12:49Z
Merge hts_parse_numeric() and hts_parse_reg() scientific notation (PR #171)

- - - - -
767cea0f by John Marshall at 2015-03-30T10:22:24Z
Rearrange hts_itr_querys() more comprehensibly [minor]

- - - - -
5f5aa02b by John Marshall at 2015-03-31T16:44:55Z
Set configure #defines via config.h

While using GNU Make target-specific variables to provide configuration
values only where they are needed is superficially attractive, doing so
introduces the risk that they may be accidentally omitted from some object
file that needs them, leading to inconsistencies.

(Compare for example 750f564ef5a65ec70cc7688519af734b05d595bb.)

So it is better to ensure that everything sees the same configuration
values by setting them via a universally-#included config.h header instead.

Add #include <config.h> to all .c source files except test/*.c, which
act as client code, and klib's upstream k*.c source files.  Add config.h
dependencies to these objects in the Makefile.

Add a Makefile rule to generate an empty config.h to preserve the ability
to make without running configure without needing -DHAVE_CONFIG_H command
line noise (hat tip @daviesrob).  We may make running configure required
in future.

- - - - -
4cbbfe90 by John Marshall at 2015-03-31T16:48:25Z
Merge mmap() support for CRAM reference sequences (PR #187)

Check AC_FUNC_MMAP in configure, but note that this invokes a bunch of
tests for standard headers etc, which is not ideal in 2015.  We use
AC_CHECK_HEADER carefully to avoid it pulling these tests in and perhaps
we can do something similar for AC_FUNC_*.

- - - - -
3524f63e by John Marshall at 2015-04-07T10:57:03Z
Remove unneeded system #include

The need for <sys/select.h> disappeared along with is_ready()
in samtools/tabix at 33faf18d012d8e6e3ff68b9a348139be54129de8.

- - - - -
7a13faab by Joshua C. Randall at 2015-04-07T12:56:22Z
fixes typo in perror

sam_hdr_read -> bam_hdr_read

- - - - -
f79fc6a7 by John Marshall at 2015-04-07T15:21:29Z
Merge bam_index() corruption detection (PR #189)

Check return value of bam_read1() in bam_index().  Refactored from
@jrandall's proposed code; in particular, avoids printing an error
message from this library routine.

- - - - -
45c14cc3 by John Marshall at 2015-04-07T15:27:09Z
Use malloc(32K) rather than getpagesize()/valloc()

These functions are not portable and have been obsoleted and removed
from POSIX.  (getpagesize() typically returned 4K.)

- - - - -
d620a735 by James Bonfield at 2015-04-08T15:41:58Z
Added support for CRAM_FLAG 0x8.

This is an indicator that the sequence started of in life as "*" and
so should be decoded as such.

- - - - -
40a401b9 by James Bonfield at 2015-04-08T16:32:57Z
Tests now use both CRAM v2.1 and v3.0.

Added a supplementary read test. (These worked for our 2.1
implementation, although technically only had formal support in 3.0.)

- - - - -
48e3b2ff by James Bonfield at 2015-04-09T10:14:59Z
Check CRAM->CRAM conversion works

- - - - -
4c8fb666 by James Bonfield at 2015-04-09T10:15:27Z
Improved error handling for when we have no bases at all due to
cr->len being zero.  This is a specific case triggered by
the torturous xx#minimal.sam.

Also removed an incorrect assertion that sequences with zero length
cigar strings are unmapped.  The specification states cigar of "*"
simply implies the unalignment is unavailable, but it could perhaps
still be mapped. (Eg due to a rough and ready hashing algorithm.)

- - - - -
e86391fe by John Marshall at 2015-04-09T14:51:23Z
Avoid -Wundef warnings in compiler detection

Check whether compiler-specific macros exist before testing their
values: undefined macros decay to 0 successfully, but with a warning
when -Wundef is used.  GCC (of course) does not define __clang_major__,
and with -no-gcc icc doesn't define __GNUC__ either.

Hat tip @noporpoise.  Fixes #197.

- - - - -
93621010 by John Marshall at 2015-04-15T12:42:17Z
Convert relative hseek(SEEK_CUR) offsets to absolute SEEK_SET positions

It turns out to be preferable to convert to SEEK_SET rather than to adjust
the SEEK_CUR offset, as some backends (e.g. the upcoming libcurl backend)
may not track their current physical position and would prefer not to
implement SEEK_CUR.  With this change, we send only SEEK_SET or SEEK_END
to the backend.

Compare glibc's fseek(), which uses lseek(SEEK_SET) for everything except
in unusual circumstances.

(Ideally we'd check for imminent overflow without actually overflowing, but
that's non-trivial in the absence of an OFF_T_MAX or a known file size.)

Refixes #152.

- - - - -
f0aaf9a4 by John Marshall at 2015-04-15T13:33:37Z
Handle local "file://" URLs directly

(The upcoming libcurl-based backend uses curl_easy_pause(), which doesn't
work for the "file" scheme.  So implement this scheme directly rather than
risk the libcurl backend claiming it.)

- - - - -
85bbb17e by John Marshall at 2015-04-16T09:13:45Z
Use -include rather than sinclude [minor]

It appears that an upcoming revision to POSIX will accept -include:
see http://austingroupbugs.net/view.php?id=333

- - - - -
2c0ea514 by John Marshall at 2015-04-16T15:38:27Z
Add libcurl hFILE backend

This code is a reworking of Heng Li's kurl.c (PR #164) into an hFILE
backend.  Improvements include leaving buffering to the existing
buffering in hfile.c, full error checking, and some bug fixes.

SEEK_CUR is not supported as the current file position is not tracked:
possibly it could be derived from CURLINFO_SIZE_DOWNLOAD and the most
recently-set CURLOPT_RESUME_FROM position, but this is probably more
trouble than it's worth.  (Currently hseek(SEEK_CUR) results in a
backend SEEK_SET, so this is not a limitation.)

The single shared curl.multi might be more trouble than it's worth.
If checking nrunning proves to be unreliable or having a single CURLM
causes multi-threading trouble, we may need to have a CURLM in each
hFILE_libcurl instead.  (If there were a form of CURL_WRITEFUNC_PAUSE
that caused curl_easy_perform() to return early, we wouldn't need to
use the multi interface at all!  However this is unlikely as it's a
significant change to how curl_easy_perform() works.)

Soon to follow: writing; handling of s3://bucket/filepath URLs.

- - - - -
0d0909a2 by John Marshall at 2015-04-17T13:21:06Z
Implement libcurl hFILE backend writing/uploading

- - - - -
6de7b82d by John Marshall at 2015-04-20T10:13:37Z
Rewrite libcurl timeout calculation

A more careful reading of curl_multi_fdset(3) shows that when it returns
maxfd == -1, a short fixed timeout should be used and curl_multi_timeout()
should not be called at all.  (The previous code worked with libcurl 7.22
but with 7.30 resulted in select(2) with no fds and a multi-minute timeout.)

- - - - -
aeb3c0a3 by John Marshall at 2015-04-21T08:20:12Z
Add another error translation

Not supporting range requests is like seeking on a pipe.

- - - - -
1a2e234e by James Bonfield at 2015-04-23T16:28:56Z
Added C/Java cross-validation script for CRAM.

This isn't yet part of our automated test harness, but it is useful to
have a copy in git for periodic manual checking.  Also improved the
compare_sam.pl with an option for comparing B-type auxiliary tags as
the Java code changes sign and also changes H to B.  I am unsure if
this is valid, but it helps if we can hide the known differences in
order to spot unknown ones.

- - - - -
6185ab74 by James Bonfield at 2015-04-23T16:31:06Z
Bug fix to the CRC32 checking for Java/C integration.

The block and container header CRCs check sum variable sized data.
The old method was to decode the structure and then reencode to a
block of memory that we can compute the CRC on.  Unfortunately there
is more than one valid encoding for the same numerical integer value,
and even more unfortunately Java and C implementations differ.  This
gives rise to false CRC failures.

The new code computes the CRC as it goes using the actual bytes being
decoded.  This isn't as quick, but the measured difference in speed is
under 1%.

- - - - -
35746af6 by James Bonfield at 2015-04-24T09:13:56Z
Fixed a Java/C integration failure.

When c1#bounds.sam is encoded by Java and decoded by C the additional
bases overhanging the end of the reference are encoded using feature
'X' and an assumption that the matching reference is 'N'. (This C code
uses feature BA to store the base directly instead.)

Changed the code here to omit the warnings about going off the end of
the reference and to treat out of bounds reference as N.

- - - - -
7725a747 by James Bonfield at 2015-04-28T15:36:37Z
Changed default quality value when unable to decode to 255. This then
gets converted to "*" within BAM format.

- - - - -
ccd0a6b0 by James Bonfield at 2015-04-29T08:34:56Z
C/Java validation testing updates.

Reduced the number of exceptions passed into compare_sam.pl so we can
catch more differences.

Expanded the set of test files to be all data rather than a more
restricted subset as we're now approaching agreement.

- - - - -
b8720d00 by James Bonfield at 2015-04-29T11:44:55Z
Fixed buffer overrun in MD tag calculation.

This occurred where the CRAM container/slice header is shorter than
the alignments held within it.

- - - - -
7e59273e by James Bonfield at 2015-04-29T13:56:47Z
Don't create MD/NM tags on records with seq "*".

Also added more test cases for checking pileup output (untested here)
and TLEN sign/size checks (spec vs picard).

- - - - -
2be9f9f1 by James Bonfield at 2015-04-29T13:58:33Z
Fixed compilation sign warning.

- - - - -
3eb77b89 by James Bonfield at 2015-04-29T14:19:45Z
Added missing test file. Sorry!

- - - - -
b79f40a7 by James Bonfield at 2015-04-29T14:24:35Z
Added missing test file. Sorry!

- - - - -
4d707943 by James Bonfield at 2015-05-01T16:05:59Z
Added ~ files and patch .rej/.orig.

- - - - -
eb357efd by James Bonfield at 2015-05-01T16:07:13Z
Moved cram option setting into htslib from test.

Specifically the tools for processing command line arguments are in
htslib now. These were originally in test_view only.

Renamed cram_option to a more general hts_fmt_option so we can use it
for controlling verbosity, number of threads, compression level, etc.
This is still a work in progress as for now it is still only honoured
by the CRAM I/O code.

The main API is via hts_opt_add during CLI parsing, hts_opt_apply to
apply options to a file descriptor and hts_opt_free to deallocate
memory.  Additionally for programmers the hts_open mode string can now
take comma separated options, like "wc" vs "wc,version=3.0,embed_ref".
(I am unsure if this latter bit is worthwhile keeping.)

- - - - -
15c92bfa by Petr Danecek at 2015-05-07T14:54:04Z
vcf: New bcf_add_id() function

bcf_add_id() - adds to the ID string checking for duplicates

- - - - -
30fb9eee by Shane McCarthy at 2015-05-07T15:02:05Z
allow underscore and dot in keys for structured header lines

We were forcing alphanumeric-only characters in the keys for structured
header lines. From the spec, INFO and FORMAT ID keys should match the regex
`^[A-Za-z_][0-9A-Za-z_.]*$`. It is unclear about keys for other
structured header lines, but this would apply the same restictions.

- - - - -
41f11938 by James Bonfield at 2015-05-13T16:56:26Z
Added hts_open_opts() and an htsFileOpts struct.

Rejigged the opening code so that we don't have to open a file and
then apply options.  Instead we reuse the htsFormat structure as a way
to control output formats, along with the hts_opt list.

- - - - -
1f626c13 by James Bonfield at 2015-05-14T11:26:51Z
Fixed requests to output SAM when the default mode is "wb".

- - - - -
b6893767 by James Bonfield at 2015-05-14T11:27:36Z
Added htsExactFormatString() to convert enum to string.

- - - - -
431aafed by James Bonfield at 2015-05-15T08:39:14Z
Allow the nthreads=N option to control BAM encoding threads too.

- - - - -
7a4f7e27 by James Bonfield at 2015-05-15T10:40:56Z
Added a -N INT option to stop decoding after INT reads.

- - - - -
56f3d78d by James Bonfield at 2015-05-15T10:40:57Z
Added more decoding error detection.

- - - - -
15a3be52 by James Bonfield at 2015-05-15T10:41:04Z
Fixed error handling in CRAM.

A bit of a howler: closing a file when we're not at EOF is not an
error!  We can legitimately open a file, read 10 of 100 records, and
then close.

Improved sam_read1 return values so that it can distinguish between
eof (-1) and error (-2 or lower).  This is done by using cram_eof and
checking that a -1 return value is due to eof rather than something
else.

These two combined essentially move the onus of error detection away
from hts_close to sam_read1, as per bam I/O.

- - - - -
6011a93d by James Bonfield at 2015-05-15T11:49:25Z
Merge branch 'develop' into output-fmt-option

Conflicts:
	test/test_view.c

- - - - -
a419848d by James Bonfield at 2015-05-15T14:54:03Z
Use htsFormat instead of htsFileOpts struct for opening.

htsFormat now uses the void*specific field for storing the options,
rather than having an htsFileOpt structure with options and htsFormat
as sub-fields.  I'm not entirely convinced on this as the void * is
messy, but it's a start.

Renamed hts_open_opts as hts_open_format.

- - - - -
7579577f by Rob Davies at 2015-05-18T11:02:36Z
Add and use safe_itf8_get() to avoid more buffer over-runs.

Add new static inline safe_itf8_get, to replace the old itf8_get macro.
It takes a pointer to the end of the buffer so it can test if it has run
out of data.  If it fails to read a complete integer it returns 0,
otherwise it returns the number of inout bytes consumed, as for itf8_get.

Change lots of uses of itf8_get() to safe_itf8_get().  Also add some
other checks for running out of data.

Make cram_decode_slice_header uncompress the slice header block if it
isn't of type RAW.  This goes beyond what the CRAM specification says,
but there's no real reason why it can't be compressed.

All this fixes several bugs found during "american fuzzy lop" fuzz testing.

- - - - -
2c20ba3d by Rob Davies at 2015-05-18T11:02:36Z
Add more codec sanity checks.

Add, or fix, checks for running out of input to get_zero_bits_MSB,
cram_gamma_decode, cram_huffman_decode_char0 and cram_huffman_decode_int.

Ensure trying to decode a null (i.e. zero symbols) huffman stream returns
an error.

Catch attempts to use cram_byte_array_stop_decode_init on anything other
than a BYTE_ARRAY or BYTE_ARRAY_BLOCK.

Prevent an invalid read if cram_decoder_init is given an out of range codec
number.

Fixes more bugs found during "american fuzzy lop" fuzz testing.

- - - - -
62ad4119 by Rob Davies at 2015-05-18T11:02:37Z
Prevent running on after decoding errors.

Bail out of cram_decode_slice, cram_decode_aux and cram_decode_seq faster
if errors are detected.  Prevents use of uninitialized values.

Fixes more bugs found during "american fuzzy lop" fuzz testing.

- - - - -
3954acee by Rob Davies at 2015-05-18T11:02:37Z
rANS decoder error checking.

Make rans_uncompress check that it has enough bytes for the decoder to start.

Make cram_uncompress_block check for rANS decoder failures.

Fixes bugs found during "american fuzzy lop" fuzz testing.

- - - - -
e33a6feb by Rob Davies at 2015-05-18T11:02:38Z
Make cram_get_seq loop round straight after calling cram_next_slice.

There would be an out-of-bounds array access if the next slice contained
no records.  This is fixed by looping around again so that it detects
there are no more records and calls cram_next_slice again.

Bug found during "american fuzzy lop" fuzz testing.

- - - - -
a30f9069 by Rob Davies at 2015-05-18T11:06:26Z
Fix various reference related bugs in cram_decode_slice

When using embedded references, ensure that s->hdr->ref_base_id is in
the valid range for the s->block_by_id lookup table.  Also check that the
block decompresses correctly.

Don't try to call cram_ref_decr on embedded references.

For multi-reference blocks where the RI information was missing or not
decoded, use -1 (i.e. unmapped) for the reference id instead of 0.

Check that the refernce id is within the range of refence ids in the SAM
header.

Fixes bugs found during "american fuzzy lop" fuzz testing.

- - - - -
83a50014 by Rob Davies at 2015-05-18T11:06:27Z
Ensure last read group name is not NULL before trying to strcmp it.

Fixes bug found during "american fuzzy lop" fuzz testing.

- - - - -
876e71d5 by Rob Davies at 2015-05-18T11:06:27Z
Catch slices with no data blocks.

The specification says that a slice should have at least one data block.

Fixes bugs found during "american fuzzy lop" fuzz testing.

- - - - -
c1c89092 by Rob Davies at 2015-05-18T11:08:03Z
Add more checks to cram_read_SAM_hdr

Ensure that the block decompresses successfully.

Check that header_len >= 0.  The specification allows a signed value.

Fixes bugs found during "american fuzzy lop" fuzz testing.

- - - - -
fb91924b by Rob Davies at 2015-05-18T11:08:04Z
Change assertion in cram_decode_slice_xref to if (...) return -1.

Make cram_decode_slice_xref return int instead of void, and change the
assertion to return -1 instead.  Also add a check to catch out of bounds
array access if mate_line is outside the valid range.

Check the return value of cram_decode_slice_xref in cram_decode_slice.

Fixes crashes found during "american fuzzy lop" fuzz testing.

- - - - -
dee92608 by Rob Davies at 2015-05-18T11:08:04Z
Ensure header_len >= 0

Prevents malloc(0) if header_len == -1.

Bug found during "american fuzzy lop" fuzz testing.

- - - - -
f7be681f by Rob Davies at 2015-05-18T11:08:05Z
Make cram_byte_array_len_decode return errors from sub-codecs.

If either len_codec or value_codec fail in cram_byte_array_len_decode then
it will return non-zero.

Fixes bugs found during "american fuzzy lop" fuzz testing.

- - - - -
81d6ea65 by Rob Davies at 2015-05-18T11:08:06Z
Fix buffer over-runs in cram_decode_seq for sequences starting beyond the ref

Incorrect handling of sequences that started beyond the end of the reference
sequence could lead to memset writing past the end of the seq array.  Add
code to handle this case correctly.

Fixes bugs found during "american fuzzy lop" fuzz testing.

- - - - -
3a2ddb84 by Rob Davies at 2015-05-18T11:08:07Z
Catch negative length in cram_byte_array_len_decode.

Fixes bug found during "american fuzzy lop" fuzz testing.

- - - - -
b43e9b7b by Rob Davies at 2015-05-18T11:08:09Z
Pull code to lookup external blocks into cram_get_block_by_id function.

Add a new static inline cram_get_block_by_id to lookup external blocks.  It
includes checks to ensure looking up via the slice->block_by_id array
doesn't go out of bounds.

Change code that looked up block ids to use the new function.  This fixes
a fixme in cram_decode_slice, so embedded references should work for any
block id, not just those below 1024.

Fixes bugs found during american fuzzy lop fuzz testing.

- - - - -
f2863494 by Rob Davies at 2015-05-18T11:08:09Z
Re-order test for end of input in cram_byte_array_stop_decode_block

Make it check that it has not reached the end of the input data before
trying to dereference the pointer.

Fixes bug found during "american fuzzy lop" fuzz testing.

- - - - -
8013558f by Rob Davies at 2015-05-18T11:08:10Z
Better sanity checking when reading content_ids in cram_decode_slice_header

hdr->num_content_ids is passed to malloc, so ensure it has a reasonable
value.

Bail out of reading block_content_ids as soon as possible if it runs out
of input.

- - - - -
05cd2fad by Rob Davies at 2015-05-18T11:08:12Z
Various fixes in cram_decode_seq.

Ensure read features don't start before the beginning of the read.

Fix check for which data series are required when decoding soft clips.  Should
check for CRAM_SC and not CRAM_IN for version 2+ CRAM files.

Convert abort on unknown feature code to return -1.

- - - - -
ca1bcefe by Rob Davies at 2015-05-18T11:08:13Z
Add more sanity checks to cram_decode_slice.

Error on reads with apparantly negative length.

Error on alignments with position <= 0 and no unmapped flag.

Fixes problems found by "american fuzzy lop".

- - - - -
93094c70 by Rob Davies at 2015-05-18T11:08:13Z
Better tests for bit-based codecs running out of input.

"american fuzzy lop" found a case where the checks for running out of
input added in commit 1ee139e86b failed.  This happened due to the result
of in->uncomp_size - in->byte being unsigned rather than signed, so the
check failed when in->byte was greater than in->uncomp_size.

The checks are pulled into a new static inline cram_not_enough_bits.  This
has been checked with frama_c to work as long as:
   0 <= blk->bit < 8
   0 <= blk->uncomp_size
   blk->uncomp_size, blk->bit and nbits are of type int32_t
   blk->byte is of type size_t

Instances of the old code are replaced with cram_not_enough_bits.

get_one_bits_MSB gets the same checks for running out of input as are already
in get_zero_bits_MSB.

cram_subexp_decode_init gets a check to ensure subexp.k >= 0 and
cram_subexp_decode checks for i >= 0.

- - - - -
6e3f665a by Rob Davies at 2015-05-18T11:08:14Z
Make cram_subexp_decode_init use safe_itf8_get.

- - - - -
509b7720 by Rob Davies at 2015-05-18T11:08:15Z
Set sub_size to -1 to catch cases where it doesn't get set.

- - - - -
e7c11026 by Rob Davies at 2015-05-18T11:08:16Z
Remove redundant code from cram_byte_array_stop_decode_init.

- - - - -
4299ea1e by Rob Davies at 2015-05-18T11:35:02Z
Avoid possible integer wrap-around in cram_read_SAM_hdr

Add some casts to size_t to avoid (header_len + 1) becoming negative.

Put in checks for positive c->length and len so (c->length - len) always
makes sense.

- - - - -
bbd13730 by James Bonfield at 2015-05-18T12:40:57Z
Merge pull request #207 from daviesrob/afl_1

CRAM fuzz testing bug fixes
- - - - -
06fc1d05 by James Bonfield at 2015-05-19T16:01:26Z
Protection against crashes when given broken data.

- - - - -
de591275 by James Bonfield at 2015-05-19T16:02:25Z
Improvements to inline documentation.

- - - - -
cd83252a by James Bonfield at 2015-05-19T16:02:45Z
Removed quality values from the CRAM_SEQ macro and added a CRAM_QUAL one.

This is an optimisation so that, for example, bam2depth, doesn't
decode quality values unnecessarily.  Note that due to the existance
of some combined feature codes such as 'A' (base and score) and 'B'
(read base) that access both BA and QS data series, it may be that
decoding a sequence explicitly requires fetching from the quality
stream too.

In this situation, there is explicit ds&CRAM_BA and ds&CRAM_QS so only
the half of the data series needed will be decoded.

- - - - -
590a3527 by James Bonfield at 2015-05-20T15:31:29Z
Improvements to htsFormat handling.

hts_open_format() now automatically sets the file type based on
filename suffix.

hts_parse_format() no longer clears the "specific" field.  This means
that using --reference x.fa --output-format cram works. (Previously it
cleared the reference setting again.)

- - - - -
0dbb3b8b by James Bonfield at 2015-05-21T08:29:23Z
Removed unused variable caused by hts_set_opt migration.

- - - - -
0662716f by Charles Plessy at 2015-05-25T02:44:54Z
Merge tag '1.2.1' into debian/unstable

HTSlib patch release 1.2.1, reinstated hts_file_type()

- - - - -
4fc92f5a by Charles Plessy at 2015-05-25T03:40:23Z
Update symbols file.  Note that there are missing symbols.

 #MISSING: 1.2.1# cram_byte_array_stop_decode_char at Base 1.0
 #MISSING: 1.2.1# cram_external_decode_block at Base 1.0
 #MISSING: 1.2.1# cram_external_encode at Base 1.0
 #MISSING: 1.2.1# download_and_open at Base 1.0

- - - - -
e047ee43 by Charles Plessy at 2015-05-25T04:00:44Z
New upstream release; no new copyright nor license statement.

- - - - -
084f27db by Charles Plessy at 2015-05-25T04:05:28Z
Install NEWS as upstream changelog.

- - - - -
316fb961 by Charles Plessy at 2015-05-25T04:25:48Z
Override a Lintian false positive.

- - - - -
33636795 by Charles Plessy at 2015-05-25T04:31:02Z
Clead test directory.

- - - - -
69bf167d by Charles Plessy at 2015-05-25T04:33:56Z
Corrected a typo.

- - - - -
12251926 by Charles Plessy at 2015-05-25T04:56:39Z
htslib (1.2.1-1) unstable; urgency=medium

  0662716 Merge tag '1.2.1' into debian/unstable
  084f27d Install NEWS as upstream changelog.
  316fb96 Override a Lintian false positive.
  3363679 Clean test directory.
  69bf167 Corrected a typo in tabix description.

 -- Charles Plessy <plessy at debian.org>  Mon, 25 May 2015 13:35:03 +0900

- - - - -
7b0edf55 by John Marshall at 2015-05-26T08:21:58Z
[make] Add maintainer-clean as a synonym of distclean

- - - - -
4854ee66 by John Marshall at 2015-05-26T11:10:12Z
[faidx] Warn if duplicate sequence names are encountered

Add only the first sequence of each name to the faidx_t and .fai file.
An alternative would be to uniquify subsequent duplicates by changing
their names to "<NAME>_<N>" or something, but there's a risk of introducing
further clashes -- better to leave the user to sort it out.

Fixes samtools/samtools#380.  An upcoming samtools documentation change
will note that subsequences retrieved are from the *first* instance of
duplicate-named reference sequences.

- - - - -
f9577275 by John Marshall at 2015-05-26T14:18:48Z
Use bam_endpos() to fix bins for unmapped reads

In sam_parse1() we can't use bam_endpos() but we now do the same
BAM_FUNMAP test as bam_endpos() would do.  (Add "no CIGAR operations"
parse error so that we can assume that n_cigar>0 here.)

Hat tip @dpryan79.  Fixes #206.

Also deobfuscate one more memory allocation via sizeof() (cf ef59ef2d6425985732d41bf7389df569a2a14c0a).

- - - - -
949840b0 by James Bonfield at 2015-05-27T13:55:32Z
Bug fixed cram_load_reference().

When given NULL as the filename it is meant to, and indeed does, just
load the reference by extracting the @SQ headers.  Unfortunately it
returned -1 as it also reuses fn to go from .fai to .fa and detects
NULL as a failure to process the .fai.

- - - - -
2e7a2223 by Rob Davies at 2015-05-27T16:16:27Z
Add more checks to beta codec.

Add missing check for enough bits to cram_beta_decode_char().

Ensure that beta.nbits is within the valid range in cram_beta_decode_init().

- - - - -
a1d2bb5c by James Bonfield at 2015-05-27T16:37:22Z
Merge pull request #212 from daviesrob/afl_2

Add more checks to beta codec.
- - - - -
ec7151a2 by James Bonfield at 2015-05-28T10:29:22Z
Added a -B (benchmarking) mode to test_view.

The purpose of this is simply to be able to run benchmarks of format
decoding only, without requiring other analysis to be run (eg samtools
flagstat, etc).

- - - - -
7e3add3f by James Bonfield at 2015-05-28T10:30:17Z
Merge branch 'develop' of github.com:samtools/htslib into develop

- - - - -
e2743e6a by John Marshall at 2015-05-28T11:48:43Z
Add missing entries

- - - - -
7c1de4ba by John Marshall at 2015-05-29T08:13:39Z
Add rule to generate config.h

Usually config.h will be made by running configure, but while we have
the Makefile-makes-fallback-config.h approach (and until samtools et al
get their own configure scripts that call htslib's configure) we need a
rule here to invoke it.

- - - - -
b5a15369 by John Marshall at 2015-05-29T12:29:43Z
Add support for S3 pseudo-URLs

Rewrite S3 pseudo-URLs to http/https URLs, adding Date and Authorization
headers for Amazon S3.

At present, access keys may be specified in the URL (in the usual URL
authority "[id:secret@]bucket" way) or via the usual AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY environment variables.  It remains to add code
to read them from config files -- probably just ~/.aws/credentials and
~/.awssecret.

- - - - -
d8172f91 by James Bonfield at 2015-06-01T10:40:29Z
Made v3.0 the default version of CRAM (was 2.1).

- - - - -
d75ed7a5 by James Bonfield at 2015-06-02T16:13:54Z
CRAM ref of "*" bug-fix.

The bug was that ref seq '*' was being converted to '\n' due to
'*' & ~0x20.   A quick test shows it's ~1% time saving on a low depth
(~10%) chr20 cram.  Hence this is just a case of over-optimisation.

- - - - -
1980e585 by John Marshall at 2015-06-03T06:46:19Z
Improve faidx blank line handling

Multiple sequence FASTA files shouldn't have blank lines before subsequent
">" headers, but we mostly support that so we need to support it in the
case where all sequence lines are the same length too.

Fixes http://seqanswers.com/forums/showpost.php?p=172494&postcount=6

Probably other bugs are lurking here; this incomprehensible state
machine could benefit from some rewriting...

- - - - -
826454f8 by John Marshall at 2015-06-04T13:53:22Z
[make] Add distdir hook for Automake superprojects

We would prefer clients to build against an already-installed separate
htslib; but we'll also grudgingly enable "make dist" when we are bundled
as a subdirectory of an Automade project.

(See info automake "Third-Party Makefiles".)

Also build tags with either "TAGS" or "tags" targets.

- - - - -
d72ce03a by Rob Davies at 2015-06-05T14:05:00Z
Add extra checks on input data and memory allocations to bam_hdr_read.

  Check that all calls to bgzf_read() return the expected number of bytes.

  Check all malloc return values for NULL.

  Ensure h->l_text won't wrap around when 1 is added to it.

  Check h->n_targets (n_ref in spec.) is not negative.

  Check that h->n_targets isn't so big that it is not possible to allocate
memory for the h->target_name and h->target_len arrays on 32-bit platforms.
Also ensures that there will be no problems with integer overflow when
working out how much memory to allocate.

  When reading the list of references:
    Check that name_len (l_name in spec.) is >= 0.
    Check that h->target_name[i] (name in spec.) does end with NUL.  Add one
if it doesn't, ensuring that the length of the resulting string won't
overflow an int32_t.

If anything goes wrong, clean up ensuring that all memory allocated by the
function gets freed, and return NULL.

Calls to calloc have been replaced with malloc.  Checking the bgzf_read
return values ensures that all allocated memory will be filled using data
read from the input file, so zeroing it first is a waste of time.

- - - - -
5f5a7aea by Rob Davies at 2015-06-05T14:05:00Z
Add HTS_RESULT_USED for bam_hdr_read() and sam_hdr_read().

- - - - -
14831e12 by Rob Davies at 2015-06-09T16:59:49Z
Make view_sam set status to EXIT_FAILURE if it fails.

Put in a check for sam_hdr_read() returning NULL in view_sam.  It
turns out that view_sam didn't have a way of returning a failure code,
so add that as well along with some other missing error checks.

- - - - -
8380601f by Rob Davies at 2015-06-09T17:07:25Z
Add check for bam_hdr_read() returning NULL to bam_index.

- - - - -
9de542c6 by Afif Elghraoui at 2015-06-11T09:32:49Z
Fix various spelling errors

Debian QC tools complained about spelling errors in the source
code. This patch corrects them.

[Applied pysam-developers/pysam#122.]

- - - - -
10c49208 by Petr Danecek at 2015-06-11T13:47:47Z
Fix in VCF header parsing

bcf_hdr_append() followed by bcf_hdr_write() should not make a
subsequent bcf_read() call fail if tag IDs remain unchanged, as
in the issue #213

- - - - -
a89f22f6 by Petr Danecek at 2015-06-11T13:53:03Z
bcf_hdr_remove: NULL to remove all records of the given type

- - - - -
adec4d33 by Rob Davies at 2015-06-15T10:05:19Z
Ensure SAM query name length fits in a byte.

- - - - -
a03a15fd by Rob Davies at 2015-06-15T11:32:25Z
Ensure names are present for @SQ/@RG/@PG lines in sam_hdr_update_hashes

sam_hdr_update_hashes goes through the tags on a header line that it's
interested in, filling in values in a SAM_hdr_tag struct as it goes.
One of these is name, which is set to a copy of the SN tag for @SQ lines
and ID for @RG and @PG lines.  These must be present according to the SAM
specification, so ensure that name has been set after going through all
the tags for the line.  This means anything using the name field (notably
refs_from_header) can assume that it isn't NULL.

Fixes a crash found by the Americal Fuzzy Lop fuzz tester.

Also fix some potential memory leaks that could happen if realloc
fails in this function.

- - - - -
e1a67ee3 by Rob Davies at 2015-06-16T08:28:31Z
Stop gzip header code from reading too far in bgzf_read_block.

When working out how many bytes to skip, make it compare to count (the
number of bytes read) instead of BGZF_BLOCK_SIZE.  This prevents it from
reading uninitialised memory if the file finishes part way through the
header.

- - - - -
76f64dac by Rob Davies at 2015-06-16T15:33:18Z
Fix check on return value of sam_write1.

sam_write1 returns the number of bytes written on success for bam and sam
so the check for failure needs to be < 0 and not != 0.

- - - - -
ad122a7a by Rob Davies at 2015-06-16T16:03:00Z
Make view_vcf set status to EXIT_FAILURE if it fails.

Change view_vcf in a similar way to view_sam so that it will set status
if it detects a failure.

- - - - -
ea63c205 by Rob Davies at 2015-06-16T17:09:42Z
Fix handling of zero-bit beta codec case.

Using the beta codec with nbits=0 is valid, but confuses
cram_not_enough_bits() as it's possible to have no input data in
this case so cram_not_enough_bits thinks it has run out.

Move the calls to cram_not_enough_bits inside checks that c->beta.nbits
is non-zero to avoid the problem.

- - - - -
9853332b by Rob Davies at 2015-06-16T17:26:53Z
Allow cram_not_enough_bits to return 0 if no bytes left but nbits is 0.

If the caller isn't going to use any bits then it's acceptable for the
input data to have been completely consumed.  Fixes an odd edge case
in the beta codec (and possibly others).

- - - - -
e51893d0 by Rob Davies at 2015-06-17T13:21:21Z
Ensure sam_hdr_parse doesn't run off the end of the string it's reading.

When looking for @SQ header lines, search for "@SQ\t" instead of just @SQ.
This ensures the term q = p + 4 points to either a NUL byte or more string
and not past the end.  The spec. says there should always be a tab after
@SQ so this should be OK on valid files.

Add extra checks for NUL to the loops that iterate through the string.

Fixes a crash found by the American Fuzzy Lop fuzz tester.

- - - - -
6eff5257 by Rob Davies at 2015-06-17T15:19:03Z
Convert status to a global variable.

John wrote:

"It's a simple little self-contained program, we should just make status
a global alongside mode and show_headers and these functions can set it
to EXIT_FAILURE themselves."

So make view_sam and view_vcf set global status instead of taking a pointer.

- - - - -
53a75288 by John Marshall at 2015-06-19T16:40:23Z
Merge fuzz fixes (PR #218 and #223)

- - - - -
36bb0b66 by James Bonfield at 2015-06-22T15:28:42Z
Renamed int32_{get,put} to int32_{get,put}_blk.

This is so they match the naming used by the externally visible
itf8_get_blk function.  Added these to the external cram_io.h so they
can be used by the Samtools cram_reheader branch.

- - - - -
93ae5c67 by James Bonfield at 2015-06-23T14:04:54Z
Fixed bugs in CRAM header manipulation.

Exposed by sam_hdr_add_PG(), the khash-ification of the original
io_lib code introduced some not so subtle bugs.  Also fixed an overly
assertive assert statement.  Oops.

- - - - -
633a4eb1 by John Marshall at 2015-06-23T14:17:40Z
Specify signedness for bit-fields explicitly

Plain int bit-fields technically have indeterminate signedness, and in
particular 1-bit signed bit-fields are unexpectedly entirely sign bit,
producing warnings on "s.foo = 1; /* should be -1 */" assignments with
gcc -Wsign-conversion and copious warnings on the struct definitions in
the headers with NVIDIA's nvcc.  Fixes #225, hat tip @nsubtil.

The changed hFILE fields are in (semi-)opaque structs so changing them
brings no compatibility concerns.  Some BGZF fields are used by client
code, so we are careful not to change the struct's layout.  The 2-bit
bit-fields are in fact used as booleans, so are presumably 2 bits as
an inferior sign bit workaround.  We've left them with the same sizes
so as not to change their layout.

(There remains one plain int bit-field in htslib/kseq.h.  We may fix
that later in conjunction with klib.)

- - - - -
fe11d211 by James Bonfield at 2015-06-24T16:00:47Z
Added an external CRAM API to htslib/cram.h.

Note that this file contains duplicates of function declaration from
cram_io.h and sam_header.h along with incomplete types copied from
cram_structs.h.[1]

Any duplicated code can lead to errors, so to stave this off the
internal cram/cram.h also includes the external htslib/cram.h to check
function declaration compatibility.

I could of course move instead of copy the function declarations, but
also leads to untidyness with half of the functions in cram_io.c being
listed in cram_io.h and the other half being listed in a header file
in a totally different directory.  Neither duplication nor
distribution are particularly elegant solutions.  (More elegant would
be #ifdef CRAM_INTERNAL to have all in the same file, but that poses
more questions over the cram vs htslib subdirectories.)

[1] To allow gdb to still view the completed versions, I added struct
names to cram_structs.h too.

- - - - -
5c51f840 by David K Jackson at 2015-06-26T11:08:14Z
Always link with -pthread [minor]

Regularise linking options (except for OS X-specific dylibs etc, where
pthreads are in libc); even test/hfile needs it when built with iRODS 3.3,
whose rcConnect() can be configured to use pthreads.

- - - - -
d8f5fa60 by John Marshall at 2015-06-29T16:40:15Z
Detect index format in hts_idx_load_local()

hts_idx_load()'s format parameter is really for deciding what suffices to
try adding to the filename.  Index files (except perhaps CRAI, which can
be identified by being associated with a CRAM file) have magic numbers
that we can use instead.

hts_idx_load_core() now reads only from BGZF*, so idx_read() can disappear.

- - - - -
e9b7db44 by John Marshall at 2015-06-30T15:21:50Z
Add index build/load functions with explicit index filenames

hts_idx_load2() takes no fmt parameter as hts_idx_load() only really
uses it to choose what filename extensions to try.  Conversely,
hts_idx_load2() is the only one of these new functions that does not
accept fnidx==NULL to look alongside fn.

Change hts_idx_save() to return int so it can signal errors.  It now
checks bgzf_open()/fopen() and returns -1, addressing samtools/samtools#312
except that the error message from samtools is misleading.

Provides functions required for pysam-developers/pysam/issues/87 and
samtools/samtools#199.

- - - - -
3a95a67f by James Bonfield at 2015-06-30T16:48:47Z
Fixed a bug in CRAM container num_blocks field when dealing with
multiple slices per container.  It was counting the compression header
per slice rather than only once.

Oddly this didn't cause any decoding issues for scramble, samtools or
cramtools.

- - - - -
1201e419 by Shane McCarthy at 2015-07-06T09:36:28Z
Merge pull request #220 from mcshane/feature/bcf_hdr_remove_all

bcf_hdr_remove: NULL to remove all records of the given type
- - - - -
acc25049 by Ryan Wilson at 2015-07-06T11:21:01Z
Increase size of max format string size from 2^16 to 2^32.

Fixes the issue of integer overflow when a VCF format string
contains a large string (in our case, nucleotide sequence).
2^32 should be a large enough number of characters (> 1 GB) such
that its an acceptable upper bound.

Closes #204, closes #221

- - - - -
a151ef02 by Shane McCarthy at 2015-07-06T11:52:36Z
Merge pull request #191 from atks/develop

updated documentation for hts_open for uncompressed bcf.
- - - - -
58358e15 by Shane McCarthy at 2015-07-06T14:29:56Z
warn if no BGZF EOF for VCF/BCF files

see samtools/bcftools#220

bcf_synced_reader will warn if the BGZF EOF marker is missing
when adding a new file. Error is added to bcf_sr_strerror so it
could be used. However, we don't fall over, just warn. We want
to be able to, say, view a BCF file while it is still being
written. Samtools also just warns when EOF missing from a BAM.

- - - - -
2bb9370f by Petr Danecek at 2015-07-06T15:24:59Z
deprecate bcf_hdr_combine in favour of new bcf_hdr_merge

This is a htslib-side fix for samtools/bcftools#208

- - - - -
1f45b194 by James Bonfield at 2015-07-06T17:05:19Z
Merge pull request #236 from samtools/feature/idx-filenames

Add index build/load functions with explicit index filenames
- - - - -
b32afba7 by John Marshall at 2015-07-14T10:13:43Z
Fall back to path-style S3 bucket access if necessary

Check whether bucket names are DNS-compliant according to the rules at
http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html
Fixes part 3 of #232, hat tip @DonFreed.

- - - - -
3fcf7c90 by Petr Danecek at 2015-07-17T12:05:26Z
vcf: Case-insensitive detection of the variant type

htslib side changes to make variant type detection case
insensitive. See samtools/bcftools#285

- - - - -
897a34f9 by Shane McCarthy at 2015-07-17T12:11:54Z
Merge pull request #244 from samtools/feature/vcf_type_case

vcf: Case-insensitive detection of the variant type
- - - - -
0d947909 by Rob Davies at 2015-07-20T16:58:37Z
Make failure of bam_hdr_init in bam_hdr_read print an error if verbose.

- - - - -
ac4f8296 by James Bonfield at 2015-07-21T15:09:27Z
Merge branch 'afl_2' of github.com:daviesrob/htslib into develop

- - - - -
26d2504c by James Bonfield at 2015-07-21T16:10:59Z
Further improved Rob's PR (#215) with error checking on reads.

I haven't added checks for BCF reading as it doesn't appear possible
for the library to generate any (another issue entirely).

- - - - -
f1875405 by John Marshall at 2015-07-23T12:14:55Z
Remove HTS_RESULT_USED from bam_hdr_read() and sam_hdr_read()

This annotation is pointless as the result is always *used*; the desired
check would be __attribute__((warn_result_not_checked_as_nonnull)), but
of course such an annotation doesn't exist.

This reverts commit 5f5a7aea1c329dee17a5b64c5f627207c20f4ed9.

- - - - -
061c8599 by John Marshall at 2015-07-27T11:19:13Z
Check bam_hdr_read() mallocs without warnings

Checking a uint32_t against SIZE_MAX causes tautological comparison
warnings on platforms where sizeof(size_t) is more than 32 bits.

Instead check whether l_text+1 overflowed directly, and for the
multiplications revert to using calloc(), which (is specified to)
do this check itself.

- - - - -
90cf5c4b by John Marshall at 2015-07-27T12:26:49Z
Remerge daviesrob/afl_2 (PR #215) with several fixes

- - - - -
1af175ea by John Marshall at 2015-07-27T12:28:46Z
Add kgetline() to kstring.c/.h

Similar to BSD's getline() but omits the \n terminator and manages the
memory as a kstring.  Call with "(kgets_func *) fgets" to read from stdio,
or implement an fgets()-style function to read from other streams, e.g.,
a wrapper around gzgets() that reorders its parameters as per fgets().

[Cherry-picked attractivechaos/klib at cbcfcabc8f9b7f0ca14c6dce2ee5644e4e80e5e4.]

- - - - -
d4dbf8d8 by Andreas Tille at 2015-07-28T10:06:57Z
Add README.source to explain the workflow of this Git repository

- - - - -
04d85d9d by Andreas Tille at 2015-07-28T10:13:17Z
In this repository the 'v' in front of the version number needs to be left out ...

- - - - -
c3137257 by James Bonfield at 2015-07-28T14:03:45Z
Fixed the ordering of enum so that the old CRAM_OPT enums stay
constant.

Also added a #define for compatibility of renaming cram_option.

- - - - -
eca68e8e by John Marshall at 2015-07-28T14:16:44Z
Revert .gitignore additions

Whether to ignore .rej/.orig/backup~ is up to each developer's taste.
Those who wish to ignore them can do so via .git/info/exclude or
core.excludesfile.

This reverts commit 4d707943c77585ecf1547f87747e19c5925cd857.

- - - - -
a4aa7b3f by John Marshall at 2015-07-28T14:39:46Z
Merge hts_fmt_option, hts_open_format(), et al (PR #233)

- - - - -
a902f295 by John Marshall at 2015-07-28T15:29:25Z
Update dependencies and *.mk variables

- - - - -
23c8f2e8 by John Marshall at 2015-07-28T15:45:22Z
Change to MIT/Expat license for this public header

As agreed with James Bonfield and GRL.  The licenses are equivalent,
this just maintains the convention for public headers in htslib/.

- - - - -
3edc3a30 by John Marshall at 2015-07-28T16:02:36Z
Merge initial CRAM public API (PR #235)

- - - - -
e399970f by James Bonfield at 2015-07-29T08:03:49Z
Added support for samtools cat, along with some CRAM API improvements.

- Sanitised bye_array_len structures so the encoder and decoder both
  use the same terminology (value vs val).

- Added cram_codec_decoder2encoder function.  This is a bit messy, but
  it is needed because cram_decoder.c fills out different internal
  structure elements than cram_encoder.c (for efficiencies sake).  So
  if we're reading from one fd and writing to another fd we have to
  fix up a few internals.

- Similarly (re-encoding a decoded struct) improved hdr->AP_delta vs
  container->pos_sorted and avoid a memory leak of the preservation
  map.

- Fixed typo in reheader branch for cram_minor_vers.

- More external CRAM APIs: detecting empty containers, chaning RG
  values in compression headers, copying slices, and various block
  handling functions.

- - - - -
8872c109 by John Marshall at 2015-08-04T12:51:09Z
Merge bcf_hdr_merge() addition (PR #238)

Fix dependencies.

- - - - -
4c82a744 by Shane McCarthy at 2015-08-05T08:12:14Z
tabix: fixes --only-header option so that it now works!

Fixes #249

[NEWS]
* tabix --only-header works again (was broken in 1.2.x; #249)

- - - - -
f7904553 by John Marshall at 2015-08-07T13:20:02Z
Constify hts_open_format() parameter and don't parse filename

Don't override mode based on the filename extension, as this level of
intelligence is not desirable by default in a low-level open() function.

Constify fmt parameter, as it is naturally input-only (if the detailed
format as opened is desired, it should be queried via hts_get_format()).

(Note that hts_open_format() documentation needs updating.)

- - - - -
4494fa66 by Rob Davies at 2015-08-10T16:50:53Z
Change rANS O1 codecs to allocate large arrays on the heap.

rans_compress_O1 needed about 1.25 Mb of stack for a couple of 256x256
arrays.  This caused problems on MacOS X when running in threaded mode
due to running out of per-thread stack.  The default stack size for
threads on MacOS X is 512 K.  For details, see:
https://developer.apple.com/library/mac/qa/qa1419/_index.html

This changes rans_compress_O1 and rans_uncompress_O1 to use malloc
and calloc to allocate the arrays instead of using the stack.

- - - - -
778653c5 by John Marshall at 2015-08-17T12:13:43Z
Merge CRAM public API additions for samtools cat (PR #237)

- - - - -
23ba31e6 by John Marshall at 2015-08-17T16:18:16Z
Remove debug printf [minor]

- - - - -
d7541514 by James Bonfield at 2015-08-18T16:40:24Z
Bug fix for CRAM index querying when the query sequence has zero
entries in the index.  It was accessing element -1 in a NULL array.

Also fixed the code so that searching on ref id -1 (unmapped data)
works.

Fixes samtools#454

- - - - -
a17f23c7 by John Marshall at 2015-08-19T09:06:35Z
Update Makefile dependencies [minor]

In particular, add cram/cram_external.c header prerequisites
and make it #include config.h.

- - - - -
a72be23f by James Bonfield at 2015-08-19T13:20:28Z
Force safe_itf8_get to zero *val_p when failing, to simplify error
handling elsewhere. (Eg in cram_transcode_rg)

- - - - -
47da3929 by James Bonfield at 2015-08-19T13:51:53Z
Fixes https://github.com/samtools/samtools/issues/455

Added an is_md5 flag to the refs_t structure.  This is used to
indicate where a reference filename has been identified via the M5
tag.

Previously for newly observed refs it looked for M5 and then rolled on
fasta loading, but for previously used refs (load ref 1, then 2, then
1 again) it had the r->length field as non-zero so fell passed the
MD5 opening code and straight onto the fasta loading code, causing the
observed error.

This commit also fixes an inefficiency in loading of MD5 seq files if
they are found via REF_PATH but not via REF_CACHE.  For local files we
no longer bother to use open_path_mfile to load it entirely into
memory.

- - - - -
8eba63af by James Bonfield at 2015-08-19T16:40:49Z
Fixed bug with non-reference base CRAMs and indexing.

When using --output-fmt cram,no_ref the reference sequence is not
loaded into memory, causing the "ref_end" container field to be
unset.  This in turn meant that the code that clipped the container
header start/end fields (pos+span) to the actual reference length, to
work around buggy aligners that positiong data off the end of the
reference, was itself buggy.

- - - - -
42734f62 by James Bonfield at 2015-08-20T10:54:35Z
Improved the efficiency of reference loading again.

Commit 47da392939b43c9761454278de5f457119c94297 included a change to
use bgzf_open and read a local file rather than relying on
open_path_mfile if it was found via REF_PATH as a local on-disk file.

It turns out this is only beneficial if we don't have HAVE_MMAP
defined, which by default isn't on in the Makefile (but likely will be
if using ./configure).  With mmap the mfopen becomes a trivial
operation.

- - - - -
c50bfe05 by James Bonfield at 2015-08-20T11:31:16Z
Further improvement to index querying.

Now instead of returning an index entry close to the one asked for,
when querying a reference that has no data aligned against it, we
instead return a different error code (-2) and let this get treated in
the same manner as HTS_IDX_NONE.  Far cleaner and likely more
efficient.

- - - - -
6d2810cd by James Bonfield at 2015-08-24T11:41:51Z
Reduced memory for CRAM decoding.

This is particularly important during merging or multi-file mpileup.
The changes involve reducing the block size in pooled_alloc (saves
SAM header memory), freeing loaded disk-blocks immediately after
decoding rather than when freeing the slice, and reducing the memory
usage of the decoded arrays to the exact size needed rather than an
auto-growing array (which at worst case is 50% larger).

It also attempts to automatically predict the size of the sequence,
quality and read-name memory blocks instead of auto-growing.  This is
both a performance improvement and potentially a reduction in heap
fragmentation.

- - - - -
5b0bb75c by James Bonfield at 2015-08-24T16:34:45Z
Added checks to return value of hts_opt_apply.

Also made cram_set_voption set errno using a couple appropriate POSIX
error codes, to avoid confusing "Success" error messages.

Fixes #452

- - - - -
82203a27 by John Marshall at 2015-08-24T16:38:57Z
Fix whitespace [trivial]

- - - - -
306664a7 by John Marshall at 2015-08-25T09:15:23Z
Allow initial whitespace in FASTA ">" headers

Fixes samtools/samtools#449.
Also ensure an empty name works; fixes #258, hat tip @mtmorgan.

Add test/faidx.fa test cases, with unnamed sequence, extra whitespace,
and tests for previously-fixed blank line-related bugs fixed in
1980e5859251741f7905e754c07112a0fe3ec3e5.

Fix memory leaks introduced by 642783e2aa8a0697fb7352dde6f359cd74593437.

[NEWS]
* fai_build() and samtools faidx now accept initial whitespace in ">"
  headers (e.g., "> chr1 description" is taken to refer to "chr1")

- - - - -
e7e2b3dd by John Marshall at 2015-08-27T21:51:53Z
Add "all-htslib" target for third-party code

- - - - -
b82f5efc by John Marshall at 2015-08-29T09:08:52Z
Add sam_parse1() "empty query name" warning

- - - - -
f859e8d2 by John Marshall at 2015-09-02T16:00:13Z
Add hts_parse_decimal() flags parameter and HTS_PARSE_THOUSANDS_SEP

hts_parse_decimal() has not yet appeared in an HTSlib release so we
can still change its signature.  We may in future add other parser
flags and/or an hts_parse_region() with a flags parameter alongside
the existing hts_parse_reg().

Use HTS_PARSE_THOUSANDS_SEP in hts_parse_reg() (for historical reasons)
but not in regidx.c/synced_bcf_reader.c which are used in a list argument
like "REGION,REGION,REGION".  Fixes samtools/bcftools#309.

- - - - -
81b173e6 by John Marshall at 2015-09-04T10:09:43Z
hts_idx_destroy() now actively frees CRAM indices

CRAM indices are stored within the cram_fd (see 655edabb50d6da6a6be1e474e371ab325be19461)
so previously hts_idx_destroy() just destroyed the fake hts_cram_idx_t and
left the real index to be freed when the cram_fd was eventually closed.

But cram_index_free() exists, so hts_idx_destroy() should be calling it.

- - - - -
0004f1a1 by James Bonfield at 2015-09-10T11:23:59Z
Fixes bug that appeared with commit ca1bcefe.

The commit fails to check whether we have requested the AP data series
to be decoded, so causes a failure in samtools flagstat.  However this
failure then triggers another bug when running in multithreaded mode;
it attempts to flush any remaining blocks during close, even when
decoding produced an error.  This crashes due to an incomplete
container.

Fixes #265

- - - - -
eb4dcaef by John Marshall at 2015-09-10T16:32:13Z
Increment line number

- - - - -
4c66fe01 by Cristina Yenyxe Gonzalez Garcia at 2015-09-11T13:30:44Z
Function bcf_hdr_parse_line now ignores trailing spaces. #248

- - - - -
2c89e84e by John Marshall at 2015-09-14T10:19:20Z
Check bam_header_to_cram() return value

In general, cram_set_header() can accept NULL to remove a header;
but within sam_hdr_write() we expect bam_header_to_cram() to succeed.

- - - - -
d2c31ce5 by John Marshall at 2015-09-14T12:34:25Z
Add NUL-padded BAM header CRAM conversion test cases

Most tools write BAM l_text/text headers without any NUL padding, but the
specification allows for any number of extra NULs.  The CRAM conversion
code is not currently expecting this, as these test cases demonstrate.

- - - - -
04997f6d by John Marshall at 2015-09-14T16:04:17Z
Allow NUL-termination in sam_hdr_add_lines()

Most tools write BAM l_text/text headers without any NUL padding, but the
specification allows for any number of extra NULs.  Allow for such padding
the CRAM conversion code.

Merge test cases.

- - - - -
d71004b7 by John Marshall at 2015-09-14T17:02:42Z
Merge VCF header trailing whitespace fix (PR #267)

Use htsfile -c rather than bespoke test program.

- - - - -
2b486ed9 by Adrian Tan at 2015-09-14T18:45:25Z
changed bcf_id2int to bcf_hdr_id2int.

- - - - -
8197cfdf by John Marshall at 2015-09-14T19:52:02Z
Update to reflect bcf_hdr_id2int() renaming (PR #269)

The previous bcf_id2int() was renamed in bc86c107dd89d4909cfe8548774ded5542551559.

- - - - -
a017f927 by Petr Danecek at 2015-09-25T13:30:39Z
vcf: Symbolic allele <*> is not a variant

- - - - -
852650c5 by Petr Danecek at 2015-09-25T14:00:34Z
bcf_gt_type: Count "./1" genotypes as missing

- - - - -
0697d01c by Petr Danecek at 2015-09-25T14:01:49Z
Fix in vcf filtering:

filters which were not defined in the header were misinterpreted as "."
by the synced reader.

- - - - -
5c6b71bf by Petr Danecek at 2015-09-25T14:03:15Z
vcf: Prevent segfault when querying freshly removed tags

- - - - -
66383089 by Petr Danecek at 2015-09-25T14:09:34Z
Update bcf_translate() accordingly to reflect the bcf_hdr_merge commit 2bb9370f5a24938d8a2dc56f404e584661bf413f

- - - - -
53ca19d8 by Petr Danecek at 2015-09-25T14:19:10Z
vcf: update rlen when when adding/removing END tag

* update reference length from END tag, no need to set rlen manually anymore
* unset rlen when removing the END tag

- - - - -
57d8aba2 by Petr Danecek at 2015-09-25T14:26:12Z
Allow a legitimate use case in bcf1_sync:

INFO can be dirty while ID is not unpacked, consider bcf_write followed
by bcf_update_info.

- - - - -
297ac307 by Petr Danecek at 2015-09-25T14:42:23Z
add bcf_sr_swap_line macro

* free fname after synced reader is done
* do not crash if the buffered lines buffered lines have been swapped (as vcfmerge may do)

- - - - -
1e2ed489 by Isaac Turner at 2015-09-28T09:28:00Z
Fixed typo in vcf.h documentation
- - - - -
d4482eb5 by John Marshall at 2015-09-29T13:21:59Z
Define bcf_empty() to match its declaration in vcf.h

Previously we declared bcf_empty() but defined bcf_empty1(), so the
function in the public header was unusable (and there was no public
declaration for the function that did exist).

Correct this by adding a bcf_empty1() alias macro for consistency with
similar functions, and defining bcf_empty() to match the API function
declaration.  Deobfuscate bcf_init() and bcf_destroy() definitions,
which previously appeared to be defining bcf_init1()/bcf_destroy1()
but were not as the alias macros act on these definitions too.

This constitutes adding bcf_empty() to the API.  There is no ABI
breakage due to removing the bcf_empty1() symbol from the library,
as bcf_empty1() was not declared in the public header files so is
not part of the API.

- - - - -
8f3b53b4 by Petr Danecek at 2015-09-30T16:44:35Z
Fix handling of BCF string dictionaries

Gaps in IDX blocks were causing incorrect tags on output.

The allocated size of the dictionary block in use is added to the
bcf_hdr_t typedef.  It has been added to the end of the struct to
preserve the structure layout, and a note added recommending allocation
via bcf_hdr_init().

Fixes samtools/bcftools#317.

- - - - -
628eb5bc by David K Jackson at 2015-10-01T09:01:37Z
Always link with -pthread [minor]

Regularise linking options (except for OS X-specific dylibs etc, where
pthreads are in libc); even test/hfile needs it when built with iRODS 3.3,
whose rcConnect() can be configured to use pthreads.
(cherry picked from commit 5c51f8401462a507a090978a529dc94e08280bf6)

- - - - -
d2ed7e6e by John Marshall at 2015-10-01T10:38:36Z
Fix incorrect asserts

Fortunately both of these inadvertent assignments were either
to dead variables or immediately overwritten.

- - - - -
66fbb28b by James Bonfield at 2015-10-05T11:12:25Z
Fixed a bug with CRAM compression level >= 6.

We could attempt to compress the CORE block twice, yielding corrupted
data during decode.  Note, this was not possible to trigger from
samtools, but could be triggered from test/test_view -l 6 or from
coding directly against htslib API.

- - - - -
9f6fa0fc by John Marshall at 2015-10-06T13:46:17Z
Add 'x' (O_EXCL) and 'e' (O_CLOEXEC) open mode letters

Also document them for hopen() and hts_open().  (Write the latter's
regexp with excess spaces (to avoid */ terminating the comment) rather
than with [letters]+ because zero extra letters is to be accepted.)
Fixes #245.

- - - - -
1cdbbcd0 by John Marshall at 2015-10-07T13:59:43Z
"make tags" should always remake the tag file

- - - - -
91d48726 by John Marshall at 2015-10-07T16:42:37Z
Add bit set data structure

See upstream PR attractivechaos/klib#59.

- - - - -
4c35976a by Shane McCarthy at 2015-10-09T13:16:21Z
bcf_write: sync dirty header before writing

* const has been removed
* Fixes samtools/bcftools#332

- - - - -
1a7fdc53 by James Bonfield at 2015-10-12T10:05:54Z
Fixed sam_hdr_add_PG to avoid va_copy requirement.

Also documented sam_hdr_vadd better to indicate this.
Fixes samtools/samtools#477

- - - - -
34c3e59f by James Bonfield at 2015-10-12T14:12:30Z
Fixed sam_hdr_add_PG to avoid va_copy requirement.

Also documented sam_hdr_vadd better to indicate this.
Fixes samtools/samtools#477

- - - - -
636d7043 by Shane McCarthy at 2015-10-12T14:12:31Z
vcfutils: add bcf_remove_alleles_set

* new bcf_remove_alleles_set function to the new kbitset.h api to indicated alleles
  to be removed. This avoids overflowing the old integer bitmask in bcf_remove_alleles
  when there are more than 31 alleles
* bcf_remove_alleles updated to use the new bcf_remove_alleles_set
* Fixes samtools/bcftools#319

- - - - -
c700fb16 by James Bonfield at 2015-10-12T14:17:17Z
Merge pull request #287 from samtools/feature/fix.332

bcf_write: sync dirty header before writing
- - - - -
1e5c3777 by Shane McCarthy at 2015-10-12T15:45:23Z
Merge pull request #285 from samtools/feature/fix.319

vcfutils: add bcf_remove_allele_set
- - - - -
f91da3c6 by John Marshall at 2015-10-20T20:44:47Z
Remove obsolete -i/--file-info option from tabix man page

This option was removed in b2cfe4e8e5faebdebd2891cdd2445bacabd9f891,
as it duplicates htsfile functionality.

Fixes (the main part of) #291.

- - - - -
fa07527e by John Marshall at 2015-10-21T10:05:31Z
Add generic plugin infrastructure

PLUGIN_PATH is set in the makefile (rather than config.h) as it likely
contains installation directory variables, which need to be expanded
and ought to be alterable at "make" time.

Moved "ifeq $(PLATFORM)" etc to below "include config.mk" so that it
can act on the config.mk-modified $(PLUGIN_OBJS).

This way of adding -rdynamic is something of a hack, but it seems
desirable to do this alongside -ldl.  (TODO) The eventual plugins
themselves don't want to be linked with -rdynamic -ldl.

Document new configure options (including --enable-libcurl already added).

- - - - -
df3fcad8 by Petr Danecek at 2015-10-22T08:21:43Z
Prevent segfault on empty data files

This resolves https://github.com/samtools/bcftools/issues/339

- - - - -
0e2bc689 by Shane McCarthy at 2015-10-22T08:53:21Z
fix type in bcf_remove_allele_set doc (minor)

- - - - -
823274d3 by John Marshall at 2015-10-28T14:17:33Z
Make plugin handling API slightly less generic

For HTSlib, we will have the convention that each category of plugin has
a well-known symbol that the plugin must provide, and load_plugin() both
opens the plugin and finds that symbol.  This simplifies plugin-loading
code for the usual case of having one extern symbol, and plugin_sym() is
available for any subsequent extern symbols.

We consider failure in plugin loading to be a case of missing optional
functionality, so also put the dynamic loader error reporting here at a
verbose log level (hts_verbose >= 4).

- - - - -
07d3be74 by John Marshall at 2015-10-28T17:16:32Z
Add hFILE plugin interface and pluginise irods/libcurl backends

Define an hFILE plugin interface based on plugins supplying a function
hfile_plugin_init(), which calls HTSlib's hfile_add_scheme_handler()
to register an interest in handling URLs based on their URL scheme.

On the first request for an URL-like filename, $HTS_PATH or the built-in
path is searched for files matching hfile_*.{so,bundle} (as appropriate
for the platform), and all are loaded.

Put -lcrypto in @CRYPTO_LIBS@ rather than @LIBS@ so that it appears only
where -lcurl appears; don't check for it on OS X, as where we have CCHmac()
we don't need any additional libraries at all.

Convert hfile_irods.c to register a destroy() function rather than use
atexit().  (TODO) Both hfile_net.c and hfile_libcurl.c should also be
converted, but may require further rejigging of their global initialisation.

(TODO) Add -rdynamic and -ldl as appropriate to htslib.pc.
(TODO) Add some way to enumerate the active plugins, beyond the logging
produced when hts_verbose >= 5.

[NEWS]

* HTSlib can be built to use remote access hFILE backends (such as iRODS
  and libcurl) via a plugin mechanism.  This allows other backends to be
  easily added and facilitates building tools that use HTSlib, as they
  don't need to be linked with the backends' various required libraries.

- - - - -
fe1f08a3 by John Marshall at 2015-10-29T09:54:17Z
Merge libcurl backend and hFILE plugin infrastructure

[NEWS]
* Files can now be accessed via HTTPS and Amazon S3 in addition to HTTP
  and FTP, when HTSlib is configured to use libcurl for network file access
  rather than the included basic knetfile networking.

- - - - -
ccc2cac5 by John Marshall at 2015-10-29T13:39:42Z
Make hFILE_backend::flush() optional

Only call it (in hflush()) if it is non-NULL, and remove pointless
no-op backend flush() methods.

For iRODS, rcDataObjFsync() doesn't work on 3.x and doesn't exist on
4.x (irods/irods#2499), so remove irods_flush() entirely.  Fixes #168.

- - - - -
44d6cfc0 by Isaac Turner at 2015-10-29T14:14:25Z
vcf.h macros should wrap args in brackets

Wrap macro arguments in brackets to avoid bugs in e.g.: bcf_gt_phased(x == 2)

- - - - -
c3881750 by John Marshall at 2015-10-29T14:46:14Z
[Makefile] Rename LDLIBS to LIBS

Rename LDLIBS to LIBS; it seems that LDLIBS doesn't particularly
exist outside GNU Make's built-in rules.  Instead LIBS is configure's
conventional variable for the purpose, so is the right thing to use
for a user-visible variable.

(See also 889ee0114656bdfad43c8447b63d896f7a6f2328 and #183.)

[NEWS]
* HTSlib's configure script and Makefile now fully support the standard
  convention of allowing CC/CPPFLAGS/CFLAGS/LDFLAGS/LIBS to be overridden
  as needed.  Previously the Makefile listened to $(LDLIBS) instead; if you
  were overriding that, you should now override LIBS rather than LDLIBS.

- - - - -
2e4226f1 by John Marshall at 2015-10-29T17:20:02Z
[libcurl] Set User-Agent header and optional verbosity

Always add a "User-Agent: htslib/<version>" header.  Fixes #295.

When hts_verbose is 8 or more, set the voluminous CURLOPT_VERBOSE.

- - - - -
7e4d2b11 by John Marshall at 2015-10-30T15:09:47Z
Fix plugin installation rules

With plugins disabled, 'test -n ""' is false but we need to ensure that
the exit status of the entire recipe line is success.  D'oh.  Fixes #296.

It might be nicer to have new installdirs-plugins/install-plugins rules
that are activated by config.mk, but that too would need to handle the
case of --enable-plugins with no plugins actually being built -- which
would make it not so nice.

- - - - -
360b67c3 by John Marshall at 2015-11-02T13:41:35Z
Override iRODS 3.x/4.0.x's misguided SIGPIPE handler

Prior to iRODS 4.1, rcConnect() (even if it fails) installs its own
SIGPIPE handler, which just prints a message and otherwise ignores the
signal.  Most actual SIGPIPEs encountered will pertain to e.g. stdout
rather than iRODS's connection, so we save and restore the existing state.

Fixes samtools/samtools#350; see also irods/irods#1970 which removed the
installation of iRODS's SIGPIPE handler, from iRODS 4.1.0 onwards.

- - - - -
00080ba2 by John Marshall at 2015-11-05T16:00:22Z
[iRODS] Set User-Agent and verbosity; basic iRODS 4.x support

* Set a User-Agent for display by ips
* Propagate hts_verbose (>= 5) to rodsLogLevel()
* Decode some more iRODS status codes
* Source code changes for iRODS 4.1+: add header no longer included by
  other headers; work around the change in clientLogin()'s arguments

- - - - -
fa6ed9ac by Petr Danecek at 2015-11-06T10:18:44Z
synced_bcf_reader: _regions_match_alleles exit with an error when payload is not available

Prevent errors such as reported in samtools/bcftools#346 and samtools/bcftools#239

Closes samtools/bcftools#346 and closes samtools/bcftools#239

- - - - -
20725273 by John Marshall at 2015-11-19T12:00:33Z
Add system includes and extern "C" to public cram.h header

Ensure that "#include <htslib/cram.h>" works even when it is not
preceded by any other #includes; needs various system headers,
and hts.h for enum hts_fmt_option.  Hat tip Ivo Palli.

- - - - -
a2656aa0 by John Marshall at 2015-11-23T10:27:30Z
Add htsfile -v option to increase hts_verbose

Adds a simple way to see e.g. plugin debug logging.

Remove the previous "hts_verbose = 2" -- the default htslib log level
is already 3.

- - - - -
d9e9e3a9 by John Marshall at 2015-11-26T16:03:20Z
Fix error return value typos and clamp zfclose() return

Ensure that cram_index_build() returns exactly -1 when zfclose() fails.

- - - - -
6462e349 by John Marshall at 2015-11-26T16:09:39Z
Distinguish sam_index_build() file opening and can't-index failures

- - - - -
e9fcc573 by John Marshall at 2015-11-30T10:36:47Z
Support S3 config files and temporary credentials

Extend the URL authority part parsing to "[ID:SECRET:TOKEN@]BUCKET".
The latter fields are optional, and if only a username is provided,
interpret it as the profile to look for in the configuration files.

Check the conventional AWS environment variables for session token,
profile, and credentials config file location.

Pick up credential settings from ~/.aws/credentials (standard shared AWS
credentials file), ~/.s3cfg (as per s3tools), and ~/.awssecret (as per
Tim Kay's aws and Heng Li's kurl.c), in that order.

Check for $AWS_SESSION_TOKEN etc and propagate any temporary credentials
found to X-Amz-Security-Token header.  Hat tip @DonFreed.

- - - - -
a3567461 by John Marshall at 2015-11-30T17:08:12Z
Register a plugin destroy() method rather than use atexit()

Move curl_global_init() etc to the plugin init routine.  This only
allocates a little memory -- no sockets or connections are started.

Add libcurl version to the User-Agent string.  Remove TODO re HTTP being
disabled; in this unlikely case, we'll translate s3[+*]: to an http[s]: URL
and return CURLE_UNSUPPORTED_PROTOCOL => EINVAL, which suffices.

- - - - -
b8204c12 by John Marshall at 2015-12-02T17:31:32Z
Eliminate bgzf_fdopen() from tabix and bgzip

Use bgzf_open("-", "[rw]") as appropriate instead.

Refactor write_open() into confirm_overwrite(), as one invocation now
uses bgzf_open() rather than open() and locallising the error handling
at the two call sites improves the error messages.

- - - - -
82bdc9f0 by Isaac Turner at 2015-12-07T17:49:21Z
Make bgzf_compress() a public API function

Change function parameter types to size_t as elsewhere in bgzf.h, and
const void *src as it really is and as next_in is in zlib 1.2.6+.

- - - - -
bbd33342 by John Marshall at 2015-12-07T18:53:23Z
Restore open(O_CREAT) mode arguments [minor]

Fix oops from b8204c12e1a474cf4b3ed845e0323f71a12c0cda.

- - - - -
cb6642a9 by John Marshall at 2015-12-09T11:40:47Z
Allow for n_targets==0 in bam_hdr_read() malloc checking

The malloc()/etc functions might return NULL when asked to allocate
zero bytes.  The clearest way to avoid the collision between this and
NULL-meaning-ENOMEM is to be explicit about n_targets==0 implying that
target_name and target_len will be NULL.

- - - - -
5fb0d2db by Andreas Tille at 2015-12-09T15:43:31Z
Define PATH_MAX

- - - - -
57b6def3 by Andreas Tille at 2015-12-09T15:54:31Z
More verbose description vor -dev and -test package

- - - - -
c1eaca0f by Andreas Tille at 2015-12-09T15:59:51Z
Fix lintian override

- - - - -
eb12aa09 by Andreas Tille at 2015-12-09T16:00:39Z
Verified Mayhem bug report which is solved in this version

- - - - -
e6d7e9d1 by Andreas Tille at 2015-12-09T16:00:51Z
Upload to unstable

- - - - -
d341aaf8 by Rob Davies at 2015-12-10T11:18:06Z
More fuzz-detected BAM file input checks

(Squashed from PR #217, and partly tweaked by the committer.)

* Add missing test for running out of data to sam_format1.

Makes the code to handle B-type tags check that it has at least 4 bytes
available before reading the number of items.

* Ensure c->l_qname >= 1 in bam_read1.

qname must have at least a NUL in it, so the minimum possible length is 1.
Prevents a possible wrap-around in sam_format1 which uses the value
c->l_qname-1.

* Fix test for enough bytes left when converting B-type tags in sam_format1

The check wasn't strict enough as it did not take the size of the array
elements into account.

The test for enough bytes in sam_format1 has been arranged so that it
doesn't try to work out n * sub_type_size.  This avoids the possibility
of overflowing an int32_t.

- - - - -
fe8b2100 by John Marshall at 2015-12-11T10:34:58Z
Set binary mode on stdin/stdout and improve Windows/MinGW compilation

Implement the longstanding TODO re setting Windows binary I/O mode when
operating on standard input and standard output.

Define fallbacks for undefined errno values.  Use Windows <winsock2.h>
instead of POSIX networking headers (non-MinGW may also need to #include
<io.h> explicitly).  Just avoid struct stat::st_blksize for now (we may
want to implement a native Windows equivalent in future).

- - - - -
2b2239a8 by Jason Piper at 2015-12-11T18:24:45Z
Enable multithreaded BGZF compression in bgzip command

(Squashed from PR #272.)

- - - - -
449b20af by John Marshall at 2015-12-15T13:11:21Z
Add "plugins-htslib" target for third-party code

- - - - -
b7256675 by John Marshall at 2015-12-15T13:35:48Z
Configure should fail when HMAC() cannot be found

- - - - -
0f6c1852 by John Marshall at 2015-12-15T14:09:38Z
Explicitly define HAVE_HMAC

...decoupling it from -lcrypto and HAVE_LIBCRYPTO (which is no longer
defined with the move to AC_SEARCH_LIBS).

- - - - -
87141ea6 by John Marshall at 2015-12-15T16:34:33Z
Release 1.3: plugins, libcurl, CRAM v3.0, many bug fixes

- - - - -
c72ae908 by John Marshall at 2015-12-15T23:11:23Z
Merge version number bump and NEWS file from master

- - - - -
68e6f6c1 by Charles Plessy at 2015-12-24T04:14:35Z
Merge tag '1.3' into debian/unstable

HTSlib release 1.3, plugins, libcurl, CRAM v3.0, bug fixes

* Files can now be accessed via HTTPS and Amazon S3 in addition to HTTP
  and FTP, when HTSlib is configured to use libcurl for network file access
  rather than the included basic knetfile networking.

* HTSlib can be built to use remote access hFILE backends (such as iRODS
  and libcurl) via a plugin mechanism.  This allows other backends to be
  easily added and facilitates building tools that use HTSlib, as they
  don't need to be linked with the backends' various required libraries.

* fai_build() and samtools faidx now accept initial whitespace in ">"
  headers (e.g., "> chr1 description" is taken to refer to "chr1").

* tabix --only-header works again (was broken in 1.2.x; #249).

* HTSlib's configure script and Makefile now fully support the standard
  convention of allowing CC/CPPFLAGS/CFLAGS/LDFLAGS/LIBS to be overridden
  as needed.  Previously the Makefile listened to $(LDLIBS) instead; if you
  were overriding that, you should now override LIBS rather than LDLIBS.

* Fixed bugs #168, #172, #176, #197, #206, #225, #245, #265, #295, and #296.

- - - - -
b333524c by Charles Plessy at 2015-12-24T04:17:51Z
New upstream release; corrected path to a copyright statement.

- - - - -
d6a8f091 by Charles Plessy at 2015-12-24T04:20:10Z
Upstream does not prefix tags with 'v'.

- - - - -
ae4c86a1 by John Marshall at 2015-12-31T02:22:40Z
Avoid POSIX Issue 7-specific errno value [minor]

(FreeBSD and derivatives apparently don't have ENOTRECOVERABLE.)
Bugs in our callback routines are a "can't happen" scenario anyway,
so falling back to the default EIO suffices.

- - - - -
0382edf8 by John Marshall at 2016-01-02T20:31:21Z
Use getopt.h's descriptive constants in long options array

- - - - -
70bfd530 by John Marshall at 2016-01-02T20:38:57Z
Happy New Year

Add --version options to bgzip and tabix.

- - - - -
df4a80e9 by John Marshall at 2016-01-04T10:55:13Z
Free all in-flight pileups in bam_plp_destroy()

Fixes #299.

Write bam_plp_next()'s node-deleting loop more clearly, using a pointer
to the pointer-that-points-to-the-node rather than an obscurely-used
dummy node.  Write list-is-empty tests more clearly.  Lift the call to
overlap_remove() out of the loop in bam_plp_reset(), as it is idempotent.

- - - - -
ab9e80d4 by James Bonfield at 2016-01-14T12:20:10Z
Fixed a CRAM encoding assertion failure on repeated templates.

If we have a complex template where more than 2 reads exist and we
have valid pnext/tlen fields between each successive pairs, the code
was erroneously removing the statistics for the tlen(etc) values
multiple times instead of just once.  Test data included.

- - - - -
a6e48836 by John Marshall at 2016-01-20T13:50:17Z
Ignore .exe and .dSYM extensions

We remove *.dSYM in "make clean", so we should ignore it ourselves too.

- - - - -
4c0c448f by John Marshall at 2016-01-22T15:16:56Z
Clarify misleading code formatting [minor]

Prevents GCC 6 -Wmisleading-indentation warning.

- - - - -
92654ef7 by James Bonfield at 2016-01-26T14:21:58Z
Fixed crash in CRAM generation with cigar ops > 2^27 in size.

- - - - -
c789bed7 by James Bonfield at 2016-01-27T11:19:55Z
Force detached read-pairs if they span references.

Previously it was possible for a read-pair aligned to different
references to still be claimed as an attached-pair if they were in the
same slice and their 5' to 3' size was as listed in their (broken?)
tlen fields.

- - - - -
f1327a70 by John Marshall at 2016-01-27T14:00:45Z
Add AC_SYS_LARGEFILE test so 32-bit systems can access 2GiB+ files

Defines _FILE_OFFSET_BITS in config.h as appropriate.  Hat tip @trifud.

Now that config.h has ODR implications, add #include <config.h> to
absolutely all *.c source files (cf 5f5aa02bfeb3857e8671f12a85787404ef9fa799).
Most k*.c files differ from their klib upstream versions due to #include
path-related trivia anyway, and knetfile.c in particular needs to see the
newly-added definitions.

- - - - -
72cb247f by Shane McCarthy at 2016-01-27T17:16:37Z
allow spaces in vcf header lines

* allow spaces between keys and values when parsing in header lines
* these spaces will be dropped when writing out the header

e.g. `##reference=<ID=hs37d5 , Source=blah>` and `##reference=<ID=hs37d5, Source = blah >`
     will become `##reference=<ID=hs37d5,Source=blah>`

Fixes samtools/bcftools#266

- - - - -
3fac82bc by Artem Tarasov at 2016-01-27T17:49:58Z
remove code duplication from thread_pool.c

- - - - -
406c7d0a by James Bonfield at 2016-01-27T17:54:38Z
Squash a duplicated conditional into a single one.

It was already done this way in the original t_pool_dispatch()
function, but apparently not in the non-blocking variant.

- - - - -
1d38e31f by Sascha Steinbiss at 2016-01-30T12:18:42Z
use secure Vcs-Git

- - - - -
6cd8493c by Sascha Steinbiss at 2016-01-30T12:18:43Z
update symbols file

- - - - -
9eb53943 by Sascha Steinbiss at 2016-01-30T12:18:43Z
ignore metadata different between tarball and git repo

- - - - -
3e3d9a2c by Sascha Steinbiss at 2016-01-30T12:18:43Z
update d/changelog

- - - - -
6c3a0f56 by Sascha Steinbiss at 2016-01-31T11:36:50Z
set release distribution to unstable

- - - - -
cc07c669 by John Marshall at 2016-02-01T13:21:38Z
Silence faidx_fetch_nseq() deprecation warning in test suite

To test ABI compatibility, test/sam.c calls faidx_fetch_nseq() to check
that it still exists.  The deprecation warning is just noise here.

- - - - -
2805ae80 by John Marshall at 2016-02-02T09:46:44Z
Fix compilation with ancient libcurl (7.18 to 7.21.x)

CURLE_NOT_BUILT_IN was introduced in 7.21.5; hat tip @AndreasHeger.
In theory we should test for this with configure, but libcurl maintains
good records of when things were introduced, and life is short.

- - - - -
4f98dcd2 by James Bonfield at 2016-02-02T14:38:22Z
Improved robustness of fai_build_core.

Fixes samtools/samtools#131

- - - - -
f8441c3b by daviesrob at 2016-02-03T16:32:14Z
Merge pull request #312 from samtools/cram_stats_fix

Fixed a CRAM encoding assertion failure on repeated templates.
- - - - -
1e94726d by John Marshall at 2016-02-04T11:23:43Z
Make local vcf_parse_format() function static [minor]

This means the _vcf_parse_format() symbol no longer appears
in libhts.so, but that was an internal function private to the
library, so affects neither API nor ABI.

- - - - -
cad00ea0 by John Marshall at 2016-02-04T11:35:08Z
Fix format string length computation

Add test case.  Fixes #325.

- - - - -
ed35e415 by John Marshall at 2016-02-05T14:32:08Z
Detect non-numeric characters in numeric format fields

Stop reading on characters-other-than-comma rather than when we see
the expected colon or end-of-field.  Fixes #321, fixes #322, fixes #323.

At present prints an error message and aborts parsing.
TODO We may need to add a new BCF_ERR_* value to |= into v->errcode,
and we need to consider whether plain "return -1" is appropriate.
(Adding a setting to reduce this to a warning would facilitate adding
the reported example VCF records as test cases in test/formatcols.vcf.)

- - - - -
67634efa by Steffen Moeller at 2016-02-06T11:31:44Z
Introduced EDAM annotation for TABIX

- - - - -
e25e9710 by Steffen Moeller at 2016-02-06T11:34:13Z
Newly introduced Tabix index file format

- - - - -
5632d5d4 by Steffen Moeller at 2016-02-06T11:37:55Z
Tabix index file format, longer form

- - - - -
59917d4f by Sascha Steinbiss at 2016-02-08T16:01:46Z
add upstream's patch for largefile issue

- - - - -
87e9cf56 by Sascha Steinbiss at 2016-02-08T16:34:42Z
add patch to run configure

- - - - -
a0b32414 by Sascha Steinbiss at 2016-02-08T16:35:17Z
run configure step before build

- - - - -
00bd24c6 by Sascha Steinbiss at 2016-02-08T16:35:26Z
update changelog

- - - - -
231557d3 by Sascha Steinbiss at 2016-02-08T16:46:15Z
Merge branch 'debian/unstable' of ssh://git.debian.org/git/debian-med/htslib into debian/unstable

- - - - -
388bb16a by Sascha Steinbiss at 2016-02-08T16:46:49Z
dch -r

- - - - -
48353e17 by Sascha Steinbiss at 2016-02-08T16:52:01Z
move around team upload notice

- - - - -
acc0763d by James Bonfield at 2016-02-09T14:59:45Z
Adds a warning about RNEXT fields with no matching @SQ record.

Also see samtools/samtools#489

- - - - -
7c84a4aa by James Bonfield at 2016-02-10T15:11:52Z
File format strings can now be specified uppercase or lowercase.

- - - - -
d7ecf685 by James Bonfield at 2016-02-12T15:14:29Z
Extra validity checking when caching a local copy of the reference.

A broken network connection, tested by using our own proxy, causes a
truncated reference to be cached followed by (continued) breakage on
decoding CRAMS.

We weren't checking for errors from hread() in find_file_url(), so
added them.  However this actually makes no difference as recv() on a
socket closed by the other end just returns 0 - same as EOF. Therefore
a second more robust check is to validate the md5sum of the returned
buffer matches the requested md5um.

[Diff reviewed and approved by Rob, so doing a direct commit.]

- - - - -
62ed90a9 by John Marshall at 2016-02-12T16:32:47Z
Merge file format string case insensitivity (PR #337)

Add scan_keyword() helper function, which could also be used in hts_opt_add().

- - - - -
eb3481f9 by James Bonfield at 2016-02-18T16:51:38Z
Bug fix to CRAM required_fields option.

(See also io_lib SVN r4003.)

Initialise cram_flags when required_fields indicates it is not
needed.  This avoids valgrind error in
valgrind ./test/test_view -i required_fields=0x3f foo.cram

Changed default sequence length to 0 instead of 1.  I've no idea why
it was 1, but it causes seq to come out as "N" instead of "*".  The
former breaks samtools as seq is a different length to the cigar
string.  The change now means that omitting sequence entirely is
possible and still yields a valid BAM stream.

- - - - -
fc9aeb6f by John Marshall at 2016-02-19T10:52:17Z
Add internal isdigit_c()/etc functions for plain chars

Using `char c; ... isdigit(c)` is technically incorrect (as the
<ctype.h> functions expect an int such as is returned by fgetc()) and
produces a warning on Windows.  See also CERT STR37-C's explanation.

Add our own *_c() functions operating on plain chars to hts_internal.h,
which also allows us to in future reimplement them directly to make them
immune to locales.

Use *_c() instead of casting to unsigned char in hts.c and vcf.c.

- - - - -
a918794d by John Marshall at 2016-02-19T10:52:17Z
Interpret digits directly in parse_version()

The buffer is not NUL-terminated, so using atoi() may be problematic.
Consider e.g. "##fileformat=VCFv4. -1" which contains delimiters for
/[0-9]+[.][0-9]+/ but ordinary characters for atoi().  Hat tip @daviesrob.

(Valid format versions will have at most a couple of digits, so we are
not too concerned about overflow here.)

- - - - -
091c89ca by John Marshall at 2016-02-19T11:35:06Z
Avoid tautology warning [minor]

Rather than being malloc()ed, ch->codecs is an array so is by definition
not NULL.

- - - - -
f4718299 by John Marshall at 2016-02-24T11:55:56Z
Wrap load_hfile_plugins() in a critical section

- - - - -
64b788f6 by jenniferliddle at 2016-02-26T09:47:12Z
Avoid unnecessary warning on unknown sort order

- - - - -
ecdc3489 by James Bonfield at 2016-03-01T12:04:12Z
Fixed build with -DTEST_MAIN.

This enables a little main() function for standalone testing, which
apparently hasn't worked in a while.  It's purely debug only and has
no impact on the library usage.

- - - - -
30fb013d by John Marshall at 2016-03-01T13:44:27Z
Simplify hts_idx_save_core() endianness handling

Use idx_write_[u]int{32,64}() functions to abstract endianness handling
away from hts_idx_save_core(), removing the need for two separate code
paths.  (Fixes n_no_coor big-endian bug.)

Use "wu" to open BAI files as uncompressed BGZF files.  With this,
hts_idx_save_core() writes only to BGZF*, so idx_write() can disappear.

- - - - -
3b2de7b7 by John Marshall at 2016-03-01T15:10:13Z
Incorporate test_view.pl into the test.pl harness

Interrupting the tests with <Ctrl-C> now works.  The previous test
harness used Perl's system(), which ignores interrupt signals.

- - - - -
820689c4 by John Marshall at 2016-03-01T15:32:57Z
Write failure messages to stderr

No difference by default, but enables highlighting the failures or
capturing them by redirecting stderr.

- - - - -
c5187a88 by John Marshall at 2016-03-03T11:31:12Z
Tidy up hts_idx_save_core() / hts_idx_save_as() code

Clarify loops; lift bgzf_open/bgzf_close out of the per-format code.

- - - - -
26b3085c by John Marshall at 2016-03-04T11:26:31Z
Constify cram_encoding2str() and remove duplicate function [minor]

This returns a string literal, so const char *.  Remove codec2str()
which just duplicates it.

- - - - -
dd69d7a4 by kirkmcclure at 2016-03-07T11:23:57Z
Report zlib error codes in bgzf.c

- - - - -
c32590a4 by Rob Davies at 2016-03-08T09:21:57Z
Remove abort() and improve multi-threaded worker error handling

Get rid of the calls to abort() and return error codes instead.

Make worker_aux() give up in a cleaner fashion if it spots an error.  In
particular, after setting its error flag and tidying up, it ensures that
w->mt->proc_cnt is incremented so that mt_flush_queue() won't get stuck
in an endless loop waiting for it to finish.  It then returns 0 so it
will be called again, where it can wait for done to be signalled in
the normal way.

Ensure that mt_flush_queue() only writes data if no errors were reported
from the threads (otherwise the output would have missing blocks).

- - - - -
6d5124b5 by Rob Davies at 2016-03-10T11:21:07Z
Add HTS_RESULT_USED to some function prototypes and fix resulting warnings.

HTS_RESULT_USED is __attribute__ ((warn_unused_result)) on gcc and clang.
Add it to some functions that return error codes as an int, to ensure they
are checked.  Notable exceptions are *_close functions, which lead to too
many false positives due to them being called in error clean-up code.

The warnings are fixed, along with some other obvious cases where errors
are not being handled properly.

- - - - -
fc9c7bde by John Marshall at 2016-03-10T13:44:12Z
Fix warnings; don't mark bgzf_mt() as HTS_RESULT_USED

bgzf_mt() is mostly advisory, so don't mark it as HTS_RESULT_USED.

- - - - -
52204884 by Rob Davies at 2016-03-10T14:41:19Z
Tidy up error reporting and fix possible memory leaks.

Turn bgzf_zerr into a function that returns strings instead of having
it write to a file directly.  Also don't export it any more as it isn't
very useful outside bgzf.c.

Improve error checking in bgzf_write_init.  Put better clean-up code in
bgzf_write_init and bgzf_index_load to avoid memory and file leaks.

- - - - -
aef535ca by Rob Davies at 2016-03-11T09:03:45Z
Makes bgzf_zerr return the z_stream msg parameter where useful.

Unfortunately zlib doesn't set msg for every possible error return, so
this only uses it in the cases where it might be set to something that
is better that the static messages in bgzf_zerr, and not likely to be
a stale message.  In practice this is mainly when inflate returns
Z_DATA_ERROR.

- - - - -
27d1c805 by John Marshall at 2016-03-11T14:53:08Z
Simplify bgzf_index_dump() / bgzf_index_load() endianness handling

Use f{read,write}_uint64() functions to abstract endianness handling
away from the main functions, removing the need for two separate
code paths.

- - - - -
5523a23b by Martin O. Pollard at 2016-03-14T16:52:32Z
Fix tabix man page CSI option

-C --csi had duplicate of previous line. Add proper CSI mode
description (Fixes #348).

- - - - -
fc93dfc1 by James Bonfield at 2016-03-15T14:06:33Z
More efficient index loading.

There is no need to load all the unmapped slices in the nested
containment list as a nested list.  All being the same size doesn't
imply containment in this special case and it's more efficient to load
them as a normal linked list instead.

- - - - -
b2f29c86 by James Bonfield at 2016-03-16T09:55:25Z
Minor fix to lzma error message.

- - - - -
3ea35c93 by James Bonfield at 2016-03-16T14:52:08Z
Added a safe_ltf8_decode function to go along with the itf8 variant.

Also fixed a potential speed issue when doing multi-threaded
encoding.  This was fixed in io_lib svn r3956, but for some reason not
duplicated here (unless there was a reason it didn't count, which
escapes me if so).

Plus a couple harmless/tiny code tweaks to improve diffing against
io_lib.

- - - - -
c62d1ff0 by James Bonfield at 2016-03-16T15:01:48Z
Fixed multi threaded partial decoding. (Eg quitting after N reads)

This triggered a bug where the decoding cram pool was flushed during
closing, attempting to write a container (to a file descriptor opened
for read, leading to crashes due to missing buffers).

See io_lib r3988.

- - - - -
f1eb18dd by James Bonfield at 2016-03-16T15:12:10Z
Fixed a bug where we weren't setting fd->first_container on V2.x or
V3.x files (io_lib r3789).

This doesn't have any impact yet, but will once if make the index work
on a file read from a pipe (io_lib r3790).  I'm not sure that's so
useful though; better is to index on-the-fly during creation.

- - - - -
22ef7152 by James Bonfield at 2016-03-17T11:48:30Z
Code changes to synchronise cram_encode.c (mostly) with io_lib.

- Various minor formatting changes, to make diff easier.
- Bug fix cram_compress_slice compression of block zero.  Generally
  nul effect, but now done explicitly.
- A few assertions are now returning error codes instead.
- New cram_update_curr_slice function (svn/3976,3978), fixing issues
  with explicit flushing part way through encoder.  (Not done in our
  code, so nul effect here, but it's tidier code.)

- - - - -
66079072 by James Bonfield at 2016-03-17T12:48:24Z
Merged in several cram_codecs changes from io_lib.

- Cache the cram_block in the codec.  This makes lookup efficient when
  the content_id is high (produced by cramtools / scramble aux tags).

- Fixed various AFL identified issues with reading past buffers during
  decode (see r3920, 3928, 3929).

- Better error reporting.

- Sanitised "value" vs "val".

- - - - -
99f112c1 by James Bonfield at 2016-03-18T10:45:41Z
Improved error checking, copied over from io_lib.

Also some minor changes of no consequence; removal of dead code and
signed changes, purely to make the code easier to diff against
io_lib.

- - - - -
1a050b4f by James Bonfield at 2016-03-18T14:11:47Z
Fixed memory allocation issue when decoding CRAM.

This bug has existed a long time, but is now more likely to trigger by
6d2810cdbbbe4646dcf7a4fab1003aa99319c55a due to making it more likely
to overrun the buffer.

The issue is caused by zero length sequences ("*") with a CIGAR
alignment.  A full bullet-proof fix will require more work to use the
...decode_block() codec variants, but this fixes the common case of
length 0.

- - - - -
c6187714 by Isaac Turner at 2016-03-20T21:20:38Z
vcf.h:vcf_parse() add documentation

Add a comment to vcf_parse() to document that the input line must not end with a newline character.
- - - - -
5ff6cf00 by James Bonfield at 2016-03-21T15:34:51Z
Added a few prebuilt CRAM files.

These are from Staden io_lib, which exposed the buffer overrun fixed
in 1a050b4fc9d429a821b2ed7085b5d68a0b7fcc51. The CRAMs have been made
by a version of Cramtools-3.0.jar and are designed for
interoperability testing.

- - - - -
ad6fa248 by John Marshall at 2016-03-21T16:13:24Z
Set Git attributes for BAM and CRAM files in the repository

- - - - -
5af1b932 by John Marshall at 2016-03-21T16:34:28Z
Change test output filenames

...to match the patterns are already git-ignored and cleaned by
"make testclean".

- - - - -
1dee8d5a by John Marshall at 2016-03-21T16:40:56Z
Merge CRAM decoding memory allocation fixes (PR #357)

- - - - -
103f55a7 by John Marshall at 2016-03-21T21:03:47Z
Change test output filenames [minor]

Output SAM files use "*.sam_" to avoid being picked up as input
files by test_view()'s glob().  Oops.

- - - - -
a061cc21 by James Bonfield at 2016-03-22T12:08:24Z
Added a BASES_PER_SLICE hts option.

This is used in CRAM to set an upper limit on the number of bases per
slice in addition to the sequences per slice.  The default value is 5
million, equiv to 10,000 reads (the default seqs per slice) each 500bp
long.  Note adjusting seqs_per_slice does not automatically scale
bases_per_slice.

The reason for this is to prevent excessive memory usage when encoding
or decoding long read technology data (eg PacBio or ONT).

- - - - -
38578756 by John Marshall at 2016-03-22T14:05:37Z
Add BCF_ERR_CHAR bcf1_t::errcode and improve diagnostic

Show the exact invalid character for unprintable characters too;
improves error messages such as in #358.

- - - - -
53c064cd by John Marshall at 2016-03-23T10:53:02Z
Check dummy header IDs can be retrieved

When we cons up a dummy header for unknown contigs and tags, ensure
that the user-supplied ID is sufficiently syntactically-correct that
the resulting header can be parsed and the same ID retrieved.

Fixes #324 and several test cases by @daviesrob.

- - - - -
ee98182c by James Bonfield at 2016-03-23T14:12:14Z
CRAM encoding now puts auxiliary tags in their own blocks.

This makes it easier and faster to pull out some aux types while
omitting others.  It may also have a small improvement to compression
ratios.

-rw-r--r-- 1 jkb team117 6074896 Mar 23 14:15 /tmp/_new.cram
-rw-r--r-- 1 jkb team117 6094754 Mar 23 14:14 /tmp/_old.cram

New cram blocks ("cram_size" tool output):

Block content_id      32, total size       22432    rR TL
Block content_id 4279619, total size       14811 g  rR AMC
Block content_id 4342618, total size        7744 g     BCZ
Block content_id 5330010, total size      102272     R QTZ
Block content_id 5459267, total size       14223 g   R SMC
Block content_id 5779523, total size        7562 g     X0C
Block content_id 5779539, total size         272 g     X0S
Block content_id 5779779, total size        8193 g     X1C
Block content_id 5779795, total size         417 g     X1S
Block content_id 5783898, total size       59925 g     XAZ
Block content_id 5784387, total size        1063 g     XCC
Block content_id 5785411, total size        3198    r  XGC
Block content_id 5786947, total size       23071    r  XMC
Block content_id 5787459, total size        2729    r  XOC
Block content_id 5788737, total size        5825     R XTA
Block content_id 6370115, total size        1007 g     a3C
Block content_id 6383683, total size          89 g     ahC

Old cram blocks:

Block content_id       1, total size      104953 g     SMC AMC X1C X0C XGC XCC XOC X0S a3C ahC X1S XTA XMC
Block content_id       7, total size      102272     R QTZ
Block content_id       8, total size        7744 g     BCZ
Block content_id       9, total size       59925 g     XAZ
Block content_id      32, total size       22432    rR TL

- - - - -
68867eaa by John Marshall at 2016-03-29T10:15:29Z
Miscellaneous VCF parsing fixes

When scanning header field values, be sure to stop at the end of the
line.  Fixes  ##FILTER=<ID=i"<">  test case.

When scanning quoted header field values, skip the terminating quote
only if it's present.  Fixes devious  A,B="C  CHROM column test case.

When removing trailing spaces, don't trim past the start of the value.
Fixes  ##INFO=<ID=BADNUM,Number= ,Type=Integer>  and  FORMAT-spaces-only
test cases.

Parse FORMAT columns until the end of the input kstring_t.
Fixes valgrind-detected bugs when there are (invalid) NUL characters
in the FORMAT columns.

Fix loop condition to stop on unbalanced '>' characters (i.e., when
nopen goes negative).

Hat tip @daviesrob for the test cases.

- - - - -
1afaf0c7 by John Marshall at 2016-03-29T14:02:31Z
Also escape quotes and backslash in dump_char() [minor]

- - - - -
370c1ed7 by James Bonfield at 2016-03-30T11:36:20Z
Add prototype for cram_update_curr_slice to avoid warning in cram_io.c.

- - - - -
8f782d12 by John Marshall at 2016-03-30T12:05:32Z
VCF GT parsing fix

Also allow for omitted trailing GT FORMAT fields.  Compare 5c65920de10f3068802b6d4df1be86cfcd2ce5cd.
This is not valid VCF (in particular, because GT when present must
be the first sub-field), but we shouldn't write out-of-bounds if we
encounter it.  Hat tip @cryptoad.

Add test case, in which the non-last GT fields are misprinted as "2,4"
because the gt_i code in vcf_format() cannot handle multiple GT sub-fields
(which is invalid anyway).  Integer arrays and GT are represented the same
way in memory, so fortunately this does not have OOB reading implications.

- - - - -
7bf85c25 by James Bonfield at 2016-03-30T12:54:07Z
Sped up rANS decoders by 6% (O0) to 10% (O1).

This is largely achieved by reducing the size of the ari_decoder
struct.  It had a lot of temporary data there which was needed to
create the lookup tables but not needed during decoding.

The R[] array is also no longer calloc/memset as it's entirely defined
by the decoder and can therefore start off uninitialised.

Also slightly reordered the decoding code as this seems to give a
small improvement to speed.

The aggregate change to samtools as a whole when decoding CRAM is
around 5%, mostly achieved through a reduction in frontend cycles idle
from 36 to 35%.   This is still too high for my liking though.

- - - - -
653f45ef by Rob Davies at 2016-04-04T10:21:24Z
Catch negative IDX tags; treat like other IDX parse failures.

Fixes an invalid read error triggered by negative IDX values.

- - - - -
55ff37c3 by Rob Davies at 2016-04-05T14:26:52Z
Correct constant passed to bcf_hdr_id2int in bcf_hdr_check_sanity.

Change BCF_HL_FMT to BCF_DT_ID when looking up GL header id.
The GL check can't have worked correctly, and using the wrong dictionary
caused odd things to happen if you used GL as a sample name.

- - - - -
33de0e93 by Rob Davies at 2016-04-06T11:12:53Z
Limit number of FORMAT items to maximum allowed by bcf1_t n_fmt field

bcf1_t n_fmt is an 8 bit bitfield, so can only store values up to 255.
vcf_parse_format didn't check for overflow of n_fmt while counting
the number of FORMAT items.  This could result in allocation of an
under-sized array, followed by invalid writes when storing the format
information.

To fix this, replace an alloca call with a fixed-sized array (as the
maximum size is small and known).  Remove the code that counts the items
up-front, as it's no longer needed.  Finally, make the loop that iterates
through the FORMAT string bail out with an error if it counts too many
items.

- - - - -
601f7e52 by Martin O. Pollard at 2016-04-12T12:22:58Z
Add read error check to fai_load

fai_load was invoking fai_read which did not check for ferror() on
read.  (Fixes #315.)  Add some system error reporting to fai_load.

This removes fai_read() as a public symbol in libhts.so, but this
function does not appear in a public header file so is not part of
the HTSlib API.  So there are no ABI implications of this removal.

- - - - -
d557cb01 by John Marshall at 2016-04-15T15:21:18Z
Various minor tidy ups

bgzf.c: in bgzf_close(), don't record errors inside  fp  as it's being
destroyed anyway.

faidx.c: fai_build() has already printed a diagnostic.

vcf.c: check that bgzf_read() has read the complete length expected;
be consistent with NULL vs 0.

- - - - -
5285dc03 by John Marshall at 2016-04-15T15:40:12Z
Merge error checking improvements (PR #271)

Omitted the bgzf.c s/count/BGZF_BLOCK_SIZE/ commit (1801353cda1c6f9ab678efe9cf059ba2584f4c90)
as it seems misguided.

- - - - -
0e5af188 by Rob Davies at 2016-04-18T09:06:01Z
Ensure vcf_parse_format always fills out end of vector values.

vcf_parse_format could skip filling out end of vector values if the
last sample record on a line finished early without including all formats.
This could lead to use of uninitialized memory later on.  Fix by moving
the code that fills end of vector values outside the loop that reads a
sample record, to ensure it always gets run.

- - - - -
314cd1ec by James Bonfield at 2016-04-19T13:31:59Z
Populating CRAM cache is now much less likely to have a race
condition.

On filesystems that don't support exclusive fopen, it was possible
totrigger a race condition causing the downloaded data to become
corrupt.  We now randomise the filenames to make this much less
likely.

The data is also md5sumed before writing.  (Logically we'd check after
writing, but if we've hit an NFS issue with multiple clients writing
to the same file then we're unlikely to detect that anyway as each
system will just read back what it wrote from its own local cache.)

- - - - -
e7782691 by James Bonfield at 2016-04-19T14:04:32Z
Removed the MD5 checking code.

This had subsequentially been re-written at a different point,
in commit d7ecf685f686541c6e2badd100594f2134206c92.

- - - - -
a9529a47 by John Marshall at 2016-04-20T10:32:17Z
Print message when first creating default cache directory

When $REF_CACHE is unset, if we write reference cache files we write
them within the user's home directory (typically ~/.cache/hts-ref).
Print a message when actually creating this directory, to alert an
unsuspecting user that we may be going to use a lot of disk space
within their home directory.  Hat tip @tk2.

If $REF_CACHE is set, it is non-trivial to determine which level is
a "root" directory to trigger this from.  In this case, we assume
that setting an environment variable indicates that the user is aware
of what's going on.

- - - - -
cf3e55ce by John Marshall at 2016-04-20T13:54:03Z
Avoid random number generator library functions

We are a library, so shouldn't use random()/srandom() as the client
application code may be using them and not expecting someone else to
be interfering with their entropy.

Instead base the unique temporary file name directly on process id,
thread id, current time (using plain C90 functions), and the integer
value of a pointer.

- - - - -
1bc63985 by John Marshall at 2016-04-20T15:04:51Z
Merge uniquified ref cache temporary filename (PR #320)

- - - - -
50c4f39d by John Marshall at 2016-04-21T10:16:14Z
Avoid pointer/integer cast warning

It was debatable how much ASLR randomness this might have added.
Instead just make thrid an incrementing counter, to alter the filename
even if we go around this loop faster than the clock resolution reported
by time() and clock().  Hat tip @daviesrob, @jkbonfield.

- - - - -
eeec6499 by John Marshall at 2016-04-21T11:23:26Z
Merge vcf_parse_format() end-of-vector bug fix (PR #370)

Clarify loop invariant: fmt[j] is the first unfilled-in fmt_aux_t.

- - - - -
0f298ce2 by John Marshall at 2016-04-22T08:45:12Z
Release 1.3.1: bug fix release, notably error checking

- - - - -
20648485 by John Marshall at 2016-04-22T13:44:15Z
Merge version number bump and NEWS file from master

- - - - -
9919fca1 by John Marshall at 2016-04-25T08:35:23Z
Use internal plain char isdigit_c()/etc ctype functions

See fc9aeb6f77668afed412119701c5c58b0fca8091.

- - - - -
543f030c by John Marshall at 2016-04-25T10:52:37Z
Add {errmod,kprobaln,bam_md}.[ch] from samtools

This begins the process of adding these functions previously in
samtools to the HTSlib API.  These files are identical to those
in the samtools/samtools at 1.3.1 release.

- - - - -
96acc6c5 by John Marshall at 2016-04-25T10:58:13Z
Retain only library parts of realn.c; canonicalise whitespace

Samtools's bam_md.c contains both library functions and the calmd
command; remove the non-library-function parts and rename to realn.c.

Detab kprobaln.c and kprobaln.h.

- - - - -
36bf59a2 by Charles Plessy at 2016-04-25T11:53:00Z
Merge tag '1.3.1' into debian/unstable

HTSlib release 1.3.1: bug fix release, notably error checking

* Improved error checking and reporting, especially of I/O errors when
  writing output files (#17, #315, PR #271, PR #317).

* Build fixes for 32-bit systems; be sure to run configure to enable
  large file support and access to 2GiB+ files.

* Numerous VCF parsing fixes (#321, #322, #323, #324, #325; PR #370).
  Particular thanks to Kostya Kortchinsky of the Google Security Team
  for testing and numerous input parsing bug reports.

* HTSlib now prints an informational message when initially creating a
  CRAM reference cache in the default location under your $HOME directory.
  (No message is printed if you are using $REF_CACHE to specify a location.)

* Avoided rare race condition when caching downloaded CRAM reference sequence
  files, by using distinctive names for temporary files (in addition to O_EXCL,
  which has always been used).  Occasional corruption would previously occur
  when multiple tools were simultaneously caching the same reference sequences
  on an NFS filesystem that did not support O_EXCL (PR #320).

* Prevented race condition in file access plugin loading (PR #341).

* Fixed mpileup memory leak, so no more "[bam_plp_destroy] memory leak [...]
  Continue anyway" warning messages (#299).

* Various minor CRAM fixes.

* Fixed documentation problems #348 and #358.

- - - - -
c2ca1bfa by Charles Plessy at 2016-04-25T11:55:19Z
First go back on debian/unstable, otherwise debian/watch is not there.

- - - - -
b9ed6fcb by Charles Plessy at 2016-04-25T12:13:04Z
A To Do list.

- - - - -
559f8b44 by Charles Plessy at 2016-04-25T12:14:26Z
Remove add_largefile.patch, applied upstream.

- - - - -
7163afb3 by Charles Plessy at 2016-04-25T12:17:02Z
Update symbols file. Missing _vcf_parse_format at Base and fai_read at Base.

- - - - -
973e9ac5 by Charles Plessy at 2016-04-25T12:26:25Z
Updated changelog.

- - - - -
3b219b8e by Charles Plessy at 2016-04-25T12:31:27Z
Enable libcurl.

- - - - -
ac22c29c by Charles Plessy at 2016-04-25T12:35:54Z
Enabled libcurl,  Do not enable plugins (no plugins available...).

- - - - -
55b2ae46 by John Marshall at 2016-04-25T12:47:35Z
Add errmod declarations to htslib/hts.h and errmod.o to Makefile

There are only three functions and one type (which can be opaque) to be
added to the API, so declare them in htslib/hts.h rather than their own
header file.  Now that errmod_t is opaque, refactor errmod.c to use its
coefficients directly there rather than via another errmod_coef_t struct.

- - - - -
1e6909ba by John Marshall at 2016-04-25T12:47:46Z
Rename kpa_glocal() to probaln_glocal() and add to htslib/hts.h

There are only four declarations to be added to the API, so declare them
in htslib/hts.h rather than their own header file.  The kpa_ prefix was
obscure and misleading (as this has always been local to samtools and
never part of klib), so change the prefix to probaln_ now that this is
being made public.  Rename kprobaln.c to probaln.c similarly.

(TODO) Add documentation in hts.h, based on the explanation in probaln.c.

- - - - -
8439061c by John Marshall at 2016-04-25T12:47:55Z
Rename as sam_cap_mapq()/sam_prob_realn() and add to htslib/sam.h

Regularise the names of bam_cap_mapQ() and bam_prob_realn_core() now
that they are being made public.

(TODO) Add documentation in sam.h.

- - - - -
622bfaab by John Marshall at 2016-04-25T12:48:03Z
Remove probaln_par_def/probaln_par_alt constants

These appear to be unused outside sam_prob_realn(), so avoid adding them
to the public API.  We can re-add them at a later date if necessary.

- - - - -
c2c1df36 by Charles Plessy at 2016-04-25T13:19:16Z
Build-depend on libcurl4-gnutls-dev and libssl-dev.

- - - - -
8b103477 by Charles Plessy at 2016-04-25T13:19:16Z
New upstream release; no new copyright nor license notice.

- - - - -
91015809 by Charles Plessy at 2016-04-25T13:19:16Z
Conforms to the Policy version 3.9.8.

- - - - -
6e232693 by Charles Plessy at 2016-04-25T13:20:32Z
htslib (1.3.1-1) unstable; urgency=medium

  36bf59a Merge tag '1.3.1' into debian/unstable
  559f8b4 Remove add_largefile.patch, applied upstream.
  7163afb Update symbols file. Missing _vcf_parse_format at Base and fai_read at Base.
  c2c1df3 Build-depend on libcurl4-gnutls-dev and libgnutls28-dev.
  3b219b8 Enable libcurl.
  9101580 Conforms to the Policy version 3.9.8.

 -- Charles Plessy <plessy at debian.org>  Mon, 25 Apr 2016 22:19:34 +0900

- - - - -
68a17aa2 by Charles Plessy at 2016-04-25T13:28:24Z
htslib (1.3.1-1) unstable; urgency=medium

  36bf59a Merge tag '1.3.1' into debian/unstable
  559f8b4 Remove add_largefile.patch, applied upstream.
  7163afb Update symbols file. Missing _vcf_parse_format at Base and fai_read at Base.
  c2c1df3 Build-depend on libcurl4-gnutls-dev and libssl-dev.
  3b219b8 Enable libcurl.
  9101580 Conforms to the Policy version 3.9.8.

 -- Charles Plessy <plessy at debian.org>  Mon, 25 Apr 2016 22:28:11 +0900

- - - - -
d8c1ca3d by James Bonfield at 2016-04-25T14:04:27Z
Removed use of CRAM "CORE" block while encoding.

Previously the cram_stats_encoding() function was written to
roughly minimise the storage requirements.  As such it used HUFFMAN or
BETA codec for data low-volume series and EXTERNAL for larger ones.
This avoided any block overheads.  (Actually there was a lot of
additional commented-out code to deal with subexponential codec too.)

However this comes at the price that (more or less) any field may
potentially be mixed with with any other field in the CORE block.
Checking a random file I see this is so:

$ cram_dump /tmp/_2.cram | egrep 'HUFFMAN|BETA'|less
         RL =>          HUFFMAN {1, 100, 1, 0}
         BA =>          HUFFMAN {4, 65, 67, 71, 84, 4, 1, 2, 3, 3}
         DL =>          HUFFMAN {4, 1, 2, 3, 4, 4, 1, 2, 3, 3}
         RG =>          HUFFMAN {1, 0, 1, 0}
         RI =>          HUFFMAN {1, 0, 1, 0}
         ...

Huffman 1,100, 1,0 is fine (1 symbol [100] with 1 bit-length [0];
so no storage in CORE), but DL and BA series here both require storage
and are interleaved into CORE.  Now BA and DL will be stored with type
EXTERNAL.

The change in file size is expected to be minimal; around 0.01% in
tests and cpu overhead is equally tiny.

The advantage is that is simplifies partial decoding.  Although the
cram_dependent_data_series() works out which data series are
co-locating in the same block, it becomes more efficient to do
partial decodes as fewer items share a common block.  It also becomes
much easier to filter by block, dropping specific data series
completely while retaining others.

- - - - -
b183ab2d by John Marshall at 2016-04-26T14:27:51Z
Add errmod / probaln / prob_realn to HTSlib API (PR #343)

Add errmod_cal(), probaln_glocal(), sam_cap_mapq(), and sam_prob_realn()
functions the to HTSlib API.  These functions were previously in samtools;
add them to htslib as they will soon be used by bcftools as well.

- - - - -
10e3c2c1 by John Marshall at 2016-04-26T15:40:57Z
Remove unused variable [minor]

Prior to samtools/samtools at 0f490725235e5f93e60577a6a9fbd40053c8be82,
is_diff appeared in a debug printf, indicating whether pb == 1.0.
By observation (with samtools mpileup -v -E -f ref.fa foo.bam),
pb (i.e., b[0][k]) is indeed one in the current code.

- - - - -
df9bd0e7 by Charles Plessy at 2016-04-27T04:33:35Z
Breaks: samtools (<< 1.3.1)

Closes: #822741

- - - - -
80a1c1c9 by Charles Plessy at 2016-04-27T04:34:58Z
Current changelog.

- - - - -
0454d476 by John Marshall at 2016-04-27T10:18:53Z
Fix len parameter type [minor]

Change to match the type of arguments used and of faidx1_t::len.
Fixes #372.

- - - - -
519f75a9 by James Bonfield at 2016-04-27T13:21:05Z
Prompoted cram/thread_pool code to the top-level and use it within
bgzf writing instead of its own multi-threading.

This can be a very significant multi-threading speed improvement in
some situations (depending on what other bottlenecks exist).

- - - - -
cd2d9157 by John Marshall at 2016-04-28T10:07:15Z
Fix potential infinite loop [minor]

A well-constructed header will always contain a #CHROM header line not at
the start, so strstr() alone would suffice.  But it's easier to handle all
the cases than to verify that we get only well-constructed headers.

- - - - -
1385d7a8 by John Marshall at 2016-04-28T14:47:24Z
Replace bcf_hdr_fmt_text(), which can't handle huge headers

Add bcf_hdr_format(), which operates on a given kstring_t, and use it
throughout.  Mark bcf_hdr_fmt_text() as deprecated, as its conversion
of the kstring_t's size_t length to int limits it to (often) 2GiB.

(TODO) Check errors within bcf_hdr_format().

- - - - -
6d70a0e8 by Martin O. Pollard at 2016-04-29T13:04:12Z
Create API to check EOF on all htsFile that support EOF block

Create EOF checking support for CRAM with public API call.
Create htsFile API call to check EOF on BAM and CRAM

- - - - -
aaf998a9 by James Bonfield at 2016-04-29T16:28:38Z
Tidied up the mass of #ifdefs.

All the #ifdef DEBUG + fprintf are now DBG_OUT which expands to fprintf
or nothing.

The #ifdef IN_ORDER is now mandatory with the old code culled, as we
found it to generally be a win on machines with CPU frequency scaling
turned on.

- - - - -
cb92f20d by John Marshall at 2016-05-03T10:07:52Z
Read BCF header's l_text as unsigned

Corrects bcf_hdr_read()'s handling of 2GiB+ header text.

- - - - -
f2399ee8 by Petr Danecek at 2016-05-04T15:14:49Z
New bam_mplp_reset function to allow mplp in multiple regions

Resets all of min/pos[]/n_plp[]/plp[] to their bam_mplp_init() values.

- - - - -
f72add4a by James Bonfield at 2016-05-06T11:28:43Z
Further updates to multi-threading.

The BGZF writing code now runs in its own dedicated thread.  This
simply permits it to interleave writing time with job dispatching and
other main loop bits.

The writing thread also does periodic hflush, which in turn calls
fdatasync is operating on a local file.  This avoids a much larger
amount to sync at the end when we close the file.

A consequence of the separate write thread is that the thread pool
output queue (aka "results queue") now has a size limit.  Otherwise
failure to write fast enough would build up more and more compressed
blocks in memory.

The net effect of these changes are further speed increases.

- - - - -
2cb33f99 by James Bonfield at 2016-05-06T11:38:36Z
Fixed dependencies & .mk defs so Samtools links too (threading change).

- - - - -
70f1faec by James Bonfield at 2016-05-06T13:15:14Z
Fixed uninitialised memory (queue shutdown).

- - - - -
5f4f5732 by Charles Plessy at 2016-05-07T13:19:46Z
Revert "Breaks: samtools (<< 1.3.1)"

This reverts commit df9bd0e7b1226e93af22cf0c2fb9774a78392ec2.

Only the test suite was incompatible, see https://github.com/samtools/htslib/issues/374

- - - - -
c8c5494b by Charles Plessy at 2016-05-07T13:20:58Z
Step back to 1.3.1-1, ready for backport.

- - - - -
f581de8b by John Marshall at 2016-05-09T15:37:03Z
Merge fixes for huge BCF headers (PR #373)

Read and write BCF headers with more than 2GiB of text.
Fixes samtools/samtools#567.

- - - - -
1cc7d126 by John Marshall at 2016-05-10T15:22:27Z
Add curl_kput(), which slurps an URL into a kstring

Reads from the URL, appending its contents to the kstring.

- - - - -
b64e076c by Rob Davies at 2016-05-13T14:02:13Z
Make hts_itr_query find no-coor reads when last reference is unused.

When searching for HTS_IDX_NOCOOR reads, hts_itr_query would return the
wrong result for indexes where the last reference was unused.  Such
references have no entries in the binning index so it it not possible
to return an end offset for them.  Add a loop so that hts_itr_query finds
the last reference with mapped reads.  If found, its ending offset is
returned as the location where the no-coor reads start.

- - - - -
fc062fb8 by John Marshall at 2016-05-16T10:13:51Z
knetfile.c: Only emit Range header if needed

- - - - -
ab793a8d by Rob Davies at 2016-05-16T10:33:46Z
Don't assume order of sequence_ids when finding HTS_IDX_NOCOOR location.

Previous commits b2aab8, 60c22d and cc207d suggest that the order of
sequence_ids in a file may not match the order in the header.  Such
an ordering is permitted by the index format.  In case it happens,
make hts_itr_query check all references when searching for the
HTS_IDX_NOCOOR location to find the one with the highest offset.

- - - - -
375ba53e by John Marshall at 2016-05-16T12:40:37Z
Merge hts_itr_query(HTS_IDX_NOCOOR) fixes (PR #376)

- - - - -
e5ab9978 by James Bonfield at 2016-05-17T14:19:42Z
Fixed out by one error in bin calculation (CRAM -> BAM).

Fixes samtools/samtools#574

- - - - -
21e6a8f3 by John Marshall at 2016-05-17T15:23:42Z
Reorder Makefile dependencies [minor]

Make them correspond to the order of #include lines in the source code,
thus facilitating scripted updating.

- - - - -
abfd3371 by John Marshall at 2016-05-18T15:49:47Z
Use internal plain char isdigit_c()/etc ctype functions

See fc9aeb6f77668afed412119701c5c58b0fca8091.
Remove #include <ctype.h> from source files that don't use
any ctype functions.

- - - - -
5720e3e9 by James Bonfield at 2016-05-19T13:39:13Z
Deleted the broken code in zfopen() when HAVE_POPEN is defined.

I've no idea how it ended up so non-sensical.  This code is defunct
anyway (fortunately) as with zlib 1.2.5 gzgets is no longer a glacier.

- - - - -
e1dd6353 by James Bonfield at 2016-05-20T14:16:39Z
Sam_index_build(2) now returns -4 for failure to save/create the index.

Manually tested both bam and cram indices in all the ways I could
think of, including explicitly specifying the filename, read only
directories, corrupted inputs and existing but read only indices.

- - - - -
43a94fba by John Marshall at 2016-05-20T15:47:42Z
Write CRAM ref cache via hFILE rather than stdio

Convert the code to use hFILE, now that hopen() understands "wx".
Mainly motivated by removing paranoid_fclose(): now the only fsync()
call is in hfile.c, so there is only one place we need to provide an
equivalent for Windows, which doesn't have a function of that name.

When aborting, unlink the file *after* closing it, for systems
(i.e., Windows) that can't unlink open files.

- - - - -
bb03b028 by jenniferliddle at 2016-05-23T15:31:34Z
Added bam_aux_update_str()

[Changed uint8_t *data to const char *data as this function operates on
strings rather than various data types; changed bam_aux_append() data
paramteter to const uint8_t *.  -- JM]

- - - - -
f58922d3 by John Marshall at 2016-05-23T16:05:41Z
Add doxygen @file documentation

Doxygen will only generate documentation for items within header files
that have @file annotations.  (The autobrief style used here requires
MULTILINE_CPP_IS_BRIEF=YES in Doxyfile.)

- - - - -
4eef9a7f by James Bonfield at 2016-05-25T14:17:22Z
Fixed CRAM MD/NM generation to follow the same logic as calmd.

The changes here are:

- N and P cigar operations no longer count towards the edit distance.
- When seq and ref match but both are "N", this is still considered
  to be a difference.

- - - - -
67234cb0 by Rob Davies at 2016-05-25T14:30:24Z
Make kstring detect more errors and work better on 64 bit systems

Add kroundup_size_t which works with both 64 bit and 32 bit size_t.

Make ks_resize preserve the value of s->m if realloc fails.

Always use ks_resize to resize the buffer.

Make ksprintf/kvsprintf and kgetline return an error code if ks_resize
fails.

- - - - -
02220bf0 by John Marshall at 2016-05-25T14:33:24Z
Use hts_get_bgzfp() when format.compression==bgzf

Prevents crashes when reading BGZF-compressed text files (eg foo.vcf.gz),
for which the BGZF file handle is buried under a kstream.

delenda est hts_get_bgzfp.  kstream quoque.

- - - - -
481752cc by John Marshall at 2016-05-25T14:52:51Z
Not with tabs, James! [minor]

- - - - -
242e03ee by James Bonfield at 2016-05-27T09:33:11Z
Tidied up pointless while/if duplication.
(Left over from a more complex structure.)

- - - - -
11661a57 by John Marshall at 2016-05-27T14:44:54Z
Show coordinates in "unsorted positions" error message

Give the user a chance to track down where the problem in their
file is -- and a reason to believe the message is accurate.

TODO Convey the sequence names down to hts_idx_push(); this message
now prints out "...on sequence #2: ...", meaning the second sequence
in the file, which is not unconfusing.

- - - - -
6717817d by James Bonfield at 2016-05-27T16:12:02Z
Draft of the multi-threaded decoder.

Still to-do: fix the termination cases.  It works, but sometimes fails
at the last block.

- - - - -
2ef8e3e3 by James Bonfield at 2016-05-31T16:34:04Z
Committing as it's a working MT decoder now.

It isn't complete and can't handle uncompressed files, but this is a
handy checkpoint to go back to incase I bork it.

- - - - -
41a714a4 by James Bonfield at 2016-06-02T09:28:12Z
Attempt to use shared pool.

ERROR: this isn't complete, but it's our best try so far so
check-pointing for ease of backing out.  Subsequent reorganisations
will be messy. :)

- - - - -
4b1610f5 by James Bonfield at 2016-06-06T14:47:24Z
Added a lossy read-name option (CRAM_OPT_LOSSY_NAME).

This takes care to handle supplementary reads too and also uses the TC
aux field if present (it almost never is!).

Note, converting a CRAM produced with CRAM_OPT_LOSSY_NAME to a new
CRAM without asking to lose names, will increase the size as the
auto-generated names then become real names and stored in the RN
block.

This is (probably) beneficial as it offers control and keeps things
sane when you're trying to merge files together containing lossy and
lossless names.  To do this properly requires finer control than a
single flag per container, but that would need a CRAM spec change.

- - - - -
4c0b93b2 by James Bonfield at 2016-06-06T14:47:24Z
Made the CRAM_OPT_PREFIX option external via the "name_prefix" option.

- - - - -
f689e188 by John Marshall at 2016-06-06T15:04:54Z
Merge CRAM lossy name compression (PR #326)

- - - - -
53b5f5d7 by James Bonfield at 2016-06-06T15:10:06Z
Expose cram_get_refs() function to return the opaque refs_t data type.

This permits explicit sharing of references between file descriptors.

- - - - -
121bab9d by James Bonfield at 2016-06-07T10:31:31Z
Major revamp of thread pool and associated tests.  In particular the
input queue is now per output queue rather than shared between all
output queues.  This helps resolve deadlocks and permits more advanced
task sharing.

Note: the rest of htslib using this has still to be updated.

- - - - -
5142d24e by James Bonfield at 2016-06-07T10:35:19Z
Bumped the sample size for the tests, and made all tests share a
common size.

- - - - -
8e9bd801 by James Bonfield at 2016-06-07T10:41:59Z
White space updates (tabs -> spaces).

Also culled the deprecated flush method.

- - - - -
7d50ac69 by James Bonfield at 2016-06-07T10:48:50Z
Sorry, updates to the previous commits too; forgetting to add this file!

- - - - -
d496268b by mcshane at 2016-06-07T12:32:46Z
allow bcf_index_build2 to index both bcf and vcf

This is to allow VCF/BCF indexing via streamed input and is a
pre-requisite for adressing samtools/bcftools#373

* distinguish negative return values as in sam_index_build2()
* expose tbx_index() via tbx.h
* add documentation for bcf_index_build() and bcf_index_build2()

- - - - -
0ba56364 by James Bonfield at 2016-06-07T13:57:05Z
Updates to cope with the new thread pool API.

Now passes minimal testing.

- - - - -
b6aa0e6e by John Marshall at 2016-06-08T10:04:32Z
Detect shared library and plugin types during configure

As plugins are only enabled if you run configure, lift their
configuration to configure time.  We now have more of a chance to
support cross-compilation via configure --host=xyz.  For now, using
$host_alias suffices; avoid AC_CANONICAL_HOST etc as our needs are
simple and we'd prefer not to bundle config.guess and config.sub.

- - - - -
2af39f8f by James Bonfield at 2016-06-08T17:04:25Z
Removed some debugging and changed the default queue sizes to be smaller.

- - - - -
bd32ec6c by James Bonfield at 2016-06-08T17:04:41Z
Improvements to avoid boom & bust scenarios.

We now continually check how many processing jobs we have vs the
output queue size so we don't launch more jobs than fit in the output
queue, causing subsequent waiting in theads later on as we over-egged
the pudding.  This caused a yo-yo effect that overall harms
performance.

- - - - -
8a8fac27 by James Bonfield at 2016-06-09T09:24:08Z
Culled the duplicate is_compressed assignment.

This appeared in bf909d669d45cb137e7cab5b56db0fc60be48a1c, but perhaps
a failed merge or just forgetting to delete the earlier variant.

- - - - -
2c2330b3 by James Bonfield at 2016-06-09T10:01:41Z
Various memory leaks fixed in bgzf multi-threading.

- - - - -
3f410839 by John Marshall at 2016-06-13T10:04:55Z
stringify_argv(): suppress trailing space

- - - - -
1eec11be by jenniferliddle at 2016-06-14T09:34:19Z
Fixes bug in bam_aux_update_str()

- - - - -
2cddfb3c by John Marshall at 2016-06-14T15:24:20Z
Build DLL and plugins on Cygwin

The Cygwin convention seems to be to just create cyghts-SOVER.dll
without additional symlinks, so we do that.  Linking also creates
an import library, libhts.dll.a, which plays a similar development
role to plain libhts.so; we could make this the true target of the
link rule (cf libhts.so), but currently its name will collide with
the same name used for a plain Windows import library.

NOTE: To use the plugins successfully, there must be only one copy
of libhts.a/cyghts*.dll in the complete executable.  To ensure this,
either make executables use the DLL too (so for htsfile/tabix/etc,
apply s/libhts.a/-L. -lhts/ to the recipe; FIXME) or perhaps we can
link the plugins themselves with some --allow-undefined option rather
than with the import library, as we do on other platforms.

Also ignore plain Windows *.dll for good measure.

- - - - -
bedecccc by James Bonfield at 2016-06-14T17:03:55Z
First apparent working bgzf_seek implementation.

This also adds new functions to thread_pool, most importantly the
ability to reset a queue.  There is now also a requirement to shutdown
queues before destroying the pool itself.  (You can destroy the pool,
but the queue shutdown will then wedge if it has jobs in flight as
they'll never be processed.)

- - - - -
d6dacc7b by James Bonfield at 2016-06-15T10:43:46Z
Added HTS_OPT_CACHE_SIZE option to specify the bgzf cache size.  This
can speed up decoding considerably in certain circumstances.

- - - - -
f522f1ff by James Bonfield at 2016-06-15T16:20:32Z
Made thread job serial numbers 64-bit.

The intention is to permit callers to provide their own serial
numbers, and to then get bgzf to start using file offsets as serials.
This may(!) permit use of the output queue as a cache layer.

- - - - -
15ef81b3 by James Bonfield at 2016-06-16T13:54:24Z
Improvements to multi-threading encode/decode/transcode.

More rigorously tested with respect to shutting down queues, avoiding
a lot of race conditions.

- - - - -
1c0ff9d3 by James Bonfield at 2016-06-16T14:09:19Z
Changed the HTS_OPT_THREAD_POOL argument from a t_pool pointer to a
t_pool plus queue size int.  This is to permit one large queue for
encoding side and lots of small queues on decoding side all sharing
one pool; eg a many-way merge.

Perhaps this would be better done using two HTS_OPTs, but if so it
requires breaking the ABI of htsFile in order to store the value
somewhere? Or maybe create the queue and then resize immediately
after?

- - - - -
c721a740 by James Bonfield at 2016-06-16T16:57:28Z
bgzf_check_EOF now works when input is multi-threaded.

- - - - -
3a8d3d3b by James Bonfield at 2016-06-17T10:40:15Z
Htslib now copes with zero length Z and H aux tags.

It also now forces H type (hex) to be an even number of bytes.

Implements samtools/hts-specs#135

- - - - -
9d2e3ded by James Bonfield at 2016-06-17T16:00:20Z
Propagate read error in thread breader ack to fp->errcode error in main thread.

- - - - -
511a2354 by James Bonfield at 2016-06-17T16:58:16Z
Deleted the now defunct read_eof variable.

Fixed the reader error reporting to also report EOF instead (empty
read).  This was necessary for bgzip to spot EOF (samtools didn't spot
this as it stops at the recognised EOF byte values rather than genuine
EOF).

Bgzip now uses threads for decoding too.

- - - - -
d0614e51 by James Bonfield at 2016-06-20T16:45:37Z
Fixed bgzf_getc and bgzf_getline to work in multi-threaded mode.

This means in theory samtools faidx and htslib's tabix could be
multi-threaded.  In practice the gain is slight as both of these
commands spend a lot of cpu doing more than decompressing.  I could
get 60% gain to faidx rate and ~100% gain to tabix index rate.

Maybe worth keeping, but currently not enabled.  The main change is in
bug fixing the functions.

- - - - -
45027425 by John Marshall at 2016-06-21T13:55:55Z
Use hFILE rather than stdio when reading indices

Otherwise hts_idx_load2() does not work with file:/// URLs, which
are local (i.e. not hisremote()) but not understood by fopen().

- - - - -
8ca01755 by James Bonfield at 2016-06-21T14:08:00Z
Added support for multi-threaded BAM indexing.

Also enabled multi-threaded decoding of text-based bgzf formats (SAM,
VCF, maybe others?).  Is there an easy way to detect when the voidp
has been utilised in htsfile and if so whether it is to a bgzf rather
than some other generic type?  I'm not particularly happy with the
sam.gz MT implementation.

- - - - -
e65f67d6 by James Bonfield at 2016-06-21T14:28:18Z
Fixed CRAM_OPT_THREAD_POOL for cram.

This was broken in 74c5fee8c0e510dbfdb9702debfccb0e9844bfec when we
changed the argument to consist of pool + queue-size.

- - - - -
173a5d0a by James Bonfield at 2016-06-21T16:49:37Z
Merge branch 'develop' of https://github.com/samtools/htslib into threading_pool

Switched to using the handy hts_get_bgzfp() function which appears to
be the semi-official way of figuring out whether to use bgzf direct or
kstream via voidp. (BAM vs SAM.gz)

Conflicts:
	hts.c

- - - - -
93be96b8 by James Bonfield at 2016-06-23T08:37:04Z
Bug fix to hts_get_bgzfp.

It returned the voidp redirect for write access too, but the open code
only uses that for reading.

- - - - -
d365af30 by James Bonfield at 2016-06-23T08:38:25Z
Improvements for VCF/BCF multi-threading.

The synched reader now has bcf_sr_set_threads and
bcf_sr_destroy_threads methods.  There are also newer index_build3
functions for vcf/tbx to support multi-threading during the indexing.

ABI breakage: this adds a couple more variables to bcf_srs_t struct.

- - - - -
1c6cf228 by James Bonfield at 2016-06-23T13:17:12Z
Fixed file-descriptor leak in refs_load_fai().

(Present since inception of function in Feb 2013.)
Fixes #394

- - - - -
925a3c9e by John Marshall at 2016-06-24T10:04:56Z
Document required Cygwin (and RPM-style) devel packages

- - - - -
ccd48bd9 by James Bonfield at 2016-06-24T11:44:38Z
Make sure we destroy the thread_pool when created by ourselves.

- - - - -
eceeb01d by James Bonfield at 2016-06-24T13:22:08Z
Merge branch 'develop' of https://github.com/samtools/htslib into threading_pool

- - - - -
3ade6b6f by James Bonfield at 2016-06-24T13:48:30Z
Bug fix to bgzf_read, which was breaking samtools index (and more).

- - - - -
4f29e227 by James Bonfield at 2016-06-24T14:23:17Z
PTHREAD_MUTEX_RECURSIVE_NP vs PTHREAD_MUTEX_RECURSIVE.

The NP is the documented type and stands for non-portable.  As
expected, it's not portable!  It doesn't work on MacOS X, but removing
the _NP works on more systems.

TODO: Figure out how to avoid getting in the situation of locking
multiple times.  It comes about through pool->queue and queue->pool
interactions, both of them sharing the same mutex.

- - - - -
72359473 by James Bonfield at 2016-06-24T16:31:42Z
Code tidyup.

Removing #ifdefed code, debugging code, etc.

- - - - -
1659e8f2 by James Bonfield at 2016-06-27T11:44:29Z
test_view now also verified multi-threading.

- - - - -
cc4fe23f by James Bonfield at 2016-06-28T13:32:53Z
Removal of debugging output.

- - - - -
02e9d211 by John Marshall at 2016-06-29T13:28:07Z
Add is_cram flag to distinguish dummy hts_itr_t objects

(Note dummy:29 not decremented as it was previously one short;
we now have 1+1+1+29 == 32 as desired.)

- - - - -
155e1aac by John Marshall at 2016-07-04T14:28:52Z
Add print-config target

Third parties such as Pysam may need access to this information.

- - - - -
0de7fe54 by John Marshall at 2016-07-05T16:15:48Z
Generate config.h.in with autoheader

Previously we hand-edited config.h.in after using autoheader, to avoid
the dross -- especially PACKAGE_VERSION, as we compute that ourselves
(via git describe) in the Makefile.  However that's used only within the
Makefile, so defining it for C code via config.h is not a major problem.
(If it becomes one, we can add AH_BOTTOM([#include "config_undefines.h"])
or so, or not reuse the conventional PACKAGE_VERSION macro name.)

- - - - -
86260e1c by John Marshall at 2016-07-11T10:03:44Z
Add configure check for fdatasync()

Enables the existing HAVE_FDATASYNC conditional in hfile.c.

OS X has a library symbol named fdatasync() but no declaration in
the headers, and it is unclear whether its function has the expected
fdatasync(2) signature.  So we additionally use AC_CHECK_DECL to
avoid detecting this symbol on OS X.

- - - - -
3bfd2811 by James Bonfield at 2016-07-11T16:20:35Z
Migrated the thread pool structures to thread_pool_internal.h.

Also created a couple accessor functions for thread pool size
(no. threads), thread queue size, and extracting the void* data from a
thread "result".

[Updated Makefile dependencies -- JM]

- - - - -
c00fe2cd by James Bonfield at 2016-07-11T16:20:35Z
Renamed various thread pool structs/functions.

t_pool        -> hts_tpool
t_pool_result -> hts_tpool_result
t_pool_queue  -> hts_tpool_process

The way we think about queue is that in "foo |gzip| bar" and
"bar |gunzip| foo", the "gzip" and "gunzip" are process names and the
"|" symbols in "|gzip|" are the input and output queues associated
with this process.  Our old queue was actually a pair of queues (the
pipe symbols) with the entire struct modelling the I/O to the process.

Technically it's not the process itself but rather a wrapper around
the process, as that is another C func (whatever was given during
a dispatch call), but in lieu of any better term this will do.
(Please offer other ideas if you have them.)

Also added hts_tpool_kill so that hts_tpool_destroy is now simpler in
operation, with only one argument required.

- - - - -
299c7b67 by James Bonfield at 2016-07-11T16:20:36Z
Culled the DEBUG_TIME code in thread_pool.

It just clutters the code up and if we *really* need to go back into
such fine grained debugging again then we could resurrect it via this
diff.  (I don't see the need arising.)

- - - - -
5c591725 by James Bonfield at 2016-07-11T16:40:11Z
Further tidying up of queue vs process; mostly comments and docs.

Fixed args of hts_tpool_destroy (bgzf.c).

- - - - -
bdf85e4a by James Bonfield at 2016-07-12T13:51:52Z
Speed up to probaln_glocal.

Achieved by using tiny lookup tables rather than nested ?: operators
and replacing the many calloc/frees with far fewer callocs/mallocs.

Overall speed is around 25% for this function and ~15% faster for the
total bcftools time in a quick test on illumina data.

- - - - -
c6cb4c39 by John Marshall at 2016-07-13T12:06:32Z
Use finer-grained $(INSTALL_LIB) and $(INSTALL_MAN) macros

See https://www.freebsd.org/doc/en/books/porters-handbook/install.html
and note that libhts.a, as a static library, stays with $(INSTALL_DATA).

- - - - -
5869a67c by John Marshall at 2016-07-21T13:13:32Z
hts_itr_query(): discard chunks far beyond the query region

Similarly to the existing min_off code, discard chunks starting beyond
max_off, computed by finding a virtual offset in the linear index or
loff of a bin to the right of the query region's end.

- - - - -
0d5a5eb4 by John Marshall at 2016-07-21T15:12:33Z
Avoid linguist mis-classification [minor]

- - - - -
fd721f10 by John Marshall at 2016-07-26T12:34:07Z
Allocate BGZF::uncompressed_block/compressed_block together

Allocate both blocks in the same block of memory, so that upcoming
code can reuse the block as a single larger temporary buffer.
Add error checking in bgzf_read_init().

Also set fp->errcode on errors in bgzf_raw_read() and bgzf_raw_write().

- - - - -
344c8257 by John Marshall at 2016-07-28T15:54:44Z
Allow plugins to select RTLD_LOCAL or RTLD_GLOBAL

If the plugin does not provide the requested symbol, fall back to
reopening it with RTLD_GLOBAL and searching for a uniquified symbol,
<symbol>_<filename_basename>.

Most plugins should work with RTLD_LOCAL, but occasionally RTLD_GLOBAL
is needed.  For example, with iRODS 4.1.x, iRODS itself has plugins
(libtcp.so et al) that need to resolve iRODS symbols linked into
whatever is invoking it, i.e., HTSlib's hFILE plugin.

- - - - -
0f2a88a0 by James Bonfield at 2016-08-01T11:29:59Z
Protect against sequences starting beyond reference end.

This was previously causing the slice reference md5sum to crash.

Fixes samtools/samtools#600, but note this perhaps isn't a perfect
fix.  The test data for that issue is broken, but ideally we should be
able to reproduce the broken input after round-tripping.  Currently we
lose track of MD/NM tags.

However making cram totally lossless even when faced with invalid data
is an issue for another date.

- - - - -
e8bddbe2 by John Marshall at 2016-08-01T14:23:41Z
Use native Doxygen API documentation markup

- - - - -
7d0b90b6 by cbk-guest at 2016-08-03T01:24:47Z
Autopkgtest added

- - - - -
56a0f1d5 by Andreas Tille at 2016-08-03T11:20:02Z
hardening=+bindnow, upload to unstable

- - - - -
e87ae87d by Rob Davies at 2016-08-04T13:48:35Z
Add interfaces to hfile for delimited string input

There are three versions:
  hgetdelim - Reads up to a given delimiter
  hgetln    - Specialization of hgetdelim for '\n'
  hgets     - Wrapper for hgetln that provides the same interface as fgets

- - - - -
ed502be3 by John Marshall at 2016-08-04T16:11:40Z
[faidx.h] Use native Doxygen API documentation markup

Remove indentation from function declarations -- there's no point
indenting within `extern "C" { ... }`, as that's the entire file!

- - - - -
f86372ea by John Marshall at 2016-08-17T09:43:22Z
Remove iRODS plugin, which has moved to samtools/htslib-plugins

As a demonstration of maintaining a plugin separately from HTSlib,
and as iRODS's linking needs have become ever more convoluted,
hfile_irods.c is now at <https://github.com/samtools/htslib-plugins>.

- - - - -
74bcfd7c by John Marshall at 2016-08-19T08:51:00Z
Embed version number directly in hfile_libcurl plugin

When built as a separate plugin, it is incorrect to use hts_version()
to report HTSlib's version number, as we could be used with a variety
of HTSlib versions.  Instead embed our own copy of HTS_VERSION as of
when we were compiled.  (But just use hts_version() when plugins are
disabled and hfile_libcurl.o is built within HTSlib.)

Embed an SCCS-style @(#) version string to faciliate examination via
strings(1) | grep '@(#)' or what(1).

- - - - -
23d7f17d by John Marshall at 2016-08-30T14:35:01Z
Discard distant chunks based on binning index, not linear index

Contrary to the description in SAMv1.pdf:

    In the linear index, for each tiling 16384bp window on the
    reference, we record the smallest file offset of the alignments
    that start in the window.

it seems the linear index actually records the smallest file offset of
an alignment that *overlaps* the window.  This could start at a position
before the bin-to-the-right we're looking at, and lead to clipping chunks
containing reads in the desired region.

Instead take max_off from a chunk fully within the bin-to-the-right;
due to sorting the first one will be the tightest bound, and as the list
of chunks is set directly, we don't need to check for 0.  Fixes #405.

- - - - -
bbaab350 by John Marshall at 2016-08-31T16:25:56Z
[tabix man page] Note coordinate arguments are 1-based inclusive

Clarifies that command-line arguments are human-orientated 1-based
even when the data file is 0-based, e.g. BED.  Fixes #407.

Divide into paragraphs, fix typo.

- - - - -
86cb62bc by Andreas Tille at 2016-09-01T08:00:37Z
Fix build on hurd

- - - - -
5ec9af01 by Andreas Tille at 2016-09-01T08:01:47Z
Add myself to uploaders, upload to unstable

- - - - -
c31216b0 by Joe Rayner at 2016-09-05T08:59:06Z
Added bgzf_block_write() and rebgzip option.
Check indexing and rebgzip are not attempted simultaneously.

Require index file to be specified when rebgzipping.
Removed stray print statment.

- - - - -
eaef296c by Joe Rayner at 2016-09-05T08:59:06Z
Added bgzip test files.
Add bgzip --rebzip test and update test files.

- - - - -
54dc2320 by James Bonfield at 2016-09-05T09:10:39Z
Rebased PR#387 and minor code formatting fixes (trailing white space,
indentation, tabs) for consistency with existing code.

- - - - -
e79775f5 by James Bonfield at 2016-09-05T11:38:37Z
Added bgzip to check/test dependency.

This fixes a case where "make clean; make check" would fail.

- - - - -
71db6875 by John Marshall at 2016-09-07T16:17:05Z
Treat regions [-1,n) as [0,n) when indexing

In VCF files, 1-based POS=0 represents an event in a telomere.
POS=0 is represented as 0-based [-1,0), which previously led to a crash
during indexing.  Instead treat [-1,0) as [0,1) and larger [-1,n) as [0,n)
so that such regions will be placed in an appropriately-sized leftmost
bin.  (Treat [-1,0) slightly more specially as [0,0) winds up in bin 0.)

The previous crash occurred within insert_to_l(); this fixes the crash and
alters beg/end for [-1,n) regions for both insert_to_l() and insert_to_b().
Fixes #406.

- - - - -
3d3cc32d by James Bonfield at 2016-09-09T08:29:10Z
Fixed a race condition in the multi-threaded cram encoder.

The changes to the thread pool broke this code. Now rather than a
strict ordering of submit job then consume output, it permits output
consumption while waiting for room in the queue.

- - - - -
8d957c07 by John Marshall at 2016-09-09T13:45:16Z
Rewrite #ifdeffed-out use of now-removed variable [minor]

Completes the removal of unused pb in 10e3c2c151ac239fff357e3b886b17f1b27dbf50.

- - - - -
5b31094e by James Bonfield at 2016-09-09T16:03:39Z
Fixed a double free in multi-threaded CRAM and regions.

When asking for more than one region while also using a thread pool,
we were potentially freeing fd->ctr twice.  Here (cram_next_slice()
and in cram_seek_to_refpos()).  Contorted code, but this appears to
fix it and passes valgrind memcheck, helgrind and drd.

(Note this doesn't trigger when using the original multi-threading,
although I am unsure why not.)

- - - - -
9c230e7c by James Bonfield at 2016-09-12T14:47:21Z
Fixed out by one error in bin calculation (CRAM -> BAM).

Fixes samtools/samtools#574

(cherry picked from commit e5ab99785bb8ec5a5054d6c6a2c432bdd8ee5ef1)

- - - - -
ebdb5aa5 by John Marshall at 2016-09-12T15:04:22Z
Allow plugins to select RTLD_LOCAL or RTLD_GLOBAL

If the plugin does not provide the requested symbol, fall back to
reopening it with RTLD_GLOBAL and searching for a uniquified symbol,
<symbol>_<filename_basename>.

Most plugins should work with RTLD_LOCAL, but occasionally RTLD_GLOBAL
is needed.  For example, with iRODS 4.1.x, iRODS itself has plugins
(libtcp.so et al) that need to resolve iRODS symbols linked into
whatever is invoking it, i.e., HTSlib's hFILE plugin.

(cherry picked from commit 344c82579a8d56d1f92c8073c71037b99beb13cb)

- - - - -
5586168b by John Marshall at 2016-09-12T15:15:00Z
Provide a fallback for PATH_MAX if it is not defined

(TODO) Ideally we should be using kstrings rather than PATH_MAX and
FILENAME_MAX.  The relative_to functionality of open_path_mfile() is
in fact unused in HTSlib; eventually cram_populate_ref() should use
other facilities and open_trace_file.{c,h} will disappear anyway.

Obsoletes debian/1.3.1-3:debian/patches/define_PATH_MAX.patch

- - - - -
88517dfc by James Bonfield at 2016-09-12T15:52:10Z
Added callback + client data hooks to pileup iterators and pileup struct.

The mechanism here is to permit a pileup user to locally cache some
data when a new bam record is in flight, attached to the pileup struct
passed to the caller, and to tidy it up once it goes out of scope.
This greatly simplifies some programming methods and also permits more
efficient code in some places where doing the right thing (caching
values) was just too hard to make it worth while.

Also added an mplp version of the callbacks too, which simply iterates
over the plp ones.

- - - - -
3566540b by James Bonfield at 2016-09-12T15:55:00Z
Factored in the renormalisation to the f[] computation.

Previously we computed row f[i] and renormalised f[i] as the next
step.  Now when computing f[i] it renormalises f[i-1] as it goes.
This is around 5-10% faster.

In theory the same could be done for b[i] too, although I had
difficulties getting my changes working there so skipped that
modification.

- - - - -
8c595dfc by James Bonfield at 2016-09-12T15:55:17Z
Cosmetic: 0 to NULL

- - - - -
6bed35a3 by John Marshall at 2016-09-13T06:15:37Z
Release 1.3.2: bin field bug fix, RTLD_GLOBAL plugins

- - - - -
503b09d6 by John Marshall at 2016-09-13T07:48:44Z
Merge version number bump and NEWS file from master

- - - - -
f10f9b27 by John Marshall at 2016-09-14T09:38:25Z
Bump SOVERSION to 2 and note ABI incompatibility in NEWS

- - - - -
bf753361 by John Marshall at 2016-09-14T13:31:51Z
Merge (ABI-changing!) mpileup callbacks (PR #398)

- - - - -
0ca17a0e by Petr Danecek at 2016-09-14T14:12:16Z
Handle VCF lines with misssing `FORMAT=.`

For example, this is a valid VCF line

```
1	300	.	C	A	.	PASS	.	.	.	.
```

Previously this would emit a warning saying:
`[W::vcf_parse_format] FORMAT '.' is not defined in the header, assuming Type=String`
and internally we would have a new `FORMAT=.` tag.
This will now be recognised as missing.

htslib already writes out such lines when  `n_fmt == 0` and `n_samples > 0`

Mixing missing and non-missing FORMAT tags (e.g. `.:GT` or `GT:.:AD`) is not allowed.

See conversation in #409

- - - - -
19c18943 by jenniferliddle at 2016-09-26T13:01:05Z
Fix bugs in bam_aux_update_str()

s/m_data/data/ and remove casts that were suppressing a warning that
would have diagnosed this mistake.  Resample bam_get_aux(), which may
also be changed by the realloc().

- - - - -
4295de42 by Petr Danecek at 2016-09-27T19:37:34Z
Bug fix: 0 is a valid return value of bcf_hrec_find_key

- - - - -
553406d2 by John Marshall at 2016-09-30T10:06:34Z
Use <inttypes.h> instead of old WIN32-specific code

As faidx.c already uses <stdint.h> types and <inttypes.h> macros, use
PRIu64/SCNu64 etc to read and write the uint64_t file offset et al.
(Microsoft Visual Studio supports <inttypes.h> since MSVC 12.0 aka
Visual Studio 2013.)

- - - - -
b077a6bc by James Bonfield at 2016-10-03T14:30:43Z
Replaced BSD license with MIT license for consistency.

Code change brought about due to migration from cram subdirectory,
which originated in Staden "io_lib", to top level directory.
(Note: I am the sole author of the thread pool code.)

- - - - -
7a54ff90 by John Marshall at 2016-10-03T16:34:33Z
Avoid extraneous #includes

Forward declare struct hts_tpool in bgzf.h and hts.h instead of
including thread_pool.h.  Remove system includes in htslib/thread_pool.h
no longer needed after internals were moved to thread_pool_internal.h.
Reduce <inttypes.h> to <stdint.h> where we need int32_t etc rather than
PRId32 etc.  Add includes elsewhere as needed, where htslib/thread_pool.h,
<pthread.h>, <inttypes.h> facilities are actually used.

Add $(htslib_thread_pool_h) to htslib_vars.mk and use it in Makefile.

Add doxygen @file documentation to new public header file.

Rationalise include guard macro names (add HTSLIB_ prefix to the
public one; avoid using the _[A-Z] prefix reserved for the compiler
implementation; cf dde9bdbe4174728f99d1bbe7326ffd631539bef6).

- - - - -
daae2ea2 by John Marshall at 2016-10-03T16:42:28Z
Merge threading pool API (PR #397)

Note the addition to struct BGZF that may have ABI considerations, but
we have already chosen to break ABI compatibility for the next HTSlib
release, so it is open season.

Fixed trailing whitespace.

- - - - -
d58eab8f by Rob Davies at 2016-10-06T13:55:25Z
Fix error handling in cram_index_load.

Eliminate repeated error handling code in cram_index_load, and ensure
that fd->index in freed and set to NULL (by calling cram_index_free)
when exiting on failure.  Setting fd->index = NULL is important as
otherwise repeated calls to cram_index_load using the same fd after an
error will apparently succeed even though the index has not been loaded
(this can happen, for example, if a caller is trying different names for
the index).

Ensure that realloc failures won't cause memory leaks, and that the
value of fd->index_sz is always correct as it's used by cram_index_free.

- - - - -
6ff0ca0d by John Marshall at 2016-10-11T08:33:03Z
Don't redefine thread_pool.h typedefs

Redefining a typedef is a C11 feature.  Fixes #426.

- - - - -
609120d2 by Petr Danecek at 2016-10-11T09:24:18Z
More thorough INFO cleaning to prevent issues like https://github.com/samtools/bcftools/issues/428

- - - - -
f7370bb3 by John Marshall at 2016-10-20T15:52:17Z
Activate auxf#values_java.cram test

Rename aux#aux_java.cram as per 9b1cb948e78a51b93473fc1dbd5e05a9c5fddf8b
and regenerate the resulting auxf#values_java.cram (with cramtools) as
auxf#values.sam has since been updated to include test cases for empty
'H' and 'Z' tags.

This test case contains 'H' aux tags, so use -Baux as cramtools converts
these to 'B' array tags.  Display any error message from compare_sam.pl.

Make compare_sam.pl -Baux also canonicalise empty 'H' and 'B' arrays.

Fix whitespace.

- - - - -
1bc5c562 by John Marshall at 2016-10-26T11:52:27Z
Ensure headers compile by themselves [minor]

...without depending on byproducts of previous inclusions.
Hat tip @MikkelSchubert (cf pysam-developers/pysam#362).

- - - - -
50db54b5 by Daniel Cooke at 2016-11-02T15:58:17Z
Suppress index date warning when hts_verbose == 0

Prevents a warning message being written to stderr from calls to  hts_idx_load2 when the target index is younger than the read file.
- - - - -
b9d85dc4 by Andreas Tille at 2016-11-04T11:50:25Z
Merge tag '1.3.2' into debian/unstable

HTSlib patch release 1.3.2: bin bug fix, RTLD_GLOBAL plugins

* Corrected bin calculation when converting directly from CRAM to BAM.
  Previously a small fraction of converted reads would fail Picard's
  validation with "bin field of BAM record does not equal value computed"
  (SAMtools issue #574).

* Plugins can now signal to HTSlib which of RTLD_LOCAL and RTLD_GLOBAL
  they wish to be opened with -- previously they were always RTLD_LOCAL.

- - - - -
97b73733 by Andreas Tille at 2016-11-04T11:56:41Z
New upstream version

- - - - -
e79fb0a5 by Andreas Tille at 2016-11-04T11:57:28Z
Add remark about pristine-tar commit

- - - - -
82819fea by Andreas Tille at 2016-11-04T12:05:01Z
Upload to unstable

- - - - -
6b9abbab by John Marshall at 2016-11-21T11:34:27Z
Add fixed/immobile hFILE buffers

As an optimisation to avoid double memory copies, allow hFILE backends
such as hFILE_mem to use the main hFILE buffer as their entire buffer
without any separate backing store.

Essentially, wherever in base hFILE we would like to alter fp->offset we
need code to handle !mobile buffers (but note that fixed buffers always
have at_eof set, so some fp->offset altering code is already irrelevant).

This implements that for reading; writing requires a little more work.

- - - - -
32984ca1 by John Marshall at 2016-11-21T12:07:13Z
Implement base64-encoded data: URLs

Either percent- or base64-decode the URL text, as appropriate.  We now
always malloc a (decoded) copy of the url argument, rather than somewhat
illegitimately holding on to the pointer provided, as previously.

Add percent- and base64-decoding functions.  When we are happy with their
signatures (perhaps they should have a way to malloc the output buffer
themselves?), we may wish to move their declarations to a public header.

Implements and fixes #422.

[NEWS]
* Data URLs ("data:,text") now follow the standard RFC 2397 format and may
  be base64-encoded (when written as "data:;base64,text") or may include
  percent-encoded characters.  HTSlib's previous over-simplified "data:text"
  format is no longer supported -- you will need to add an initial comma.

- - - - -
25204972 by John Marshall at 2016-11-21T12:09:54Z
Add JSON format and very basic recognition

We currently recognise only JSON that's a top-level object, so starts
with `{"field": }`, as that's reasonably distinctive and it suffices
to recognise GA4GH streaming's JSON redirector response format.

- - - - -
293a426d by John Marshall at 2016-11-21T17:46:49Z
Merge CRAM updates, sync with io_lib implementation (PR #361)

[NEWS]
* When writing CRAM, each auxiliary tag is placed in its own block.
  There is also a new bases_per_slice format option.

- - - - -
274ef7da by John Marshall at 2016-11-22T08:23:11Z
Add hopen() varargs; use them for HTTP headers in hfile_libcurl.c

Add a varargs interface to allow calling code to specify scheme-specific
extra arguments to hopen().  For example for a networking backend,
`hopen("http://foo", "r:", "httphdr:v", arrayofstrings, NULL)` allows
for the addition of a NULL-terminated array of headers.  We may later
add `hopen("http://foo", "r:", "httphdr", "Range: 10-20", NULL, NULL)`
and/or other ways to specify headers.

While hFILE_plugin has a version field, unfortunately hFILE_scheme_handler
does not.  So as to remain compatible with existing plugins, we've abused
the priority field to also encode a struct version.

- - - - -
39ca0891 by John Marshall at 2016-11-22T08:28:03Z
Add JSON tokeniser / lexer

Provides functions to read a single JSON token from a string or an hFILE.
User code can call this repeatedly within nested loops that correspond to
the expected structure of their JSON text to parse JSON input.

At some point we may wish to move the declarations from hts_internal.h
to the public API.  At present these functions return tokens in string
form only; before making this public, we should also interpret numeric
tokens -- see the TODO in the hts_json_token declaration.

- - - - -
7e6f35d0 by Rob Davies at 2016-11-22T09:31:38Z
Make cram_decode_estimate_sizes handle missing codecs.

Prevents a segfault if  hdr->codecs[DS_QS] or hdr->codecs[DS_RN] is NULL.

- - - - -
be8b4a50 by Rob Davies at 2016-11-22T12:15:16Z
Ensures rANS uncompressors don't read beyond end of input.

Changes mostly copied over from io_lib, but extended to catch cases where
there aren't enough input bytes to set up the decoder state.

- - - - -
707f60a9 by James Bonfield at 2016-11-28T13:57:27Z
Merge pull request #438 from daviesrob/cram_afl

Cram bug fixes
- - - - -
7aa1ef8c by John Marshall at 2016-11-30T11:27:03Z
Refactor incidental uses of kstream

Using kstream is unnecessary here as bgzf_getline() and kgetline()
are available.  In hts.c, use bgzf_open() rather than gzopen() to
further centralise our zlib usage in bgzf.c and reduce the parts
of the zlib API that we're using.

This vcf_hdr_read() fn_aux code has crashed since d49e3f63184107934e4d00ba1d45d03ac506b6d3
as the kstream as declared here has a different type from its definition
in hts.c.  Rewrite it using plain fopen(), as FAI files are not compressed
anyway (cf fai_read(); sam_hdr_read()'s similar fn_aux code).  This code
used gzopen() when it was first introduced in c70504d1c0c9946b7fa2134251bea3f37cf30a70
but only because vcf.c already had a gzread-based kstream.

- - - - -
71c03b88 by John Marshall at 2016-12-02T11:25:37Z
Remove htsFile's use of kstream

Use of kstream within HTSlib, especially non-static use, should be
avoided as it potentially conflicts with application code's kstream
usage, because (unlike other klib facilities) there is no name mangling
done based on the template argument type.

Also kstream does no error checking.

Use the plain hFILE (or BGZF) for reading uncompressed (respectively
compressed) text formats, just as we do when writing.

Rewrite hts_getline() accordingly, maintaining compatibility with the
existing signature (which returns the number of characters read or
negative on EOF or error; note that some callers control reading loops
with >=0 and some with >0, which seems buggy).  (TODO) This function
is overdue for rationalisation: removing the delimiter parameter and
rationalising the return value.

Account for the delimiter character in bgzf_getline(), and correct the
function's documentation.

In vcf_sweep.c, uncompressed text files no longer have an underlying
BGZF pointer, but bgzf_index_build_init() does nothing for uncompressed
files anyway, so not calling it changes nothing.

kstream delenda est.

- - - - -
941c0439 by John Marshall at 2016-12-02T16:23:12Z
Parse GA4GH Retrieval protocol and handle redirects

Accessing a file via the GA4GH Retrieval protocol returns a JSON ticket
containing a list of URLs whose contents are to be concatenated to form
the file to be retrieved.

Add a multipart hFILE pseudo-backend ("pseudo-" in that it is not
dispatched via a scheme lookup in hopen() itself) that handles reading
from the concatenation of a list of files.

Parse the GA4GH JSON text to initialise this special type of hFILE.

In hts_hopen(), if JSON format is detected, reopen by "redirecting" to
the multipart hFILE specified in the GA4GH ticket file.  With two hFILE
handles in play, care must be taken to close the multipart hFILE if
hts_hopen() subsequently fails, and to close the ticket hFILE only if
hts_hopen() succeeds.  Also ensure all error paths set errno.

- - - - -
54909303 by John Marshall at 2016-12-02T16:23:43Z
Fix hseek() already-read buffer reuse bug

Now that the "desired position is within our read buffer" hseek()
optimisation has been implemented, we need to ensure that the whole buffer
is up-to-date.  For mobile hFILEs, reading directly into the destination
invalidates the already-read portion of the hFILE's buffer.

- - - - -
cc42613c by John Marshall at 2016-12-05T11:15:37Z
Fix hFILE write-after-read bug

The "desired position is within our read buffer" hseek() optimisation
also breaks hread()-hseek()-hwrite() on files opened for update, as
hwrite() expects the hseek() to have left an empty direction-agnostic
buffer.  So we disable the optimisation on files opened for update.

Alternative fixes would be to have hflush() discard any read buffer and
do a backend->seek(begin-end, SEEK_CUR) (or rather the equivalent SEEK_SET)
to correct the backend file position, and require callers to use hflush()
when switching between reading and writing in either direction; or (hat
tip @daviesrob) have hputc()/hputs()/hwrite() check for a non-empty read
buffer, and discard it and do a corrective seek if necessary.

- - - - -
82d0f5bd by John Marshall at 2016-12-05T11:56:36Z
Merge JSON-based GA4GH redirection file access protocol (PR #439)

[NEWS]
* hts_open() supports the upcoming GA4GH redirecting retrieval protocol.

- - - - -
ee472589 by Shane McCarthy at 2016-12-05T13:23:11Z
allele trimming bugfix

* handle cases with missing data
* remove asserts to give more informative error messages

Fixes samtools/bcftools#213, samtools/bcftools#256, samtools/bcftools#322, samtools/bcftools#404

- - - - -
de69feb1 by John Marshall at 2016-12-05T17:14:55Z
Also handle uncompressed (raw) BCF for vcf_sweep

We failed to consider raw BCF files, which use htsFile::fp.bgzf but
have no_compression set.  Add a new htsFile::is_bgzf flag, which like
the other is_xyz flags is for internal use only!

- - - - -
c9e8f3a0 by Rob Davies at 2016-12-06T13:57:17Z
Prevents out-of-bounds array access on ref_id

Stops a possible segfault before cram_decode_slice is able to bail out
with 'Unable to fetch reference #...'.

- - - - -
9ea1bf98 by Rob Davies at 2016-12-06T14:00:04Z
Adds more CRAM decoder checks to prevent overrunning input buffers

Converts more itf8_get() calls to safe_itf8_get().
Adds a few extra tests to ensure running out of input is always spotted.
Error messages in affected functions get tests for hts_verbose >= 1.

- - - - -
1dd5c15d by Rob Davies at 2016-12-06T14:00:56Z
Fixes test for enough data when reading the preservation map.

Each case of the switch statement uses at least one character.  Plus two
for the key means at least three are needed to avoid running off the end
of the input.

- - - - -
36c5c47e by Rob Davies at 2016-12-06T14:20:43Z
Prevents wrap-around bugs in allocations.

Add a few checks to catch negative allocations due to invalid values
read from CRAM files.

m_data field in bam1_t becomes uint32_t, to prevent kroundup32()
from converting a large positive number into a negative one.

- - - - -
677d9b0a by Rob Davies at 2016-12-07T16:45:17Z
Make hts_expand handle realloc failure a bit better.

The hts_expand() and hts_expand0() macros don't return a value, and will
set (ptr) to NULL if realloc fails.  This could lead to segmentation faults
if callers don't check the value of (ptr).  As few if any do, and some
callers are in external packages, the best solution is to make the
hts_expand macros print an error message and call exit(1).

It's not ideal behaviour for library code, but fixing this in any other
way is just too hard.

This implementation removes the assumption that (m) and (n) are of type
int.  It should work with any integer type no bigger than size_t.

- - - - -
6ce295dc by John Marshall at 2016-12-07T17:33:37Z
Propagate error return codes from hts_getline()

In hts_getline(), bgzf_getline() already returns -2 => error, -1 => EOF,
>=0 => success.  Check herrno() to distinguish the two meanings when
kgetline() returns EOF.

For SAM and VCF files, sam_hdr_read() and vcf_hdr_read() should fail
on I/O errors, and sam_read1() and vcf_read() should propagate the new
distinct-from-EOF return code.

- - - - -
5dff82a8 by John Marshall at 2016-12-07T18:17:14Z
Use hFILE to read htsFile::fn_aux FAI file

Use hFILE so that samtools view -t etc can use URLs as well as local
files.  In sam_hdr_read(), avoid strtok() for reentrancy reasons.
Improve error handling.

- - - - -
47a29fd1 by John Marshall at 2016-12-08T11:39:17Z
Make htsfile work with (e.g. GA4GH) redirects

When viewing files, use hts_hopen() before querying the format category.
So e.g. for GA4GH JSON redirects, htsfile sees the real redirected format
rather than unknown_category/json.  Opening and closing the htsFile*
outwith view_sam()/view_vcf() simplifies the error handling.

Clarify view_sam()/view_vcf() error handling and print error messages.

- - - - -
6e897526 by Olivier Cinquin at 2016-12-10T21:10:57Z
Provide more informative error message when unknown SAM tag type is encountered.

- - - - -
4ef6c76f by James Bonfield at 2016-12-12T15:22:58Z
Fixed a rare renormalisation bug in the rANS codec.

The symbol frequencies need to sum to TOTFREQ (4096 currently) and are
rounded up/down accordingly.  The combination of integer rounding
means the renormalised frequences don't always total 4096 exactly, so
the remainder is added-to / subtracted-from the most frequent symbol.
In one particular data set this remainder was larger than the most
frequent symbol, causing it to become negative.

We now just do another round of renormalisation with slightly lower
products until we get it right.  It's not the fastest solution, but a
very rare event.

- - - - -
2c75e79b by John Marshall at 2016-12-13T14:18:34Z
Add "httphdr", "httphdr:l", and "va_list" hopen() options

Other plugins such as the upcoming hfile_gcs.c and an eventual
refactored hfile_s3.c will invoke hfile_libcurl.c's hopen() with any
user headers passed to their hopen() along with their own additional
headers.  The easiest way to do this is to provide a way to pass a
caller's va_list into hfile_libcurl.c's hopen(), ergo "va_list".

As a convenience, the pointer arguments to "httphdr" and "va_list" may
be NULL, in which case nothing happens.

- - - - -
7def9fdf by John Marshall at 2016-12-14T11:19:41Z
Constify extern tbx_conf_* preset variables

User code should not be able to change these and confuse other code!

- - - - -
468abb6c by Petr Danecek at 2016-12-14T15:11:32Z
Turn off autodetection when -s,-b,-e,-0,-c,-S, or -p are given

Simplify the code, removing conf_ptr, so that reheader_file() gets the
right conf when -s/etc are used and `-p bed -c %` combinations (in that
order) have a chance to work.

Resolves #428, updated version of PR #429.

- - - - -
63fe9d14 by John Marshall at 2016-12-15T16:37:12Z
Add support for Google Cloud Storage pseudo-URLs

Rewrite gs: pseudo-URLs to http/https URLs, adding an Authorization
header for Google Cloud Storage.

At present, the access token is simply taken from a bespoke environment
variable, $GCS_OAUTH_TOKEN, as per PR #390.  (TODO) This should be
replaced with pulling the information from shared configuration files
or obtaining it via other authentication infrastructure.

In configure.ac, this is before AC_SYS_LARGEFILE to keep the options
in ./configure --help in alphabetical order.

Passing the va_list to hopen(..., "va_list", ...) in gcs_vopen() is
painful as passing a va_list (as opposed to a va_list*) via .../va_arg
proved impossible, and we need a local va_list object rather than a
parameter with a possibly-decayed type in order to form a va_list pointer.
See the following for details:
http://stackoverflow.com/questions/4958384/what-is-the-format-of-the-x86-64-va-list-structure
http://stackoverflow.com/questions/8047362/is-gcc-mishandling-a-pointer-to-a-va-list-passed-to-a-function

- - - - -
8efbd1b6 by John Marshall at 2016-12-15T16:58:37Z
Add missing entries

- - - - -
afd9b56b by Nathan T. Weeks at 2016-12-16T12:15:01Z
Define _XOPEN_SOURCE so that PTHREAD_MUTEX_RECURSIVE is defined

[Some platforms require this to make PTHREAD_MUTEX_RECURSIVE available
(this is probably a bug on those platforms).

Fixes #420.  I have some reservations about the ODR implications of
doing this in a single translation unit rather than in config.h, but
it's not uncommon and is a simpler less-invasive change.  -- JM]

- - - - -
37c85e4b by Rob Davies at 2016-12-19T14:05:32Z
Prevent reads past the end of the VCF header.

Modify bcf_hdr_read to ensure the header is always NUL-terminated.

Fix returned length in bcf_hdr_parse_line when it reaches a NUL character.
Previously it would include the NUL for lines which did not finish with \n.
This could cause a later call to the same function to read beyond the end
of the input buffer.

- - - - -
cc6ca521 by Rob Davies at 2016-12-19T14:56:11Z
Make bcf_read1_core() return error if ks_resize fails, or on short read.

Check the return value of ks_resize() in bcf_read1_core, and return error
on failure.  Additionally, change bcf_read1_core to return failure
(instead of EOF) if bgzf_read() does not return the expected number of
bytes.

Make bcf_index() return failure if bcf_read1_core() fails, or if
either hts_idx_init() or bcf_init1() fail.

Allow NULL to be passed in to bcf_hdr_destroy() and bcf_destroy().

- - - - -
d3ac0db0 by Rob Davies at 2016-12-21T13:47:39Z
Add function bcf_record_check() to validate bcf records

Check contig and tag ids are within the valid range given in the header.

Check for sensible data types.

Ensure decoding doesn't run off the end of the shared and indiv strings.

It's currently only called by bcf_read(), but might want to be used in
other places (or possibly even exposed in the API).

- - - - -
6d927dfa by John Marshall at 2016-12-21T16:35:18Z
Change bam1_core_t::n_cigar from uint16_t to uint32_t

As various CIGAR-related loops and API functions use int, the practical
limit without further changes is 2^31-1 rather than 2^32-1.  As bam_init1()
uses calloc(), the unused field is initialised to 0 and we don't need to
bother re-zeroing it in sam_read1() et al.

Until the BAM format is extended to deal with it (samtools/hts-specs#40),
trying to write >64K CIGAR operations to a BAM file remains an error.

Implements and fixes #437.  (Update htslib/sam.h copyright notice for
changes in 2015 and 2016.  MinGW does not have EOVERFLOW.)

- - - - -
092b5aa4 by John Marshall at 2017-01-03T11:34:15Z
Happy New Year

- - - - -
5d114ebd by John Marshall at 2017-01-05T12:14:47Z
Alter bam1_t data layout so that CIGAR data is 32-bit aligned

Add extra padding NULs after qname so that the CIGAR data in memory is
aligned on a 32 bit boundary.  These NULs are included in l_qname so
that accessor macros are unchanged (as is code that accesses this data
without using the macros, sigh), and a new l_extranul field is added
that counts the extra NULs beyond the existing terminator NUL.

Add tests that read and write SAM, BAM, and CRAM.  Check sizeof(bam1_t),
taking care to allow for a possible extra 4 bytes of alignment padding
(e.g., on 64-bit platforms).

Fixes #400.

- - - - -
59972b70 by John Marshall at 2017-01-06T14:58:25Z
Fix test/compare_sam.pl -Baux on 32-bit platforms

The B:i mapping's ($_+4294967296)&4294967295 overflows to float
and clamps all positive values to 4294967295 on 32-bit platforms.
Rewrite it to avoid this.

Also remove $_= from all these internal map{}s, as we are interested
in the return values, not the side-effect of changing a non-lvalue.

- - - - -
c8475e82 by John Marshall at 2017-01-09T13:24:51Z
Support custom S3 endpoint host_base setting (in .s3cfg)

Allow the user to specify an endpoint other than s3.amazonaws.com.
This can be set using ~/.s3cfg's host_base setting (only; we ignore
host_bucket); when there's a blessed setting key for .aws/credentials,
we'll support it there too (perhaps endpoint_url; cf aws/aws-cli#1270).

Fixes (part of) #436.

- - - - -
271cbaa9 by James Bonfield at 2017-01-10T10:28:42Z
Added a kputd for %g specialisation.

VCF spends a lot of time doing ksprintf(s,"%g",val), especially on
VCFs with many samples.  The kputd is a dedicated function for
this one specific task that runs significantly quicker than the glibc
printf code.

On a 1000genomes test user time was previously 50.5s and after this
patch was 12.5s, to convert uncompressed BCF to uncompressed VCF. (But
still -Ou for uncomp BCF is 0.2s, so VCF generation is still slow).

I haven't look at VCF decoding (vs encoding), but likely there is room
there too.  uBCF->uBCF is ~0.2s; uVCF->uBCF is 21.7s.

- - - - -
da61ad59 by Shane McCarthy at 2017-01-10T11:45:16Z
kputd: set kstring len correctly for negative exponential values

- - - - -
19014511 by Shane McCarthy at 2017-01-10T18:09:51Z
vcfutils: replace exit() with return -1 in bcf_remove_allele_set

follow up to ee4725892ee3f29a813ba906d7c8b07928a3f352

* `bcf_remove_allele_set` to return `int` rather than `void`
* replace `exit(1)` in `bcf_remove_allele_set` with `return -1`
  along with cleanup on error
* wrap error messages in `hts_verbose>1`
* indicate `bcf_remove_alleles` is deprecated with HTS_DEPRECATED

- - - - -
515fd209 by Petr Danecek at 2017-01-11T12:35:33Z
propagate vcf errors from synced reader

Fixes #318

- - - - -
f3a3a80f by James Bonfield at 2017-01-11T17:56:36Z
Merge Google Cloud Storage support (PR #446)

- - - - -
1bd62b39 by John Marshall at 2017-01-12T09:23:17Z
Split S3 parts of hfile_libcurl.c into separate hfile_s3.c

Just copy the whole file for now; the next commit will remove
the duplicated portions and add the new file to the build.

- - - - -
acbd58eb by James Bonfield at 2017-01-12T10:18:28Z
Permit CRAM lossy_names mode to accept TLEN 0 or TLEN +/- 1.

The lack of TLEN value (eg "* 0 0" for the 3 SAM fields) or TLEN being
out by 1 causes the CRAM encoder to encode reads in "detached" state
so it can store TLEN verbatim.  This process broke the lossy name
encoding, so we're no longer quite so precise in our round-trip for
TLEN if we're asking for lossy read names.

Ideally these TLEN options would be a separate and orthogonal option
to the lossy_names option, but for now they are tied together for
simplicity.

- - - - -
be9736d7 by Olivier Cinquin at 2017-01-12T15:45:08Z
Undefine macro after it has served its purpose (no functional change).

- - - - -
28aa45f5 by John Marshall at 2017-01-12T16:50:14Z
Ensure max_off is -1 when end bin overflows

When the end coordinate is larger than the index's maximum coordinate,
the starting bin for computing max_off is somewhere on the row beyond
the index's bottom row.  Disable max_off rather than using a random
parent bin of the bin somewhere below the bottom row.  (The min_off
computation has a similar problem, but for min_off it's immaterial.)

Fixes #455.

- - - - -
3046bbf2 by dlaehnemann at 2017-01-13T13:48:45Z
Extended bcf_get_format*() documentation to emphasize difference between
ndst and the return value. Correspondingly extended the
bcf_get_gentotypes() example. See pull request #308.

- - - - -
038a61be by John Marshall at 2017-01-13T14:31:42Z
Fix whitespace, shorten help string [minor]

- - - - -
70f2cc88 by John Marshall at 2017-01-13T15:00:54Z
Move S3 support from hfile_libcurl.c to hfile_s3.c

Delete S3 code from hfile_libcurl.c and libcurl code from hfile_s3.c
and add hfile_s3.c to the build.

Similarly to the GCS support, the S3 code is now a separate plugin
that calls (generally libcurl's) hopen() with some extra HTTP headers.

- - - - -
481ba669 by John Marshall at 2017-01-13T15:02:16Z
Merge separate hfile_s3.c code

- - - - -
e7bf06fa by John Marshall at 2017-01-13T15:31:18Z
Don't FAILONERROR at high verbosity and other minor libcurl changes

Normally we set CURLOPT_FAILONERROR so HTTP errors automatically
become hopen() failures.  At high verbosity don't set it, so that we
have an opportunity to see the error response body received from the
server, if any.

Now that add_header() is only used once, write it out inline.
Simplify CURLOPT_RESUME_FROM_LARGE silliness.

- - - - -
2448fd56 by John Marshall at 2017-01-13T16:47:49Z
Add `htsfile -cv` raw view mode for unknown file formats

In particular (once hts_detect_format() no longer detects everything
vaguely textual as SAM; cf #200), this will aid debugging of network
failures as `htsfile -cvvvvvv URL` will display HTTP error response
body text in addition to the HTTP status code.

- - - - -
ec1d68e2 by John Marshall at 2017-01-13T17:24:51Z
Add bgzf_compression(); reuse check_header() in bgzf_is_bgzf()

Now that bgzf_open() is capable of opening plain-gzipped and
uncompressed files, there's no need to preflight with a bgzf_is_bgzf()
check.  In tbx.c, remove that in favour of a new bgzf_compression()
function that encapsulates the is_compressed/is_gzip flags (which
returns int rather than enum htsCompression so that bgzf.h continues
to not depend on hts.h).  Fixes #451.

Avoid problems in bgzf_close() when gz_stream has not been initialised
(it is used as a flag by bgzf_read_block(), so its initialisation can't
just be lifted to bgzf_read_init()).

Better to just remove bgzf_is_bgzf() entirely as using it means you
end up opening the file twice, but sadly enough third parties (mostly
derived from old tabix code) use it that we'll merely deprecate it
for now.  Have it reuse check_header() rather than use its own test.

- - - - -
6fb42cf5 by Andreas Tille at 2017-01-14T18:21:47Z
Add build-essential to autopkgtest

- - - - -
e9945fc7 by Andreas Tille at 2017-01-14T18:22:23Z
debhelper 10

- - - - -
f24104ed by Andreas Tille at 2017-01-14T18:23:48Z
d/watch: version=4

- - - - -
1b39d8d3 by Andreas Tille at 2017-01-14T21:11:18Z
Add @builddeps@ to autopkgtest

- - - - -
cf5950e6 by Steffen Moeller at 2017-01-22T23:44:34Z
Added reference to tabix

There is none for the HTSlib source package. Maybe one should also
add a reference for SAMtools.

- - - - -
536682b0 by Steffen Moeller at 2017-01-22T23:46:04Z
Merge branch 'debian/unstable' of ssh://anonscm.debian.org/git/debian-med/htslib into debian/unstable

- - - - -
40217538 by John Marshall at 2017-01-24T07:43:28Z
Add BZ2/LZMA to configure.ac and infrastructure to config.pc.in

We want to ensure most people can read all CRAM files and so configure
with both BZ2 and LZMA supported.  Check for the relevant libraries and
error out if they're not found.  Require --disable-bz2/--disable-lzma
to be explicitly given to build HTSlib without full CRAM support.

Have autoconf generate htslib.pc.tmp ("template") from htslib.pc.in, and
continue to delay expanding @includedir@, @libdir@, and @PACKAGE_VERSION@
until install-time.

Add static_ldflags and static_libs variables to htslib.pc, which don't
contain variable expansions etc so are usable via simple sed(1) extraction
as well as via pkg-config --variable.  An upcoming samtools change will
use these to link against libbz2/liblzma when they are required.
(TODO) These variables can also be used to communicate exactly when
-rdynamic/-ldl are needed.

- - - - -
a6842b5c by James Bonfield at 2017-01-24T08:04:47Z
Fixed lzma memory limit.

The hard-coded memory wasn't appropriate for lzma -9.  It now queries
the maximum amount necessary.

Also fixed the error message.

- - - - -
c010cc3c by James Bonfield at 2017-01-24T08:23:16Z
Document the --disable-lzma and --disable-bz2 configure options

- - - - -
2434a5ad by John Marshall at 2017-01-25T04:43:45Z
Add -rdynamic/-ldl to htslib.pc.in's static_* variables when needed

- - - - -
942f5d26 by Rob Davies at 2017-01-25T14:08:16Z
Stop test_cmd from merging stderr with its output.

- - - - -
86e32e9e by Rob Davies at 2017-01-25T14:10:01Z
Add hts_endian.h to convert little-endian bytes to/from native integers.

Also handles unaligned access.  This can be controlled by setting
HTS_ALLOW_UNALIGNED - if 0, unaligned access is disabled; if 1 it is allowed.
If unset, it will be enabled on intel x86-like platforms.

cram/cram_encode.c and cram/os.h are modified to use HTS_ALLOW_UNALIGNED
instead of ALLOW_UAC (which could not be disabled).

One of the macros HTS_LITTLE_ENDIAN or HTS_BIG_ENDIAN is defined on
platforms known to be little- or big- endian respectively.  HTS_ENDIAN_NEUTRAL
can be defined to disable this.  Code in hts_endian.h is intended to
be endian-neutral by default, so it should still work even where
endian-ness detection fails.

Includes unit tests and documentation.

- - - - -
70622cff by Rob Davies at 2017-01-25T14:12:51Z
Fix undefined behaviour and improve endian-related behaviour

Undefined behaviour includes:

* Illegal shifts
* Calls to memcpy(ptr, NULL, 0)
* Unaligned access

On Intel, HTS_ALLOW_UNALIGNED=0 needs to be defined to enable unaligned access
prevention.

Behaviour on big-endian systems is also changed, notably bam auxiliary
data is now stored in little-endian order instead of being byte-swapped.
This means code accessing aux data needs to byte swap integer and float
values on big-endian platforms.  To assist this, more bam_aux functions
are added.  These include bam_auxB_len(), bam_auxB2i() and bam_auxB2f()
for accessing array elements.

Justification for this:

* The data was previously stored inconsistently depending on if you
  read a sam, bam or cram file.  This meant big-endian platforms basically
  didn't work before this change.
* The values are often unaligned.  Some platforms (e.g. sparc, mips, armhf)
  need special handling for unaligned data, and it's easy to do the byte
  swapping at the same time.
* No time is wasted byte swapping aux values that aren't going to be
  accessed (although this is only an advantage for bam).

Undefined behaviour detected by compiling with -fsanitize=undefined and
running the test harnesses for htslib, samtools and bcftools.

Endian compatibility tested using netbsd on sparc (emulated with qemu).

- - - - -
0e63e293 by James Bonfield at 2017-01-27T12:22:12Z
Fixed dead-lock case in seek + multi-threaded decode.

If the bgzf_mt_reader() function was blocked (queue input is full)
dispatching the very last block of the file, and the main thread then
sends a SEEK command, the reader is woken up but does not process the
seek request.  Instead it wedges on the next dispatch, marking EOF,
instead.

The reversal of the blocks of code appears valid given we have one
read-ahead thread per file open each with its own queue of blocks to
decode.  Hence it should not be possible for the very first dispatch
call to block.

This hopefully resolves the issue #537, subject to further testing by
the submitter.

- - - - -
2916d7c6 by James Bonfield at 2017-01-30T11:19:13Z
Merge PR #395 (Add a kputd for %g specialisation).

- - - - -
255863e1 by James Bonfield at 2017-01-30T12:02:14Z
Adjusted prototype for kputd to be consistent with other kput functions.

The initial implementation copied the same argument order as ksprintf,
which it specialises, but the integer and character kput* functions
all use the opposite argument order.

- - - - -
f476b53f by James Bonfield at 2017-02-01T15:34:42Z
Fixed bgzf threading dead-lock when trying to reading beyond EOF.

This addresses the comment in
https://github.com/samtools/bcftools/issues/537#issuecomment-276194107
and also fixes (for multi-threading only) issue #461, making it skip
internal empty blocks, plus warnings about absent EOF blocks when
reading from a pipe.

- - - - -
2e0e1246 by Petr Danecek at 2017-02-02T12:33:07Z
BGZF skip empty blocks, do not give up reading prematurely

Resolves https://github.com/samtools/htslib/issues/45

- - - - -
99660e02 by Rob Davies at 2017-02-02T12:39:32Z
Report missing BGZF EOF blocks

Make bgzf_read_block report missing EOF blocks while reading in
single-threaded mode. The multi-threaded code is updated so that it uses
the same field in struct BGZF to record having seen an EOF block.  We
also set a bit when we discover the EOF block is missing so as to
avoid repeated warnings on the subject.

The bizarre 2-bit booleans in struct BGZF are modified to be 1 bit long
so we can repurpose some of the bits.  We can get away with this as we
are changing the ABI.

- - - - -
d8d0323b by James Bonfield at 2017-02-02T15:30:27Z
Fixes for dealing with raw gzip streams.

This fixes issue samtools#632 via John's suggested change as well as
fixing multi-threading on non-bgzf streams too.  (It just bails back
out to single threaded.)

- - - - -
0452e7ee by James Bonfield at 2017-02-06T16:42:54Z
Further thread pool fixes.

This replaces PR #462 with a revised method suggested by @daviesrob.
The synchronisation between main and reader is now done within thread
pool using half-shutdown of the process-queue via reference counting.

Also fixed some memory leaks during shutdowns.

- - - - -
12d6e020 by James Bonfield at 2017-02-06T17:32:08Z
Merge commit PR #459 (Fix undefined behaviour and improve endian-related behaviour)

- - - - -
79f38a35 by Andreas (Kusalananda) Kähäri at 2017-02-07T08:56:09Z
Include <sys/select.h>

The <sys/select.h> header is needed for calling select() on some
non-Linux Unices.

- - - - -
190a7225 by Rob Davies at 2017-02-07T15:25:56Z
Merge "Further thread pool fixes" branch (PR #465)

Add copyright boilerplate.  Fixte.  FiUpdate .gitignore.
Stop test/thrash_threads5 from outputting possibly binary data to terminals.
Make test/thrash_threads6 complain if its input is not big enough.

- - - - -
fd178d3a by James Bonfield at 2017-02-08T10:03:47Z
Fixed bgzf_gzip_compress when given uncompressable data.

There was an assumption in this code that deflate had no data from the
previous block to flush, hence (by design) the output block is always
large enough for the input data supplied.  Changing from Z_NO_FLUSH to
Z_PARTIAL_FLUSH removes this assumption.  Fixes #270.

- - - - -
ff71a319 by James Bonfield at 2017-02-08T16:31:02Z
Fixed data corruption when switching to threads part way through a stream.

The bug was caused by over-zealous memory leak removal in the previous
bgzf commits.  The thread thrashing code now does this to exercise the bug.

- - - - -
3dc96c56 by Rob Davies at 2017-02-08T16:40:10Z
Allow fai index to be in a different location to the indexed file.

Convert bgzf_index_load, bgzf_index_dump, fai_load, fai_build, fai_read and
fai_save to use hfile instead of stdio.  This allows access to remote
indexes via http, ftp etc. and the plugin infrastructure.

Add new API interfaces fai_build3() and fai_load3() which take separate
names for the fai and gzi index files.  If an index file name is
passed in as NULL, it is derived from the name of the file being indexed
as with fai_build() and fai_load().  As a result, fai_build() and
fai_load() are replaced by simple wrappers that call fai_build3() and
fai_load3() with NULL index file names.

The download_and_open() function which made local copies of remote index
files is removed.  The side effect of creating local files was not
desirable in some cases, and download_and_open() suffered from race
conditions if two processes tried to access the same index simultaneously.
It was also not called for .gzi files.  fai_build3() and fai_load3() can
directly access remote files for both indexed, .fai and .gzi files.

This removes fai_save() as a public symbol in libhts.so, but this
function does not appear in a public header file so is not part of
the official HTSlib API or ABI.

- - - - -
2e1c10b8 by Rob Davies at 2017-02-08T16:40:10Z
Add bgzf_index_load_hfile and bgzf_index_dump_hfile

These read and write .gzi indexes in the same way as bgzf_index_load and
bgzf_index_dump, but allow writing to an existing file handle.

bgzf_index_load and bgzf_index_dump are modified to simply open an hFILE
and call the new functions.

Doxygen documentation is added for the new functions, and improved for the
existing ones.

- - - - -
84a89da6 by Rob Davies at 2017-02-08T16:40:11Z
Add bgzf unit tests

Directly exercise the functions exposed by htslib/bgzf.h

- - - - -
44f77d57 by Andreas Tille at 2017-02-09T13:30:46Z
Install header files from cram dir in -dev package since these are used in libseqlib

- - - - -
e4ea80c3 by Andreas Tille at 2017-02-09T14:22:18Z
Fix pkg-config

- - - - -
54feaaa2 by Andreas Tille at 2017-02-10T07:46:42Z
Upload to experimental

- - - - -
9e2fbac8 by James Bonfield at 2017-02-13T17:26:37Z
Remove MacOS X dead-lock in bgzf threading.

This is due to an erroneous double unlock of command_m mutex, present
since 8ca01755.  This exposed itself once we added additional
multi-threading in samtools, due to the additional MT checks add by
Rob (thanks).

Fixes samtools/samtools#639

- - - - -
ee2bcaa6 by Rob Davies at 2017-02-14T15:03:53Z
Add more error checks when building indexes

Includes check for chromosome positions bigger than the index
can handle.

- - - - -
7bfa010f by Rob Davies at 2017-02-14T15:03:53Z
Add tabix functional tests

- - - - -
95b1034e by Rob Davies at 2017-02-15T15:07:38Z
Remove abort on corrupt aux data, pass errors up instead

Remove abort from skip_aux() when it finds an aux record of unknown type.
Also add some tests for records that are longer then the space available
for them.  Make it return NULL if anything is wrong.

Make bam_aux_get() and bam_aux_del() report an error if the aux data is
founbd to be corrupt.  bam_aux_get() now sets errno so callers can tell
the difference between broken aux data and tags that are not present.
errno is used so that the API for bam_aux_get() is essentially unchanged -
callers that do not check will still work, albeit while silently ignoring
the problem.

- - - - -
8dd26ff8 by Rob Davies at 2017-02-15T16:03:54Z
Make sam_format1() fail it it finds an invalid aux type

Add a missing catch for invalid aux types.

If hts_verbose is set it will now print a message if it fails due to
invalid aux data.

Also fix a FIXME - check if there is enough aux data left the correct way.

Thanks to Chris Saunders at Illumina for reporting problems with
sam_format1() and skip_aux() on broken records.

- - - - -
57c71055 by Rob Davies at 2017-02-17T14:07:44Z
Deal with bzip2 pkg-config module not being available everywhere

- - - - -
dc54b0df by James Bonfield at 2017-02-20T16:46:42Z
Fix to iterators when the query overlaps zero bins.

Such a query now sets the iterator to be 'finished'.
This fixes samtools/samtools#637.

- - - - -
7317e4fb by Rob Davies at 2017-02-21T17:09:40Z
Fix endianness, integer type and memory safety issues in index metadata

Index metadata stored in an hts_idx_t struct was byte-swapped for
tbi indicies but not for csi.  Make them both the same by leaving the
data in little-endian order, and make tbx_index_load2() access the
data in an endian-neutral manner.

Functions hts_idx_get_meta() and hts_idx_set_meta() are changed to
treat the metadata length as an unsigned value (uint32_t) to match
the integer type that is used to store the length in the hts_idx_t
struct.  hts_idx_set_meta() is also changed to return an int
so callers can detect if it ran out of memory.

hts_idx_load_local() and hts_idx_set_meta() now ensure that the stored
meta-data is always followed by a NUL.  This is to prevent tbx_index_load2()
from running off the end of the metadata if there is an unterminated
string in the list of sequence names.

get_tid() is made to return -1 if it fails to add a hash entry.  A few
other calls to malloc are also made safer.

- - - - -
0c326318 by Shane McCarthy at 2017-02-23T18:34:20Z
bcf_index_build3: return -4 on index write failure as per sam_index_build3

- - - - -
e37732bc by Rob Davies at 2017-03-01T11:53:35Z
Add more libraries to static_LIBS, where required

Add -lz (unconditionally), -lcurl and $CRYPTO_LIBS (if required) to
static_LIBS.  This makes the static linking information in htslib.pc
correct for the various combinations of libcurl, S3 and enabling or
disabling plug-ins.

- - - - -
9558a729 by Rob Davies at 2017-03-01T14:09:34Z
Create a Makefile fragment with static linking flags

htslib_static.mk is built from htslib.pc.tmp, and sets two variables:

  HTSLIB_static_LIBS    : -l flags needed by programs linking libhts.a
  HTSLIB_static_LDFLAGS : other flags needed to link with libhts.a

Rules to make this file are added to the Makefile and htslib.mk.

The file makes it easier for packages building against the source tree
to get the correct linker flags.

- - - - -
06edd7e4 by Rob Davies at 2017-03-01T15:33:29Z
Remove explicit -lz -lm link flags; add to LIBS instead

Add -lz and -lm to $(LIBS) via $(htslib_default_libs).  This means they
don't need to be explicitly need to be on link lines.  LIBS will be updated
by config.mk if configure is used.  configure.ac is updated to ensure
configure supplies -lz and -lm.

MacOS libhts.dylib may now get -lm on its link line, which is not strictly
needed, but harmless (libm.dylib is a symbolic link to libSystem.dylib).

- - - - -
1c3c77aa by James Bonfield at 2017-03-02T12:20:38Z
Mention thread pool changes.

- - - - -
1b5652cf by Rob Davies at 2017-03-03T09:47:09Z
Merge "Provide more informative error message for unknown tag type (PR#444)"

Brought up to date with current sam_parse1().

- - - - -
a2d7f075 by James Bonfield at 2017-03-06T13:35:33Z
Merged PR#463 (Configure BZ2/LZMA and make htslib.pc more accurate).

- - - - -
811d30f6 by James Bonfield at 2017-03-06T15:31:21Z
News updates (from historical commits).

- - - - -
ae0bec6a by Rob Davies at 2017-03-06T18:01:39Z
Prevent segfault due to VCFs with very large IDX tag values

- - - - -
442ae75b by Anders Kaplan at 2017-03-07T15:20:06Z
Fixed a few non-portable constructs.

- - - - -
30b9f501 by Anders Kaplan at 2017-03-07T15:20:06Z
Added a description of how the thread pool test program works.

- - - - -
c6d1c343 by Anders Kaplan at 2017-03-07T15:20:06Z
Added missing #include in test_view.c.

- - - - -
6a44f49c by Petr Danecek at 2017-03-09T20:57:45Z
Reworked synced VCF/BCF reading

The original version of allele matching was too simplistic and handled
records with duplicate positions by removing ("collapsing") them by
type - if requested so via the bcf_srs_t.collapse option. This behavior
is changed by this commit, records will never be discarded by the
reader. Instead, compatible records will be paired intelligently
based on user criteria. If duplicate positions are not desired, they
must be removed by the caller or by an external tool, such as `bcftools norm`.

Notes:

- the usage of the old "collapse" is discouraged, but backward-compatible
  at the API level, with behavior changed as described above

- bcf_sr_set_opt() call was introduced to avoid the need to access reader's
  internal structures directly. Ideally at some point it should became
  an opaque struct, to minimize the need for API/ABI breaking changes in
  future

- the allele pairing is still not perfect, it does not try to resolve
  ambiguous representations of identical alleles, such as multiallelic
  indels AA>A vs AAA>AA,AAA>A

- the new implementation slows the reader by 40-20%. This was measured
  with `bcftools isec`, which has no significant overhead: the original
  version took 113 seconds to process, the new reader took 157 seconds
  (two sites-only BCFs with 35M sites). In real life this never is a
  bottleneck, for example the difference in speed was not measurable when
  176 gBCFs with 64M sites are merged.

- - - - -
efbded02 by Rob Davies at 2017-03-09T21:13:50Z
Default to check for libcurl; don't fail on no -lcrypto for s3 check

Default to checking for libcurl as we want to use it if it's there.

Only warn if --enable-s3=check and -lcrypto can't be found.  This is so
we get libcurl even if S3 can't be built (and S3 was not specifically
required).

- - - - -
b1a193c4 by Rob Davies at 2017-03-09T21:13:50Z
Add libbz2 and liblzma to default libraries in the Makefile

Always turn on bzip2 and lzma support unless it is explicitly disabled
by configure options.  Includes NEWS item.

- - - - -
03452dda by Rob Davies at 2017-03-09T21:13:51Z
Add sections on dependencies and making configure to the INSTALL file.

- - - - -
078069bf by Rob Davies at 2017-03-09T21:13:51Z
Travis updates.

bz2 and lzma packages are requested for building against.
MacOS X has been added to the os matrix.  This is primarily to test
the linux vs osx package dependencies.
We test both with and without using autoconf.

- - - - -
5c695a17 by Petr Danecek at 2017-03-10T11:40:41Z
Prevent infinite loop on empty indexes

Resolves https://github.com/samtools/htslib/issues/478

- - - - -
e7203ae9 by Anders Kaplan at 2017-03-12T16:36:49Z
Fixed a bug in bcf_fmt_array with garbage allele frequencies. The special value 0x7f800001 (NaN), used to indicate a missing value, was changed to 0x7fc00001 (also NaN) when the function le_to_float returned.

This problem might be specific to MSVC and the calling convention where the return value is passed on the FPU register stack. In that process the value seems to be converted from signaling NaN to quiet NaN.

The fix is to keep the value as uint32 while testing for special values.

Reported as a bug in samtools:
https://github.com/samtools/hts-specs/issues/145

- - - - -
150334ca by Anders Kaplan at 2017-03-12T16:36:49Z
Handling of float-type missing values in VCF files, part 2 of 3: bcf_get_info_values.

Added a test case and removed the special handling of the case info->len == 1, because all saved was one value on the stack and it didn't get the missing float values right.

- - - - -
dd0c31d4 by Anders Kaplan at 2017-03-12T16:36:49Z
Handling of float-type missing values in VCF files, part 3 of 3: bcf_get_format_values.

Added a test case in test-vcf-api.c.

- - - - -
cf469f65 by Anders Kaplan at 2017-03-12T16:36:49Z
Replaced memcpy with the more descriptive bcf_float_set_missing in vcf_parse.

- - - - -
55e64043 by Anders Kaplan at 2017-03-12T17:13:22Z
oops

- - - - -
b7bb8c70 by Anders Kaplan at 2017-03-12T18:06:09Z
Reordered test cases so that the output file ends up the way it's supposed to.

- - - - -
ca1423d6 by Rob Davies at 2017-03-13T14:46:24Z
Fix over-specified location of htslib.pc.tmp

- - - - -
d2d9c76a by jenniferliddle at 2017-03-13T14:48:31Z
Release 1.4: summary

- - - - -
879855be by jenniferliddle at 2017-03-13T14:52:49Z
Merge version number bump and NEWS file from master

- - - - -
3a31ddfc by James Blachly at 2017-03-13T19:51:14Z
Add check for libsocket to autoconf (needed to compile on illumos/Solaris)

- - - - -
9db2e71e by Anders Kaplan at 2017-03-14T19:12:16Z
Made the changes slightly smaller through the use of bcf_float_set.

- - - - -
0172fe3e by Andreas Tille at 2017-03-21T16:40:46Z
Merge tag '1.4' into debian/unstable

Relase 1.4 (13 March 2017)

* Incompatible changes: several functions and data types have been changed
  in this release, and the shared library soversion has been bumped to 2.

  - bam_pileup1_t has an additional field (which holds user data)
  - bam1_core_t has been modified to allow for >64K CIGAR operations
    and (along with bam1_t) so that CIGAR entries are aligned in memory
  - hopen() has vararg arguments for setting URL scheme-dependent options
  - the various tbx_conf_* presets are now const
  - auxiliary fields in bam1_t are now always stored in little-endian byte
    order (previously this depended on if you read a bam, sam or cram file)
  - index metadata (accessible via hts_idx_get_meta()) is now always
    stored in little-endian byte order (previously this depended on if
    the index was in tbi or csi format)
  - bam_aux2i() now returns an int64_t value
  - fai_load() will no longer save local copies of remote fasta indexes
  - hts_idx_get_meta() now takes a uint32_t * for l_meta (was int32_t *)

* HTSlib now links against libbz2 and liblzma by default.  To remove these
  dependencies, run configure with options --disable-bz2 and --disable-lzma,
  but note that this may make some CRAM files produced elsewhere unreadable.

* Added a thread pool interface and replaced the bgzf multi-threading
  code to use this pool.  BAM and CRAM decoding is now multi-threaded
  too, using the pool to automatically balance the number of threads
  between decode, encode and any data processing jobs.

* New errmod_cal(), probaln_glocal(), sam_cap_mapq(), and sam_prob_realn()
  functions, previously internal to SAMtools, have been added to HTSlib.

* Files can now be accessed via Google Cloud Storage using gs: URLs, when
  HTSlib is configured to use libcurl for network file access rather than
  the included basic knetfile networking.

* S3 file access now also supports the "host_base" setting in the
  $HOME/.s3cfg configuration file.

* Data URLs ("data:,text") now follow the standard RFC 2397 format and may
  be base64-encoded (when written as "data:;base64,text") or may include
  percent-encoded characters.  HTSlib's previous over-simplified "data:text"
  format is no longer supported -- you will need to add an initial comma.

* When plugins are enabled, S3 support is now provided by a separate
  hfile_s3 plugin rather than by hfile_libcurl itself as previously.
  When --enable-libcurl is used, by default both GCS and S3 support
  and plugins will also be built; they can be individually disabled
  via --disable-gcs and --disable-s3.

* The iRODS file access plugin has been moved to a separate repository.
  Configure no longer has a --with-irods option; instead build the plugin
  found at <https://github.com/samtools/htslib-plugins>.

* APIs to portably read and write (possibly unaligned) data in little-endian
  byte order have been added.

* New functions bam_auxB_len(), bam_auxB2i() and bam_auxB2f() have been
  added to make accessing array-type auxiliary data easier.  bam_aux2i()
  can now return the full range of values that can be stored in an integer
  tag (including unsigned 32 bit tags).  bam_aux2f() will return the value
  of integer tags (as a double) as well as floating-point ones.  All of
  the bam_aux2 and bam_auxB2 functions will set errno if the requested
  conversion is not valid.

* New functions fai_load3() and fai_build3() allow fasta indexes to be
  stored in a different location to the indexed fasta file.

* New functions bgzf_index_dump_hfile() and bgzf_index_load_hfile()
  allow bgzf index files (.gzi) to be written to / read from an existing
  hFILE handle.

* hts_idx_push() will report when trying to add a range to an index that
  is beyond the limits that the given index can handle.  This means trying
  to index chromosomes longer than 2^29 bases with a .bai or .tbi index
  will report an error instead of apparantly working but creating an invalid
  index entry.

* VCF formatting is now approximately 4x faster.  (Whether this is
  noticable depends on what was creating the VCF.)

* CRAM lossy_names mode now works with TLEN of 0 or TLEN within +/- 1
  of the computed value.  Note in these situations TLEN will be
  generated / fixed during CRAM decode.

* CRAM now supports bzip2 and lzma codecs.  Within htslib these are
  disabled by default, but can be enabled by specifying "use_bzip2" or
  "use_lzma" in an hts_opt_add() call or via the mode string of the
  hts_open_format() function.

- - - - -
94d97111 by Andreas Tille at 2017-03-21T16:42:12Z
New upstream version

- - - - -
e742362d by Andreas Tille at 2017-03-21T16:43:53Z
Update patches

- - - - -
46d5eee3 by Andreas Tille at 2017-03-21T18:03:50Z
Drop autoconf from Build-Depends, add libbz2-dev and liblzma-dev to Build-Depends

- - - - -
28af9f8c by Andreas Tille at 2017-03-21T20:21:20Z
Remove binary files from test results

- - - - -
f711b016 by Andreas Tille at 2017-03-21T20:25:58Z
Remove symbols file since I'm too lazy to maintain these - volunteers welcome for a final upload to instable

- - - - -
d6ae67da by Andreas Tille at 2017-03-21T20:30:16Z
hange library name to match soname

- - - - -
e880f572 by Isaac Turner at 2017-03-22T12:25:41Z
Use -lpthread instead of -pthread when only linking; Fixes clang warning

- - - - -
e2a1f187 by James Bonfield at 2017-03-22T12:40:29Z
Merge PR #255 into develop.

Also updated the pthreads link options for the new executables that
had appeared since this PR was made.

Conflicts:
	Makefile

- - - - -
e2a55784 by James Bonfield at 2017-03-22T12:43:09Z
Added tabix to the "make test" dependency.

- - - - -
10bc1a71 by James Bonfield at 2017-03-23T11:31:54Z
Tweak to tabix long opts to remove duplicate 'h'.

Fixes #482

The change is invisible to the command line as "tabix -h" previously
(and now) reported help simply because it had no filename listed, not
because it interpreted -h as help.

- - - - -
a17fc73f by James Bonfield at 2017-03-23T15:31:21Z
Fixes to support a stricter C99 plus POSIX.1-2001 environment.

Removed __restrict.
Include strings.h for strcasecmp.
Replace alloca with static sized arrays or malloc/realloc where appropriate.

- - - - -
65267c74 by James Bonfield at 2017-03-23T17:42:21Z
Replaces the itf8* macro abominations with static inline functions.

This also fixes various issues with shifts on signed data.

- - - - -
b30fc6a6 by James Bonfield at 2017-03-23T18:13:07Z
Tidy up non-standard C99 issues.

- - - - -
10081c4d by Shane McCarthy at 2017-03-24T12:01:05Z
add test-bcf-sr to .gitignore [minor]

- - - - -
9986f5de by James Bonfield at 2017-04-04T13:32:27Z
Fixed kputd printing of NaN.

- - - - -
052a16ee by Rob Davies at 2017-04-04T13:55:56Z
Merge check for libsocket (commit 3a31ddf from PR#488)

- - - - -
487f3670 by Rob Davies at 2017-04-04T13:57:35Z
Merge Merge NaN issue fix (PR#485)

- - - - -
ff8f83a2 by Rob Davies at 2017-04-04T13:59:08Z
Merge C99 compliance improvements (PR#498)

- - - - -
61631149 by Anders Kaplan at 2017-04-04T20:36:21Z
First draft of a new logging mechanism.

- - - - -
e1848ae6 by Anders Kaplan at 2017-04-04T20:36:22Z
Added helper macros log_error, log_error, etc.

- - - - -
ddcc3ab5 by Anders Kaplan at 2017-04-04T20:36:22Z
Switched to new logging functions in bgzf.c. NOTE suspicious buggy bracket placements on lines 1716 and 1752 (before edit) changed.

- - - - -
c54f23ca by Anders Kaplan at 2017-04-04T20:36:22Z
Added tests for logging.

- - - - -
c6d03a1d by Anders Kaplan at 2017-04-04T20:57:51Z
Adapted to gcc/unix environment.

- - - - -
af89ccb2 by Petr Danecek at 2017-04-05T10:18:27Z
Free file name strings upon successful faidx loading

- - - - -
9317ca65 by Rob Davies at 2017-04-07T15:10:53Z
Ensure B aux tags have a comma after the type charater

Anything other than a comma caused the number of elements to be miscounted
leading to a heap overflow in some cases.

Fixes #501

- - - - -
1acad458 by James Bonfield at 2017-04-10T08:43:19Z
Fix buffer overrun on corrupted data.

Fixes #507

(Note this bug has nothing to do with the c->l_extranul code.  It is
simply that b->l_data (block_len-32) is smaller than c->l_qname.)

- - - - -
1e1bf970 by James Bonfield at 2017-04-11T10:56:21Z
Harden rANS decoding against malicious or random input.

Fixes #510, fixes #511, fixes #512 and fixes #513.
Also tidied up a couple minor memory leaks when recovering from errors.

- - - - -
b1b28169 by James Bonfield at 2017-04-11T11:56:00Z
CRAM check for embedded references being too small for slice.

- - - - -
2270547b by James Bonfield at 2017-04-12T11:50:05Z
Further rANS protections.

We ensure we can't read from unintialised parts of the D.R matrices.

- - - - -
0f60c356 by Rob Davies at 2017-04-12T14:18:08Z
Ensure BGZF block length is longer than the block header

Prevents an attempt to read a negative number of bytes if the length
reported in the BC subfield is less than BLOCK_HEADER_LENGTH.

- - - - -
42bfe70c by Rob Davies at 2017-04-12T16:17:27Z
Check number of symbols is sensible in Huffman codec stream

Avoid negative values and possible arithmetic overflow.  Also remove
a malloc(0) in the case where ncodes == 0.

- - - - -
08c16ef0 by Anders Kaplan at 2017-04-13T20:11:20Z
Moved the declaration of hts_log & co to the public API.
Appended the hts_ prefix to the log_<level> macros.
Switched to standard C99 __VA_ARGS__ instead of ##__VA_ARGS__.
Added HTS_FORMAT checking to the hts_log function.
Changed the initialization of hts_verbose to use the literal HTS_LOG_INFO instead of magic number 3.
Centralized newline handling to the hts_log function.
Re-purposed the test-logging.pl script to test for message consistency.

- - - - -
6a50863e by Petr Danecek at 2017-04-18T08:58:53Z
Make bcf_set_variant_type() aware of breakends

- - - - -
0d3d32f5 by Anders Kaplan at 2017-04-20T18:39:02Z
Restored the default log level (3) but now interpreted as HTS_LOG_WARNING.
Added HTS_LOG_TRACE log level.

- - - - -
ea59199b by Anders Kaplan at 2017-04-20T18:39:02Z
Adjusted log levels in bgzf.c.

- - - - -
49fe80f9 by Anders Kaplan at 2017-04-20T19:12:30Z
Added trace log level to get_severity_tag.

- - - - -
bb159012 by Rob Davies at 2017-04-26T11:21:30Z
Improve end of name detection in fai_read

Use !isspace() instead of isgraph() to find out where the first column
ends.  This better matches how the file is written by fai_save, and
works better on files with unusual sequence names - notably ones that
include utf-8 encoded ligatures.

Also prevent the parser from going beyond the end of the read line.  This
could only happen on broken fai files where the last line has an
unterminated sequence name.

Fixes #521

- - - - -
cd26782a by Rob Davies at 2017-04-26T11:21:30Z
Improve fai_fetch() and faidx_fetch_seq()

Fix bug where fai_fetch() did not return an error if a non-existent
sequence name was followed by a region specification.  Fixes #522.

Improve choice of data type for some variables.

Check malloc() return values.

Replace atoi() with strtol() for better handling of out of range numbers.

Report if an out of range number means part of the sequence cannot be
accessed.  This can only happen with very long sequences on platforms
with 32 bit longs.  Shorter sequences will bring the number back into
range by clamping the value to the sequence length.

Fix some region parsing oddities, so for example 'seq:-4' is interpreted
as 'seq bases 1 to 4' and not 'seq bases minus 4 to end'.

Pulled retrieval code that was duplicated in fai_fetch() and
faidx_fetch_seq() into a new static function.

- - - - -
74cd2222 by Rob Davies at 2017-04-26T11:21:30Z
Prevent fai_retrieve() from reading one character too many

Fix an "unexpected end of file" when trying to retrieve one past
the last base of the last sequence.

- - - - -
8e1be4ae by Rob Davies at 2017-04-26T11:21:30Z
Change isspace to isspace_c

- - - - -
ae4f3df1 by Rob Davies at 2017-04-26T15:07:32Z
Ensure bcf_hdr_read() is reading vcf or bcf

bcf_hdr_read() assumes it can use bgzf to read the file.  This may not
be true if the file is plain text but was not detected as VCF.  To
ensure that bgzf_read() will work, check that the format is BCF before
attempting to read it.

- - - - -
02e2be3c by Andreas Tille at 2017-05-02T13:41:59Z
Include htsfile binary into tabix package since its used by manta package

- - - - -
903f2b3b by Andreas Tille at 2017-05-02T13:42:35Z
Upload to experimental

- - - - -
079c7b81 by Rob Davies at 2017-05-03T15:58:32Z
Limit query name to 251 bytes to prevent l_qname overflow

Since 5d114ebd8e9b80622769b8e575c5a9359cd51273, up to three extra NULs
have been added after the query name in the bam1_t data so that the
following CIGAR records are 32 bit aligned.  These extra NULs are included
in l_qname so that bam_get_cigar() etc. work without being changed.  As
l_qname is uint8_t, it will overflow if the query name is longer than
251 bytes.

Add checks to bam_read1(), sam_parse1() and bam_construct_seq() to
make them fail if they encounter a query name that is too long.  This
means some files will become unreadable.  Such files are likely to
be artificially-generated data (for example made by
https://github.com/sbg/Mitty) rather than from real sequencing experiments
which usually generate names much shorter than this limit.

TODO: This restriction can be removed on the next ABI change, by
making l_qname 16 bit.

Thanks to @jmarshall for help with this, including work that can be used
for the future ABI breaking fix.

- - - - -
d6cab913 by Andreas Tille at 2017-05-04T08:57:51Z
add the cram_to_bam function to the public interface of htslib because it is used by code in Sambamba

- - - - -
8e96d8ae by Petr Danecek at 2017-05-05T14:24:26Z
Bug fix in VCF/BCF header parsing

When VCF header contained a line that could not be parsed, all samples
were discarded. For example, in a well-formed VCF all header lines must
be of the form ##key=value and if they are not, the parsing would stop
prematurely and consequently all genotype fields dropped:

    ##fileformat=VCFv4.2
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##contig=<ID=1>
    ##Incorrect comment line
    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample
    1      1   .  G   T   .    .      .    GT     0/1

This commit skip malformed header lines, printing a warning.

On non-recoverable errors, bcf_hdr_parse now returns negative
value which is now checked by all callers.

- - - - -
affe5c00 by Rob Davies at 2017-05-05T14:24:26Z
Ensure bcf_hdr_t is destroyed when bcf_hdr_parse() fails

- - - - -
94accfce by Rob Davies at 2017-05-05T14:24:26Z
Ensure bcf_hdr_parse_line() always finishes at the next newline or NUL

Terminate the search for an '=' character if a newline is found.  This
stops bcf_hdr_parse_line() from incorrectly running into the next line
when '=' is missing.

On structured header lines, drop all characters between the closing '>'
and the next '\n' or NUL, not just spaces.  This ensures *len is set to
the correct line length, and the next call starts parsing at the
correct location.  If any non-whitespace characters are seen, a warning
is printed.

(Minor) Remove the need for an intermediate buffer when printing the
'Could not parse the header line' error message.

- - - - -
b8c6f7dd by Rob Davies at 2017-05-05T14:24:26Z
Simplify bcf_hdr_parse()

Use a do ... while loop to allow the header parser to restart when if
finds a malformed line.

- - - - -
72a7c2ae by James Bonfield at 2017-05-07T22:25:37Z
Trivial SAM header sanitising.

Although the htslib code checks things like headers starting with @,
nul termination, etc, not all BAM implementations are as friendly.

This code sanitises on input, also fixing samtools/samtools#661 in the
process.

- - - - -
0639ea17 by Rob Davies at 2017-05-08T09:03:27Z
Improve sam_hdr_sanitise()

Remove some variables that are no longer needed.
Make it count lines.
Ensure adding '\n' doesn't overflow h->l_text.
Simplify logic for adding the '\n' a bit.

- - - - -
89ea70ad by James Bonfield at 2017-05-08T09:53:31Z
NEWS update. (#529)

* NEWS update.

(Ideally merge on day on release, editing date if required.)

* Update NEWS

Added 'sanitise headers' #509

- - - - -
6c068335 by jenniferliddle at 2017-05-08T10:07:52Z
Release 1.4.1: summary

- - - - -
77712880 by jenniferliddle at 2017-05-08T10:14:55Z
Merge version number bump and NEWS file from master

- - - - -
0cad436b by Rob Davies at 2017-05-09T10:59:15Z
Add htslib/hts_log.h to htslib_vars.mk and htslib.mk

- - - - -
06642765 by Rob Davies at 2017-05-09T15:36:47Z
Adjusted log levels for a few messages

- - - - -
e66ab048 by Rob Davies at 2017-05-09T15:37:18Z
Make hts_log() preserve errno.

Remove a few places where errno was saved locally that are no longer
needed.  This unwraps a few calls to free(), but while free() might
currently set errno under some conditions, it's fairly unlikely.

- - - - -
3e85cce7 by Rob Davies at 2017-05-10T11:11:18Z
Merge logging mechanism (PR #499)

- - - - -
a58cd854 by James Bonfield at 2017-05-11T09:21:21Z
Bug fix to CRAM index creation (reported by Brent Pedersen).

References with exactly one record aligned againts them had the wrong
"span" value in the index.

- - - - -
499245ee by James Bonfield at 2017-05-15T13:37:28Z
Added a PTHREAD_MUTEX_RECURSIVE check.

This was always the intention for the C99 tidyup, but sadly forgotten
about causing compilation errors of thread_pool.c for Centos/RHEL 5
and SUSE/SLES 11 systems.

Fixes samtools/bcftools#610
Fixes samtools/bcftools#611

- - - - -
389c4f3b by Anders Kaplan at 2017-05-20T14:25:44Z
Replaced fprintf(stderr, ...) calls with hts_log_<level> calls.
Fixed a few cases of irregular indentation in the process.
Made the logging test script ignore multi-line messages.

- - - - -
11896602 by Anders Kaplan at 2017-05-20T14:25:44Z
Replaced fprintf(stderr, ...) calls with hts_log_<level> calls.

- - - - -
dc45f188 by Anders Kaplan at 2017-05-20T14:25:44Z
Cosmetic improvements to the error messages.

- - - - -
403353fc by Anders Kaplan at 2017-05-20T14:43:56Z
Fixed format string errors.

- - - - -
a7df7651 by James Bonfield at 2017-05-22T09:08:47Z
Fix buffer overrun in vcf_format.

Fixes #537

Also "fixed" the parser to prevent this from happening in the first
place, but poorly as there is no way to return an error from
bcf_hdr_register_hrec.

The following minimal VCF triggered this crash. Note "dype".

    ##fileformat=VCFv4.1
    ##INFO=<ID=S,Number=1,dype=String,Description="blah">
    ##INFO=<ID=I,Number=1,Type=Integer,Description="blah">
    ##contig=<ID=chr1,length=249250621,assembly=b37>
    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
    chr1	10327	.	T	C	.	.	S=foo;I=12345
    chr1	10327	.	T	C	.	.	S=foo;I=12345

- - - - -
40518058 by James Bonfield at 2017-05-23T13:11:15Z
Also added Number= checking for INFO headers.

This (along with the previously checked Type) is declared as mandatory
in VCF for Info.  It is unknown if it should be mandatory for FORMAT,
but the specification implies otherwise so we only validate on INFO.

- - - - -
15b2f215 by James Bonfield at 2017-05-24T12:54:41Z
Remove read-past-buffer when printing error message.

Fixes #538

- - - - -
c4deecd2 by James Bonfield at 2017-05-24T12:54:41Z
Additional argument protection in sam_hdr_write and bcf_hdr_write.

Fixes #541.

- - - - -
31650a41 by Rob Davies at 2017-05-24T12:54:41Z
Prevent out of bounds read on BAM_CIGAR_STR by extending to 16 bytes

Fixes #546 by extending the array in bam_cigar_opchr() so it can't
get past the end.  Attempting to convert an invalid CIGAR operation
now returns '?'.

- - - - -
d8e5ec4b by Rob Davies at 2017-05-24T15:09:23Z
Fix clang warning

Stop clang from needlessly complaining:
vcf.c:507:39: warning: adding 'int' to a string does not append to the string
by swapping to code that's a bit more obvious as to how it works.

- - - - -
54bfd95f by Rob Davies at 2017-05-25T13:06:09Z
Fix length calculation in bam_read1 and possible memory leak

Add missing c->l_extranul when checking how much space is needed for
the name, cigar, seq and qual data.  Fixes #547 where the block length
in the bam record was short by a few bytes but the check failed because
the extra NULs after the name were not being counted.  This could cause
a buffer over-run by up to three bytes (but only when l_data was close
to a power of two).

The new version does not use bam_get_aux() to avoid some dubious
pointer arithmetic (especially when d->data has not been allocated).

Also ensure that memory is not leaked and b->m_data is left unchanged
if the realloc on b->data fails for some reason.

- - - - -
f6138671 by Rob Davies at 2017-05-30T08:09:37Z
Limit subexp and gamma decoders to integer cram_external_type

The decoders currently assume that they are writing to an integer array,
and neither htslib nor htsjdk appear to write anything other than
integers using these codecs.  So it should be reasonable to restrict
them to use with only integer data streams.

Fixes #548 (Stack buffer overflows in cram_gamma_decode and
cram_subexp_decode)

- - - - -
df519163 by James Bonfield at 2017-05-30T09:01:06Z
Additional error checking for invalid CRAM ref_id.

Negative reference IDs no longer cause reading from invalid
addresses.  Fixes #549.

It's possible the fix to cram_decode_slice could be
"if (cr->ref_id < 0 || ...", but I have alarm bells ringing regarding
the (broken, but sadly happens) case of unmapped data with CIGAR
strings.  (It shouldn't happen and I don't think *can* in current
CRAM, but it's one of the round-trip issues we want to fix at some
point).  Hence it's safer to check vs -1 and add extra checks to
cram_decode_seq to prevent the memory accesses there.

Validated at -1, -2 and -large.

- - - - -
fe2627f4 by Rob Davies at 2017-06-01T16:06:52Z
Merge Logging improvements 2 (PR #543) into develop

Includes minor update to bring in fix from commit 15b2f21

- - - - -
8d201be9 by Anders Kaplan at 2017-06-04T08:57:07Z
Added dependency from knetfile.c to hts_log.h, with some cascading. Added missing dependencies to the Makefile, including the missing definition of htslib_hts_h.

- - - - -
f0fc3eea by Anders Kaplan at 2017-06-04T08:57:07Z
Replaced fprintf(stderr, ...) calls with hts_log_<level> calls.

- - - - -
03c4be2b by Anders Kaplan at 2017-06-04T08:57:07Z
Replaced fprintf(stderr, ...) calls with hts_log_<level> calls.

- - - - -
a0fe63f5 by John Marshall at 2017-06-05T15:01:01Z
Use BGZF* in cram_index_build() instead of zfp*

Check the return values from cram_index_build_multiref() and from
writing to the index file.

The (never publicly exposed) zfio.c routines are now unused in HTSlib
so can be removed.  Fixes part of #552.

- - - - -
5b9361df by James Bonfield at 2017-06-06T13:06:19Z
Removed the (>20 year old!) unnecessary vlen.[ch] code.

This was used by mFILE in the mfprintf function, but this function is
not needed by CRAM support.  (The origin of these functions are from
Staden io_lib, where mfprintf is used as part of the "Experiment File"
format which does use this code.)

- - - - -
979571bb by Rob Davies at 2017-06-09T13:38:54Z
Allow out to be NULL in cram_huffman_decode_char()

Various decoders were updated in commit 1a050b4 to allow this, but
cram_huffman_decode_char() was missed out.  Add a test, as it's possible
to use the huffman codec for data streams that may pass out == NULL (in
particular DS_BA, DS_BB, DS_IN and DS_SC).

Fixes #554 (third case, the others were fixed by f613867).

- - - - -
e8b1b66a by Andreas Tille at 2017-06-17T22:10:48Z
Merge tag '1.4.1' into debian/unstable

Noteworthy changes in release 1.4.1  (8th May 2017)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is primarily a security bug fix update.

* Fixed SECURITY issue with buffer overruns with malicious data. (#514).

* S3 support for non Amazon AWS endpoints. (#506)

* Support for variant breakpoints in bcftools. (#516)

* Improved handling of BCF NaNs. (#485)

* Compilation / portability improvements. (#255, #423, #498, #488)

* Miscellaneous bug fixes (#482, #521, #522, #523, #524).

* Sanitise headers (#509)

- - - - -
68e8b570 by Andreas Tille at 2017-06-17T22:16:58Z
New upstream version

- - - - -
04bb939d by Andreas Tille at 2017-06-17T22:21:02Z
Upload to unstable

- - - - -
ab6cc2a1 by Andreas Tille at 2017-06-18T12:19:15Z
Create symlinks only for arch=indep target when target dir exists

- - - - -
b8c54133 by Andreas Tille at 2017-06-18T12:19:42Z
Upload to unstable

- - - - -
7fd21f5b by James Bonfield at 2017-06-18T15:32:58Z
Adds HTS_OPT_BLOCK_SIZE support for SAM/BAM/CRAM.

Allow the size of the internal hFILE buffer to be changed.  This may
be useful for fine tuning I/O speed on filesystems that don't report
an optimal block size.

It's possible to shrink the buffer, but only if the buffer does not
contain data that would be lost after the resize.  If it does, a
warning will be printed and the buffer will be left at the existing
size.

- - - - -
3c0302d4 by Rob Davies at 2017-06-18T16:58:20Z
Use htslib_hts_log_h definition from htslib_vars.mk

- - - - -
f1b95e10 by Rob Davies at 2017-06-18T17:00:04Z
Merge branch 'pr551' into pr551_merge

- - - - -
b28aa8d0 by Rob Davies at 2017-06-18T20:50:20Z
Remove autom4te.cache during `make distclean`

- - - - -
38728612 by James Bonfield at 2017-06-19T14:05:02Z
Updated NEWS file for Solstice release.

- - - - -
fba943d6 by Rob Davies at 2017-06-20T08:41:38Z
Stop threaded bgzf_read_block from using stale values at EOF

j->block_address is not filled out at EOF, so should
not be used.  Also initialize comp_len to zero, in
sace anything tries to use that.

Fixes samtools/samtools#687 (samtools 1.4.1 bai indices created
with multiple threads can lose reads)

- - - - -
49fdfbda by Valeriu Ohan at 2017-06-20T12:40:28Z
Relase 1.5: Solstice

- - - - -
485cda1a by Sascha Steinbiss at 2017-07-17T16:54:57Z
fix FTBFS

- - - - -
942393ba by Graham Inggs at 2017-07-17T22:54:22Z
Fix FTBFS on s390x and sparc64

- - - - -
7b161023 by Graham Inggs at 2017-07-19T08:40:28Z
Dereference symlinks to fix autopkgtest

- - - - -
ffd8e110 by Andreas Tille at 2017-07-19T19:52:19Z
Merge tag '1.5' into debian/unstable

Noteworthy changes in release 1.5 (21st June 2017)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Added a new logging API: hts_log(), along with hts_log_error(),
  hts_log_warn() etc. convenience macros.  Thanks go to Anders Kaplan
  for the implementation. (#499, #543, #551)

* Added a new file I/O option "block_size" (HTS_OPT_BLOCK_SIZE) to
  alter the hFILE buffer size.

* Fixed various bugs, including compilation issues samtools/bcftools#610,
  samtools/bcftools#611 and robustness to corrupted data #537, #538,
  #541, #546, #548, #549, #554.

- - - - -
ada885d3 by Andreas Tille at 2017-07-19T19:53:58Z
New upstream version

- - - - -
78d31d74 by Steffen Moeller at 2017-07-20T10:40:02Z
Added ref to OMICtools

- - - - -
8ce1126f by Steffen Moeller at 2017-07-20T10:41:54Z
Merge branch 'debian/unstable' of ssh://anonscm.debian.org/git/debian-med/htslib into debian/unstable

- - - - -
d3b4f02c by Andreas Tille at 2017-07-20T11:10:32Z
Standards-Version: 4.0.0 (no changes needed)

- - - - -
e335d9b3 by Andreas Tille at 2017-07-20T11:11:16Z
do not parse d/changelog explicitly

- - - - -
bee0b75b by Andreas Tille at 2017-07-20T11:11:16Z
Refresh patches

- - - - -
3ccc8bd1 by Andreas Tille at 2017-07-20T11:11:36Z
hardening=+all

- - - - -
76541927 by Andreas Tille at 2017-07-20T14:36:14Z
install htslib*.mk files that are used in bcftools

- - - - -
cd809515 by Matthias Klumpp at 2017-08-04T02:43:41Z
libhts-dev: Depend on missing liblzma-dev

- - - - -
ff8e09b1 by Matthias Klumpp at 2017-08-04T02:53:54Z
Wrap and sort

- - - - -
f45af8db by Matthias Klumpp at 2017-08-04T03:04:56Z
Finalize changelog for 1.5-1

- - - - -
08915d93 by Andreas Tille at 2017-10-20T08:08:37Z
Fix OMICS entry

- - - - -
49237dbf by Andreas Tille at 2017-11-08T07:23:59Z
Apply patch provided by Graham Inggs <ginggs at debian.org> to fix FTBFS on armel armhf and ppc64el of bcftools

- - - - -
ca1cd8bf by Andreas Tille at 2017-11-08T07:24:23Z
Standards-Version: 4.1.1

- - - - -
0071935d by Diane Trout at 2017-11-09T22:34:58Z
Revert "Remove symbols"

This reverts deleting the symbols file in commit f711b01663ee28387dd1c704f4cccd9854f03764.
There were changes to the d/changelog and d/rules that I left

This is work needed for properly fixing: Bug #879886

- - - - -
517d1b80 by Diane Trout at 2017-11-09T22:34:58Z
update symbols file names to current SOVERSION name

- - - - -
0a866457 by Diane Trout at 2017-11-09T22:34:58Z
Remove missing symbols

vcf_parse was a public symbol and they bumped SOVERSION
fai_read wasn't public

- - - - -
eaa8b369 by Diane Trout at 2017-11-09T22:34:58Z
Update symbols file for 1.4.1

A number of symbols were added, the symbols that were removed were not
in the "public api"

- - - - -
e6cfeed2 by Diane Trout at 2017-11-09T22:34:58Z
Remove ks_destroy, ks_getuntil2, ks_init

The aren't in the library, but do appear in the public API but as some
form of C macro.

- - - - -
7b50dafa by Diane Trout at 2017-11-09T22:34:58Z
Update symbols file for 1.5

- - - - -
b481c8a9 by Diane Trout at 2017-11-09T22:34:58Z
Update symbols files for 1.5 cram symbols

This removals are problematic.

htslib's upstream considers cram internal, but a few applications
like SeqLib used them. Those projects upstreams included them via
embedding htslib, but Debian unbundled and linked against the system
htslib

See #879886 for discussion

- - - - -
e73fa42f by Diane Trout at 2017-11-09T22:36:23Z
Update changelog with work on symbols file

- - - - -
2823ef83 by Andreas Tille at 2017-11-10T11:37:17Z
Import Debian changes 1.5-2

htslib (1.5-2) unstable; urgency=medium

  * Apply patch provided by Graham Inggs <ginggs at debian.org> to fix FTBFS
    on armel armhf and ppc64el of bcftools
    Closes: #877670
  * Standards-Version: 4.1.1

- - - - -
6940ebf3 by Mattia Rizzolo at 2017-11-10T11:39:15Z
Merge tag 'debian/1.5-2' into debian/unstable

Debian release 1.5-2

- - - - -
5540e882 by Graham Inggs at 2017-11-10T12:18:42Z
Update 877670.patch as it was applied upstream

- - - - -
64747721 by Diane Trout at 2017-11-10T18:35:43Z
Indicate closes Bug 879886

- - - - -
7d3f63ed by Diane Trout at 2017-11-10T19:59:50Z
release to unstable

- - - - -
020aa907 by Andreas Tille at 2017-11-20T21:42:01Z
Fix build on i386

- - - - -
fe81062d by Andreas Tille at 2017-11-20T21:42:14Z
Upload to unstable

- - - - -
1bfbb49b by Graham Inggs at 2017-11-21T09:54:57Z
Extend i386 fix to hurd and kfreebsd

- - - - -
092b0c94 by Andreas Tille at 2017-12-10T07:23:37Z
Fix FTCBFS: Let dh_auto_configure pass --host to ./configure

- - - - -
5e812e23 by Andreas Tille at 2017-12-10T07:41:34Z
Procide cram headers in separate package libhts-private-dev.install

- - - - -
6863c083 by Andreas Tille at 2017-12-10T07:44:04Z
Deactivate fix_pkg-config.patch

- - - - -
6fd23217 by Andreas Tille at 2017-12-10T07:46:50Z
Close according bug

- - - - -
4ff741ab by Andreas Tille at 2017-12-10T07:52:24Z
Add versioned breaks

- - - - -
030a92f7 by Andreas Tille at 2017-12-10T08:00:23Z
Standards-Version: 4.1.2

- - - - -
64f8109a by Andreas Tille at 2017-12-10T08:19:22Z
Test is using private headers

- - - - -
3a01f348 by Andreas Tille at 2017-12-10T08:40:14Z
Upload to unstable

- - - - -
5fedaeab by Andreas Tille at 2017-12-11T12:40:11Z
Do not install htslib*.mk fragments

- - - - -
30646ba4 by Andreas Tille at 2017-12-11T12:58:28Z
Upload to unstable after verifying that bcftools does not need these htslib*.mk fragments

- - - - -
29f04454 by Andreas Tille at 2017-12-11T13:15:41Z
Start using default Debian Med repository layout

- - - - -
a7cd55f9 by Andreas Tille at 2017-12-11T13:15:59Z
New upstream version 1.5
- - - - -
05e80dfe by Andreas Tille at 2017-12-11T13:16:09Z
Update upstream source from tag 'upstream/1.5'

Update to upstream version '1.5'
with Debian dir 9e9526211f346df38aadf27fc7e6501e9cdda14e
- - - - -
a2ccfa86 by Andreas Tille at 2017-12-11T13:17:29Z
Switch to default Debian Med repository layout

- - - - -
0b1efffb by Andreas Tille at 2017-12-11T13:18:07Z
New upstream version 1.6
- - - - -
57b25277 by Andreas Tille at 2017-12-11T13:18:17Z
Update upstream source from tag 'upstream/1.6'

Update to upstream version '1.6'
with Debian dir c44c4d262d49c67ceb10a485df8636b06c710141
- - - - -
0876f84b by Andreas Tille at 2017-12-11T13:19:06Z
New upstream version

- - - - -
86a38d10 by Andreas Tille at 2017-12-11T13:24:52Z
Remove debian/patches/literal_version.patch since upstream stops running make to get the version number

- - - - -
02927ed5 by Andreas Tille at 2017-12-11T13:25:56Z
Refresh patches

- - - - -
ca55c40f by Andreas Tille at 2017-12-11T13:48:34Z
Update symbols file

- - - - -
9b683f1f by Andreas Tille at 2017-12-11T13:58:03Z
Upload to experimental

- - - - -
47bccf55 by Andreas Tille at 2017-12-11T16:43:45Z
Fix version of new symbols

- - - - -
f2767224 by Andreas Tille at 2017-12-14T13:23:27Z
Remove links that were provided to enable build of previous versions of bcftools but are not needed any more

- - - - -
73f4b47b by Andreas Tille at 2017-12-14T13:24:42Z
Remove tabix dependency which was provided to enable build of a previous version of bcftools

- - - - -
f696fc20 by Andreas Tille at 2017-12-14T14:36:52Z
Make sure no remainings of build time test will occur in test data package

- - - - -
5d5de29e by Andreas Tille at 2017-12-14T14:44:41Z
Add missing override, upload to unstable

- - - - -
b839a102 by Graham Inggs at 2017-12-20T14:18:33Z
Ship a copy of cram headers in htslib-test instead of using symlinks

- - - - -
fb82f631 by Graham Inggs at 2017-12-20T14:53:04Z
Fix autopkgtest on i386 with GCC 7

- - - - -
6886c3ac by Graham Inggs at 2017-12-20T14:54:25Z
Prepare for upload to unstable

- - - - -
c722ac7c by Graham Inggs at 2017-12-21T15:10:38Z
Ship new files win/rand.c and win/rand.h, upload to unstable

- - - - -
0b801687 by Steffen Moeller at 2018-02-12T17:58:29Z
New upstream version 1.7
- - - - -
f24ec88d by Steffen Moeller at 2018-02-12T17:58:39Z
Update upstream source from tag 'upstream/1.7'

Update to upstream version '1.7'
with Debian dir 097ef2f34f7037d087319628240fa195f51e3b66
- - - - -
bea984e7 by Steffen Moeller at 2018-02-12T18:13:11Z
Adjusting to new 1.7 upstream version.

- - - - -
ede72ab7 by Andreas Tille at 2018-02-14T08:48:49Z
Document how to test the package

- - - - -
0eb2a7cd by Andreas Tille at 2018-02-14T12:13:09Z
Update symbols file

- - - - -
60702c25 by Andreas Tille at 2018-02-14T12:16:38Z
Upload to unstable

- - - - -
753ec1d6 by Andreas Tille at 2018-04-27T15:16:32Z
New upstream version 1.8
- - - - -
4d1f4ec9 by Andreas Tille at 2018-04-27T15:16:41Z
Update upstream source from tag 'upstream/1.8'

Update to upstream version '1.8'
with Debian dir fd77806197dae2253f07cb0fa5b061574f6f52e7
- - - - -
b4c09aca by Andreas Tille at 2018-04-27T15:16:41Z
New upstream version

- - - - -
7ca5391a by Andreas Tille at 2018-04-27T15:16:43Z
Point Vcs fields to salsa.debian.org

- - - - -
603fe868 by Andreas Tille at 2018-04-27T15:16:43Z
Standards-Version: 4.1.4

- - - - -
95fc615f by Andreas Tille at 2018-04-27T15:16:43Z
debhelper 11

- - - - -
3a89a18e by Andreas Tille at 2018-04-27T15:18:21Z
Update patches

- - - - -
169f79ef by Andreas Tille at 2018-04-27T15:25:18Z
version 1.8 drops symbol cram_nop_decode_reset without bumping soversion

- - - - -
8aaefae6 by Andreas Tille at 2018-04-27T15:30:41Z
version 1.8 drops symbol cram_nop_decode_reset without bumping soversion

- - - - -
3ca6d381 by Andreas Tille at 2018-05-03T09:40:11Z
Fix symbols file.
    Note: version 1.8 drops symbol cram_nop_decode_reset without bumping
    soversion but this should be no issue according to upstream
    (https://github.com/samtools/htslib/issues/695)

- - - - -
f2a8dc19 by Andreas Tille at 2018-05-03T09:55:11Z
Remove deactivated patch

- - - - -
3c62e7eb by Andreas Tille at 2018-05-03T09:55:22Z
Upload to experimental

- - - - -
2e7d4cfc by Steffen Moeller at 2018-07-28T17:47:52Z
Preparing for new upstream version 1.9

- - - - -
ad17dd5c by Steffen Moeller at 2018-07-28T17:48:44Z
New upstream version 1.9
- - - - -
faf51b7b by Steffen Moeller at 2018-07-28T17:51:06Z
Update upstream source from tag 'upstream/1.9'

Update to upstream version '1.9'
with Debian dir e5674753a1e14b05e781454e4ba822d6ede0217f

- - - - -
a29f7bad by Steffen Moeller at 2018-07-28T18:11:25Z
Cleaning up for upload of 1.9

- - - - -
e9e50291 by Andreas Tille at 2018-09-10T19:12:54Z
Depends: zlib1g-dev

- - - - -
063f02cf by Andreas Tille at 2018-09-10T19:43:35Z
Remove unused lintian overrides

- - - - -
5524cabb by Andreas Tille at 2018-09-10T19:44:50Z
Standards-Version: 4.2.1

- - - - -
3b196980 by Andreas Tille at 2018-09-12T19:25:42Z
Upload to unstable

- - - - -
01db69dc by Michael R. Crusoe at 2018-10-14T16:53:10Z
drop *i386

- - - - -
024c3225 by Michael R. Crusoe at 2018-10-15T08:42:25Z
re-add mipsel

- - - - -
ec847be4 by Andreas Tille at 2018-12-02T06:54:06Z
Add Breaks: python-pysam (<< 0.15~), python3-pysam (<< 0.15~)

- - - - -
47eee749 by Andreas Tille at 2018-12-02T07:00:45Z
Upload to unstable

- - - - -
bc33ca5a by Michael R. Crusoe at 2018-12-05T04:46:14Z
Add libdeflate for DEFLATE based (de)compression and its crc32 implementation.

- - - - -
81e1ecba by Michael R. Crusoe at 2018-12-05T04:56:22Z
Re-add mipsel to tabix & libhts-dev as well. (Closes: #915404)

- - - - -
147dd806 by Michael R. Crusoe at 2018-12-06T06:59:28Z
Partial Revert "Add libdeflate for DEFLATE based (de)compression and its crc32 implementation."

This reverts commit bc33ca5af9a047ad3e27022953cccedbfcd912ef.

- - - - -
103be525 by Michael R. Crusoe at 2018-12-11T02:13:22Z
Try dropping to gcc-7 on i386*

- - - - -


10 changed files:

- − .gitattributes
- − .gitignore
- − .travis.yml
- INSTALL
- LICENSE
- Makefile
- + NEWS
- − README.md
- + bcf_sr_sort.c
- + bcf_sr_sort.h


The diff was not included because it is too large.


View it on GitLab: https://salsa.debian.org/med-team/htslib/compare/0141e47ab9ca92cd2aeb6d573440d689bbfdf327...103be52531bf6c4303e463ecfff0d5a201f6708c

-- 
View it on GitLab: https://salsa.debian.org/med-team/htslib/compare/0141e47ab9ca92cd2aeb6d573440d689bbfdf327...103be52531bf6c4303e463ecfff0d5a201f6708c
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20181211/0d16b0e6/attachment-0001.html>


More information about the debian-med-commit mailing list