[med-svn] [Git][med-team/htscodecs][master] 9 commits: New upstream version 1.6.5
Étienne Mollier (@emollier)
gitlab at salsa.debian.org
Sat Feb 21 10:55:17 GMT 2026
Étienne Mollier pushed to branch master at Debian Med / htscodecs
Commits:
7063c1a4 by Étienne Mollier at 2026-02-21T11:13:49+01:00
New upstream version 1.6.5
- - - - -
e3f38fdf by Étienne Mollier at 2026-02-21T11:14:06+01:00
Update upstream source from tag 'upstream/1.6.5'
Update to upstream version '1.6.5'
with Debian dir 991f6220c4624db393515c0f5398f048687c9159
- - - - -
d8ab3411 by Étienne Mollier at 2026-02-21T11:15:55+01:00
d/copyright: update years where relevant.
- - - - -
d3672f19 by Étienne Mollier at 2026-02-21T11:41:16+01:00
d/control: drop redundant Priority: optional.
- - - - -
36b3dfb7 by Étienne Mollier at 2026-02-21T11:41:34+01:00
d/control: drop redundant Rules-Requires-Root: no.
- - - - -
b7d2f94a by Étienne Mollier at 2026-02-21T11:42:03+01:00
d/control: declare compliance to standards version 4.7.3.
- - - - -
7be99633 by Étienne Mollier at 2026-02-21T11:43:05+01:00
d/watch: convert to v5 Github template.
- - - - -
2a132e99 by Étienne Mollier at 2026-02-21T11:45:29+01:00
link_with_libm: update dep3 header.
The issue looks pretty Debian specific and forwarding not needed.
- - - - -
dce7a913 by Étienne Mollier at 2026-02-21T11:54:41+01:00
d/changelog: ready for upload to unstable.
- - - - -
19 changed files:
- .cirrus.yml
- .github/workflows/unix-build.yml
- NEWS.md
- configure.ac
- debian/changelog
- debian/control
- debian/copyright
- debian/patches/link_with_libm
- debian/watch
- htscodecs/arith_dynamic.c
- htscodecs/htscodecs.h
- htscodecs/rANS_static32x16pr_neon.c
- htscodecs/rANS_static4x16pr.c
- htscodecs/tokenise_name3.c
- javascript/index.js
- + tests/dat/qsimd
- + tests/dat/r4x16/qsimd.69
- tests/fqzcomp.test
- tests/rans4x8.test
Changes:
=====================================
.cirrus.yml
=====================================
@@ -77,10 +77,10 @@ rocky_task:
task:
name: freebsd
freebsd_instance:
- image_family: freebsd-14-0
+ image_family: freebsd-14-3
pkginstall_script:
- - pkg update -f
+ - IGNORE_OSVERSION=yes pkg update -f
- pkg install -y gcc autoconf automake libdeflate libtool
compile_script:
=====================================
.github/workflows/unix-build.yml
=====================================
@@ -9,11 +9,6 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest]
-
- defaults:
- run:
- shell: bash {0}
-
steps:
- name: Checkout
@@ -32,6 +27,8 @@ jobs:
- name: Ubuntu-latest using gcc with sanitizers
if: runner.os == 'Linux'
run: |
+ sudo apt-get update
+ sudo apt-get install -y --no-install-suggests --no-install-recommends libbz2-dev
autoreconf -i
./configure CC="gcc -fsanitize=address,undefined"
=====================================
NEWS.md
=====================================
@@ -1,3 +1,64 @@
+Release 1.6.5: 9th December 2025
+--------------------------------
+
+This is a bug fix release.
+
+Bug fixes
+
+- Add cpuid checks for XSAVE, OSXSAVE and AVX. Corrects auto-detection of
+ SIMD version on machines that have but disable specific CPU features.
+ (PR #140 Robert Davies, fixes samtools/samtools#2256 Ran Fan).
+
+- Avoid undefined behaviour by replacing literal copies with memcpy
+ (PR #142 James Bonfield, fixes Issue #141 Vasudeva Easwara Sarma)
+
+
+Release 1.6.4: 9th July 2025
+----------------------------
+
+This is primarily a bug fix release.
+
+Fixes
+
+- Fixed a minor thread data race in first call of the rans4x16 codec.
+
+- Protect against SIMD rANS encoding on small or highly compressible data
+ sets. This could fail when combined with the RLE method where one
+ sub-component was very small (<32 bytes) and the other was large.
+
+Changes
+
+- UUID4 based read names are now compressed better with the name tokeniser.
+ This also slightly improves name compression of mixed data sets.
+
+
+Release 1.6.3: 22nd May 2025
+----------------------------
+
+A tiny bug fix to 1.6.2 to fix a memory leak in rans_compress_to_4x16,
+detected by htslib's CI system. (#136)
+
+
+Release 1.6.2: 22nd May 2025
+----------------------------
+
+This release has minor bug fixes and some continuous integration test
+improvements.
+
+Bug fixes
+
+- Improved check of out_size in rans4x16 and arithmetic coder, plus
+ better memory freeing on error. (#127]
+
+- [CI] Bump FreeBSD release used to 14.2 and Ubuntu to 24.04 (#129 jkb,
+ #133 from John Marshall).
+
+- [CI] Remove GitHub workflow shell override. (#133, John Marshall)
+
+- [JavaScript] Correct arithmetic coder, (Commit 9d3127d, with thanks to
+ Colin Diesh)
+
+
Release 1.6.1: 22nd August 2024
-------------------------------
=====================================
configure.ac
=====================================
@@ -1,5 +1,5 @@
dnl Process this file with autoconf to produce a configure script.
-AC_INIT(htscodecs, 1.6.1)
+AC_INIT(htscodecs, 1.6.5)
# Some functions benefit from -O3 optimisation, so if the user didn't
# explicitly set any compiler flags, we'll plump for O3.
@@ -61,7 +61,7 @@ AM_EXTRA_RECURSIVE_TARGETS([fuzz])
# libhtscodecs.so.1.1.0
VERS_CURRENT=3
-VERS_REVISION=6
+VERS_REVISION=10
VERS_AGE=1
AC_SUBST(VERS_CURRENT)
AC_SUBST(VERS_REVISION)
=====================================
debian/changelog
=====================================
@@ -1,3 +1,16 @@
+htscodecs (1.6.5-1) unstable; urgency=medium
+
+ * New upstream version 1.6.5
+ * d/copyright: update years where relevant.
+ * d/control: drop redundant Priority: optional.
+ * d/control: drop redundant Rules-Requires-Root: no.
+ * d/control: declare compliance to standards version 4.7.3.
+ * d/watch: convert to v5 Github template.
+ * link_with_libm: update dep3 header.
+ The issue looks pretty Debian specific and forwarding not needed.
+
+ -- Étienne Mollier <emollier at debian.org> Sat, 21 Feb 2026 11:46:25 +0100
+
htscodecs (1.6.1-2) unstable; urgency=medium
* Team upload.
=====================================
debian/control
=====================================
@@ -1,17 +1,15 @@
Source: htscodecs
Section: science
-Priority: optional
Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
Uploaders: Michael R. Crusoe <crusoe at debian.org>,
Étienne Mollier <emollier at debian.org>
Build-Depends: debhelper-compat (= 13),
d-shlibs,
libbz2-dev
-Standards-Version: 4.7.0
+Standards-Version: 4.7.3
Vcs-Browser: https://salsa.debian.org/med-team/htscodecs
Vcs-Git: https://salsa.debian.org/med-team/htscodecs.git
Homepage: https://github.com/jkbonfield/htscodecs/
-Rules-Requires-Root: no
Package: libhtscodecs2
Architecture: any
=====================================
debian/copyright
=====================================
@@ -3,7 +3,7 @@ Upstream-Name: htscodecs
Source: https://github.com/jkbonfield/htscodecs/
Files: *
-Copyright: 2009-2024, Genome Research Ltd.
+Copyright: 2009-2025, Genome Research Ltd.
License: BSD-3-clause
Files: htscodecs/c_range_coder.h
@@ -26,7 +26,7 @@ Files: debian/*
Copyright: 2020 Michael R. Crusoe <michael.crusoe at gmail.com>,
2020-2021 Andreas Tille <tille at debian.org>,
2021 Steffen Moeller <moeller at debian.org>,
- 2021-2024 Étienne Mollier <emollier at debian.org>
+ 2021-2026 Étienne Mollier <emollier at debian.org>
License: BSD-3-clause
License: BSD-3-clause
=====================================
debian/patches/link_with_libm
=====================================
@@ -1,5 +1,6 @@
From: Michael R. Crusoe <michael.crusoe at gmail.com>
Subject: Make sure the library itself, not the tests, link against libm
+Forwarded: not-needed
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -33,7 +33,7 @@
=====================================
debian/watch
=====================================
@@ -1,4 +1,5 @@
-version=4
+Version: 5
-opts="filenamemangle=s%(?:.*?)?v?(\d[\d.]*)\.tar\.gz%@PACKAGE at -$1.tar.gz%" \
- https://github.com/samtools/htscodecs/tags (?:.*?/)?v?(\d[\d.]*)\.tar\.gz
+Template: Github
+Owner: samtools
+Project: htscodecs
=====================================
htscodecs/arith_dynamic.c
=====================================
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2019-2022 Genome Research Ltd.
+ * Copyright (c) 2019-2022, 2025 Genome Research Ltd.
* Author(s): James Bonfield
*
* Redistribution and use in source and binary forms, with or without
@@ -733,15 +733,17 @@ unsigned char *arith_compress_to(unsigned char *in, unsigned int in_size,
unsigned int c_meta_len;
uint8_t *rle = NULL, *packed = NULL;
- if (in_size > INT_MAX) {
+ if (in_size > INT_MAX || (out && *out_size == 0)) {
*out_size = 0;
return NULL;
}
if (!out) {
*out_size = arith_compress_bound(in_size, order);
- if (!(out = malloc(*out_size)))
+ if (!(out = malloc(*out_size))) {
+ *out_size = 0;
return NULL;
+ }
}
unsigned char *out_end = out + *out_size;
@@ -751,24 +753,30 @@ unsigned char *arith_compress_to(unsigned char *in, unsigned int in_size,
if (order & X_CAT) {
out[0] = X_CAT;
c_meta_len = 1 + var_put_u32(&out[1], out_end, in_size);
+ if (c_meta_len + in_size > *out_size) {
+ *out_size = 0;
+ return NULL;
+ }
memcpy(out+c_meta_len, in, in_size);
*out_size = in_size+c_meta_len;
}
if (order & X_STRIPE) {
- int N = (order>>8);
+ int N = (order>>8) & 0xff;
if (N == 0) N = 4; // default for compatibility with old tests
- if (N > 255)
- return NULL;
+ if (N > in_size)
+ N = in_size;
unsigned char *transposed = malloc(in_size);
unsigned int part_len[256];
unsigned int idx[256];
- if (!transposed)
+ if (!transposed) {
+ *out_size = 0;
return NULL;
- int i, j, x;
+ }
+ int i, j, x;
for (i = 0; i < N; i++) {
part_len[i] = in_size / N + ((in_size % N) > i);
idx[i] = i ? idx[i-1] + part_len[i-1] : 0; // cumulative index
@@ -788,6 +796,12 @@ unsigned char *arith_compress_to(unsigned char *in, unsigned int in_size,
c_meta_len = 1;
*out = order & ~X_NOSZ;
c_meta_len += var_put_u32(out+c_meta_len, out_end, in_size);
+ if (c_meta_len >= *out_size) {
+ free(transposed);
+ *out_size = 0;
+ return NULL;
+ }
+
out[c_meta_len++] = N;
out2_start = out2 = out+7+5*N; // shares a buffer with c_meta
@@ -795,6 +809,7 @@ unsigned char *arith_compress_to(unsigned char *in, unsigned int in_size,
// Brute force try all methods.
// FIXME: optimise this bit. Maybe learn over time?
int j, best_j = 0, best_sz = INT_MAX;
+ uint8_t *r;
// Works OK with read names. The first byte is the most important,
// as it has most variability (little-endian). After that it's
@@ -843,24 +858,37 @@ unsigned char *arith_compress_to(unsigned char *in, unsigned int in_size,
// {1, 128}};
for (j = 1; j <= m[MIN(i,3)][0]; j++) {
+ if (out2 - out > *out_size)
+ continue; // an error, but caught in best_sz check later
+
olen2 = *out_size - (out2 - out);
//fprintf(stderr, "order=%d m=%d\n", order&3, m[MIN(i,4)][j]);
if ((order&3) == 0 && (m[MIN(i,3)][j]&1))
continue;
- arith_compress_to(transposed+idx[i], part_len[i],
- out2, &olen2, m[MIN(i,3)][j] | X_NOSZ);
- if (best_sz > olen2) {
+ r = arith_compress_to(transposed+idx[i], part_len[i],
+ out2, &olen2, m[MIN(i,3)][j] | X_NOSZ);
+ if (r && olen2 && best_sz > olen2) {
best_sz = olen2;
best_j = j;
}
}
-// if (best_j == 0) // none desireable
-// return NULL;
+
+ if (best_sz == INT_MAX) {
+ free(transposed);
+ *out_size = 0;
+ return NULL;
+ }
if (best_j != j-1) {
olen2 = *out_size - (out2 - out);
- arith_compress_to(transposed+idx[i], part_len[i],
- out2, &olen2, m[MIN(i,3)][best_j] | X_NOSZ);
+ r = arith_compress_to(transposed+idx[i], part_len[i],
+ out2, &olen2,
+ m[MIN(i,3)][best_j] | X_NOSZ);
+ if (!r) {
+ free(transposed);
+ *out_size = 0;
+ return NULL;
+ }
}
out2 += olen2;
c_meta_len += var_put_u32(out+c_meta_len, out_end, olen2);
@@ -893,6 +921,10 @@ unsigned char *arith_compress_to(unsigned char *in, unsigned int in_size,
// PACK 2, 4 or 8 symbols into one byte.
int pmeta_len;
uint64_t packed_len;
+ if (c_meta_len + 256 > *out_size) {
+ *out_size = 0;
+ return NULL;
+ }
packed = hts_pack(in, in_size, out+c_meta_len, &pmeta_len, &packed_len);
if (!packed) {
out[0] &= ~X_PACK;
@@ -934,6 +966,7 @@ unsigned char *arith_compress_to(unsigned char *in, unsigned int in_size,
#else
fprintf(stderr, "Htscodecs has been compiled without libbz2 support\n");
free(out);
+ *out_size = 0;
return NULL;
#endif
@@ -945,25 +978,40 @@ unsigned char *arith_compress_to(unsigned char *in, unsigned int in_size,
// *out_size = lzma_size;
} else {
+ uint8_t *r;
if (do_rle) {
if (order == 0)
- arith_compress_O0_RLE(in, in_size, out+c_meta_len, out_size);
+ r=arith_compress_O0_RLE(in, in_size, out+c_meta_len, out_size);
else
- arith_compress_O1_RLE(in, in_size, out+c_meta_len, out_size);
+ r=arith_compress_O1_RLE(in, in_size, out+c_meta_len, out_size);
} else {
//if (order == 2)
// arith_compress_O2(in, in_size, out+c_meta_len, out_size);
//else
if (order == 1)
- arith_compress_O1(in, in_size, out+c_meta_len, out_size);
+ r=arith_compress_O1(in, in_size, out+c_meta_len, out_size);
else
- arith_compress_O0(in, in_size, out+c_meta_len, out_size);
+ r=arith_compress_O0(in, in_size, out+c_meta_len, out_size);
+ }
+
+ if (!r) {
+ free(rle);
+ free(packed);
+ *out_size = 0;
+ return NULL;
}
}
if (*out_size >= in_size) {
out[0] &= ~(3|X_EXT); // no entropy encoding, but keep e.g. PACK
out[0] |= X_CAT | no_size;
+
+ if (out + c_meta_len + in_size > out_end) {
+ free(rle);
+ free(packed);
+ *out_size = 0;
+ return NULL;
+ }
memcpy(out+c_meta_len, in, in_size);
*out_size = in_size;
}
=====================================
htscodecs/htscodecs.h
=====================================
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2021-2024 Genome Research Ltd.
+ * Copyright (c) 2021-2025 Genome Research Ltd.
* Author(s): James Bonfield
*
* Redistribution and use in source and binary forms, with or without
@@ -43,7 +43,7 @@
* Note currently this needs manually editing as it isn't automatically
* updated by autoconf.
*/
-#define HTSCODECS_VERSION 100601
+#define HTSCODECS_VERSION 100605
/*
* A const string form of the HTSCODECS_VERSION define.
=====================================
htscodecs/rANS_static32x16pr_neon.c
=====================================
@@ -1238,7 +1238,7 @@ static inline void transpose_and_copy(uint8_t *out, int iN[32],
// }
for (z = 0; z < NX; z+=4) {
- *(uint64_t *)&out[iN[z]] =
+ uint64_t t0[4] = {
((uint64_t)(t[0][z])<< 0) +
((uint64_t)(t[1][z])<< 8) +
((uint64_t)(t[2][z])<<16) +
@@ -1246,36 +1246,8 @@ static inline void transpose_and_copy(uint8_t *out, int iN[32],
((uint64_t)(t[4][z])<<32) +
((uint64_t)(t[5][z])<<40) +
((uint64_t)(t[6][z])<<48) +
- ((uint64_t)(t[7][z])<<56);
- *(uint64_t *)&out[iN[z+1]] =
- ((uint64_t)(t[0][z+1])<< 0) +
- ((uint64_t)(t[1][z+1])<< 8) +
- ((uint64_t)(t[2][z+1])<<16) +
- ((uint64_t)(t[3][z+1])<<24) +
- ((uint64_t)(t[4][z+1])<<32) +
- ((uint64_t)(t[5][z+1])<<40) +
- ((uint64_t)(t[6][z+1])<<48) +
- ((uint64_t)(t[7][z+1])<<56);
- *(uint64_t *)&out[iN[z+2]] =
- ((uint64_t)(t[0][z+2])<< 0) +
- ((uint64_t)(t[1][z+2])<< 8) +
- ((uint64_t)(t[2][z+2])<<16) +
- ((uint64_t)(t[3][z+2])<<24) +
- ((uint64_t)(t[4][z+2])<<32) +
- ((uint64_t)(t[5][z+2])<<40) +
- ((uint64_t)(t[6][z+2])<<48) +
- ((uint64_t)(t[7][z+2])<<56);
- *(uint64_t *)&out[iN[z+3]] =
- ((uint64_t)(t[0][z+3])<< 0) +
- ((uint64_t)(t[1][z+3])<< 8) +
- ((uint64_t)(t[2][z+3])<<16) +
- ((uint64_t)(t[3][z+3])<<24) +
- ((uint64_t)(t[4][z+3])<<32) +
- ((uint64_t)(t[5][z+3])<<40) +
- ((uint64_t)(t[6][z+3])<<48) +
- ((uint64_t)(t[7][z+3])<<56);
+ ((uint64_t)(t[7][z])<<56),
- *(uint64_t *)&out[iN[z]+8] =
((uint64_t)(t[8+0][z])<< 0) +
((uint64_t)(t[8+1][z])<< 8) +
((uint64_t)(t[8+2][z])<<16) +
@@ -1283,36 +1255,8 @@ static inline void transpose_and_copy(uint8_t *out, int iN[32],
((uint64_t)(t[8+4][z])<<32) +
((uint64_t)(t[8+5][z])<<40) +
((uint64_t)(t[8+6][z])<<48) +
- ((uint64_t)(t[8+7][z])<<56);
- *(uint64_t *)&out[iN[z+1]+8] =
- ((uint64_t)(t[8+0][z+1])<< 0) +
- ((uint64_t)(t[8+1][z+1])<< 8) +
- ((uint64_t)(t[8+2][z+1])<<16) +
- ((uint64_t)(t[8+3][z+1])<<24) +
- ((uint64_t)(t[8+4][z+1])<<32) +
- ((uint64_t)(t[8+5][z+1])<<40) +
- ((uint64_t)(t[8+6][z+1])<<48) +
- ((uint64_t)(t[8+7][z+1])<<56);
- *(uint64_t *)&out[iN[z+2]+8] =
- ((uint64_t)(t[8+0][z+2])<< 0) +
- ((uint64_t)(t[8+1][z+2])<< 8) +
- ((uint64_t)(t[8+2][z+2])<<16) +
- ((uint64_t)(t[8+3][z+2])<<24) +
- ((uint64_t)(t[8+4][z+2])<<32) +
- ((uint64_t)(t[8+5][z+2])<<40) +
- ((uint64_t)(t[8+6][z+2])<<48) +
- ((uint64_t)(t[8+7][z+2])<<56);
- *(uint64_t *)&out[iN[z+3]+8] =
- ((uint64_t)(t[8+0][z+3])<< 0) +
- ((uint64_t)(t[8+1][z+3])<< 8) +
- ((uint64_t)(t[8+2][z+3])<<16) +
- ((uint64_t)(t[8+3][z+3])<<24) +
- ((uint64_t)(t[8+4][z+3])<<32) +
- ((uint64_t)(t[8+5][z+3])<<40) +
- ((uint64_t)(t[8+6][z+3])<<48) +
- ((uint64_t)(t[8+7][z+3])<<56);
+ ((uint64_t)(t[8+7][z])<<56),
- *(uint64_t *)&out[iN[z]+16] =
((uint64_t)(t[16+0][z])<< 0) +
((uint64_t)(t[16+1][z])<< 8) +
((uint64_t)(t[16+2][z])<<16) +
@@ -1320,36 +1264,8 @@ static inline void transpose_and_copy(uint8_t *out, int iN[32],
((uint64_t)(t[16+4][z])<<32) +
((uint64_t)(t[16+5][z])<<40) +
((uint64_t)(t[16+6][z])<<48) +
- ((uint64_t)(t[16+7][z])<<56);
- *(uint64_t *)&out[iN[z+1]+16] =
- ((uint64_t)(t[16+0][z+1])<< 0) +
- ((uint64_t)(t[16+1][z+1])<< 8) +
- ((uint64_t)(t[16+2][z+1])<<16) +
- ((uint64_t)(t[16+3][z+1])<<24) +
- ((uint64_t)(t[16+4][z+1])<<32) +
- ((uint64_t)(t[16+5][z+1])<<40) +
- ((uint64_t)(t[16+6][z+1])<<48) +
- ((uint64_t)(t[16+7][z+1])<<56);
- *(uint64_t *)&out[iN[z+2]+16] =
- ((uint64_t)(t[16+0][z+2])<< 0) +
- ((uint64_t)(t[16+1][z+2])<< 8) +
- ((uint64_t)(t[16+2][z+2])<<16) +
- ((uint64_t)(t[16+3][z+2])<<24) +
- ((uint64_t)(t[16+4][z+2])<<32) +
- ((uint64_t)(t[16+5][z+2])<<40) +
- ((uint64_t)(t[16+6][z+2])<<48) +
- ((uint64_t)(t[16+7][z+2])<<56);
- *(uint64_t *)&out[iN[z+3]+16] =
- ((uint64_t)(t[16+0][z+3])<< 0) +
- ((uint64_t)(t[16+1][z+3])<< 8) +
- ((uint64_t)(t[16+2][z+3])<<16) +
- ((uint64_t)(t[16+3][z+3])<<24) +
- ((uint64_t)(t[16+4][z+3])<<32) +
- ((uint64_t)(t[16+5][z+3])<<40) +
- ((uint64_t)(t[16+6][z+3])<<48) +
- ((uint64_t)(t[16+7][z+3])<<56);
+ ((uint64_t)(t[16+7][z])<<56),
- *(uint64_t *)&out[iN[z]+24] =
((uint64_t)(t[24+0][z])<< 0) +
((uint64_t)(t[24+1][z])<< 8) +
((uint64_t)(t[24+2][z])<<16) +
@@ -1357,8 +1273,38 @@ static inline void transpose_and_copy(uint8_t *out, int iN[32],
((uint64_t)(t[24+4][z])<<32) +
((uint64_t)(t[24+5][z])<<40) +
((uint64_t)(t[24+6][z])<<48) +
- ((uint64_t)(t[24+7][z])<<56);
- *(uint64_t *)&out[iN[z+1]+24] =
+ ((uint64_t)(t[24+7][z])<<56)
+ };
+ memcpy(&out[iN[z]], &t0, 32);
+
+ uint64_t t1[4] = {
+ ((uint64_t)(t[0][z+1])<< 0) +
+ ((uint64_t)(t[1][z+1])<< 8) +
+ ((uint64_t)(t[2][z+1])<<16) +
+ ((uint64_t)(t[3][z+1])<<24) +
+ ((uint64_t)(t[4][z+1])<<32) +
+ ((uint64_t)(t[5][z+1])<<40) +
+ ((uint64_t)(t[6][z+1])<<48) +
+ ((uint64_t)(t[7][z+1])<<56),
+
+ ((uint64_t)(t[8+0][z+1])<< 0) +
+ ((uint64_t)(t[8+1][z+1])<< 8) +
+ ((uint64_t)(t[8+2][z+1])<<16) +
+ ((uint64_t)(t[8+3][z+1])<<24) +
+ ((uint64_t)(t[8+4][z+1])<<32) +
+ ((uint64_t)(t[8+5][z+1])<<40) +
+ ((uint64_t)(t[8+6][z+1])<<48) +
+ ((uint64_t)(t[8+7][z+1])<<56),
+
+ ((uint64_t)(t[16+0][z+1])<< 0) +
+ ((uint64_t)(t[16+1][z+1])<< 8) +
+ ((uint64_t)(t[16+2][z+1])<<16) +
+ ((uint64_t)(t[16+3][z+1])<<24) +
+ ((uint64_t)(t[16+4][z+1])<<32) +
+ ((uint64_t)(t[16+5][z+1])<<40) +
+ ((uint64_t)(t[16+6][z+1])<<48) +
+ ((uint64_t)(t[16+7][z+1])<<56),
+
((uint64_t)(t[24+0][z+1])<< 0) +
((uint64_t)(t[24+1][z+1])<< 8) +
((uint64_t)(t[24+2][z+1])<<16) +
@@ -1366,8 +1312,38 @@ static inline void transpose_and_copy(uint8_t *out, int iN[32],
((uint64_t)(t[24+4][z+1])<<32) +
((uint64_t)(t[24+5][z+1])<<40) +
((uint64_t)(t[24+6][z+1])<<48) +
- ((uint64_t)(t[24+7][z+1])<<56);
- *(uint64_t *)&out[iN[z+2]+24] =
+ ((uint64_t)(t[24+7][z+1])<<56)
+ };
+ memcpy(&out[iN[z+1]], &t1, 32);
+
+ uint64_t t2[4] = {
+ ((uint64_t)(t[0][z+2])<< 0) +
+ ((uint64_t)(t[1][z+2])<< 8) +
+ ((uint64_t)(t[2][z+2])<<16) +
+ ((uint64_t)(t[3][z+2])<<24) +
+ ((uint64_t)(t[4][z+2])<<32) +
+ ((uint64_t)(t[5][z+2])<<40) +
+ ((uint64_t)(t[6][z+2])<<48) +
+ ((uint64_t)(t[7][z+2])<<56),
+
+ ((uint64_t)(t[8+0][z+2])<< 0) +
+ ((uint64_t)(t[8+1][z+2])<< 8) +
+ ((uint64_t)(t[8+2][z+2])<<16) +
+ ((uint64_t)(t[8+3][z+2])<<24) +
+ ((uint64_t)(t[8+4][z+2])<<32) +
+ ((uint64_t)(t[8+5][z+2])<<40) +
+ ((uint64_t)(t[8+6][z+2])<<48) +
+ ((uint64_t)(t[8+7][z+2])<<56),
+
+ ((uint64_t)(t[16+0][z+2])<< 0) +
+ ((uint64_t)(t[16+1][z+2])<< 8) +
+ ((uint64_t)(t[16+2][z+2])<<16) +
+ ((uint64_t)(t[16+3][z+2])<<24) +
+ ((uint64_t)(t[16+4][z+2])<<32) +
+ ((uint64_t)(t[16+5][z+2])<<40) +
+ ((uint64_t)(t[16+6][z+2])<<48) +
+ ((uint64_t)(t[16+7][z+2])<<56),
+
((uint64_t)(t[24+0][z+2])<< 0) +
((uint64_t)(t[24+1][z+2])<< 8) +
((uint64_t)(t[24+2][z+2])<<16) +
@@ -1375,8 +1351,39 @@ static inline void transpose_and_copy(uint8_t *out, int iN[32],
((uint64_t)(t[24+4][z+2])<<32) +
((uint64_t)(t[24+5][z+2])<<40) +
((uint64_t)(t[24+6][z+2])<<48) +
- ((uint64_t)(t[24+7][z+2])<<56);
- *(uint64_t *)&out[iN[z+3]+24] =
+ ((uint64_t)(t[24+7][z+2])<<56),
+
+ };
+ memcpy(&out[iN[z+2]], &t2, 32);
+
+ uint64_t t3[4] = {
+ ((uint64_t)(t[0][z+3])<< 0) +
+ ((uint64_t)(t[1][z+3])<< 8) +
+ ((uint64_t)(t[2][z+3])<<16) +
+ ((uint64_t)(t[3][z+3])<<24) +
+ ((uint64_t)(t[4][z+3])<<32) +
+ ((uint64_t)(t[5][z+3])<<40) +
+ ((uint64_t)(t[6][z+3])<<48) +
+ ((uint64_t)(t[7][z+3])<<56),
+
+ ((uint64_t)(t[8+0][z+3])<< 0) +
+ ((uint64_t)(t[8+1][z+3])<< 8) +
+ ((uint64_t)(t[8+2][z+3])<<16) +
+ ((uint64_t)(t[8+3][z+3])<<24) +
+ ((uint64_t)(t[8+4][z+3])<<32) +
+ ((uint64_t)(t[8+5][z+3])<<40) +
+ ((uint64_t)(t[8+6][z+3])<<48) +
+ ((uint64_t)(t[8+7][z+3])<<56),
+
+ ((uint64_t)(t[16+0][z+3])<< 0) +
+ ((uint64_t)(t[16+1][z+3])<< 8) +
+ ((uint64_t)(t[16+2][z+3])<<16) +
+ ((uint64_t)(t[16+3][z+3])<<24) +
+ ((uint64_t)(t[16+4][z+3])<<32) +
+ ((uint64_t)(t[16+5][z+3])<<40) +
+ ((uint64_t)(t[16+6][z+3])<<48) +
+ ((uint64_t)(t[16+7][z+3])<<56),
+
((uint64_t)(t[24+0][z+3])<< 0) +
((uint64_t)(t[24+1][z+3])<< 8) +
((uint64_t)(t[24+2][z+3])<<16) +
@@ -1384,7 +1391,9 @@ static inline void transpose_and_copy(uint8_t *out, int iN[32],
((uint64_t)(t[24+4][z+3])<<32) +
((uint64_t)(t[24+5][z+3])<<40) +
((uint64_t)(t[24+6][z+3])<<48) +
- ((uint64_t)(t[24+7][z+3])<<56);
+ ((uint64_t)(t[24+7][z+3])<<56)
+ };
+ memcpy(&out[iN[z+3]], &t3, 32);
iN[z+0] += 32;
iN[z+1] += 32;
=====================================
htscodecs/rANS_static4x16pr.c
=====================================
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2017-2023 Genome Research Ltd.
+ * Copyright (c) 2017-2023, 2025 Genome Research Ltd.
* Author(s): James Bonfield
*
* Redistribution and use in source and binary forms, with or without
@@ -844,6 +844,33 @@ static int rans_cpu = 0xFFFF; // all
# define UNUSED
#endif
+#if defined(__APPLE__)
+/*
+ MacOS before 12.2 (a.k.a. Darwin 21.3) had a bug that could cause random
+ failures on some AVX512 operations due to opmask registers not being restored
+ correctly following interrupts. For simplicity, check that the major version
+ is 22 or higher before using AVX512.
+ See https://community.intel.com/t5/Software-Tuning-Performance/MacOS-Darwin-kernel-bug-clobbers-AVX-512-opmask-register-state/m-p/1327259
+*/
+
+#include <sys/utsname.h>
+static inline int not_ancient_darwin(void) {
+ static long version = 0;
+ if (!version) {
+ struct utsname uname_info;
+ if (uname(&uname_info) == 0) {
+ version = strtol(uname_info.release, NULL, 10);
+ }
+ }
+ return version >= 22;
+}
+#else
+static inline int not_ancient_darwin(void) {
+ return 1;
+}
+#endif
+
+
// CPU detection is performed once. NB this has an assumption that we're
// not migrating between processes with different instruction stes, but
// to date the only systems I know of that support this don't have different
@@ -862,6 +889,11 @@ static int is_amd UNUSED = 0;
#define HAVE_HTSCODECS_TLS_CPU_INIT
static void htscodecs_tls_cpu_init(void) {
unsigned int eax = 0, ebx = 0, ecx = 0, edx = 0;
+ unsigned int have_xsave UNUSED = 0;
+ unsigned int have_avx UNUSED = 0;
+ uint64_t xcr0 UNUSED = 0ULL;
+ const uint64_t xcr0_can_use_avx UNUSED = (1ULL << 2);
+ const uint64_t xcr0_can_use_avx512 UNUSED = (7ULL << 5);
// These may be unused, depending on HAVE_* config.h macros
int level = __get_cpuid_max(0, NULL);
@@ -877,15 +909,43 @@ static void htscodecs_tls_cpu_init(void) {
#endif
#if defined(bit_SSE4_1)
have_sse4_1 = ecx & bit_SSE4_1;
+#endif
+#if defined(bit_AVX)
+ have_avx = ecx & bit_AVX;
+#endif
+#if defined(bit_XSAVE) && defined(bit_OSXSAVE)
+ have_xsave = (ecx & bit_XSAVE) && (ecx & bit_OSXSAVE);
+ if (have_xsave) {
+ /* OSXSAVE tells us it's safe to use XGETBV to read XCR0
+ which then describes if AVX / AVX512 instructions can be
+ executed. See Intel 64 and IA-32 Architectures Software
+ Developer’s Manual Vol. 1 sections 13.2 and 13.3.
+
+ Use inline assembly for XGETBV here to avoid problems
+ with builtins either not working correctly, or requiring
+ specific compiler options to be in use. Also emit raw
+ bytes here as older toolchains may not have the XGETBV
+ instruction.
+ */
+ __asm__ volatile (".byte 0x0f, 0x01, 0xd0" :
+ "=d" (edx), "=a" (eax) :
+ "c" (0));
+ xcr0 = ((uint64_t) edx << 32) | eax;
+ }
#endif
}
- if (level >= 7) {
+ // AVX2 and AVX512F depend on XSAVE, AVX and bit 2 of XCR0.
+ if (level >= 7 && have_xsave && have_avx
+ && (xcr0 & xcr0_can_use_avx) == xcr0_can_use_avx) {
__cpuid_count(7, 0, eax, ebx, ecx, edx);
#if defined(bit_AVX2)
have_avx2 = ebx & bit_AVX2;
#endif
#if defined(bit_AVX512F)
- have_avx512f = ebx & bit_AVX512F;
+ // AVX512 depends on bits 5:7 of XCR0
+ if ((xcr0 & xcr0_can_use_avx512) == xcr0_can_use_avx512
+ && not_ancient_darwin())
+ have_avx512f = ebx & bit_AVX512F;
#endif
}
@@ -900,14 +960,6 @@ unsigned char *(*rans_enc_func(int do_simd, int order))
unsigned char *out,
unsigned int *out_size) {
- int have_e_sse4_1 = have_sse4_1;
- int have_e_avx2 = have_avx2;
- int have_e_avx512f = have_avx512f;
-
- if (!(rans_cpu & RANS_CPU_ENC_AVX512)) have_e_avx512f = 0;
- if (!(rans_cpu & RANS_CPU_ENC_AVX2)) have_e_avx2 = 0;
- if (!(rans_cpu & RANS_CPU_ENC_SSE4)) have_e_sse4_1 = 0;
-
if (!do_simd) { // SIMD disabled
return order & 1
? rans_compress_O1_4x16
@@ -925,6 +977,14 @@ unsigned char *(*rans_enc_func(int do_simd, int order))
}
#endif
+ int have_e_sse4_1 = have_sse4_1;
+ int have_e_avx2 = have_avx2;
+ int have_e_avx512f = have_avx512f;
+
+ if (!(rans_cpu & RANS_CPU_ENC_AVX512)) have_e_avx512f = 0;
+ if (!(rans_cpu & RANS_CPU_ENC_AVX2)) have_e_avx2 = 0;
+ if (!(rans_cpu & RANS_CPU_ENC_SSE4)) have_e_sse4_1 = 0;
+
if (order & 1) {
// With simulated gathers, the AVX512 is now slower than AVX2, so
// we avoid using it unless asking for the real avx512 gather.
@@ -974,14 +1034,6 @@ unsigned char *(*rans_dec_func(int do_simd, int order))
unsigned char *out,
unsigned int out_size) {
- int have_d_sse4_1 = have_sse4_1;
- int have_d_avx2 = have_avx2;
- int have_d_avx512f = have_avx512f;
-
- if (!(rans_cpu & RANS_CPU_DEC_AVX512)) have_d_avx512f = 0;
- if (!(rans_cpu & RANS_CPU_DEC_AVX2)) have_d_avx2 = 0;
- if (!(rans_cpu & RANS_CPU_DEC_SSE4)) have_d_sse4_1 = 0;
-
if (!do_simd) { // SIMD disabled
return order & 1
? rans_uncompress_O1_4x16
@@ -999,6 +1051,14 @@ unsigned char *(*rans_dec_func(int do_simd, int order))
}
#endif
+ int have_d_sse4_1 = have_sse4_1;
+ int have_d_avx2 = have_avx2;
+ int have_d_avx512f = have_avx512f;
+
+ if (!(rans_cpu & RANS_CPU_DEC_AVX512)) have_d_avx512f = 0;
+ if (!(rans_cpu & RANS_CPU_DEC_AVX2)) have_d_avx2 = 0;
+ if (!(rans_cpu & RANS_CPU_DEC_SSE4)) have_d_sse4_1 = 0;
+
if (order & 1) {
#if defined(HAVE_AVX512)
if (have_d_avx512f)
@@ -1164,11 +1224,17 @@ void rans_set_cpu(int opts) {
unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
unsigned char *out,unsigned int *out_size,
int order) {
- if (in_size > INT_MAX) {
+ if (in_size > INT_MAX || (out && *out_size == 0)) {
*out_size = 0;
return NULL;
}
+#ifdef VALIDATE_RANS
+ int orig_order = order;
+ int orig_in_size = in_size;
+ unsigned char *orig_in = in;
+#endif
+
unsigned int c_meta_len;
uint8_t *meta = NULL, *rle = NULL, *packed = NULL;
uint8_t *out_free = NULL;
@@ -1177,8 +1243,10 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
*out_size = rans_compress_bound_4x16(in_size, order);
if (*out_size == 0)
return NULL;
- if (!(out_free = out = malloc(*out_size)))
+ if (!(out_free = out = malloc(*out_size))) {
+ *out_size = 0;
return NULL;
+ }
}
unsigned char *out_end = out + *out_size;
@@ -1199,11 +1267,15 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
int N = (order>>8) & 0xff;
if (N == 0) N = 4; // default for compatibility with old tests
+ if (N > in_size)
+ N = in_size;
+
unsigned char *transposed = malloc(in_size);
unsigned int part_len[256];
unsigned int idx[256];
if (!transposed) {
free(out_free);
+ *out_size = 0;
return NULL;
}
int i, j, x;
@@ -1241,6 +1313,13 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
c_meta_len = 1;
*out = order & ~RANS_ORDER_NOSZ;
c_meta_len += var_put_u32(out+c_meta_len, out_end, in_size);
+ if (c_meta_len >= *out_size) {
+ free(out_free);
+ free(transposed);
+ *out_size = 0;
+ return NULL;
+ }
+
out[c_meta_len++] = N;
unsigned char *out_best = NULL;
@@ -1249,7 +1328,8 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
out2_start = out2 = out+7+5*N; // shares a buffer with c_meta
for (i = 0; i < N; i++) {
// Brute force try all methods.
- int j, m[] = {1,64,128,0}, best_j = 0, best_sz = in_size+10;
+ uint8_t *r;
+ int j, m[] = {1,64,128,0}, best_j = 0, best_sz = INT_MAX;
for (j = 0; j < sizeof(m)/sizeof(*m); j++) {
if ((order & m[j]) != m[j])
continue;
@@ -1257,18 +1337,24 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
// order-1 *only*; bit check above cannot elide order-0
if ((order & RANS_ORDER_STRIPE_NO0) && (m[j]&1) == 0)
continue;
+
+ if (out2 - out > *out_size)
+ continue; // an error, but caught in best_sz check later
+
olen2 = *out_size - (out2 - out);
- rans_compress_to_4x16(transposed+idx[i], part_len[i],
- out2, &olen2,
- m[j] | RANS_ORDER_NOSZ
- | (order&RANS_ORDER_X32));
- if (best_sz > olen2) {
+ r = rans_compress_to_4x16(transposed+idx[i], part_len[i],
+ out2, &olen2,
+ m[j] | RANS_ORDER_NOSZ
+ | (order&RANS_ORDER_X32));
+ if (r && olen2 && best_sz > olen2) {
best_sz = olen2;
best_j = j;
if (j < sizeof(m)/sizeof(*m) && olen2 > out_best_len) {
unsigned char *tmp = realloc(out_best, olen2);
if (!tmp) {
free(out_free);
+ free(transposed);
+ *out_size = 0;
return NULL;
}
out_best = tmp;
@@ -1279,6 +1365,15 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
memcpy(out_best, out2, olen2);
}
}
+
+ if (best_sz == INT_MAX) {
+ free(out_best);
+ free(out_free);
+ free(transposed);
+ *out_size = 0;
+ return NULL;
+ }
+
if (best_j < sizeof(m)/sizeof(*m)) {
// Copy the best compression to output buffer if not current
memcpy(out2, out_best, best_sz);
@@ -1301,6 +1396,12 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
out[0] = RANS_ORDER_CAT;
c_meta_len = 1;
c_meta_len += var_put_u32(&out[1], out_end, in_size);
+
+ if (c_meta_len + in_size > *out_size) {
+ free(out_free);
+ *out_size = 0;
+ return NULL;
+ }
if (in_size)
memcpy(out+c_meta_len, in, in_size);
*out_size = c_meta_len + in_size;
@@ -1329,6 +1430,11 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
// PACK 2, 4 or 8 symbols into one byte.
int pmeta_len;
uint64_t packed_len;
+ if (c_meta_len + 256 > *out_size) {
+ free(out_free);
+ *out_size = 0;
+ return NULL;
+ }
packed = hts_pack(in, in_size, out+c_meta_len, &pmeta_len, &packed_len);
if (!packed) {
out[0] &= ~RANS_ORDER_PACK;
@@ -1345,6 +1451,11 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
int sz = var_put_u32(out+c_meta_len, out_end, in_size);
c_meta_len += sz;
*out_size -= sz;
+
+ if (do_simd && in_size < 32) {
+ do_simd = 0;
+ out[0] &= ~RANS_ORDER_X32;
+ }
}
} else if (do_pack) {
out[0] &= ~RANS_ORDER_PACK;
@@ -1357,6 +1468,7 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
c_rmeta_len = in_size+257;
if (!(meta = malloc(c_rmeta_len))) {
free(out_free);
+ *out_size = 0;
return NULL;
}
@@ -1380,8 +1492,27 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
// Compress lengths with O0 and literals with O0/O1 ("order" param)
int sz = var_put_u32(out+c_meta_len, out_end, rmeta_len*2), sz2;
sz += var_put_u32(out+c_meta_len+sz, out_end, rle_len);
+ if ((c_meta_len+sz+5) > *out_size) {
+ free(out_free);
+ free(rle);
+ free(meta);
+ free(packed);
+ *out_size = 0;
+ return NULL;
+ }
c_rmeta_len = *out_size - (c_meta_len+sz+5);
- rans_enc_func(do_simd, 0)(meta, rmeta_len, out+c_meta_len+sz+5, &c_rmeta_len);
+ if (do_simd && (rmeta_len < 32 || rle_len < 32)) {
+ do_simd = 0;
+ out[0] &= ~RANS_ORDER_X32;
+ }
+ if (!rans_enc_func(do_simd, 0)(meta, rmeta_len, out+c_meta_len+sz+5, &c_rmeta_len)) {
+ free(out_free);
+ free(rle);
+ free(meta);
+ free(packed);
+ *out_size = 0;
+ return NULL;
+ }
if (c_rmeta_len < rmeta_len) {
sz2 = var_put_u32(out+c_meta_len+sz, out_end, c_rmeta_len);
memmove(out+c_meta_len+sz+sz2, out+c_meta_len+sz+5, c_rmeta_len);
@@ -1404,17 +1535,39 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
out[0] &= ~RANS_ORDER_RLE;
}
+ if (c_meta_len > *out_size) {
+ free(out_free);
+ free(rle);
+ free(packed);
+ *out_size = 0;
+ return NULL;
+ }
+
*out_size -= c_meta_len;
if (order && in_size < 8) {
out[0] &= ~1;
order &= ~1;
}
- rans_enc_func(do_simd, order)(in, in_size, out+c_meta_len, out_size);
+ if (!rans_enc_func(do_simd, order)(in, in_size, out+c_meta_len, out_size)) {
+ free(out_free);
+ free(rle);
+ free(packed);
+ *out_size = 0;
+ return NULL;
+ }
if (*out_size >= in_size) {
out[0] &= ~3;
out[0] |= RANS_ORDER_CAT | no_size;
+
+ if (out + c_meta_len + in_size > out_end) {
+ free(out_free);
+ free(rle);
+ free(packed);
+ *out_size = 0;
+ return NULL;
+ }
if (in_size)
memcpy(out+c_meta_len, in, in_size);
*out_size = in_size;
@@ -1425,6 +1578,24 @@ unsigned char *rans_compress_to_4x16(unsigned char *in, unsigned int in_size,
*out_size += c_meta_len;
+// Validation mode
+#ifdef VALIDATE_RANS
+ unsigned int decoded_size = orig_in_size;
+ unsigned char *decoded = malloc(decoded_size);
+ decoded = rans_uncompress_to_4x16(out, *out_size,
+ decoded, &decoded_size);
+ if (!decoded ||
+ decoded_size != orig_in_size ||
+ memcmp(orig_in, decoded, orig_in_size) != 0) {
+ fprintf(stderr, "rans round trip failed for order %d. Written to fd 5\n", orig_order);
+ if (write(5, orig_in, orig_in_size) < 0)
+ abort();
+ abort();
+ }
+ free(decoded);
+#endif
+
+
return out;
}
=====================================
htscodecs/tokenise_name3.c
=====================================
@@ -610,12 +610,16 @@ int search_trie(name_context *ctx, char *data, size_t len, int n, int *exact, in
prefix_len = 6; // IonTorrent
*fixed_len = 6;
*is_fixed = 1;
- } else if (l > 37 && d[f+8] == '-' && d[f+13] == '-' && d[f+18] == '-' && d[f+23] == '-' &&
- ((d[f+0] >= '0' && d[f+0] <='9') || (d[f+0] >= 'a' && d[f+0] <= 'f')) &&
- ((d[f+35] >= '0' && d[f+35] <='9') || (d[f+35] >= 'a' && d[f+35] <= 'f'))) {
+ } else if (l >= 36
+ && d[f+8]=='-' && d[f+13]=='-' && d[f+18]=='-' && d[f+23]=='-'
+ && isxdigit((uint8_t)d[f+0]) && isxdigit((uint8_t)d[f+7])
+ && isxdigit((uint8_t)d[f+9]) && isxdigit((uint8_t)d[f+12])
+ && isxdigit((uint8_t)d[f+14]) && isxdigit((uint8_t)d[f+17])
+ && isxdigit((uint8_t)d[f+19]) && isxdigit((uint8_t)d[f+22])
+ && isxdigit((uint8_t)d[f+24]) && isxdigit((uint8_t)d[f+35])) {
// ONT: f33d30d5-6eb8-4115-8f46-154c2620a5da_Basecall_1D_template...
- prefix_len = 37;
- *fixed_len = 37;
+ prefix_len = 36;
+ *fixed_len = 36;
*is_fixed = 1;
} else {
// Check Illumina and trim back to lane:tile:x:y.
@@ -638,7 +642,6 @@ int search_trie(name_context *ctx, char *data, size_t len, int n, int *exact, in
*is_fixed = 0;
}
}
- //prefix_len = INT_MAX;
if (!ctx->t_head) {
ctx->t_head = calloc(1, sizeof(*ctx->t_head));
@@ -647,6 +650,7 @@ int search_trie(name_context *ctx, char *data, size_t len, int n, int *exact, in
}
// Find an item in the trie
+ int from_punct = from;
for (nlines = i = 0; i < len; i++, nlines++) {
t = ctx->t_head;
while (i < len && data[i] > '\n') {
@@ -661,16 +665,10 @@ int search_trie(name_context *ctx, char *data, size_t len, int n, int *exact, in
x = x->sibling;
t = x;
-// t = t->next[c];
-
-// if (!t)
-// return -1;
-
from = t->n;
+ if ((ispunct(c) || isspace(c)) && t->n != n)
+ from_punct = t->n;
if (i == prefix_len) p3 = t->n;
- //if (t->count >= .0035*ctx->t_head->count && t->n != n) p3 = t->n; // pacbio
- //if (i == 60) p3 = t->n; // pacbio
- //if (i == 7) p3 = t->n; // iontorrent
t->n = n;
}
}
@@ -678,7 +676,7 @@ int search_trie(name_context *ctx, char *data, size_t len, int n, int *exact, in
//printf("Looked for %d, found %d, prefix %d\n", n, from, p3);
*exact = (n != from) && len;
- return *exact ? from : p3;
+ return *exact ? from : (p3 != -1 ? p3 : from_punct);
}
@@ -729,10 +727,29 @@ static int encode_name(name_context *ctx, char *name, int len, int mode) {
if (!ctx->lc[cnum].last)
return -1;
encode_token_diff(ctx, cnum-pnum);
-
int ntok = 1;
- i = 0;
- if (is_fixed) {
+
+ if (fixed_len == 36) {
+ // ONT uuid4 format data
+ if (37 >= ctx->max_tok) {
+ do {
+ memset(&ctx->desc[ctx->max_tok << 4], 0, 16*sizeof(ctx->desc[0]));
+ memset(&ctx->token_dcount[ctx->max_tok], 0, sizeof(int));
+ memset(&ctx->token_icount[ctx->max_tok], 0, sizeof(int));
+ } while (ctx->max_tok++ < 37);
+ }
+#ifdef ENC_DEBUG
+ fprintf(stderr, "Tok %d (%d x uuid chr)", ntok, len);
+#endif
+ for (i = 0; i < 36; i++, ntok++) {
+ encode_token_char(ctx, ntok, name[i]);
+ ctx->lc[cnum].last[ntok].token_int = name[i];
+ ctx->lc[cnum].last[ntok].token_type = N_CHAR;
+ }
+ is_fixed = 0;
+ i = 36;
+ } else if (is_fixed) {
+ // Other fixed length data
if (ntok >= ctx->max_tok) {
memset(&ctx->desc[ctx->max_tok << 4], 0, 16*sizeof(ctx->desc[0]));
memset(&ctx->token_dcount[ctx->max_tok], 0, sizeof(int));
@@ -752,6 +769,8 @@ static int encode_name(name_context *ctx, char *name, int len, int mode) {
ctx->lc[cnum].last[ntok].token_str = 0;
ctx->lc[cnum].last[ntok++].token_type = N_ALPHA;
i = fixed_len;
+ } else {
+ i = 0;
}
for (; i < len; i++) {
@@ -1555,6 +1574,7 @@ uint8_t *tok3_encode_names(char *blk, int len, int level, int use_arith,
if (compress(ctx->desc[i].buf, ctx->desc[i].buf_l, i&0xf, level,
use_arith, out, &out_len) < 0) {
free_context(ctx);
+ free(out);
return NULL;
}
=====================================
javascript/index.js
=====================================
@@ -54,7 +54,7 @@ function r4x16_uncompress(inputBuffer, outputBuffer) {
}
function arith_uncompress(inputBuffer, outputBuffer) {
- arith.decode(inputBuffer).copy(outputBuffer, 0, 0);
+ new arith().decode(inputBuffer).copy(outputBuffer, 0, 0);
}
function fqzcomp_uncompress(inputBuffer, outputBuffer) {
=====================================
tests/dat/qsimd
=====================================
@@ -0,0 +1 @@
+0000000000000000000000000000000000000000000011111111111111111111111111111111111111111111000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111
\ No newline at end of file
=====================================
tests/dat/r4x16/qsimd.69
=====================================
@@ -0,0 +1 @@
+`t%
01+++/+WW+W/+010101010101
\ No newline at end of file
=====================================
tests/fqzcomp.test
=====================================
@@ -11,6 +11,10 @@ do
cut -f 1 $f > $out/fqz
for s in 0 1 2 3
do
+ if [ ! -e "$comp.$o" ]
+ then
+ continue
+ fi
printf 'Testing fqzcomp_qual -r -s %s on %s\t' $s "$f"
# Round trip
=====================================
tests/rans4x8.test
=====================================
@@ -11,6 +11,11 @@ do
cut -f 1 < $f | tr -d '\012' > $out/r4x8-nl
for o in 0 1
do
+ if [ ! -e "$comp.$o" ]
+ then
+ continue
+ fi
+
printf 'Testing rans4x8 -r -o%s on %s\t' $o "$f"
# Round trip
View it on GitLab: https://salsa.debian.org/med-team/htscodecs/-/compare/7ebf341055e83f69c54984535af36f675b200fc0...dce7a91309c38a09c3520f880a9a2ff9c3d017b5
--
View it on GitLab: https://salsa.debian.org/med-team/htscodecs/-/compare/7ebf341055e83f69c54984535af36f675b200fc0...dce7a91309c38a09c3520f880a9a2ff9c3d017b5
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20260221/24cab134/attachment-0001.htm>
More information about the debian-med-commit
mailing list