[med-svn] [Git][med-team/bolt-lmm][upstream] New upstream version 2.4.1+dfsg

Tue Jul 11 11:14:48 BST 2023


Andreas Tille pushed to branch upstream at Debian Med / bolt-lmm


Commits:
0c1d6c5f by Andreas Tille at 2023-07-11T11:52:18+02:00
New upstream version 2.4.1+dfsg
- - - - -


7 changed files:

- example/example.log
- example/example_reml2.log
- src/Bolt.cpp
- src/Bolt.hpp
- src/BoltMain.cpp
- src/FileUtils.cpp
- src/Makefile


Changes:

=====================================
example/example.log
=====================================
@@ -1,7 +1,7 @@
                       +-----------------------------+
                       |                       ___   |
-                      |   BOLT-LMM, v2.4     /_ /   |
-                      |   July 22, 2022       /_/   |
+                      |   BOLT-LMM, v2.4.1   /_ /   |
+                      |   November 16, 2022   /_/   |
                       |   Po-Ru Loh            //   |
                       |                        /    |
                       +-----------------------------+
@@ -83,7 +83,7 @@ Reading bed file #1: EUR_subset.bed
 Total indivs after QC: 373
 Total post-QC SNPs: M = 2431
   Variance component 1: 2431 post-QC SNPs (name: 'modelSnps')
-Time for SnpData setup = 0.370177 sec
+Time for SnpData setup = 0.352433 sec
 
 === Reading phenotype and covariate data ===
 
@@ -115,13 +115,13 @@ Total independent covariate vectors: Cindep = 4
 Number of chroms with >= 1 good SNP: 6
 Average norm of projected SNPs:           362.015344
 Dimension of all-1s proj space (Nused-1): 365
-Time for covariate data setup + Bolt initialization = 0.017323 sec
+Time for covariate data setup + Bolt initialization = 0.0179222 sec
 
 Phenotype 1:   N = 366   mean = 0.00450586   std = 1.0273
 
 === Computing linear regression (LINREG) stats ===
 
-Time for computing LINREG stats = 0.0035708 sec
+Time for computing LINREG stats = 0.00358891 sec
 
 === Estimating variance parameters ===
 
@@ -134,17 +134,17 @@ Estimating MC scaling f_REML at log(delta) = 1.09865, h2 = 0.25...
   iter 2:  time=0.01  rNorms/orig: (0.01,0.03)  res2s: 791.087..208.371
   iter 3:  time=0.01  rNorms/orig: (0.002,0.004)  res2s: 791.958..209.121
   Converged at iter 3: rNorms/orig all < CGtol=0.005
-  Time breakdown: dgemm = 44.0%, memory/overhead = 56.0%
+  Time breakdown: dgemm = 43.5%, memory/overhead = 56.5%
   MCscaling: logDelta = 1.10, h2 = 0.250, f = 0.0583786
 
 Estimating MC scaling f_REML at log(delta) = 4.23869e-05, h2 = 0.5...
   Batch-solving 16 systems of equations using conjugate gradient iteration
   iter 1:  time=0.01  rNorms/orig: (0.2,0.3)  res2s: 157.403..82.5002
   iter 2:  time=0.01  rNorms/orig: (0.04,0.1)  res2s: 176.427..94.685
-  iter 3:  time=0.01  rNorms/orig: (0.01,0.02)  res2s: 178.429..97.6069
-  iter 4:  time=0.01  rNorms/orig: (0.004,0.005)  res2s: 178.791..97.8407
+  iter 3:  time=0.00  rNorms/orig: (0.01,0.02)  res2s: 178.429..97.6069
+  iter 4:  time=0.00  rNorms/orig: (0.004,0.005)  res2s: 178.791..97.8407
   Converged at iter 4: rNorms/orig all < CGtol=0.005
-  Time breakdown: dgemm = 44.1%, memory/overhead = 55.9%
+  Time breakdown: dgemm = 43.7%, memory/overhead = 56.3%
   MCscaling: logDelta = 0.00, h2 = 0.500, f = 0.00362986
 
 Estimating MC scaling f_REML at log(delta) = -0.0727959, h2 = 0.518202...
@@ -155,7 +155,7 @@ Estimating MC scaling f_REML at log(delta) = -0.0727959, h2 = 0.518202...
   iter 4:  time=0.00  rNorms/orig: (0.004,0.006)  res2s: 160.548..91.4234
   iter 5:  time=0.00  rNorms/orig: (0.0008,0.001)  res2s: 160.575..91.4401
   Converged at iter 5: rNorms/orig all < CGtol=0.005
-  Time breakdown: dgemm = 44.1%, memory/overhead = 55.9%
+  Time breakdown: dgemm = 43.8%, memory/overhead = 56.2%
   MCscaling: logDelta = -0.07, h2 = 0.518, f = -0.000114364
 
 Secant iteration for h2 estimation converged in 1 steps
@@ -163,7 +163,7 @@ Estimated (pseudo-)heritability: h2g = 0.518
 To more precisely estimate variance parameters and estimate s.e., use --reml
 Variance params: sigma^2_K = 0.539611, logDelta = -0.072796, f = -0.000114364
 
-Time for fitting variance components = 0.0747402 sec
+Time for fitting variance components = 0.0710289 sec
 
 === Computing mixed model assoc stats (inf. model) ===
 
@@ -179,19 +179,19 @@ Each chunk is excluded when testing SNPs belonging to the chunk
   iter 5:  time=0.01  rNorms/orig: (0.0008,0.002)  res2s: 95.3793..101.413
   iter 6:  time=0.01  rNorms/orig: (0.0003,0.0004)  res2s: 95.381..101.415
   Converged at iter 6: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 61.2%, memory/overhead = 38.8%
+  Time breakdown: dgemm = 61.1%, memory/overhead = 38.9%
 
 AvgPro: 1.016   AvgRetro: 0.998   Calibration: 1.018 (0.008)   (30 SNPs)
 Ratio of medians: 1.020   Median of ratios: 1.015
 
-Time for computing infinitesimal model assoc stats = 0.0411689 sec
+Time for computing infinitesimal model assoc stats = 0.0401161 sec
 
 === Estimating chip LD Scores using 400 indivs ===
 
 WARNING: Only 373 indivs available; using all
 Reducing sample size to 368 for memory alignment
 
-Time for estimating chip LD Scores = 0.00515795 sec
+Time for estimating chip LD Scores = 0.00524688 sec
 
 === Reading LD Scores for calibration of Bayesian assoc stats ===
 
@@ -255,9 +255,9 @@ Dimension of all-1s proj space (Nused-1): 291
   iter 13:  time=0.00 for  1 active reps  approxLL diffs: (0.01,0.01)
   iter 14:  time=0.00 for  1 active reps  approxLL diffs: (0.01,0.01)
   Converged at iter 14: approxLL diffs each have been < LLtol=0.01
-  Time breakdown: dgemm = 29.5%, memory/overhead = 70.5%
+  Time breakdown: dgemm = 29.6%, memory/overhead = 70.4%
 Computing predictions on left-out cross-validation fold
-Time for computing predictions = 0.00540709 sec
+Time for computing predictions = 0.00543308 sec
 
 Average PVEs obtained by param pairs tested (high to low):
  f2=0.3, p=0.01: 0.126476
@@ -342,9 +342,9 @@ Dimension of all-1s proj space (Nused-1): 292
   iter 22:  time=0.00 for  1 active reps  approxLL diffs: (0.02,0.02)
   iter 23:  time=0.00 for  1 active reps  approxLL diffs: (0.01,0.01)
   Converged at iter 23: approxLL diffs each have been < LLtol=0.01
-  Time breakdown: dgemm = 24.2%, memory/overhead = 75.8%
+  Time breakdown: dgemm = 24.0%, memory/overhead = 76.0%
 Computing predictions on left-out cross-validation fold
-Time for computing predictions = 0.00540304 sec
+Time for computing predictions = 0.00533509 sec
 
 Average PVEs obtained by param pairs tested (high to low):
  f2=0.3, p=0.01: 0.110938
@@ -413,9 +413,9 @@ Dimension of all-1s proj space (Nused-1): 292
   iter 9:  time=0.00 for  1 active reps  approxLL diffs: (0.03,0.03)
   iter 10:  time=0.00 for  1 active reps  approxLL diffs: (0.01,0.01)
   Converged at iter 10: approxLL diffs each have been < LLtol=0.01
-  Time breakdown: dgemm = 31.8%, memory/overhead = 68.2%
+  Time breakdown: dgemm = 31.2%, memory/overhead = 68.8%
 Computing predictions on left-out cross-validation fold
-Time for computing predictions = 0.00174713 sec
+Time for computing predictions = 0.00173998 sec
 
 Average PVEs obtained by param pairs tested (high to low):
  f2=0.5, p=0.01: 0.090904
@@ -487,9 +487,9 @@ Dimension of all-1s proj space (Nused-1): 292
   iter 30:  time=0.00 for  1 active reps  approxLL diffs: (0.01,0.01)
   iter 31:  time=0.00 for  1 active reps  approxLL diffs: (0.01,0.01)
   Converged at iter 31: approxLL diffs each have been < LLtol=0.01
-  Time breakdown: dgemm = 28.1%, memory/overhead = 71.9%
+  Time breakdown: dgemm = 27.7%, memory/overhead = 72.3%
 Computing predictions on left-out cross-validation fold
-Time for computing predictions = 0.00174212 sec
+Time for computing predictions = 0.00175405 sec
 
 Average PVEs obtained by param pairs tested (high to low):
  f2=0.5, p=0.01: 0.087902
@@ -539,9 +539,9 @@ Dimension of all-1s proj space (Nused-1): 292
   iter 8:  time=0.00 for  1 active reps  approxLL diffs: (0.02,0.02)
   iter 9:  time=0.00 for  1 active reps  approxLL diffs: (0.01,0.01)
   Converged at iter 9: approxLL diffs each have been < LLtol=0.01
-  Time breakdown: dgemm = 32.4%, memory/overhead = 67.6%
+  Time breakdown: dgemm = 31.5%, memory/overhead = 68.5%
 Computing predictions on left-out cross-validation fold
-Time for computing predictions = 0.00173402 sec
+Time for computing predictions = 0.0017519 sec
 
 Average PVEs obtained by param pairs tested (high to low):
  f2=0.5, p=0.01: 0.056417
@@ -559,7 +559,7 @@ Detailed CV fold results:
 
 Optimal mixture parameters according to CV: f2 = 0.5, p = 0.01
 
-Time for estimating mixture parameters = 21.4859 sec
+Time for estimating mixture parameters = 22.1844 sec
 
 === Computing Bayesian mixed model assoc stats with mixture prior ===
 
@@ -580,7 +580,7 @@ Each chunk is excluded when testing SNPs belonging to the chunk
   iter 12:  time=0.00 for  1 active reps  approxLL diffs: (0.02,0.02)
   iter 13:  time=0.00 for  1 active reps  approxLL diffs: (0.01,0.01)
   Converged at iter 13: approxLL diffs each have been < LLtol=0.01
-  Time breakdown: dgemm = 27.7%, memory/overhead = 72.3%
+  Time breakdown: dgemm = 27.8%, memory/overhead = 72.2%
 Filtering to SNPs with chisq stats, LD Scores, and MAF > 0.01
 # of SNPs passing filters before outlier removal: 2427/2431
 Masking windows around outlier snps (chisq > 20.0)
@@ -590,7 +590,7 @@ Estimated attenuation: 0.428 (0.415)
 Intercept of LD Score regression for cur stats: 1.038 (0.044)
 Calibration factor (ref/cur) to multiply by:      1.003 (0.015)
 
-Time for computing Bayesian mixed model assoc stats = 0.050467 sec
+Time for computing Bayesian mixed model assoc stats = 0.050324 sec
 
 Calibration stats: mean and lambdaGC (over SNPs used in GRM)
   (note that both should be >1 because of polygenicity)
@@ -599,23 +599,23 @@ Mean BOLT_LMM: 1.0957 (2431 good SNPs)   lambdaGC: 1.06946
 
 === Streaming genotypes to compute and write assoc stats at all SNPs ===
 
-Time for streaming genotypes and writing output = 0.167874 sec
+Time for streaming genotypes and writing output = 0.159266 sec
 
 
 === Streaming genotypes to compute and write assoc stats at dosage SNPs ===
 
-Time for streaming dosage genotypes and writing output = 0.0240848 sec
+Time for streaming dosage genotypes and writing output = 0.0267661 sec
 
 
 === Streaming genotypes to compute and write assoc stats at IMPUTE2 SNPs ===
 
 Read 379 indivs; using 373 in filtered PLINK data
 
-Time for streaming IMPUTE2 genotypes and writing output = 0.0200861 sec
+Time for streaming IMPUTE2 genotypes and writing output = 0.030103 sec
 
 
 === Streaming genotypes to compute and write assoc stats at dosage2 SNPs ===
 
-Time for streaming dosage2 genotypes and writing output = 0.0451181 sec
+Time for streaming dosage2 genotypes and writing output = 0.0679121 sec
 
-Total elapsed time for analysis = 22.3057 sec
+Total elapsed time for analysis = 23.0091 sec


=====================================
example/example_reml2.log
=====================================
@@ -1,7 +1,7 @@
                       +-----------------------------+
                       |                       ___   |
-                      |   BOLT-LMM, v2.4     /_ /   |
-                      |   July 22, 2022       /_/   |
+                      |   BOLT-LMM, v2.4.1   /_ /   |
+                      |   November 16, 2022   /_/   |
                       |   Po-Ru Loh            //   |
                       |                        /    |
                       +-----------------------------+
@@ -60,7 +60,7 @@ Total indivs after QC: 379
 Total post-QC SNPs: M = 1331
   Variance component 1: 660 post-QC SNPs (name: 'chr21')
   Variance component 2: 671 post-QC SNPs (name: 'chr22')
-Time for SnpData setup = 0.327268 sec
+Time for SnpData setup = 0.330156 sec
 
 === Reading phenotype and covariate data ===
 
@@ -80,7 +80,7 @@ Total independent covariate vectors: Cindep = 1
 Number of chroms with >= 1 good SNP: 2
 Average norm of projected SNPs:           368.000000
 Dimension of all-1s proj space (Nused-1): 368
-Time for covariate data setup + Bolt initialization = 0.011085 sec
+Time for covariate data setup + Bolt initialization = 0.0118818 sec
 
 Phenotype 1:   N = 369   mean = -0.000706532   std = 1.02606
 Phenotype 2:   N = 369   mean = 1.53117   std = 0.499705
@@ -99,7 +99,7 @@ Estimating MC scaling f_REML at log(delta) = 1.09861, h2 = 0.25...
   iter 4:  time=0.00  rNorms/orig: (0.001,0.001)  res2s: 874.469..225.607
   iter 5:  time=0.00  rNorms/orig: (0.0002,0.0002)  res2s: 874.479..225.61
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 26.5%, memory/overhead = 73.5%
+  Time breakdown: dgemm = 27.4%, memory/overhead = 72.6%
   MCscaling: logDelta = 1.10, h2 = 0.250, f = -0.0414761
 
 Estimating MC scaling f_REML at log(delta) = 1.94591, h2 = 0.125...
@@ -109,7 +109,7 @@ Estimating MC scaling f_REML at log(delta) = 1.94591, h2 = 0.125...
   iter 3:  time=0.00  rNorms/orig: (0.0008,0.001)  res2s: 2350.09..296.683
   iter 4:  time=0.00  rNorms/orig: (9e-05,0.0001)  res2s: 2350.11..296.687
   Converged at iter 4: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 26.2%, memory/overhead = 73.8%
+  Time breakdown: dgemm = 26.9%, memory/overhead = 73.1%
   MCscaling: logDelta = 1.95, h2 = 0.125, f = 0.012255
 
 Estimating MC scaling f_REML at log(delta) = 1.75266, h2 = 0.147712...
@@ -119,7 +119,7 @@ Estimating MC scaling f_REML at log(delta) = 1.75266, h2 = 0.147712...
   iter 3:  time=0.00  rNorms/orig: (0.001,0.002)  res2s: 1888.28..282.578
   iter 4:  time=0.00  rNorms/orig: (0.0002,0.0002)  res2s: 1888.31..282.586
   Converged at iter 4: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 26.4%, memory/overhead = 73.6%
+  Time breakdown: dgemm = 27.0%, memory/overhead = 73.0%
   MCscaling: logDelta = 1.75, h2 = 0.148, f = 0.00181293
 
 Estimating MC scaling f_REML at log(delta) = 1.71911, h2 = 0.151986...
@@ -129,7 +129,7 @@ Estimating MC scaling f_REML at log(delta) = 1.71911, h2 = 0.151986...
   iter 3:  time=0.00  rNorms/orig: (0.001,0.002)  res2s: 1817.23..279.989
   iter 4:  time=0.00  rNorms/orig: (0.0002,0.0002)  res2s: 1817.27..279.999
   Converged at iter 4: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 26.6%, memory/overhead = 73.4%
+  Time breakdown: dgemm = 26.8%, memory/overhead = 73.2%
   MCscaling: logDelta = 1.72, h2 = 0.152, f = -0.000107663
 
 Secant iteration for h2 estimation converged in 2 steps
@@ -151,14 +151,14 @@ Estimating MC scaling f_REML at log(delta) = 1.71911, h2 = 0.151986...
   iter 4:  time=0.00  rNorms/orig: (0.0004,0.0007)  res2s: 1809.76..293.283
   iter 5:  time=0.00  rNorms/orig: (6e-05,9e-05)  res2s: 1809.76..293.283
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 33.5%, memory/overhead = 66.5%
+  Time breakdown: dgemm = 33.0%, memory/overhead = 67.0%
 Estimating MC scaling f_REML at log(delta) = 2.71911, h2 = 0.0618553...
   Batch-solving 8 systems of equations using conjugate gradient iteration
   iter 1:  time=0.00  rNorms/orig: (0.05,0.06)  res2s: 5411.29..341.902
   iter 2:  time=0.00  rNorms/orig: (0.003,0.005)  res2s: 5452.11..344.669
   iter 3:  time=0.00  rNorms/orig: (0.0002,0.0004)  res2s: 5452.34..344.696
   Converged at iter 3: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 33.0%, memory/overhead = 67.0%
+  Time breakdown: dgemm = 32.7%, memory/overhead = 67.3%
 WARNING: Estimated h2 on leave-out batch 0 exceeds all-SNPs h2
          Replacing 0.265571 with 0.151986
 MCscaling:   logDelta[0] = 1.719106,   h2 = 0.152,   Mused = 671  (50.4%)
@@ -180,7 +180,7 @@ Estimating MC scaling f_REML at log(delta) = 1.09861, h2 = 0.25...
   iter 4:  time=0.00  rNorms/orig: (0.001,0.001)  res2s: 874.469..55.2016
   iter 5:  time=0.00  rNorms/orig: (0.0002,0.0002)  res2s: 874.479..55.2022
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 26.7%, memory/overhead = 73.3%
+  Time breakdown: dgemm = 27.2%, memory/overhead = 72.8%
   MCscaling: logDelta = 1.10, h2 = 0.250, f = -0.103553
 
 Estimating MC scaling f_REML at log(delta) = 1.94591, h2 = 0.125...
@@ -190,7 +190,7 @@ Estimating MC scaling f_REML at log(delta) = 1.94591, h2 = 0.125...
   iter 3:  time=0.00  rNorms/orig: (0.0008,0.001)  res2s: 2350.09..71.6417
   iter 4:  time=0.00  rNorms/orig: (9e-05,0.0001)  res2s: 2350.11..71.6423
   Converged at iter 4: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 26.6%, memory/overhead = 73.4%
+  Time breakdown: dgemm = 27.5%, memory/overhead = 72.5%
   MCscaling: logDelta = 1.95, h2 = 0.125, f = -0.0587766
 
 Estimating MC scaling f_REML at log(delta) = 3.05814, h2 = 0.0448672...
@@ -199,7 +199,7 @@ Estimating MC scaling f_REML at log(delta) = 3.05814, h2 = 0.0448672...
   iter 2:  time=0.00  rNorms/orig: (0.0007,0.001)  res2s: 7840.59..84.0634
   iter 3:  time=0.00  rNorms/orig: (4e-05,7e-05)  res2s: 7840.64..84.0642
   Converged at iter 3: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 26.7%, memory/overhead = 73.3%
+  Time breakdown: dgemm = 26.9%, memory/overhead = 73.1%
   MCscaling: logDelta = 3.06, h2 = 0.045, f = -0.0246466
 
 Estimating MC scaling f_REML at log(delta) = 3.86133, h2 = 0.0206065...
@@ -207,7 +207,7 @@ Estimating MC scaling f_REML at log(delta) = 3.86133, h2 = 0.0206065...
   iter 1:  time=0.00  rNorms/orig: (0.01,0.01)  res2s: 18037.5..88.1625
   iter 2:  time=0.00  rNorms/orig: (0.0001,0.0003)  res2s: 18046.2..88.2065
   Converged at iter 2: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 27.2%, memory/overhead = 72.8%
+  Time breakdown: dgemm = 27.5%, memory/overhead = 72.5%
   MCscaling: logDelta = 3.86, h2 = 0.021, f = -0.0122715
 
 Estimating MC scaling f_REML at log(delta) = 4.65779, h2 = 0.00939822...
@@ -215,7 +215,7 @@ Estimating MC scaling f_REML at log(delta) = 4.65779, h2 = 0.00939822...
   iter 1:  time=0.00  rNorms/orig: (0.005,0.006)  res2s: 40636.6..90.1814
   iter 2:  time=0.00  rNorms/orig: (3e-05,7e-05)  res2s: 40640.7..90.1908
   Converged at iter 2: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 26.6%, memory/overhead = 73.4%
+  Time breakdown: dgemm = 27.1%, memory/overhead = 72.9%
   MCscaling: logDelta = 4.66, h2 = 0.009, f = -0.0057282
 
 Estimating MC scaling f_REML at log(delta) = 5.35504, h2 = 0.00470207...
@@ -223,7 +223,7 @@ Estimating MC scaling f_REML at log(delta) = 5.35504, h2 = 0.00470207...
   iter 1:  time=0.00  rNorms/orig: (0.003,0.003)  res2s: 82199.2..91.034
   iter 2:  time=0.00  rNorms/orig: (7e-06,2e-05)  res2s: 82201.3..91.0364
   Converged at iter 2: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 26.7%, memory/overhead = 73.3%
+  Time breakdown: dgemm = 27.3%, memory/overhead = 72.7%
   MCscaling: logDelta = 5.36, h2 = 0.005, f = -0.00257581
 
 Estimating MC scaling f_REML at log(delta) = 5.92476, h2 = 0.00266533...
@@ -231,7 +231,7 @@ Estimating MC scaling f_REML at log(delta) = 5.92476, h2 = 0.00266533...
   iter 1:  time=0.00  rNorms/orig: (0.001,0.002)  res2s: 145816..91.405
   iter 2:  time=0.00  rNorms/orig: (2e-06,5e-06)  res2s: 145817..91.4057
   Converged at iter 2: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 26.7%, memory/overhead = 73.3%
+  Time breakdown: dgemm = 27.3%, memory/overhead = 72.7%
   MCscaling: logDelta = 5.92, h2 = 0.003, f = -0.00101218
 
 WARNING: Secant iteration for h2 estimation may not have converged
@@ -250,13 +250,13 @@ Estimating MC scaling f_REML at log(delta) = 5.92476, h2 = 0.00266533...
   iter 1:  time=0.00  rNorms/orig: (0.002,0.003)  res2s: 145671..91.3763
   iter 2:  time=0.00  rNorms/orig: (5e-06,9e-06)  res2s: 145674..91.3781
   Converged at iter 2: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 33.3%, memory/overhead = 66.7%
+  Time breakdown: dgemm = 32.7%, memory/overhead = 67.3%
 Estimating MC scaling f_REML at log(delta) = 6.92476, h2 = 0.000982175...
   Batch-solving 8 systems of equations using conjugate gradient iteration
   iter 1:  time=0.00  rNorms/orig: (0.0008,0.001)  res2s: 397458..91.7015
   iter 2:  time=0.00  rNorms/orig: (7e-07,1e-06)  res2s: 397459..91.7018
   Converged at iter 2: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 33.1%, memory/overhead = 66.9%
+  Time breakdown: dgemm = 32.7%, memory/overhead = 67.3%
 MCscaling:   logDelta[0] = 60.456787,   h2 = 0.000,   Mused = 671  (50.4%)
 WARNING: Estimated h2 on leave-out batch 1 exceeds all-SNPs h2
          Replacing 0.10913 with 0.00266533
@@ -279,13 +279,13 @@ Vegs[2][2,2]((0.142948,5.73352e-07),(5.73352e-07,1e-09))
 
 Performing initial gradient evaluation
   Batch-solving 16 systems of equations using conjugate gradient iteration
-  iter 1:  time=0.00  rNorms/orig: (0.09,0.1)  res2s: 757.838..714.073
+  iter 1:  time=0.01  rNorms/orig: (0.09,0.1)  res2s: 757.838..714.073
   iter 2:  time=0.00  rNorms/orig: (0.01,0.02)  res2s: 775.25..735.608
   iter 3:  time=0.00  rNorms/orig: (0.002,0.004)  res2s: 775.947..736.998
   iter 4:  time=0.00  rNorms/orig: (0.0002,0.0005)  res2s: 775.968..737.057
   iter 5:  time=0.00  rNorms/orig: (4e-05,7e-05)  res2s: 775.969..737.058
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 31.0%, memory/overhead = 69.0%
+  Time breakdown: dgemm = 30.4%, memory/overhead = 69.6%
 grad[9](3.09407,-6.72961,-1.13849,-3.04818,7.07062,1.17961,13.4749,-2.86227,-7.58339)
 
 -------------------------------------------------------------------------------
@@ -299,7 +299,7 @@ Start ITER 1: computing AI matrix
   iter 4:  time=0.00  rNorms/orig: (0.0001,0.0006)  res2s: 391.934..576.579
   iter 5:  time=0.00  rNorms/orig: (2e-05,8e-05)  res2s: 391.935..576.579
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 33.0%, memory/overhead = 67.0%
+  Time breakdown: dgemm = 32.8%, memory/overhead = 67.2%
 Reducing off-diagonals by a factor of 4.47035e-08 to make matrix positive definite
 Reducing off-diagonals by a factor of 1.86265e-09 to make matrix positive definite
 
@@ -318,7 +318,7 @@ Computing actual (approximate) change in log likelihood
   iter 4:  time=0.01  rNorms/orig: (0.001,0.002)  res2s: 760.12..736.003
   iter 5:  time=0.01  rNorms/orig: (0.0003,0.0004)  res2s: 760.136..736.026
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 45.5%, memory/overhead = 54.5%
+  Time breakdown: dgemm = 44.7%, memory/overhead = 55.3%
 grad[9](1.70519,0.912804,1.26463,-4.85099,7.69901,-1.24528,1.75122,0.944023,-5.57033)
 
 Approximate change in log likelihood: 0.752385 (attempt 1)
@@ -343,7 +343,7 @@ Start ITER 2: computing AI matrix
   iter 4:  time=0.00  rNorms/orig: (0.0005,0.002)  res2s: 400.293..592.563
   iter 5:  time=0.00  rNorms/orig: (9e-05,0.0005)  res2s: 400.305..592.564
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 42.2%, memory/overhead = 57.8%
+  Time breakdown: dgemm = 42.1%, memory/overhead = 57.9%
 Reducing off-diagonals by a factor of 6.51926e-09 to make matrix positive definite
 Reducing off-diagonals by a factor of 2.8871e-07 to make matrix positive definite
 Reducing off-diagonals by a factor of 8.3819e-09 to make matrix positive definite
@@ -364,7 +364,7 @@ Computing actual (approximate) change in log likelihood
   iter 4:  time=0.00  rNorms/orig: (0.002,0.002)  res2s: 754.521..724.302
   iter 5:  time=0.00  rNorms/orig: (0.0003,0.0004)  res2s: 754.538..724.327
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 39.4%, memory/overhead = 60.6%
+  Time breakdown: dgemm = 38.9%, memory/overhead = 61.1%
 grad[9](0.0579094,-0.0410736,-0.0340175,-6.19482,7.16577,-2.04715,0.0273764,0.616035,-6.7186)
 
 Approximate change in log likelihood: 0.0158352 (attempt 1)
@@ -390,7 +390,7 @@ Start ITER 3: computing AI matrix
   iter 5:  time=0.00  rNorms/orig: (9e-05,0.0005)  res2s: 386.707..577.27
   iter 6:  time=0.00  rNorms/orig: (2e-05,9e-05)  res2s: 386.707..577.27
   Converged at iter 6: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 38.5%, memory/overhead = 61.5%
+  Time breakdown: dgemm = 38.2%, memory/overhead = 61.8%
 Reducing off-diagonals by a factor of 1.86265e-09 to make matrix positive definite
 
 Constrained Newton-Raphson optimized variance parameters:
@@ -419,7 +419,7 @@ Performing initial gradient evaluation
   iter 4:  time=0.02  rNorms/orig: (0.001,0.002)  res2s: 692.852..724.206
   iter 5:  time=0.02  rNorms/orig: (0.0002,0.0005)  res2s: 692.865..724.231
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 34.2%, memory/overhead = 65.8%
+  Time breakdown: dgemm = 33.7%, memory/overhead = 66.3%
 grad[9](-6.9566,10.7443,-1.85918,-15.0527,20.8568,0.457434,-9.31692,3.14494,-11.8159)
 
 -------------------------------------------------------------------------------
@@ -434,7 +434,7 @@ Start ITER 1: computing AI matrix
   iter 5:  time=0.00  rNorms/orig: (9e-05,0.0005)  res2s: 386.165..577.663
   iter 6:  time=0.00  rNorms/orig: (2e-05,9e-05)  res2s: 386.166..577.663
   Converged at iter 6: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 35.8%, memory/overhead = 64.2%
+  Time breakdown: dgemm = 35.2%, memory/overhead = 64.8%
 Reducing off-diagonals by a factor of 3.72529e-09 to make matrix positive definite
 Reducing off-diagonals by a factor of 1.11759e-08 to make matrix positive definite
 Reducing off-diagonals by a factor of 2.70084e-08 to make matrix positive definite
@@ -455,7 +455,7 @@ Computing actual (approximate) change in log likelihood
   iter 4:  time=0.02  rNorms/orig: (0.0006,0.002)  res2s: 711.638..765.655
   iter 5:  time=0.02  rNorms/orig: (0.0001,0.0003)  res2s: 711.643..765.667
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 34.3%, memory/overhead = 65.7%
+  Time breakdown: dgemm = 33.6%, memory/overhead = 66.4%
 grad[9](0.593026,-0.656444,0.398117,-8.84248,7.66129,-1.54565,0.982601,-1.77881,-9.57942)
 
 Approximate change in log likelihood: 0.505571 (attempt 1)
@@ -480,7 +480,7 @@ Start ITER 2: computing AI matrix
   iter 4:  time=0.00  rNorms/orig: (0.0004,0.002)  res2s: 448.291..603.825
   iter 5:  time=0.00  rNorms/orig: (7e-05,0.0003)  res2s: 448.297..603.826
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 35.7%, memory/overhead = 64.3%
+  Time breakdown: dgemm = 35.3%, memory/overhead = 64.7%
 Reducing off-diagonals by a factor of 1.55531e-07 to make matrix positive definite
 Reducing off-diagonals by a factor of 2.23517e-08 to make matrix positive definite
 
@@ -499,7 +499,7 @@ Computing actual (approximate) change in log likelihood
   iter 4:  time=0.02  rNorms/orig: (0.0007,0.002)  res2s: 709.767..761.724
   iter 5:  time=0.02  rNorms/orig: (0.0001,0.0003)  res2s: 709.773..761.738
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 34.0%, memory/overhead = 66.0%
+  Time breakdown: dgemm = 33.6%, memory/overhead = 66.4%
 grad[9](0.0132718,0.00120605,-0.000469693,-9.30953,8.31106,-1.85435,-0.0542454,-1.31603,-9.91664)
 
 Approximate change in log likelihood: 0.00315098 (attempt 1)
@@ -524,7 +524,7 @@ Start ITER 3: computing AI matrix
   iter 4:  time=0.00  rNorms/orig: (0.0004,0.002)  res2s: 444.271..599.25
   iter 5:  time=0.00  rNorms/orig: (7e-05,0.0003)  res2s: 444.279..599.251
   Converged at iter 5: rNorms/orig all < CGtol=0.0005
-  Time breakdown: dgemm = 35.9%, memory/overhead = 64.1%
+  Time breakdown: dgemm = 35.5%, memory/overhead = 64.5%
 
 Constrained Newton-Raphson optimized variance parameters:
 optVegs[0][2,2]((0.785936,0.0422243),(0.0422243,0.93567))
@@ -565,4 +565,4 @@ Variance component 2:  "chr22"
   gen corr (1,2): -0.999999 (53.474367)
   h2g (2,2): 0.000856 (0.090672)
 
-Total elapsed time for analysis = 2.14227 sec
+Total elapsed time for analysis = 2.18387 sec


=====================================
src/Bolt.cpp
=====================================
@@ -32,6 +32,7 @@
 
 #include "omp.h"
 #include "zlib.h"
+#include "zstd.h"
 
 #include <boost/random.hpp>
 #include <boost/random/mersenne_twister.hpp>
@@ -3018,21 +3019,29 @@ namespace LMM {
     fout.close();
   }
 
-  string Bolt::getSnpStatsBgen2(uchar *buf, uint bufLen, const uchar *zBuf, uint zBufLen,
-				uint Nbgen, const vector <uint64> &bgenIndivInds,
-				const string &snpName, int chrom, int physpos,double genpos,
-				const string &allele1, const string &allele0,
-				double snpCovCompVec[], bool verboseStats,
+  string Bolt::getSnpStatsBgen2(uint CompressedSNPBlocks, uchar *buf, uint bufLen,
+				const uchar *zBuf, uint zBufLen, uint Nbgen,
+				const vector <uint64> &bgenIndivInds, const string &snpName,
+				int chrom, int physpos,double genpos, const string &allele1,
+				const string &allele0, double snpCovCompVec[], bool verboseStats,
 				const vector <StatsDataRetroLOCO> &retroData, bool domRecHetTest,
 				double bgenMinMAF, double bgenMinINFO) const {
 
     /********** decompress and check genotype probability block **********/
 
     //cout << "bufLen = " << bufLen << " zBufLen = " << zBufLen << endl;
-    uLongf destLen = bufLen;
-    if (uncompress(buf, &destLen, zBuf, zBufLen) != Z_OK || destLen != bufLen) {
-      cerr << "ERROR: uncompress() failed" << endl;
-      exit(1);
+    if (CompressedSNPBlocks == 1) {
+      uLongf destLen = bufLen;
+      if (uncompress(buf, &destLen, zBuf, zBufLen) != Z_OK || destLen != bufLen) {
+	cerr << "ERROR: uncompress() failed" << endl;
+	exit(1);
+      }
+    }
+    else {
+      if (ZSTD_decompress(buf, bufLen, zBuf, zBufLen) != bufLen) {
+	cerr << "ERROR: ZSTD_decompress() failed" << endl;
+	exit(1);
+      }
     }
     uchar *bufAt = buf;
     uint N = bufAt[0]|(bufAt[1]<<8)|(bufAt[2]<<16)|(bufAt[3]<<24); bufAt += 4;
@@ -3213,7 +3222,7 @@ namespace LMM {
     fseek_check(fin, L_H-20, SEEK_CUR); //cout << "skipping L_H-20 = " << L_H-20 << " bytes (free data area)" << endl;
     uint flags; fread_check(&flags, 4, 1, fin); //cout << "flags: " << flags << endl;
     uint CompressedSNPBlocks = flags&3; cout << "CompressedSNPBlocks: " << CompressedSNPBlocks << endl;
-    assert(CompressedSNPBlocks==1); // REQUIRE CompressedSNPBlocks==1
+    assert(CompressedSNPBlocks==1 || CompressedSNPBlocks==2); // REQUIRE CompressedSNPBlocks==1||2
     uint Layout = (flags>>2)&0xf; cout << "Layout: " << Layout << endl;
     assert(Layout==1 || Layout==2); // REQUIRE Layout==1 or Layout==2
 
@@ -3302,10 +3311,11 @@ namespace LMM {
 	for (int b = 0; b < B; b++) {
 	  int t = omp_get_thread_num();
 	  if (bufLens[b] > bufs[t].size()) bufs[t].resize(bufLens[b]);
-	  outStrs[b] = getSnpStatsBgen2(&bufs[t][0], bufLens[b], &zBufs[b][0], zBufLens[b], Nbgen,
-					bgenIndivInds, snpNames[b], chroms[b], bps[b], gps[b],
-					allele1s[b], allele0s[b], snpCovCompVecs[t], verboseStats,
-					retroData, domRecHetTest, bgenMinMAF, bgenMinINFO);
+	  outStrs[b] = getSnpStatsBgen2(CompressedSNPBlocks, &bufs[t][0], bufLens[b], &zBufs[b][0],
+					zBufLens[b], Nbgen, bgenIndivInds, snpNames[b], chroms[b],
+					bps[b], gps[b], allele1s[b], allele0s[b],
+					snpCovCompVecs[t], verboseStats, retroData, domRecHetTest,
+					bgenMinMAF, bgenMinINFO);
 	}
 
 	for (int b = 0; b < B; b++)


=====================================
src/Bolt.hpp
=====================================
@@ -325,8 +325,9 @@ namespace LMM {
 			    double dosageLine[], bool verboseStats,
 			    const std::vector <StatsDataRetroLOCO> &retroData, double info=-9)
       const;
-    std::string getSnpStatsBgen2(uchar *buf, uint bufLen, const uchar *zBuf, uint zBufLen,
-				 uint Nbgen, const std::vector <uint64> &bgenIndivInds,
+    std::string getSnpStatsBgen2(uint CompressedSNPBlocks, uchar *buf, uint bufLen,
+				 const uchar *zBuf, uint zBufLen, uint Nbgen,
+				 const std::vector <uint64> &bgenIndivInds,
 				 const std::string &snpName, int chrom, int physpos, double genpos,
 				 const std::string &allele1, const std::string &allele0,
 				 double snpCovCompVec[], bool verboseStats,


=====================================
src/BoltMain.cpp
=====================================
@@ -54,8 +54,8 @@ int main(int argc, char *argv[]) {
 
   cout << "                      +-----------------------------+" << endl;
   cout << "                      |                       ___   |" << endl;
-  cout << "                      |   BOLT-LMM, v2.4     /_ /   |" << endl;
-  cout << "                      |   July 22, 2022       /_/   |" << endl;
+  cout << "                      |   BOLT-LMM, v2.4.1   /_ /   |" << endl;
+  cout << "                      |   November 16, 2022   /_/   |" << endl;
   cout << "                      |   Po-Ru Loh            //   |" << endl;
   cout << "                      |                        /    |" << endl;
   cout << "                      +-----------------------------+" << endl;


=====================================
src/FileUtils.cpp
=====================================
@@ -25,6 +25,7 @@
 #include <cassert>
 
 #include "zlib.h"
+#include "zstd.h"
 
 #include <boost/iostreams/filtering_stream.hpp>
 #include <boost/iostreams/filter/gzip.hpp>
@@ -277,7 +278,7 @@ namespace FileUtils {
     fseek_check(fin, L_H-20, SEEK_CUR); //cout << "skipping L_H-20 = " << L_H-20 << " bytes (free data area)" << endl;
     uint flags; fread_check(&flags, 4, 1, fin); //cout << "flags: " << flags << endl;
     uint CompressedSNPBlocks = flags&3; cout << "CompressedSNPBlocks: " << CompressedSNPBlocks << endl;
-    assert(CompressedSNPBlocks==1); // REQUIRE CompressedSNPBlocks==1
+    assert(CompressedSNPBlocks==1 || CompressedSNPBlocks==2); // REQUIRE CompressedSNPBlocks==1||2
     uint Layout = (flags>>2)&0xf; cout << "Layout: " << Layout << endl;
     assert(Layout==1 || Layout==2); // REQUIRE Layout==1 or Layout==2
 
@@ -336,9 +337,18 @@ namespace FileUtils {
 
       uLongf destLen = D, bufLen = D, zBufLen = C-4;
       //cout << "bufLen = " << bufLen << " zBufLen = " << zBufLen << endl;
-      if (uncompress(buf, &destLen, zBuf, zBufLen) != Z_OK || destLen != bufLen) {
-	cerr << "ERROR: uncompress() failed" << endl;
-	exit(1);
+      if (CompressedSNPBlocks == 1) {
+	uLongf destLen = bufLen;
+	if (uncompress(buf, &destLen, zBuf, zBufLen) != Z_OK || destLen != bufLen) {
+	  cerr << "ERROR: uncompress() failed" << endl;
+	  exit(1);
+	}
+      }
+      else {
+	if (ZSTD_decompress(buf, bufLen, zBuf, zBufLen) != bufLen) {
+	  cerr << "ERROR: ZSTD_decompress() failed" << endl;
+	  exit(1);
+	}
       }
       uchar *bufAt = buf;
       uint N = bufAt[0]|(bufAt[1]<<8)|(bufAt[2]<<16)|(bufAt[3]<<24); bufAt += 4;


=====================================
src/Makefile
=====================================
@@ -8,6 +8,8 @@ ZLIB_STATIC_DIR = /n/groups/price/poru/external_software/zlib/zlib-1.2.11 # prob
 LIBSTDCXX_STATIC_DIR = /n/groups/price/poru/external_software/libstdc++/usr/lib/gcc/x86_64-redhat-linux/4.8.5/
 GLIBC_STATIC_DIR = /home/pl88/glibc-static/usr/lib64
 
+ZSTD_DIR = /n/data1/bwh/medicine/loh/ploh/external_software/zstd-1.5.2/lib
+
 ifeq ($(strip ${linking}),)
 	linking = dynamic
 endif
@@ -51,6 +53,10 @@ ifneq ($(strip ${ZLIB_STATIC_DIR}),)
 	endif
 endif
 
+# add libzstd path
+CPATHS += -I${ZSTD_DIR}
+LPATHS += -L${ZSTD_DIR}
+
 # add MKL paths (if not compiling with g++, i.e., compiling with icpc)
 ifneq (${CC},g++)
 	CPATHS += -I${MKLROOT}/include
@@ -105,7 +111,7 @@ else
 endif
 
 # build link line (minus flags)
-LLIBS = -lboost_program_options -lboost_iostreams -lz -lnlopt
+LLIBS = -lboost_program_options -lboost_iostreams -lzstd -lz -lnlopt
 ifeq (${linking},static-except-glibc)
 	L = -L${LIBSTDCXX_STATIC_DIR} ${LPATHS} -Wl,--wrap=memcpy -Wl,-Bstatic ${LLIBS} ${LLAPACK} -Wl,-Bdynamic ${LIOMP5} -lpthread -lm ${LDL}
 else ifeq (${linking},static-except-glibc-intel)



View it on GitLab: https://salsa.debian.org/med-team/bolt-lmm/-/commit/0c1d6c5f78320daff9a9b3093f2241a4d5f6fa11

-- 
View it on GitLab: https://salsa.debian.org/med-team/bolt-lmm/-/commit/0c1d6c5f78320daff9a9b3093f2241a4d5f6fa11
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20230711/8bcb0f1b/attachment-0001.htm>