[Debichem-devel] Bug#686874: cp2k: segfaults on i386 due to not aligning FFT arrays to 16 bytes for SSE2 and reusing them

Michael Banck mbanck at debian.org
Thu Sep 6 22:12:27 UTC 2012

Package: cp2k
Version: 2.2.426-1
Severity: serious

Since some months ago, fftw3 is using SSE2 instructions for fast fourier
transforms if the CPU supports it.  One requirement for this is that the
incoming arrays must be aligned to 16 bytes for the SIMD instructions.
This is the case on amd64 by default, but not on i386.

Usually, fftw3 will detect unaligned arrays and not use SSE2.  However,
if the same plan is originally executed with a properly aligned array
and then re-executed  with another, unaligned array, a segfault will
occur, see e.g. 

                        max_diis:                                              4
                        eps_scf:                                        1.00E-05
                        eps_scf_history:                                0.00E+00
                        eps_diis:                                       1.00E-01
                        eps_eigval:                                     1.00E-05
                        level_shift [a.u.]:                                 0.00
                        Mixing method:                           DIRECT_P_MIXING
                        No outer SCF

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x56A8FF6B
#1  0x56A905AC
#2  0x555733FF
#3  0x55FE3C0B
#4  0x55F2008B
#5  0x55F22E9B
#6  0x55F23402
#7  0x55F22E9B
#8  0x55FC558B
#9  0x9666567 in fftw33d_ at fftw3_lib.F:196
#10  0x966195B in fft_3d_ at fft_lib.f90:135
#11  0x80C77D3 in __fft_tools_MOD_fft3d_s at fft_tools.F:441
#12  0x8340A58 in __pw_methods_MOD_fft_wrap_pw1pw2 at pw_methods.F:1462
#13  0x8EABD6D in __qs_collocate_density_MOD_density_rs2pw at qs_collocate_density.F:3433
#14  0x8EC0EF2 in __qs_collocate_density_MOD_calculate_rho_elec at qs_collocate_density.F:1392
#15  0x849ADDE in __qs_rho_methods_MOD_qs_rho_update_rho at qs_rho_methods.F:391
#16  0x84A1E30 in __qs_scf_MOD_scf_env_initial_rho_setup at qs_scf.F:2277
#17  0x84A9F28 in __qs_scf_MOD_init_scf_run at qs_scf.F:1810
#18  0x84B53A7 in __qs_scf_MOD_scf at qs_scf.F:368
#19  0x836FA1C in __qs_energy_MOD_qs_energies_scf at qs_energy.F:231
#20  0x80CD15C in __force_env_methods_MOD_force_env_calc_energy_force at force_env_methods.F:231
#21  0x8054D39 in __cp2k_runs_MOD_cp2k_run at cp2k_runs.F:403
#22  0x8058C1D in __cp2k_runs_MOD_run_input at cp2k_runs.F:1143
#23  0x804E2C5 in cp2k at cp2k.F:285

This problem is common enough (1/3 of the test cases segfaulted on the
autobuilder) that it makes cp2k pretty much unusable on i386.


More information about the Debichem-devel mailing list