Bug#788704: gnutls28: VIA PadLock accelerated AES-CBC segfaults
Peter Lebbing
peter at digitalbrains.com
Sun Jun 14 12:40:11 UTC 2015
Source: gnutls28
Version: 3.3.8-6+deb8u1
Severity: normal
Tags: patch
Dear Maintainer,
After upgrading a server with a VIA C3 (Nehemiah) processor to jessie,
exim4 started to crash on pretty much every connection that negotiated
AES-128-CBC or AES-256-CBC on TLS:
> 2015-05-31T16:06:43.641909+02:00 info kernel: [ 3466.879332] exim4[2381]: segfault at 4aeb2453 ip b6a4a5e2 sp bf7bfde0 error 4 in libgnutls-deb0.so.28.41.0[b6996000+13a000]
There's nothing in the exim4 log; it crashes before anything is logged.
The problem is not limited to exim4: I could reproduce the crash with
gnutls-cli. This is also how I discovered AES-CBC was to blame. It turns
out that lib/accelerated/x86/elf/e_padlock-x86.s segfaults in the
function padlock_cbc_encrypt() because it has not been written to handle
the case where it is requested to encrypt 0 bytes of data; it expects a
strictly positive number. Yet, it is called with a length of 0. The
function overwrites 512 bytes on the top of its stack, and then
dereferences a pointer in the overwritten data.
The attached patch simply checks for a 0 length in the C functions that
call the handwritten assembly PadLock AES encryption routines, so the
call is avoided. The patch fixed the issue on my system.
All the functions padlock_{ecb,cbc,cfb,ofb,ctr32}_encrypt() share almost
all their code. The AES-CGM ciphers in GnuTLS don't seem to call the
function with a 0 length, but my patch also checks for 0 in that cipher
mode on the basis "once bitten, twice shy".
The rest of the mail is a detailed description of the problem with
length 0.
I don't include a "steps to reproduce this" because it's likely you
don't have access to a system with a VIA PadLock engine. But it's
trivial to reproduce with gnutls-cli and an SMTP server to connect to.
To my surprise, e_padlock-x86.s is a generated file[4]; the source is
not included with GnuTLS. The source is the file
engines/asm/e_padlock-x86.pl in openssl; from the Git repository for
GnuTLS version 3.3.8, it can be concluded that the source is commit
34ccd24 in the OpenSSL Git.
First, a small aside on return values. OpenSSL checks the return value
(an int) of padlock_cbc_encrypt(), and indeed the function can abort if
the struct padlock_cipher_data is not aligned on a 16-byte boundary, or
if len is not a multiple of 16 (AES block size). But there's something
odd: /if/ the function aborts, the return value (EAX) is not set; it is
kept at the value it was when the function was called. On a succesful
return, it's set to 1.
GnuTLS never checks the return value. If it is possible that the length
is not a multiple of 16, GnuTLS will not notice that the
padlock_cbc_encrypt() call did nothing (unless it does some checks on
the data later).
I analysed the behaviour of padlock_cbc_encrypt() with GDB.
During my debugging, the input and output data were never aligned on a
16-byte boundary. padlock_cbc_encrypt() checks for this because not all
VIA processors can cope with this, or only at a tremendous speed
penalty. To fix the alignment, an aligned block of room is allocated on
the stack, and the data is copied there, encrypted in place, and copied to
the destination.
This part of the program is the problematic part:
File openssl/engines/asm/e_padlock-x86.pl, line 204:
----------------------- 8< ------------------ >8 -----------------------
204: &cmp ($len,$chunk);
205: &cmovc ($chunk,$len); # chunk=len>PADLOCK_CHUNK?PADLOCK_CHUNK:len
206: &and ("eax",$chunk); # out_misaligned?chunk:0
207: &mov ($chunk,$len);
208: &neg ("eax");
209: &and ($chunk,$PADLOCK_CHUNK-1); # chunk=len%PADLOCK_CHUNK
210: &lea ("esp",&DWP(0,"eax","ebp")); # alloca
211: &mov ("eax",$PADLOCK_CHUNK);
212: &cmovz ($chunk,"eax"); # chunk=chunk?:PADLOCK_CHUNK
----------------------- 8< ------------------ >8 -----------------------
[1]
At the start, EAX is all ones (binary), signifying the output is not
aligned on a 16-byte boundary. Room is allocated on the stack for the
data in line 210. Either `len` bytes or PADLOCK_CHUNK (512), whichever
is less.
The purpose of `chunk` is to take up to PADLOCK_CHUNK bytes from
`len` to process. If `len` is not a multiple of 512, the remainder is
processed first. After the remainder is processed, either a multiple of
PADLOCK_CHUNK still needs to be processed, or we're done.
To this end, it ANDs `len` with 511 in line 209. If `len` is a multiple
of 512, the result is zero, so a full PADLOCK_CHUNK is processed (line
212).
But if `len` was already zero to begin with, this test produces the
wrong result, and results in `chunk` also being set to 512, to process
512 bytes of data.
The confusion comes now. The code has just set aside 0 bytes on the
stack for copying data. Then it copies 512 bytes of data from the input
pointer to the stack (in line 265 of the source[2]), thereby overwriting
the other information it is keeping on the stack. For good measure, this
is then encrypted as well :). And here is where the SIGSEGV occurs:
File openssl/engines/asm/e_padlock-x86.pl, line 279:
----------------------- 8< ------------------ >8 -----------------------
279: &mov ($out,&DWP(0,"ebp")); # restore parameters
280: &mov ($chunk,&DWP(12,"ebp"));
[...]
292: &test ($out,0x0f);
293: &jz (&label("${mode}_out_aligned"));
294: &mov ($len,$chunk);
295: &lea ($inp,&DWP(0,"esp"));
296: &shr ($len,2);
297: &data_byte(0xf3,0xa5); # rep movsl
----------------------- 8< ------------------ >8 -----------------------
[3]
`out` and `chunk` are loaded with bogus data; it is from the part of
the stack that was overwritten and subsequently encrypted. In line 297,
the 'rep movsl' copies `len` = `chunk` bytes of data from the stack to the
address `out`, but `chunk` and `out` are both garbage, resulting in a
segmentation fault.
That concludes my analysis.
With regards,
Peter.
-- System Information:
Debian Release: 8.1
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: i386 (i686)
Kernel: Linux 3.16.0-4-586
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
$ cat /proc/cpuinfo
processor : 0
vendor_id : CentaurHauls
cpu family : 6
model : 9
model name : VIA Nehemiah
stepping : 10
cpu MHz : 999.583
cache size : 64 KB
fdiv_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr cx8 sep mtrr pge cmov pat mmx
fxsr sse rng rng_en ace ace_en
bogomips : 1999.16
clflush size : 32
cache_alignment : 32
address sizes : 32 bits physical, 32 bits virtual
power management:
[1] https://git.openssl.org/?p=openssl.git;a=blob;f=engines/asm/e_padlock-x86.pl;h=4148468c41de695751e8731369a948dff171c1ca;hb=34ccd24d0e6#l204
[2] https://git.openssl.org/?p=openssl.git;a=blob;f=engines/asm/e_padlock-x86.pl;h=4148468c41de695751e8731369a948dff171c1ca;hb=34ccd24d0e6#l265
[3] https://git.openssl.org/?p=openssl.git;a=blob;f=engines/asm/e_padlock-x86.pl;h=4148468c41de695751e8731369a948dff171c1ca;hb=34ccd24d0e6#l279
[4] In my opinion, this is not in the spirit of the stipulation "the
preferred form of the work for making modifications to it" of the (L)GPL
license GnuTLS is released under. For one thing, all comments explaining
the code are not in the generated output.
--
I use the GNU Privacy Guard (GnuPG) in combination with Enigmail.
You can send me encrypted mail if you want some privacy.
My key is available at <http://digitalbrains.com/2012/openpgp-key-peter>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Handle_zero_length_padlock_accelerated_AES.patch
Type: text/x-diff
Size: 1116 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-gnutls-maint/attachments/20150614/6b4d7dfe/attachment.patch>
More information about the Pkg-gnutls-maint
mailing list