[Pkg-xen-devel] Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
Jan Beulich
JBeulich at suse.com
Tue May 16 09:54:37 UTC 2017
>>> On 16.05.17 at 05:47, <ehem+debian at m5p.com> wrote:
> On Mon, May 15, 2017 at 02:02:53AM -0600, Jan Beulich wrote:
>> >>> On 14.05.17 at 00:36, <ehem+debian at m5p.com> wrote:
>> > I haven't yet done as much experimentation as Andreas Pflug has, but I
>> > can confirm I'm also running into this bug with Xen 4.4.1.
>> >
>> > I've only tried Linux kernel 3.16.43, but as Dom0:
>> >
>> > EDAC MC: Ver: 3.0.0
>> > AMD64 EDAC driver v3.4.0
>> > EDAC amd64: DRAM ECC enabled.
>> > EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
>> > EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
>> > AMD64 EDAC driver v3.4.0
>> > EDAC amd64: DRAM ECC enabled.
>> > EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
>> > EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not
>> > load.
>>
>> Afaict the driver as is simply can't work in a Xen Dom0; it needs
>> enabling (read: para-virtualizing). I'm actually glad to see it doesn't
>> load (the worse alternative would be for it to load and then do the
>> wrong thing or give you a false sense of safety of your data).
>
> I'm unsure of how to evaluate the situation. Since ECC is enabled in the
> BIOS, data should be safe whether or not the EDAC driver loads. I
> /suspect/ the EDAC driver failing to load merely means reportting of ECC
> errors won't happen.
"Merely" being relative here: The missing reports mean a false feeling
of safety, as they may be early indications of later double-bit errors.
> I suspect the only paravirtualization needed is to
> map the physical address of the soft|hard errors to which VM's memory
> range was effected. What this effects is which VM should panic in case
> of hard errors.
Which in turn obviously requires hypervisor interaction. It's not really
clear to me whether perhaps the driver would better live in the
hypervisor in the first place for that reason.
And there's a second piece of paravirtualization needed: The driver
doesn't distinguish physical and machine address spaces, yet the
addresses reported by hardware are machine ones and hence would
generally need translation to physical ones in order to assign Dom0-
local meaning to them (or to determine that the address belongs to
another VM or the hypervisor).
Jan
More information about the Pkg-xen-devel
mailing list