[Pkg-xen-devel] Bug#988477: Bug#988477: xen-hypervisor-4.14-amd64: xen dmesg shows (XEN) AMD-Vi: IO_PAGE_FAULT on sata pci device
Elliott Mitchell
ehem+debian at m5p.com
Fri Jul 4 01:25:27 BST 2025
On Wed, May 28, 2025 at 05:21:00PM -0700, Elliott Mitchell wrote:
> On Sun, May 18, 2025 at 02:10:25PM +0200, Maximilian Engelhardt wrote:
> > On Montag, 14. April 2025 00:22:01 CEST Elliott Mitchell wrote:
> > >
> > > Do any of the Debian maintainers have an AMD machine setup for debugging?
> > > I'm not very well setup for debugging this particular issue. If you've
> > > got an AMD machine with a pair of available SATA ports (including SATA
> > > power!), I could send a pair of SATA devices known to readily reproduce
> > > the issue.
> >
> > I'm not aware of anybody in our team having hardware where they can reproduce
> > this issue, else I'm sure they would have already provided feedback here.
> > There are also not many reports here of people running into this problem. Thus
> > I assume it needs a special (and probably rare) hardware combination to
> > trigger this.
> > One thing I can add is that I have been running software raid1 with Xen on two
> > SATA SSDs on an Intel CPU since many years without seeing any data corruption.
>
> I'm skeptical of it being rare, but certainly uncommon. You've got some
> similarity to the reproductions, but there are differences.
>
> First question, what brand/model are the SSDs? Samsung SSDs are known to
> be effected (severely effected for some models), while Crucial/Micron
> SSDs are uneffected (some models might be mildly effected).
>
> Second question, where are the SATA ports? They on-motherboard? Add-on
> card? The reproductions were with on-motherboard ports.
>
> What generation is your processor? Are you sure it has an IOMMU and Xen
> is driving the IOMMU? I had suspected Intel systems would be effected,
> but you may have disproven this.
Uh. I did hope you could help narrowing things down some. Right now
we've got two confirmed reproductions, while you're the only person who
isn't seeing this reproduce.
The biggest difference is you've got a system with an Intel processor.
Yet we already know not all SSDs are effected, so could be your pair are
ones which won't reproduce the issue. On top of that, similar to the
spurious interrupt issue, could be it is less severe on Intel processors
and that has kept you safe.
Presently the shortage of reports seems mostly attributable to few people
using RAID1 with SSDs.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\BS ( | ehem+sigmsg at m5p.com PGP 87145445 | ) /
\_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
More information about the Pkg-xen-devel
mailing list