Bug#521717: zaptel-source: "FXO PCI Master abort" while network/disk/cpu usage is high

Tue Mar 31 03:56:48 UTC 2009

Hi

I did some more testing and found a work arround.

First, the new upstream "dahdi" driver doesn't solve the problem.

Second, I found an other maybe related bug. When I was doing light 
network usage (paying an MP3 over samba), I got this kind of oops 
(tested 4 times) :

[  657.980980] skge 0000:01:04.0: PCI error cmd=0x7 status=0x82b0
[  666.000010] NETDEV WATCHDOG: eth1: transmit timed out
[  666.000109] ------------[ cut here ]------------
[  666.000157] WARNING: at net/sched/sch_generic.c:222 
dev_watchdog+0x8f/0xdc()
[  666.000205] Modules linked in: tun nfsd auth_rpcgss exportfs nfs 
lockd nfs_acl sunrpc iptable_raw ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT

ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN 
ipt_ecn ipt_ah ipt_addrtype nf_nat_tftp nf_nat_sip nf_nat_irc nf_nat_ftp

nf_conntrack_tftp nf_conntrack_sip nf_conntrack_netlink 
nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_ftp xt_helper 
xt_hashlimit

xt_conntrack xt_CONNMARK xt_connmark xt_state iptable_nat nf_nat 
nf_conntrack_ipv4 nf_conntrack iptable_mangle iptable_filter ip_tables

act_police sch_ingress cls_u32 xt_u32 xt_time xt_tcpudp xt_tcpmss 
xt_string xt_statistic xt_sctp xt_realm xt_rateest xt_quota xt_policy

xt_pkttype xt_physdev xt_owner xt_multiport xt_mark xt_mac xt_limit 
xt_length xt_iprange xt_esp xt_dscp xt_dccp xt_comment xt_TRACE

xt_TCPOPTSTRIP xt_TCPMSS xt_SECMARK xt_RATEEST xt_NFQUEUE xt_NFLOG 
xt_MARK xt_DSCP xt_CLASSIFY nfnetlink_queue nfnetlink_log nfnetlink

x_tables cls_fw sch_cbq sch_gred sch_red sch_sfq ext2 fuse dm_snapshot 
dm_mirror dm_log dm_mod asb100 hwmon_vid usbhid hid ff_memless amd74xx

ide_pci_generic ide_core skge wcfxo zaptel crc_ccitt pcspkr ata_generic 
forcedeth ehci_hcd ohci_hcd usbcore i2c_nforce2 i2c_core button

nvidia_agp agpgart evdev ext3 jbd mbcache raid1 md_mod sd_mod thermal 
processor fan thermal_sys shpchp pci_hotplug sata_sil libata scsi_mod

dock
[  666.005122] Pid: 0, comm: swapper Not tainted 2.6.26-1-686 #1
[  666.005182]  [<c012256f>] warn_on_slowpath+0x40/0x66
[  666.005281]  [<c011845d>] __wake_up_common+0x2e/0x58
[  666.005372]  [<c011a641>] __wake_up+0x29/0x39
[  666.005462]  [<c012289b>] wake_up_klogd+0x2b/0x2d
[  666.005577]  [<f89a8f2e>] skge_tx_clean+0x1f/0x4b [skge]
[  666.005671]  [<c0266b55>] dev_watchdog+0x8f/0xdc
[  666.005753]  [<c012965c>] run_timer_softirq+0x11a/0x17c
[  666.005837]  [<c0266ac6>] dev_watchdog+0x0/0xdc
[  666.005926]  [<c012657d>] __do_softirq+0x66/0xd3
[  666.006013]  [<c012662f>] do_softirq+0x45/0x53
[  666.006094]  [<c01268e6>] irq_exit+0x35/0x67
[  666.006172]  [<c01101c9>] smp_apic_timer_interrupt+0x6b/0x76
[  666.006253]  [<c0102656>] default_idle+0x0/0x53
[  666.006334]  [<c0104364>] apic_timer_interrupt+0x28/0x30
[  666.006415]  [<c0102656>] default_idle+0x0/0x53
[  666.006506]  [<c0114d54>] native_safe_halt+0x2/0x3
[  666.006590]  [<c0102683>] default_idle+0x2d/0x53
[  666.006669]  [<c01025ce>] cpu_idle+0xab/0xcb
[  666.006756]  =======================
[  666.006800] ---[ end trace e58036d16ad6c14d ]---

The strange thing is that I tested both with and without the wcfxo 
module loaded, and I only got the problem when the module was loaded. I 
think that this bug always happens before and might be the source of the 
"FXO PCI Master abort" bug. I didn't realise before because my kern.log 
was full of the same message, so I had to look into kern.log.0.

Then I moved my X100P card from one PCI slot to an other... and now I 
can't reproduce either bug!

Maybe this have something to do with a different IRQ being used? wcfxo 
generate a lot of interupts. Or maybe my motherboard has a problem?

Anyways thanks for your help. If you think this bug won't happen to 
anybody else because it's hardware related you can close it. Let me know 
if you need more information.

Xavier Douville

Tzafrir Cohen a écrit :
> On Sun, Mar 29, 2009 at 03:22:59PM -0400, Xavier Douville wrote:
>> Hi
>>
>> When the driver is bugged, syslogd and klogd take 100% of the CPU, even 
>> if I stop the network transfer. 
> 
> Rate-limiting this message should be simple enough, regardless of the
> original issue. I'll try to get to that in the coming days.
> 
>> The only way to stop this is to rmmod wcfxo. 
> 
> Why the driver does not recover is a separate issue. 
> 
>> This bug is very easy to reproduce on my computer. It's really 
>> when I do network transfers. It doesn't bug when I copy a file on the 
>> HDD. I know there are a lot of issues with IRQ with zaptel, but both 
>> eth1 and sata_sil have a higher IRQ number than wcfxo so they should 
>> have less priority right?
> 
> I'm not familiar enough with that, unfortunately.
>