Bug#521717: zaptel-source: "FXO PCI Master abort" while network/disk/cpu usage is high
Xavier Douville
debian at douville.org
Tue Mar 31 03:56:48 UTC 2009
Hi
I did some more testing and found a work arround.
First, the new upstream "dahdi" driver doesn't solve the problem.
Second, I found an other maybe related bug. When I was doing light
network usage (paying an MP3 over samba), I got this kind of oops
(tested 4 times) :
[ 657.980980] skge 0000:01:04.0: PCI error cmd=0x7 status=0x82b0
[ 666.000010] NETDEV WATCHDOG: eth1: transmit timed out
[ 666.000109] ------------[ cut here ]------------
[ 666.000157] WARNING: at net/sched/sch_generic.c:222
dev_watchdog+0x8f/0xdc()
[ 666.000205] Modules linked in: tun nfsd auth_rpcgss exportfs nfs
lockd nfs_acl sunrpc iptable_raw ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT
ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN
ipt_ecn ipt_ah ipt_addrtype nf_nat_tftp nf_nat_sip nf_nat_irc nf_nat_ftp
nf_conntrack_tftp nf_conntrack_sip nf_conntrack_netlink
nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_ftp xt_helper
xt_hashlimit
xt_conntrack xt_CONNMARK xt_connmark xt_state iptable_nat nf_nat
nf_conntrack_ipv4 nf_conntrack iptable_mangle iptable_filter ip_tables
act_police sch_ingress cls_u32 xt_u32 xt_time xt_tcpudp xt_tcpmss
xt_string xt_statistic xt_sctp xt_realm xt_rateest xt_quota xt_policy
xt_pkttype xt_physdev xt_owner xt_multiport xt_mark xt_mac xt_limit
xt_length xt_iprange xt_esp xt_dscp xt_dccp xt_comment xt_TRACE
xt_TCPOPTSTRIP xt_TCPMSS xt_SECMARK xt_RATEEST xt_NFQUEUE xt_NFLOG
xt_MARK xt_DSCP xt_CLASSIFY nfnetlink_queue nfnetlink_log nfnetlink
x_tables cls_fw sch_cbq sch_gred sch_red sch_sfq ext2 fuse dm_snapshot
dm_mirror dm_log dm_mod asb100 hwmon_vid usbhid hid ff_memless amd74xx
ide_pci_generic ide_core skge wcfxo zaptel crc_ccitt pcspkr ata_generic
forcedeth ehci_hcd ohci_hcd usbcore i2c_nforce2 i2c_core button
nvidia_agp agpgart evdev ext3 jbd mbcache raid1 md_mod sd_mod thermal
processor fan thermal_sys shpchp pci_hotplug sata_sil libata scsi_mod
dock
[ 666.005122] Pid: 0, comm: swapper Not tainted 2.6.26-1-686 #1
[ 666.005182] [<c012256f>] warn_on_slowpath+0x40/0x66
[ 666.005281] [<c011845d>] __wake_up_common+0x2e/0x58
[ 666.005372] [<c011a641>] __wake_up+0x29/0x39
[ 666.005462] [<c012289b>] wake_up_klogd+0x2b/0x2d
[ 666.005577] [<f89a8f2e>] skge_tx_clean+0x1f/0x4b [skge]
[ 666.005671] [<c0266b55>] dev_watchdog+0x8f/0xdc
[ 666.005753] [<c012965c>] run_timer_softirq+0x11a/0x17c
[ 666.005837] [<c0266ac6>] dev_watchdog+0x0/0xdc
[ 666.005926] [<c012657d>] __do_softirq+0x66/0xd3
[ 666.006013] [<c012662f>] do_softirq+0x45/0x53
[ 666.006094] [<c01268e6>] irq_exit+0x35/0x67
[ 666.006172] [<c01101c9>] smp_apic_timer_interrupt+0x6b/0x76
[ 666.006253] [<c0102656>] default_idle+0x0/0x53
[ 666.006334] [<c0104364>] apic_timer_interrupt+0x28/0x30
[ 666.006415] [<c0102656>] default_idle+0x0/0x53
[ 666.006506] [<c0114d54>] native_safe_halt+0x2/0x3
[ 666.006590] [<c0102683>] default_idle+0x2d/0x53
[ 666.006669] [<c01025ce>] cpu_idle+0xab/0xcb
[ 666.006756] =======================
[ 666.006800] ---[ end trace e58036d16ad6c14d ]---
The strange thing is that I tested both with and without the wcfxo
module loaded, and I only got the problem when the module was loaded. I
think that this bug always happens before and might be the source of the
"FXO PCI Master abort" bug. I didn't realise before because my kern.log
was full of the same message, so I had to look into kern.log.0.
Then I moved my X100P card from one PCI slot to an other... and now I
can't reproduce either bug!
Maybe this have something to do with a different IRQ being used? wcfxo
generate a lot of interupts. Or maybe my motherboard has a problem?
Anyways thanks for your help. If you think this bug won't happen to
anybody else because it's hardware related you can close it. Let me know
if you need more information.
Xavier Douville
Tzafrir Cohen a écrit :
> On Sun, Mar 29, 2009 at 03:22:59PM -0400, Xavier Douville wrote:
>> Hi
>>
>> When the driver is bugged, syslogd and klogd take 100% of the CPU, even
>> if I stop the network transfer.
>
> Rate-limiting this message should be simple enough, regardless of the
> original issue. I'll try to get to that in the coming days.
>
>> The only way to stop this is to rmmod wcfxo.
>
> Why the driver does not recover is a separate issue.
>
>> This bug is very easy to reproduce on my computer. It's really
>> when I do network transfers. It doesn't bug when I copy a file on the
>> HDD. I know there are a lot of issues with IRQ with zaptel, but both
>> eth1 and sata_sil have a higher IRQ number than wcfxo so they should
>> have less priority right?
>
> I'm not familiar enough with that, unfortunately.
>
More information about the Pkg-voip-maintainers
mailing list