[Nut-upsdev] [PATCH] disable nonblocking mode on serial port
Jim Paris
jim at jtan.com
Fri Sep 26 15:37:19 UTC 2008
> >I got a new Cyberpower 1500AVR UPS and nut wouldn't work. It failed
> >to detect the UPS and a strace showed that all writes to the serial
> >port were failing with -EAGAIN. The attached patch disabled
> >nonblocking mode on the serial port and now it works fine.
>
> What operating system are you using? Which NUT driver? If you're using
> the 'cyberpower' driver, switch to the 'powerpanel' driver and you can
> probably skip the remainder of this message.
This is Linux 2.6.24 on amd64 with 'cyberpower'. The serial
connection is through a USB HID->COM converter driven by cypress_m8.
I expect that any difference in behavior between my system and others
is due to the serial driver, although I do not believe that the driver
behavior is at fault here.
I can try powerpanel next time I'm around that machine (I'm out of
town at the moment and the UPS is not connected).
> The thing you should ask yourself is why the connection is timing out
> in the first place. Running the driver in debug mode might reveal what
> is going on. If it is because we are sending it too much data and the
> UPS can't keep up with the data flow, this should be fixed. The same
> may happen if synchronization is lost, which is a frequent problem
> with drivers that don't properly flush in- and outputs.
It's not a timeout, and I already know what's going on. It's the very
first _write_ to the device failing immediately with -EAGAIN. I
verified this with strace. This is independent of whether the device
is connected or not and has nothing to do with whether the device is
responding.
> Using the sledgehammer approach of opening the device in blocking mode
> and just sit it out until it can handle whatever we throw at it is
> definitely not what we want and certainly not something that we will
> integrate into NUT without ruling out the problems already mentioned.
Sure, blocking mode has its own problems when trying to read from a
device that isn't sending anything back. But this is a write over
rs232, so it's independent of the device behavior.
> >Is there a reason the port was put in non-blocking mode?
>
> Of course. :-)
>
> >There's no
> >code I could find to deal with the inevitable -EAGAIN that you'd
> >receive on writes. Maybe the delays introduced by ser_send_pace are
> >why it might work for some people?
>
> Opening a port in non-blocking mode allows us to deal with errors that
> we may encounter (in a timely manner). If you open a device in
> blocking mode, it may take a relatively long time before we notice
> that something is wrong for some error conditions. The timeout depends
> on the system settings, but 15 minutes is not uncommon.
Are you talking about during the open() itself? I'm aware of the
reasons for using O_NONBLOCK during an open, which is why my patch
retains O_NONBLOCK there and switches it off later.
> While this will be dealt with while reading data (through select()
> calls, so these never risk blocking), this is not the case when
> writing to the device. If the UPS happens to be on battery at that
> time, it will probably be empty by the time the call times out and the
> system will be shutdown hard.
But writes to a rs232 serial port will complete in finite time
regardless of whether the device is paying attention to them or not.
What timeout do you mean?
> Your patch makes little sense, since just removing O_NONBLOCK from the
> flags in the call to open() should be sufficient already.
Using O_NONBLOCK during open is different than using O_NONBLOCK for a
write. I'm not trying to change the open() behavior.
> However, if
> we should ever decide that in order to use the Cyberpower 1500AVR UPS
> we need to open it in blocking mode, clearing the O_NONBLOCK flag
> should be done by the driver. By doing it here, you risk breaking all
> other drivers that use this library.
It's wrong to open a device with O_NONBLOCK and then write to it and
expect it to always succeed. One needs to at deal with EAGAIN. If we
can agree on that we can figure out a solution that we're both happy
with :)
One solution would be to get rid of O_NONBLOCK as I did. Another
would be to put the write() in a loop checking for EAGAIN. A third is
to put a select() before the write that checks for writability. I can
whip up a patch to do any of these, they would all fix my issue.
-jim
More information about the Nut-upsdev
mailing list