[Nut-upsuser] Windows 2.6.0-1 usbhid-ups driver infinite loop on USB disconnect

Frédéric Bohé fredericbohe at eaton.com
Wed Jun 29 13:50:30 UTC 2011


Hello David,

First, thanks for this post. The Windows port is still in beta stage and
this kind of post if very useful.


On Tue, 2011-06-28 at 20:54 -0400, David Bolen wrote:
> David Bolen <db3l.net at gmail.com> writes:
> 
> > Attached below is an excerpt of debugging output from usbhid-ups.  It
> > would appear that the interrupt and query failures have reasonably
> > appropriate errors, so maybe it's just a difference in how such errors
> > reflect under Windows vs. *nix.
> 
> Following up on my own note, I think I have a theory on what is
> happening, but am having trouble compiling to test a change as while I
> have an pre-existing Mingw/MSYS toolchain, it doesn't have a full
> autoconf toolchain yet.  If anyone might have a pre-existing
> "configure" script for Win32, that would help jump start things as I
> could configure from there. 

Setting up the build environnent in Windows is kind of complex. I will
send you my configure file if it can help you. Don't hesitate to ask if
you are still stuck.

> 
> I think I've run into two items.  The first is that the errors from
> checking the interrupt pipe in upsdrv_updateinfo aren't actually
> checked for cases where they could identify a disconnected UPS, so it
> just assumes no information and continues.  That's not Win32 specific.
> I'm not quite sure why it loops so quickly, as opposed to the top
> level polltime, but perhaps the state of the driver connection is such
> that it immediately satisfies the WaitForMultipleObjects call in the
> Win32 version of dstate_poll_fds.

It's just a silly bug. The timeout of WaitForMultipleObject is supposed
to be in milliseconds but it is passed in seconds.

> 
> Anyway, normally it looks like that would be short-lived anyway since
> the periodic full poll would detect the disconnect a modest time
> later.  Here I think I run into an assumption in the libhid code that
> errno always identifies all errors, while in the case of a win32
> failure inside of libusb-win32, it's only the result code of the
> function calls that reflect the error, not errno.  So USB I/O level
> errors while polling are getting hidden.

Another bug. errno has to be filled with the result of GetLastError() to
be really useful.

> 
> After shrinking the poll time and letting things run a bit after a
> cable disconnect, I got a few other errors (from libusb_get_report)
> interleaved such as:
> 
>    5.843750     Got 0 HID objects...
>    5.843750     Quick update...
>    5.859375     upsdrv_updateinfo...
>    5.859375     libusb_get_interrupt: libusb0-dll:err [submit_async] submitting
> request failed, win error: The device does not recognize the command.
>    5.859375     Got 0 HID objects...
>    5.859375     Quick update...
>    5.859375     libusb_get_report: libusb0-dll:err [control_msg] sending control
>  message failed, win error: The device does not recognize the command.
>    5.859375     Can't retrieve Report 16: No error
>    5.859375     libusb_get_report: libusb0-dll:err [control_msg] sending control
>  message failed, win error: The device does not recognize the command.
>    5.859375     Can't retrieve Report 15: No error
>    5.859375     upsdrv_updateinfo...
>    5.859375     libusb_get_interrupt: libusb0-dll:err [submit_async] submitting
> request failed, win error: The device does not recognize the command.
> 
> From looking at the libusb-win32 source, this would be the result of a
> Win32 ERROR_BAD_COMMAND code which is turned into a -EIO result for
> the usb_control_message call.  But in this case, it's only the result
> that has the error, as it's a translation from the Windows internal
> error and not from any original errno value.
> 
> It appears that the NUT libusb.c module passes the return code on
> fine, but then it's suppressed with libhid.c (refresh_report_buffer),
> and eventually HIDGetDataValue, who just gets a generic -1 code,
> assumes it can pass along -errno to the usbhid-ups driver, which at
> that point has no error (yielding the "No error" in the "Can't
> retrieve Report" message.
> 
> It seems like adjusting functions along the path to all return the
> function result code as opposed to just -1 would work but also feels
> like a riskier change.  I'm wondering if perhaps just making the lower
> level libusb entry points (probably just #if WIN32) assign the result
> code of the libusb-win32 driver calls to errno to ensure it represents
> the failures would be cleaner and help obscure the Windows differences?

I guess that managing correctly errno with GetLasterror should be
enough.

> 
> -- David
> 
> PS: In reviewing the libusb-win32 source it also looks like in some
> cases it can return either EINVAL or ENOMEM, neither of which are
> checked for in the error checking path for usbhid-ups, so probably
> could get added, if only to the WIN32 block.
> 

Indeed, after setting errno correctly using GetLastError(), I add EINVAL
as a disconnection cause and it seems to work now. Adding ENOMEM might
be useful too.

May I privately mail you a link to a replacement driver with those fixes
for testing ? Or a patch if you manage to set up the build
environnement ?

Regards,
Fred


-- 
Team Open Source Eaton - http://powerquality.eaton.com

--------------------------------------------------------------------------



More information about the Nut-upsuser mailing list