[sane-devel] Yet another saned problem

Henning Meier-Geinitz henning@meier-geinitz.de
Sun, 30 Dec 2001 22:22:46 +0100


Some days ago there was a bug report from Juergen Hammelmann on this
list ("SANE 1.06 and net backend crashes at end of scan...").
He tried to use his Mustek Paragon 600 II N over saned (localhost) and
got an endless loop at the end of the scan.

First I thought it was a problem with the mustek backend but now it
looks like a saned problem. It's probably the good old "scanning over
localhost" problem which we haven't heard about for quite a lot of
month now. Yes, it is still present (at least sort of).

The symptom:
@ scanimage -d net:some_host:mustek > test.pnm 2> sane.debug

[net] sane_read: reading paket length
[net] sane_read: read 4 bytes, 0 from 4 total
[net] sane_read: next record length=-1 bytes
[net] sane_read: received error signal
[net] sane_read: error code End of file reached
[net] do_cancel: 0x8053e68
[net] do_cancel: closing data pipe
[net] sane_cancel: sending net_cancel
[net] do_cancel: 0x8053e68
[net] sane_cancel: done
[net] sane_close: handle 0x8053e68
[net] sane_close: removing cached option descriptors
[net] sane_close: net_close
scanimage: stopping scanner... (sig 13)
[net] sane_cancel: sending net_cancel
[net] do_cancel: 0x8053e68
[net] sane_cancel: done
scanimage: stopping scanner... (sig 13)
[... endless loop]

Sig 13 is SIGPIPE. When sending NET_CLOSE, saned has already exited so
scanimage receives sig 13. Now sighandler is called, prints the
"scanimage: stopping scanner..." message and --- tataaa --- calls
sane_cancel(). This will invoke sig 13 which will call sighandler ...

I have commited a small change to scanimage.c to CVS. In sighandler(),
sane_cancel() is only called once. If more than one signal is
received, scanimage is _exited (to avoid calling the atexit handler).

Now the output is:
[net] sane_close: net_close
scanimage: received signal 13
scanimage: trying to stop scanner
[net] sane_cancel: sending net_cancel
[net] do_cancel: 0x8052cd8
[net] sane_cancel: done
scanimage: received signal 13
scanimage: aborting

A nice side-effect of this is that you can now terminate scanimage by
pressing ctrl-c twice even if something hangs during sane_cancel().

Other options to avoid this endless loop:

* Don't catch sigpipe. However, this will also harm normal shutdown
  for other sources of sigpipe.
* Don't call sane_cancel for sigpipe. See above.

However, I think my implementation is the best one. Other ideas or

Ok, now the real source of the problem: Getting the image data works
ok. At the end of the scan, saned receives EOF, writes this status to
the buf (function do_scan()). Then saned writes the data over the net
to the client (net backend). net.c receives EOF and closes the data
connection to saned. Saned should exit the do_scan loop now but the
select finds out that the backend (mustek) has also closed the
select_fd. saned again tries to transfer the error code and status
over the net. Boom! The data connection has already been closed -->
sigpipe. Saned exits through its function quit(). The net backend will
also get sigpipe.

This is a race situation: If the net backend is faster with closing
the connection than saned with sending the additional status code,
sigpipe will occur. Otherwise nothing will happen. The bug can be
forced by adding a usleep to the
  if (status_dirty && sizeof (buf) - bytes_in_buf >= 5)
Block. It should occur with any backend that uses a select_fd.

I have changed saned in CVS in the following way: status_dirty will
only be set if status was GOOD before select_fd was closed. So we
should avoid detecting EOF twice. I have also added some more DBG
statements to show this (and possible other) problem(s).

Is this ok? Please test! It fixes the bug for me. I don't want to
create new problems...