[sane-devel] USB device locking in the snapscan backend

Mon Feb 25 20:23:24 UTC 2008

Hi all,

> The snapscan backend uses a mutex to lock the USB device it's
> working with when sending "atomic" commands.

First off, the original code wasn't written by me and I don't claim I 
fully understand it. The background, as far as I understand it, is 
this:

The backend may fork a reader process or thread who's only purpose is 
to fetch image data from the scanner. It seems that this was invented 
to speed up data transfer from the scanner without depending on the 
frontend to read the data fast enough. As a result there could be two 
processes/threads that access the scanner device at the same time. 
This code, as far as I can tell, was in the backend from the very 
beginning.

When USB support was added measures were taken to protect access to 
the scanner on command level, i.e. only one process/thread can send a 
command to the scanner and receive the result. This should allow for 
mixed access from both processes/threads when implemented correctly. 
It seems, however, that for SCSI no such measures were taken.

To make matters more complicated an USB command queueing was 
introduced in the USB part of the backend. The purpose, according to 
the comment, was this:
     The "Set window" command returns with status "Device busy" when
     the scanner is busy. One consequence is that some frontends exits
     with an error message if it's started when the scanner is warming
     up. A solution was suggested by Dmitri (dmitri at advantrix.com)
     The idea is that a SCSI command which returns "device busy" is
     stored in a "TODO" queue. The send command function is modified
     to first send commands in the queue before the intended command.
     So far this strategy has worked flawlessly. 
As far as I understand it this code makes the locking necessary in the 
first place, since both processes/threads may theoretically add 
commands to the queue while the scanner is busy.

Currently there are three implementations for the mutex: One for BeOS, 
one for pthread (used on OS/2 and OS-X), one for fork(). The one that 
is causing problems is the one for fork() which is used on all Linux 
platforms IIRC.

> We're not talking about pthread mutexes here, which is one of the
> available implementations, but IPC semaphores.
>
> Let's look at the code:
>
> static int snapscani_mutex_open(snapscan_mutex_t* sem_id, const
> char* dev) {
>     *sem_id = semget( ftok(dev,0x12), 1, IPC_CREAT | 0660 );
>     if (*sem_id != -1)
>     {
>         semop(*sem_id, &sem_signal, 1);
>         return 1;
>     }
>     return 0;
> }
>
> ftok() requires its first argument to be a real, accessible
> file. Obviously, when the code was written, USB scanners were
> driven through the kernel scanner driver and this was not a
> problem.
>
> But today, dev is something like "libusb:001:003" which is all but
> a real, accessible file.
>
> ftok() returns -1 in this case, but this isn't checked for in this
> code, so it happily goes on and semget(0xffffffff, 1, ...).
>
> As it happens, most of the time, that works. Except when another
> similarly careless piece of code has already used 0xffffffff as the
> IPC key.
>
> In which case, all hell breaks loose.
>
> So, 2 bugs in this code:
>  - ftok() return value is not checked (BAD BAD BAD)
>  - dev is not a real file
>
> Now, two questions:
>  - is that locking really useful in the backend?

Honest answer: I don't know for sure. I'd sleep better if I knew that 
there was _some_ locking done when there's the remote chance of two 
processes writing to the same device.

>  - if yes, how do we fix this?

Good question. Maybe the easiest solution is to remove all forking / 
threading / locking since it probably won't make a difference on any 
machine faster than 100 MHz. At least I never noticed any difference 
when testing the backend in single-thread mode. But I'm open to other 
suggestions (or even implementations, given my lack of time to 
support the backend...)

Regards,

Oliver