[sane-devel] Proposal for integration tests

Wed Feb 27 11:57:52 GMT 2019

Hi Olaf, Rene,

Thanks for your replies.

On 26/02/2019 14:48, Olaf Meeuwissen wrote:
> # Actually, I've been kind of waiting for others to chime in first.
> # Glad to see that René Rebe, an ex-SANE developer, eventually did but
> # would have liked to see some more opinions before my reply :-(.

I agree, this is a change that may eventually affect every USB backend
if it wants tests, so it's best that every active maintainer is on board.

> Povilas Kanapickas writes:
> 
>> Hey,
>>
>> I would like to propose adding integration tests to guard against
>> accidental regressions especially in rarer scanners that few SANE
>> developers have access to. We could run these tests on each pull request
>> on gitlab without any scanner hardware.
> 
> I think this is a *really* *cool* idea.  But, as René mentioned, the
> devil will be in the details.
> 
>> Given limited development resources and large variety of hardware that
>> SANE supports, a major requirement of the tests would be that they are
>> easy to create and use, otherwise no one will bother to deal with them.
> 
> ACK.  Using them should be as easy as `make check` ;-)
> 
> # ... and only run tests for the backends that have been compiled.
> 
>> Therefore, I would like to propose a simple way of testing SANE by
>> recording low-level communication with the scanner, replaying it at the
>> test time and checking whether SANE behaves in an equivalent way.
> 
> Recording low-level communication can be done with Wireshark for network
> and USB scanners.  This requires *zero* work on SANE's side.
> 
> # OK, so maybe we would need some documentation or pointers to such on
> # how to make captures with Wireshark.
> 
> Apart from use in integration and regression tests, these captures can
> also be useful in reproducing odd behaviour observed by our users.  If
> they can just provide a capture, anyone would be able to reproduce the
> behaviour.  Especially nice when looking at those pesky timing issues.
> 
>> Additionally, I would like to propose that we distribute the recording
>> functionality with release SANE builds. This way we could collect
>> recordings from our users for scanners that no SANE developers have
>> access to and then add to the test database.
> 
> As per suggestion above, Wireshark already does a very good job at
> recording network and USB traffic, runs on Unix and Windows and is
> well-known and well-supported.  Its default PCap file format is very
> easy to understand and work with too.

I use Wireshark currently to reverse-engineer a scanner and I agree it's
a great tool. There were several reasons why I suggested that we also
have a way to capture data on SANE side too.

I think that the functionality should be as user-friendly as possible.
Most of our users are non-technical and due to lack of manpower we need
to them to do part of the work supporting the project. Using Wireshark
is too hard in my opinion, because:

 - it often involves non-obvious permission problems and so on.

 - there are maybe 15 of mouse clicks before a USB capture is ready.

 - user needs to read a lot in order to be able to navigate the
interface at all.

Each of the above is enough to turn off a well-meaning non-technical
user who could give a valuable capture.

In an ideal world SANE frontends would have an option to capture raw
communication data into a separate file which users then could send to
us. Two mouse clicks, no learning needed.

>From the technical side, capturing data on SANE side has a large benefit
in that it's the same place where we would inject test data. We would be
guaranteed that we capture the same thing that we would later inject as
part of the test. Parsing pcapng files introduces extra complexity and
chance for bugs.

Finally, capturing data on SANE side has the benefit that we could use
the produced data files to test against directly. Data captured by
Wireshark most of the time needs manual cleanup. There's high chance
that most captures produced by users will need manual cleanup. That's
work we don't have manpower to manually do, but we can write code to do
it for us.

I don't expect that this would be a complex feature, neither that it
would modify the existing lowest-level USB backend. The maintainability
of the project would not be affected and there's low risk of bugs.
Actually, the primary reason why I came up with this idea in the first
place is that being a new maintainer of Genesys backend I'm afraid to
change anything lest I introduce non-obvious regressions. Before doing
any changes I would like to have a safety net :-)

I think this would be really useful for long-term health of the project.
Currently anyone who wants to add support for a new scanner needs to
have a lot of hardware to check against regressions. It should not
necessarily be that way.

> Personally, I wouldn't bother with SCSI, parallel and serial devices
> anymore.  That is, unless someone is willing to put in the effort and
> implements something that can output device I/O to PCap files for such
> scanners.
> 
>> Finally, I expect that there will be a need for tools for simplify and
>> automate the process of working with the recorded data. I would like to
>> propose that we use python as a implementation language for any of such
>> tools.
> 
> Not too charmed with your choice of Python.  Nothing wrong with the
> language itself but I have next to no experience with it.  Don't know
> about other project members but you probably want some people familiar
> with it around if this is going to be something long(er)-term.
> 
> As for PCap files, there is a C library to deal with them and, as said,
> the file format is pretty simple.  Most all project members are (should
> be?) familiar with C.

I picked Python because it's the most popular scripting language and
also because it's already used within SANE. It's very efficient for the
developer: even though I can't say I know Python really well I'm still
at least 10 times more productive when working on small script-type
programs using Python compared to C or C++ in which I have much more
experience.

I don't suggest we use it for any important parts - just to quickly
automate things that otherwise would need to be done manually. After
such tools are implemented it's not likely that they would need to be
significantly modified, so from that side it matters less which
implementation language they're written in.

>> Distribution of recording functionality together with release versions
>> of SANE would require that we are able to change the communication
>> backend on-demand. The best place to do that it seems to be at sanei
>> layer. For example, in case of sanei_usb.h, I propose that instead of
>> calling functions from that header we pass a pointer to a struct
>> containing function pointers that match the current functionality
>> one-to-one. Then, depending on startup options, we could create a USB
>> backend that supports either of the following options:
>>
>>   - calls directly to sanei_usb.c.
>>   - calls directly to sanei_usb.c, but records all parameters and
>> responses to a file.
>>   - reads expected parameters and responses from a file and compares
>> them with the current parameters. An error is recorded if anything does
>> not match.
> 
> # You basically suggest an OOP approach to device I/O.  I have some
> # experience with that (see the third party epkowa backend code if
> # interested).
> 
> That would be a noble end goal but I'd start with something a bit more
> hackish (and in line with René's suggestions) to get a feel for what's
> needed/involved and expand from there.  A rough roadmap would look a
> bit like this, I guess
> 
>  - implement the sanei_usb API to use the recorded device I/O;
>    control what gets used with a compile time switch
>  - get that to work with a scanner you're familiar with first
>    (for a simple definition of "work")
>  - get some feedback on your code here (or over on GitLab.com, assuming
>    you're using a fork or dedicated branch for your work on this)
>    # Please do *not* do any of the work on our master branch!
>  - collect a few more captures to work with (for other scanners)
>  - adjust your implementation to get those to work too
>  - push your sanei_usb implementation for replaying of recorded
>    device I/O behind a dynamic switch

Agreed with all points, happy to see we're on the same page in this regard.

> # BTW, not all supported USB scanners use sanei_usb ... :-/
> # The ones that don't are pretty old ... let's not care about those for
> # now (and probably forever).
> 
>> If we agree with the above proposal, the use of the new, dynamic APIs
>> would be completely opt-in. For example, the current API of sanei_usb.h
>> would be left unmodified. Once the viability is proved with a single
>> backend we could gradually introduce it to more backends.
> 
> Once you have implemented the dynamic opt-in functionality in sanei_usb
> it is immediately usable by *all* backends that use sanei_usb.  That is
> I think the switching belongs in the sanei_usb code.  There should not
> be any code changes in any backend to accommodate this functionality.
> Backends should be completely unaware of what they are talking to (as
> long as whatever that is talks the expected protocol).

Agreed, thank you!

Regards,
Povilas