[sane-devel] Proposal for integration tests

Sun Mar 10 06:08:10 GMT 2019

Hi again Povilas,

Sorry for the late follow-up.

Povilas Kanapickas writes:

> Hi Olaf, Rene,
>
> Thanks for your replies.
>
> On 26/02/2019 14:48, Olaf Meeuwissen wrote:
>> # Actually, I've been kind of waiting for others to chime in first.
>> # Glad to see that René Rebe, an ex-SANE developer, eventually did but
>> # would have liked to see some more opinions before my reply :-(.
>
> I agree, this is a change that may eventually affect every USB backend
> if it wants tests, so it's best that every active maintainer is on board.
>
>> Povilas Kanapickas writes:
>>
>>> Hey,
>>>
>>> I would like to propose adding integration tests to guard against
>>> accidental regressions especially in rarer scanners that few SANE
>>> developers have access to. We could run these tests on each pull request
>>> on gitlab without any scanner hardware.
>>
>> I think this is a *really* *cool* idea.  But, as René mentioned, the
>> devil will be in the details.
>>
>>> Given limited development resources and large variety of hardware that
>>> SANE supports, a major requirement of the tests would be that they are
>>> easy to create and use, otherwise no one will bother to deal with them.
>>
>> ACK.  Using them should be as easy as `make check` ;-)
>>
>> # ... and only run tests for the backends that have been compiled.
>>
>>> Therefore, I would like to propose a simple way of testing SANE by
>>> recording low-level communication with the scanner, replaying it at the
>>> test time and checking whether SANE behaves in an equivalent way.
>>
>> Recording low-level communication can be done with Wireshark for network
>> and USB scanners.  This requires *zero* work on SANE's side.
>>
>> # OK, so maybe we would need some documentation or pointers to such on
>> # how to make captures with Wireshark.
>>
>> Apart from use in integration and regression tests, these captures can
>> also be useful in reproducing odd behaviour observed by our users.  If
>> they can just provide a capture, anyone would be able to reproduce the
>> behaviour.  Especially nice when looking at those pesky timing issues.
>>
>>> Additionally, I would like to propose that we distribute the recording
>>> functionality with release SANE builds. This way we could collect
>>> recordings from our users for scanners that no SANE developers have
>>> access to and then add to the test database.
>>
>> As per suggestion above, Wireshark already does a very good job at
>> recording network and USB traffic, runs on Unix and Windows and is
>> well-known and well-supported.  Its default PCap file format is very
>> easy to understand and work with too.
>
> I use Wireshark currently to reverse-engineer a scanner and I agree it's
> a great tool. There were several reasons why I suggested that we also
> have a way to capture data on SANE side too.
>
> I think that the functionality should be as user-friendly as possible.
> Most of our users are non-technical and due to lack of manpower we need
> to them to do part of the work supporting the project. Using Wireshark
> is too hard in my opinion, because:
>
>  - it often involves non-obvious permission problems and so on.
>
>  - there are maybe 15 of mouse clicks before a USB capture is ready.
>
>  - user needs to read a lot in order to be able to navigate the
> interface at all.

That can be "solved" by wrapping the capturing in a script using sudo
for permission issues and tshark to get rid of the GUI.  You can even
arrange things so that it runs scanimage with any options passed on the
command-line.

But before we think about getting non-technical users to contribute
captures, let's get some tech-savvy users do so first ;-)

> Each of the above is enough to turn off a well-meaning non-technical
> user who could give a valuable capture.
>
> In an ideal world SANE frontends would have an option to capture raw
> communication data into a separate file which users then could send to
> us. Two mouse clicks, no learning needed.

In an ideal world, SANE frontends would allow users to set all the
options a backend provides.  To the best of my knowledge, at least
Simple Scan doesn't.  Seeing that that is the default frontend on
Ubuntu and Debian for Gnome and Cinnamon desktops ...

Having an external capturing solution like Wireshark is a lot less
work for us and definitely less intrusive for our backends.

Please note that the sanei_usb layer cannot add SANE options.  This
is something that has to be done by each backend itself.

Anyway, whatever capturing mechanisms are used, as long as the file
format is or can be converted to a well-known standard, we should be
good.  BTW, there are Pcap libraries for C, Haskell, Ruby, Perl, Ocaml,
and, oh, Python too, *three* of them it seems ;-P

> From the technical side, capturing data on SANE side has a large benefit
> in that it's the same place where we would inject test data. We would be
> guaranteed that we capture the same thing that we would later inject as
> part of the test. Parsing pcapng files introduces extra complexity and
> chance for bugs.

Capturing from within SANE also introduces extra complexity ;-P
More seriously, I am fine with any way that can be used to capture the
data.  I think using Wireshark for is less work for you/us now.  That
should allow you to focus on getting these captures used as a "scanner"
for regression testing, debugging, whatever.  Being able to use those
captures should be the focus of your effort.  We can make things easy
later, once using the captures has proven its worth.

> Finally, capturing data on SANE side has the benefit that we could use
> the produced data files to test against directly. Data captured by
> Wireshark most of the time needs manual cleanup. There's high chance
> that most captures produced by users will need manual cleanup. That's
> work we don't have manpower to manually do, but we can write code to do
> it for us.

Replaying from a capture should just skip everything that is not
produced by the scanner of interest.  That obviates most of the need to
clean.  Manual cleaning at this point in time is something that should
be doable.  It's not like we are going to be inundated with captures :-)
And if/when we are, we can write code to do it for us (as you said so
yourself).

Moreover, even capturing directly from SANE will probably need cleanup
to get rid of I/O from the scanner probing phase of most frontends.

> I don't expect that this would be a complex feature, neither that it
> would modify the existing lowest-level USB backend. The maintainability
> of the project would not be affected and there's low risk of bugs.
> Actually, the primary reason why I came up with this idea in the first
> place is that being a new maintainer of Genesys backend I'm afraid to
> change anything lest I introduce non-obvious regressions. Before doing
> any changes I would like to have a safety net :-)

:+1: for having a safety net.

> I think this would be really useful for long-term health of the project.

I hope so :-)

> Currently anyone who wants to add support for a new scanner needs to
> have a lot of hardware to check against regressions. It should not
> necessarily be that way.

In practice, most of the time we've been relying on user feedback about
breakage introduced by changes :-/

>> Personally, I wouldn't bother with SCSI, parallel and serial devices
>> anymore.  That is, unless someone is willing to put in the effort and
>> implements something that can output device I/O to PCap files for such
>> scanners.
>>
>>> Finally, I expect that there will be a need for tools for simplify and
>>> automate the process of working with the recorded data. I would like to
>>> propose that we use python as a implementation language for any of such
>>> tools.
>>
>> Not too charmed with your choice of Python.  Nothing wrong with the
>> language itself but I have next to no experience with it.  Don't know
>> about other project members but you probably want some people familiar
>> with it around if this is going to be something long(er)-term.
>>
>> As for PCap files, there is a C library to deal with them and, as said,
>> the file format is pretty simple.  Most all project members are (should
>> be?) familiar with C.
>
> I picked Python because it's the most popular scripting language

The [Stack Overflow Developer Survey 2018][1] says otherwise ;-)

 [1]: https://insights.stackoverflow.com/survey/2018/#technology

But that's splitting hairs.  I know Python is popular and not too
difficult to get started with, if so inclined.

> and also because it's already used within SANE.

I wasn't aware of that but see that the pixma backend uses a little
snippet to generate options.

> It's very efficient for the
> developer: even though I can't say I know Python really well I'm still
> at least 10 times more productive when working on small script-type
> programs using Python compared to C or C++ in which I have much more
> experience.
>
> I don't suggest we use it for any important parts - just to quickly
> automate things that otherwise would need to be done manually. After
> such tools are implemented it's not likely that they would need to be
> significantly modified, so from that side it matters less which
> implementation language they're written in.

Mostly agreed.  But don't forget that such utilities will have to be
maintained in order to keep running in an ever changing environment.
Think of the Python 2 to 3 transition for example.

>>> Distribution of recording functionality together with release versions
>>> of SANE would require that we are able to change the communication
>>> backend on-demand. The best place to do that it seems to be at sanei
>>> layer. For example, in case of sanei_usb.h, I propose that instead of
>>> calling functions from that header we pass a pointer to a struct
>>> containing function pointers that match the current functionality
>>> one-to-one. Then, depending on startup options, we could create a USB
>>> backend that supports either of the following options:
>>>
>>>   - calls directly to sanei_usb.c.
>>>   - calls directly to sanei_usb.c, but records all parameters and
>>> responses to a file.
>>>   - reads expected parameters and responses from a file and compares
>>> them with the current parameters. An error is recorded if anything does
>>> not match.
>>
>> # You basically suggest an OOP approach to device I/O.  I have some
>> # experience with that (see the third party epkowa backend code if
>> # interested).
>>
>> That would be a noble end goal but I'd start with something a bit more
>> hackish (and in line with René's suggestions) to get a feel for what's
>> needed/involved and expand from there.  A rough roadmap would look a
>> bit like this, I guess
>>
>>  - implement the sanei_usb API to use the recorded device I/O;
>>    control what gets used with a compile time switch
>>  - get that to work with a scanner you're familiar with first
>>    (for a simple definition of "work")
>>  - get some feedback on your code here (or over on GitLab.com, assuming
>>    you're using a fork or dedicated branch for your work on this)
>>    # Please do *not* do any of the work on our master branch!
>>  - collect a few more captures to work with (for other scanners)
>>  - adjust your implementation to get those to work too
>>  - push your sanei_usb implementation for replaying of recorded
>>    device I/O behind a dynamic switch
>
> Agreed with all points, happy to see we're on the same page in this regard.
>
>> # BTW, not all supported USB scanners use sanei_usb ... :-/
>> # The ones that don't are pretty old ... let's not care about those for
>> # now (and probably forever).
>>
>>> If we agree with the above proposal, the use of the new, dynamic APIs
>>> would be completely opt-in. For example, the current API of sanei_usb.h
>>> would be left unmodified. Once the viability is proved with a single
>>> backend we could gradually introduce it to more backends.
>>
>> Once you have implemented the dynamic opt-in functionality in sanei_usb
>> it is immediately usable by *all* backends that use sanei_usb.  That is
>> I think the switching belongs in the sanei_usb code.  There should not
>> be any code changes in any backend to accommodate this functionality.
>> Backends should be completely unaware of what they are talking to (as
>> long as whatever that is talks the expected protocol).
>
> Agreed, thank you!

Looking forward to the first proof-of-concept code,
--
Olaf Meeuwissen, LPIC-2            FSF Associate Member since 2004-01-27
 GnuPG key: F84A2DD9/B3C0 2F47 EA19 64F4 9F13  F43E B8A4 A88A F84A 2DD9
 Support Free Software                        https://my.fsf.org/donate
 Join the Free Software Foundation              https://my.fsf.org/join