[sane-devel] reverse engineering techniques

Dave Burns sane@burnsorama.com
Wed, 21 Jul 2004 18:26:26 -0400


Some of you are lucky enough to get developer docs from a manufacturer so
that writing a driver isn't too hard. I'm working on the Canon FS4000 film
scanner and, although I haven't tried, I've heard Canon is tough to get docs
from. So, I'd like to get as far as I can before having to sign an NDA with
Canon (if they'll even do that ;-) ).

I've been following this list for the last few months and I haven't seen
much reverse-engineering advice. Mostly it consists of "dump out a USB/SCSI
transaction log and stare at the numbers until you make some sense of them."
This is what I've done and I can control the FS4000 well enough to scan
slides at high quality. But I'd say that about 50% of the parameters I'm
using are still not well understood. I fill in the blanks in my
understanding by watching what other programs do and use those magic values.
Surely the scanning domain is not so big that there isn't shared wisdom to
be had across different scanner brands?

Let me try an example which someone out there might solve readily for all I
know. I can see the parameters where other drivers control the RGB gain and
gamma curves (at least I think they specify gamma curves). RGB gain is fine,
I have the values for that figured out. But there are two other values per
RGB channel that remain unknown to me. The first value ranges from 0-63 and
may have something to do with a black point (but not sure of that). Typical
values across RGB are 47, 36, 36. The second value is made up of two bytes.
I'm not 100% sure though that it is a short int. Since the 1st byte is only
ever a 1 or 0 and when I've graphed scanned image data with varying values,
there's a big discontinuity in the data. Typical values that give good
results for me are 0x119, 0x108, 0x106. But I've also see weird combinations
like 0x115, 0x9, 0x5. These combinations were from a driver that does
autoexposure. If it helps, I can tell what the sequence is that the driver
goes through before settling on those values (it iterates from a known
starting point then must derive what it thinks are the best values).

Is this some weird way of specifying a gamma curve? Not gamma-related at
all? Demonstrating empirically what changing those values does would require
me to assemble some histograms and posting them so I thought I'd start with
just an email first.

I have plenty of other examples that I can give like this. Any ideas or
suggestions?

db