[sane-devel] SANE2 standard revisited: image data format

Henning Meier-Geinitz henning@meier-geinitz.de
Thu, 5 Dec 2002 22:10:18 +0100


On Thu, Dec 05, 2002 at 06:09:57PM +0100, Henning Meier-Geinitz wrote:
> http://www.meier-geinitz.de/sane/sane2/

I guess the image format is the most controversal point so I'll start
with that one. Please keep calm everyone, I don't like to create a
flamewar again :-)

| Section 3.2 (Image Data format):

| 3.2.1 Pixel oriented frames

I like the approach with SANE_FRAME_RAW and textual descriptions of
the channels. But: SANE1 was complicated enough and provided many pitfalls.
Remember 1 bit bit-order issues? With SANE2 it's even more complicated
because you have more options. E.g. you could do a scan with 4
channels and two frames: "red" and "green,infrared,blue". Oh well.

So the question is: do we really need the concept of a frame? Is there
any other use than in the old three-pass-scanners? Couldn't the
backend just buffer the image data and transfer it using rgb frames
for these old scanners? The frontend have to buffer the image anyway.

Removing the frame paradigm would at least save us one pitfall.

| 3.2.2 Arbitrary data frames

I'm still not sure if MIME is the right approach (instead of e.g.
SANE_FRAME_JPG or other specific frame types). 
I fear that increasing usage of strange MIME types leads to backends
that can be only used with one frontend and vice-versa. 

That said, I can live with the both frame types and won't oppose them.

Some details:

| 4.3.8 sane_get_parameters

|typedef struct
|  {
|    char reserved[32]; /* 32 bytes for future use */
|  }
| SANE_Parameters;

--> SANE_Char, not char

| Member format specifies the format of the next frame to be returned.
| The possible values for type SANE_Frame are described in Table 9. The
| meaning of these values is described in more detail in Section 3.2. 
I wouldn't mention the old frame types in the table, that's just
confusing. The SANE2 programmer doesn't need to know them. We can add
a comment to the paragraph that the numbering starts at 5 because of
older now obsolete frame types used in v1.

| # SANE_PFLAG_LAST_FRAME (bit 0, bitvalue 1) is set to 1 if and only if
| the frame that is currently being acquired (or the frame that will be
| acquired next if there is no current frame) is the last frame of a
| multi frame image (e.g., the current frame is the blue component of a
| red, green, blue image). Note, that it is possible to transmit
| multiple images in succession.

It has to be set to 1 for single-frame images, too!

Proposal: "...is the only frame or the last frame of a multi frame

I also don't understand the last sentence. What's its relevance for the

| # SANE_PFLAG_MORE_IMAGES (bit 1, bitvalue 2) is set to 1 to indicate
| further pending images. It is permissible to set that value to 1 "in
| good faith", as it has to be determined at a very early time, where it
| might not be detectable, if there actually are more images to
| transfer. E.g. you will usually not know if the document feeder
| contains further pages when starting to scan the current one. Thus you
| are allowed to set that bit but later fail at sane_start().

So this flag is intended to flag the availability of some sort of ADF,
or film holder? I didn't understand this on my first read. Maybe
rewrite like this:

"# SANE_PFLAG_MORE_IMAGES (bit 1, bitvalue 2) is set to 1 to indicate
   further pending images. The frontend is expected to call sane_start
   again after the end of the current scan to get more images, e.g. from an
   automatic document feeder. It is permissible to set that value to 1
   "in ..."

| # SANE_PFLAG_NEW_PAGE (bit 2, bitvalue 4) is set to 1 to indicate that
| the current frame comes from a new physical page. This bit is of
| informational character only to help frontends to group multi-image
| scans.

As far as I know there is no way yet to specify multi-image scans on
one page. Do we need this feature at all? Can't this be done easier in
the frontend by selecting a big eough scan area to get all the images?

| Member bytes_per_line specifies the number of bytes that comprise one
| scan line. If bytes_per_line is set to 0, which can currently only be
| the case for SANE_FRAME_MIME, the frontend shall not assume a constant
| line length. Instead it should simply try to read until
| SANE_STATUS_EOF with an arbitrary block length. 

There may be no concept of line length at all, e.g. if compressed data
is saved in blocks, not lines. Also, it's not necessary to call
sane_read with a buffer length of one scan_line so I wouldn't mention
the block size at all.


" Member bytes_per_line specifies the number of bytes that comprise one
  scan line. For SANE_FRAME_MIME, this value may not be applicable and
  must be set to 0. In this case the frontend should call sane_read()
  until SANE_STATUS_EOF is returned."
By the way: why 0 and not -1 as in lines?

| Member depth specifies the number of bits per sample. Note, that only
| 0 (for not applicable), 1, and n*8 are allowed values. Data with other
| depths has to be scaled up accordingly. 

Same here: Why not -1? Mabye better: "..., and multiples of 8 are
allowed values". n is used in the formula below this paragraph and
means pixels per line.

| Assume B is the number of channels in the frame, then the bit depth d
| (as given by member depth) and the number of pixels per line n (as
| given by this member pixels_per_line) are related to c, the number of
| bytes per line (as given by member bytes_per_line) as follows: 

I'm not sure if the formula that follows after that paragraph is
correct. This issue was already mentioned by Abel Deuring during the
last discussion. The symbol \lceil (looks like a [ bracket without the
bottom part) means "round up" as far as I know. So the second part of
formula is wrong. Just using B * n * d/8 should do the trick (because
d is always a multiple of 8 or is 1).

| Member format_desc is used for the new frametypes SANE_FRAME_RAW and
| SANE_FRAME_MIME. Its meaning differs between the two types: 

We don't have any old types anymore, so:

"Member format_desc is used to describe the details of frame formats.
 Its meaning differs between the two types: "
> # SANE_FRAME_MIME: The format_desc contains the MIME type/subtype
> *(;parameter) fields as described in RFC 1521, 4. The Content-Type
> header field, without the prefixing "Content-Type:".

It took me 4 times to parse this sentence :-) 

First, you must know RFCs to understand what *( ... ) means. Second,
it wasn't obvious for me, that the sentence didn't end after "4."

So maybe: 

`SANE_FRAME_MIME: format_desc contains the MIME Content-Type: header
field as described in RFC 1521 (section 4) without the prefixing

> Note, that it is discouraged to transfer proprietary file formats
> over SANE. If at all possible, please stick to the IANA assigned MIME
> types, and make sure the data stream is compliant with the
> corresponding specification.

I like clear words:

"MIME Types and subtypes should be either chosen from the RFC or from the
list of IANA-approved values. The data stream must be compliant with
the corresponding specification."

| When data is transmitted with the frame type SANE_FRAME_MIME all data
| has to be transmitted within one frame, multiple frames are not
| allowed (so the flag last_frame has to be set when using this frame
| type). A fully compliant SANE backend is required to transmit in
| either SANE native frametypes, or in a MIME type, for which a
| converting meta backend exists and is freely available for all
| platforms.

"is freely available for all platforms" matches no software at all
(even without "freely"). It won't run on the Commodore C-16, for sure

I don't think the restriction is useful, and it's undermined in the
next paragraph anyway. If the device returns image data, it can and
should be able to use SANE_FRAME_RAW at least optionally. If it can't,
because it's e.g. a barcode scanner that only returns numbers, it
doesn't make sense to impose such a restriction.

We could define that all SANE devices must return some kind of image
data, however, this is a restriction of "SANE is an application
programming interface (API) that provides standardized access to any
raster image scanner hardware". The reason is, that a barcode scanner
is a raster image scanner hardware, it just doesn't return the bitmap
to the computer. I wouln't like that.

So my proposal is to remove the sentences starting from "A fully
compliant.." and the following paragraph and instead write the
following as a not indented paragraph:

"A SANE backend must be able to at least optionally transmit
SANE_FRAME_RAW (possibly with the help of a meta backend), if the
hardware supports delivering image data. For data that doesn't
comprise images, it's admisable to only provide MIME frames. As a
general principle, if there are several choices, the format that
is most widely implemented should be used."

I think it's pretty clear that a frontend can save unknown file
formats to a file so it not necessary to mention it here.

| Note, that for frontends that are able to parse a given MIME type
| internally, it is perfectly permissible to ignore the extension part
| of the proposed filename and only make use of the basename, when using
| internal save algorithms for different formats.
| In any case, if the frontend makes use of this field, the frontend
| must mangle this proposal or the final filename it produces with its
| help to suit local filesystem restrictions.
| Special care should be taken not to cause security flaws this way. For
| Unix, that means killing out all path separators (/) [to avoid to save
| away stuff in obscure places or create critical files like
| /etc/hosts.allow] and avoiding to overwrite existing files. (Creating
| of leading dot files - like .rhosts - is not an issue here, because
| that's only a proposed filename extension as mentioned above. 

This is slightly off-topic, it doesn't explain the backend-frontend
interface. So if it should stay, I would shorten the text and put it
into a "frontend impementation not" like it's done for the backends.
Or, even better, put it into a yet-to-write frontend-writing.txt.

| The string proposed_comment can be used to transmit additional image
| data, that can be stored in the comment areas several fileformats
| offer. It can contain any textual information the backend wishes to
| convey to the user, like date/time of exposure, enganged filters, etc.

Nice, but proposed_comment is not mentioned in struct SANE_Parameters.

And I would use "additional image information", because data sounds
like pixel data. Add: "Set to "" if unused".

| The members dpi_x, dpi_y encode the horizontal and vertical
| resolution.

Which value to use if the resolution is unknown (e.g. for cameras)?

> Note, that multiple-image scans may have different resolutions of
> each image.

Again a reference to a multiple-image scan that is not yet

| The member reserved is an array of 32 bytes (char) to keep the size of
| the struct unchanged when future extensions are done. The backend has
| to set the reserved bytes to 0.