Bug#1081416: poppler-utils: pdftocairo docs: man page BNF expresses a mandatory parameter as optional & somewhat hides quality reduction

Manny debbug.poppler-utils at sideload.33mail.com
Wed Sep 11 14:58:39 BST 2024


Package: poppler-utils
Version: 22.12.0-2+b1
Severity: minor
X-Debbugs-Cc: debbug.poppler-utils at sideload.33mail.com

The pdftocairo man page starts with:

> NAME
>        pdftocairo - Portable Document Format (PDF) to PNG/JPEG/TIFF/PDF/PS/EPS/SVG using cairo
> SYNOPSIS
>        pdftocairo [options] PDF-file [output-file]
> DESCRIPTION
>        pdftocairo  converts  Portable  Document Format (PDF) files …

Bug ①: That BNF tells the user that they can simply run pdftocairo on
a PDF doc with no options (as the square brackets imply that the token
is not required). This immediately leaves the user wondering what
effect that would have. In reality, pdftocairo terminates with an
error. So the BNF needs to be fixed.

Bug ②: It’s not obvious from the man page that quality will be
altered. Consider the extraction_works.pdf sample that was attached
to:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076283

The PDF is 2.1mb. When pdfimages extracts the PNG file (which involves
no manipulation), the resulting PNG is about the same size as the only
difference is metadata and overhead. But when “pdftocairo -png” is
used, the output PNG is about half the size:

===8<------------------------------
$ identify pdfimages_extraction_works-000.png
pdfimages_extraction_works-000.png PNG 2550x2452 2550x2452+0+0 8-bit sRGB 2130440B 0.000u 0:00.000
$ identify cairo_extraction_works-1.png
cairo_extraction_works-1.png PNG 1275x2100 1275x2100+0+0 8-bit sRGB 1312420B 0.000u 0:00.000
===8<------------------------------

The resolution was cut in half. Why is that?  Some would say this
fails the principle of least astonishment. Is it that the PDF metadata
includes some parameters about paper size and resolution, and the
embedded image was much larger than necessary for the PDF’s rendering?
The man page says this:

> The image dimensions will depend on the PDF page size and the
> resolution.

That’s clear to careful and meticulous readers but a bit subtle, no?
I think misunderstanding by users can in part be attributed to the
mention of “converts” in the description: “pdftocairo converts
Portable Document Format (PDF) files”. Mere conversion does not lead
the user to expect manipulation. If it would say something like:

  “pdftocairo RENDERS output images with the size properties specified
   by the PDF page spec… Resolution does not necessarily match that of
   the source images contained in the PDF and may be increased or
   decreased.”

it might be more clear to users what to expect. Perhaps even better,
it would also be extra helpful if the output text would inform the
user of what happened. E.g. “output image resolution was decreased by
51% on image 1 page 1, 26% on image 2 page 1, 35% on image 1 page 2,
…”  etc.

It is indeed useful that it generates output that matches the render
quality specified by the PDF. This enables us to repackage a PDF for
transmission such that size is not wasteful for a given quality. But
users could be made more clearly aware of that.

③ (enhancement) It might also be useful if users could specify a
“maintain source quality” option, whereby the output preserves the
internal image parameters. Though I hesitate to suggest this because I
realize that would only be sensible in situations where each page
contains exactly one image to consumes the whole page (like a scanned
doc). Nonetheless, I thought it would be worth mentioning.

-- System Information:
Debian Release: 12.6
  APT prefers stable-updates
  APT policy: (990, 'stable-updates'), (990, 'stable-security'), (990, 'stable'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.10.0-28-amd64 (SMP w/2 CPU threads)
Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages poppler-utils depends on:
ii  libc6          2.36-9+deb12u7
ii  libcairo2      1.16.0-7
ii  libfreetype6   2.12.1+dfsg-5+deb12u3
ii  liblcms2-2     2.14-2
ii  libpoppler126  22.12.0-2+b1
ii  libstdc++6     12.2.0-14

poppler-utils recommends no packages.

poppler-utils suggests no packages.

-- no debconf information



More information about the Pkg-freedesktop-maintainers mailing list