[Python-apps-team] Bug#644444: pkpgcounter does not properly handle all postscript documents with copies or n-up options
Brian Paul Kroth
bpkroth at gmail.com
Wed Oct 5 23:05:48 UTC 2011
Package: pkpgcounter
Version: 3.50-7
Severity: normal
As stated, the current pkpgcounter's postscript.py native implementation
trusts files to be a DSC compliant a little too much. It doesn't
properly handle documents printed n-up or certain copies options.
For example, I had a document that was specified from MS Word for 9
copies of 2 pages, which produced postscript with 9 duplicates of the 2
pages, one of which had a 9 copies tag on it. pkpgcounter marked this
as 162 pages (9*9*2) rather than 18.
Similarly, n-up page documents count the individual pages rather than
physical pages.
I have attached a diff for this that I have been testing that simply
attempts to detect either of those options and then falls back to the
ghostscript rending method.
I attempted to pass this info to the original dev, but haven't gotten
any response.
Let me know if you need any more details or the sample postscript.
Thanks,
Brian
-- System Information:
Debian Release: 6.0.2
APT prefers stable
APT policy: (500, 'stable'), (120, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages pkpgcounter depends on:
ii ghostscript 8.71~dfsg2-9 The GPL Ghostscript PostScript/PDF
ii python 2.6.6-3+squeeze6 interactive high-level object-orie
ii python-imaging 1.1.7-2 Python Imaging Library
ii python-support 1.0.10 automated rebuilding support for P
Versions of packages pkpgcounter recommends:
pn imagemagick <none> (no description available)
pn python-psyco <none> (no description available)
pn texlive-latex-base <none> (no description available)
pn xauth <none> (no description available)
pn xvfb <none> (no description available)
Versions of packages pkpgcounter suggests:
pn abiword <none> (no description available)
-- no debconf information
-- debsums errors found:
debsums: changed file /usr/share/pyshared/pkpgpdls/postscript.py (from pkpgcounter package)
*** /filespace/people/b/bpkroth/src/postscript.diff
--- postscript.py.orig 2011-10-04 13:53:32.000000000 -0500
+++ postscript.py 2011-10-05 17:48:30.000000000 -0500
@@ -28,6 +28,8 @@
import pdlparser
import inkcoverage
+import re
+
class Parser(pdlparser.PDLParser) :
"""A parser for PostScript documents."""
totiffcommands = [ 'gs -sDEVICE=tiff24nc -dPARANOIDSAFER -dNOPAUSE -dBATCH -dQUIET -r"%(dpi)i" -sOutputFile="%(outfname)s" "%(infname)s"' ]
@@ -54,7 +56,12 @@
if self.isMissing(self.required) :
raise pdlparser.PDLParserError, "The gs interpreter is nowhere to be found in your PATH (%s)" % os.environ.get("PATH", "")
infname = self.filename
- command = 'gs -sDEVICE=bbox -dPARANOIDSAFER -dNOPAUSE -dBATCH -dQUIET "%(infname)s" 2>&1 | grep -c "%%HiResBoundingBox:" 2>/dev/null'
+ # Actualy this one reports twice as much with older versions of gs (eg: 8.62.dfsg.1-3.2lenny5)
+ #command = 'gs -sDEVICE=bbox -dPARANOIDSAFER -dNOPAUSE -dBATCH -dQUIET "%(infname)s" 2>&1 | grep -c "%%HiResBoundingBox:" 2>/dev/null'
+ # This seems to be faster and just as accurate.
+ # http://en.wikibooks.org/wiki/PostScript_FAQ#How_to_count_pages_in_a_PS_file.3F
+ # NOTE: As a hack we're hiding the broken pipe messages that yes sometimes spits out.
+ command = 'yes 2>/dev/null | gs -q -dBATCH -dPARANOIDSAFER -sDEVICE=nullpage "%(infname)s" 2>&1 | grep -c showpage 2>/dev/null'
pagecount = 0
fromchild = os.popen(command % locals(), "r")
try :
@@ -66,7 +73,11 @@
if fromchild.close() is not None :
raise pdlparser.PDLParserError, "Problem during analysis of Binary PostScript document"
self.logdebug("GhostScript said : %s pages" % pagecount)
- return pagecount * self.copies
+ # recent versions of ghostscript (at least >= 8.71~dfsg2-9, though
+ # possibly earlier) seem to process copies correctly, even for goofy
+ # windows output that produces both individual pages and a copy tag
+ #return pagecount * self.copies
+ return pagecount
def natively(self) :
"""Count pages in a DSC compliant PostScript document."""
@@ -78,6 +89,7 @@
prescribe = False # Kyocera's Prescribe commands
acrobatmarker = False
pagescomment = None
+ scalepattern = re.compile("[0-9].*\s+scale([^f]|$)");
for line in self.infile :
line = line.strip()
if (not prescribe) and line.startswith(r"%%BeginResource: procset pdf") \
@@ -167,11 +179,21 @@
else :
if number > self.pages[pagecount]["copies"] :
self.pages[pagecount]["copies"] = number
+ # The scale operator is often used in n-up style printing, which
+ # this native method doesn't handle. Set notrust and make
+ # ghostscript do it for us.
+ # NOTE: This might catch non-n-up printing, but just means we'll
+ # take a small performance hit by calling out to ghostscript.
+ elif scalepattern.search(line) :
+ notrust = True
previousline = line
# extract max number of copies to please the ghostscript parser, just
# in case we will use it later
self.copies = max([ v["copies"] for (k, v) in self.pages.items() ])
+ # See notes above regarding ghostscript and copies.
+ if self.copies > 1 :
+ notrust = True
# now apply the number of copies to each page
if not pagecount and pagescomment :
@@ -189,10 +211,13 @@
"""Count pages in PostScript document."""
self.copies = 1
(nbpages, notrust) = self.natively()
+ #print "nbpages: %d, notrust: %d" % (nbpages, notrust)
newnbpages = nbpages
if notrust or not nbpages :
try :
newnbpages = self.throughGhostScript()
except pdlparser.PDLParserError, msg :
self.logdebug(msg)
- return max(nbpages, newnbpages)
+ # max() is probably the wrong thing to do
+ #return max(nbpages, newnbpages)
+ return newnbpages
More information about the Python-apps-team
mailing list