[xml/sgml-pkgs] Bug#692741: Better support for pdftohtml output (specific profile?)
Mathieu Malaterre
malat at debian.org
Thu Nov 8 12:33:45 UTC 2012
Package: herold
Version: 6.0.2-1
Severity: normal
It would be really nice if there was a profile for pdftohtml output. Currently pdftohtml generates something like:
<b>Scope</b><br>
TIFF describes image data that typically comes from scanners, frame grabbers,<br>and paint- and photo-retouching programs.<br>
TIFF is not a printer language or page description language. The purpose of TIFF<br>is to describe and store raster image data.<br>
A primary goal of TIFF is to provide a rich environment within which applica-<br>tions can exchange image data. This richness is required to take advantage of the<br>varying capabilities of scanners and other imaging devices.<br>
Though TIFF is a rich format, it can easily be used for simple scanners and appli-<br>cations as well because the number of required fields is small.<br>
TIFF will be enhanced on a continuing basis as new imaging needs arise. A high<br>priority has been given to structuring TIFF so that future enhancements can be<br>added without causing unnecessary hardship to developers.<br>
which get converted into (no profile):
<para><emphasis remap="b:86:2" role="bold">Scope</emphasis></para>
<para> TIFF describes image data that typically comes from scanners, frame grabbers,</para>
<para>and paint- and photo-retouching programs.</para>
<para> TIFF is not a printer language or page description language. The purpose of TIFF</para>
<para>is to describe and store raster image data.</para>
<para> A primary goal of TIFF is to provide a rich environment within which applica-</para>
<para>tions can exchange image data. This richness is required to take advantage of the</para>
<para>varying capabilities of scanners and other imaging devices.</para>
<para> Though TIFF is a rich format, it can easily be used for simple scanners and appli-</para>
<para>cations as well because the number of required fields is small.</para>
<para> TIFF will be enhanced on a continuing basis as new imaging needs arise. A high</para>
<para>priority has been given to structuring TIFF so that future enhancements can be</para>
<para>added without causing unnecessary hardship to developers.</para>
This make is difficult to use in docbook (too many <para/>).
Also pdftohtml extract PDF headers and place it into HTML/META elements. Eg:
<HEAD>
<TITLE>TIFF6.final.9509</TITLE>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<META name="generator" content="pdftohtml 0.36">
<META name="author" content="Adobe Systems Inc.">
<META name="keywords" content="TIFF,,.TIF,,TIF">
<META name="date" content="1995-09-14T14:32:50+00:00">
<META name="subject" content="TIFF 6.0">
</HEAD>
It would be really nice to have them in docbook/info !
Thanks
-- System Information:
Debian Release: 6.0.6
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable'), (200, 'testing'), (100, 'unstable')
Architecture: amd64 (x86_64)
Kernel: Linux 3.2.0-0.bpo.3-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages herold depends on:
ii antlr3 3.2-5 language tool for constructing rec
ii libcommons-codec-java 1.4-2 encoder and decoders such as Base6
ii libcommons-jxpath-jav 1.3-3 manipulate javabean using XPath sy
ii libcommons-logging-ja 1.1.1-8 commmon wrapper interface for seve
ii libxml-commons-resolv 1.2-7~bpo60+1 XML entity and URI resolver librar
ii libxmlgraphics-common 1.4.dfsg-4~bpo60+1 reusable components used by Batik
herold recommends no packages.
herold suggests no packages.
-- debconf-show failed
More information about the debian-xml-sgml-pkgs
mailing list