Bug#856123: pandoc: Convert markdown to bad docbook for non-ascii titles

Petter Reinholdtsen pere at hungry.com
Sat Feb 25 11:19:14 UTC 2017


Package: pandoc
Version: 1.12.4.2~dfsg-1+b14

This issue is also present with pandoc version 1.17.2~dfsg-3 in
unstable.

When trying to convert a Norwegian text file from markdown to docbook
using pandoc, the resulting docbook file contain section IDs rejected by
dblatex if the section title have non-ascii characters.

This script demonstrate the problem:

=============================================================================
#!/bin/sh
# Script demonstrating invalid docbook output from pandoc.

cat << EOF > dblatex-nonascii-title-book.xml
<?xml version='1.0' encoding='UTF-8' ?>
<?xml-stylesheet href="docbook-css-0.4/driver.css" type="text/css"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" []>
<book id="index" lang="nb">
<bookinfo>
  <title>demontrate problem in title</title>
</bookinfo>
<xi:include href="dblatex-nonascii-title-first-level.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
</book>
EOF

cat <<EOF> dblatex-nonascii-title-first-level.xml
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<chapter>
<title>chapter title</title>
<xi:include href="dblatex-nonascii-title-second-level.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
</chapter>
EOF

cat <<EOF> dblatex-nonascii-title-second-level.md
The title with æøå  ÆØÅ
========================
For some reason, this do not work.

subtitle with æøå ÆØÅ
---------------------
This do not work either
EOF
pandoc -f markdown -t docbook dblatex-nonascii-title-second-level.md \
       --output dblatex-nonascii-title-second-level.xml
# This result in <sect1 id="the-title-with-æøå-æøå">, which is an
# invalid docbook ID according to dblatex.
dblatex dblatex-nonascii-title-book.xml
=============================================================================

It would be nice if pandoc created IDs with only ascii characters.

If you believe this is a problem with dblatex instead, please reassign
it there.  I've tested using dblatex 0.3.5-2 and 0.3.9-1.

-- 
Happy hacking
Petter Reinholdtsen



More information about the Pkg-haskell-maintainers mailing list