[xml/sgml-pkgs] Bug#676717: dh_installcatalogs transition and w3c-dtd-xhtml removal bugs

Jakub Wilk jwilk at debian.org
Tue Jun 26 12:59:52 UTC 2012


* Jakub Wilk <jwilk at debian.org>, 2012-06-26, 08:31:
>We should implement here a real TR9401 parser. This shouldn't be very 
>difficult. I'll try to write such a parser today.

As promised, attached.

I'll leave integrating this with update-catalog as an excercise to 
reader^WHelmut. :)

-- 
Jakub Wilk
-------------- next part --------------
#!/usr/bin/perl

use strict;
use warnings;

# Reference: https://www.oasis-open.org/specs/a401.htm

my $catalog_tokens = qr{
( (?: \s+ | -- .*? --)+ # whitespace and comments
| ' .*? ' | " .*? " # literal
| (?: \S+ )+ # other tokens
)
}sx;

sub parse_catalog {
    my ($filename) = @_;
    open my $fh, '<', $filename;
    local $/;
    my $contents = <$fh>;
    my $in_catalog = 0;
    while ($contents =~ m/$catalog_tokens/g) {
        my $token = $1;
        if ($in_catalog) {
            next if $token =~ m/^\s|^--/;
            $token =~ s/^(['"])(.*)\1$/$2/;
            print "$token\n";
            $in_catalog = 0;
        } elsif ("\L$token" eq 'catalog') {
            $in_catalog = 1;
        }
    }
    close $fh;
}

map { parse_catalog $_ } @ARGV;

# vim:ts=4 sw=4 et


More information about the debian-xml-sgml-pkgs mailing list