[Pkg-haskell-commits] [SCM] haskell-testpack branch, master, updated. debian/1.0.2-1-4-gb0d6b36
John Goerzen
jgoerzen at complete.org
Fri Apr 23 14:54:34 UTC 2010
The following commit has been merged in the master branch:
commit a0eb2f4410db027a06c0e77312b4d788159ddcc4
Author: John Goerzen <jgoerzen at complete.org>
Date: Sat Jan 29 00:29:47 2005 +0100
Added Pesco's regexp library; updated copyright; fixed Printf.hs to work with cpphs
(jgoerzen at complete.org--projects/missingh--head--0.7--patch-180)
index b9c5c56..07cf2b0 100644
@@ -153,5 +153,19 @@ University wherever it may occur in that file.
The code was obtained from
+MissingH.Regex.Pesco was written by Sven Moritz Hallberg,
+<pesco@@gmx.de>, on December 6, 2004. It was pulled from the darcs
+repository at http://www.scannedinavian.org/~pesco/code/Regex/ as of
+January 28, 2005.
+Sven's license is:
+ All of this is mine! Take it and your soul shall be damned forever.
+ But you can keep it and do with it whatever you want.
+ -- SMH, Creator, Programmer.
arch-tag: copyright statement
diff --git a/ChangeLog b/ChangeLog
index f1cf989..4ac9789 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -2,6 +2,27 @@
# arch-tag: automatic-ChangeLog--jgoerzen at complete.org--projects/missingh--head--0.7
+2005-01-28 17:29:47 GMT John Goerzen <jgoerzen at complete.org> patch-180
+ Summary:
+ Added Pesco's regexp library; updated copyright; fixed Printf.hs to work with cpphs
+ Revision:
+ missingh--head--0.7--patch-180
+ new files:
+ libsrc/MissingH/Regex/.arch-ids/=id
+ libsrc/MissingH/Regex/.arch-ids/Pesco.lhs.id
+ libsrc/MissingH/Regex/Pesco.lhs
+ modified files:
+ COPYRIGHT ChangeLog Makefile MissingH.cabal
+ libsrc/MissingH/Printf.hs
+ new directories:
+ libsrc/MissingH/Regex libsrc/MissingH/Regex/.arch-ids
2005-01-27 18:44:49 GMT John Goerzen <jgoerzen at complete.org> patch-179
diff --git a/Makefile b/Makefile
index 7f6c6ed..8c50385 100644
--- a/Makefile
+++ b/Makefile
@@ -1,5 +1,5 @@
# arch-tag: Main Makefile
-# Copyright (C) 2004 John Goerzen <jgoerzen at complete.org>
+# Copyright (C) 2004 - 2005 John Goerzen <jgoerzen at complete.org>
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
diff --git a/MissingH.cabal b/MissingH.cabal
index b587099..ee55654 100644
--- a/MissingH.cabal
+++ b/MissingH.cabal
@@ -13,6 +13,7 @@ Exposed-Modules: MissingH.IO, MissingH.IO.Binary, MissingH.List,
MissingH.Hsemail.Rfc2234, MissingH.Hsemail.Rfc2821,
+ MissingH.Regex.Pesco,
MissingH.FiniteMap, MissingH.Path, MissingH.Path.NameManip,
diff --git a/libsrc/MissingH/Printf.hs b/libsrc/MissingH/Printf.hs
index ef9cfe4..dc41fe2 100644
--- a/libsrc/MissingH/Printf.hs
+++ b/libsrc/MissingH/Printf.hs
@@ -277,7 +277,6 @@ pio = id
Welcome to the Haskell printf support. This module is designed to emulate the
C printf(3) family of functions. Here are some quick introductory examples:
>vsprintf "Hello"
>> "Hello"
@@ -360,8 +359,6 @@ in an association list or FiniteMap passed in. Python programmers will
find this very similar to Python's @%@ operator, which can look up inside
Here's an example:
>import MissingH.Printf
@@ -401,7 +398,7 @@ These make less sense in Haskell.
Please be aware of the following as you use this module:
If the type system cannot determine the type of an argument (as in the
-numeric literals in the examples at "MissingH.Printf#examples"), you may have to explicitly cast it to something.
+numeric literals in the examples in the introduction), you may have to explicitly cast it to something.
In practice, this is only a problem in interactive situations like ghci or
@@ -497,8 +494,7 @@ When applied to the same example file as before, the output will be:
>3 0000000015 000000000F
>4 0000000023 0000000017
-There's a full association list example at
+There's a full association list example elsewhere in this document.
diff --git a/libsrc/MissingH/Regex/Pesco.lhs b/libsrc/MissingH/Regex/Pesco.lhs
new file mode 100644
index 0000000..4e6cdd0
--- /dev/null
+++ b/libsrc/MissingH/Regex/Pesco.lhs
@@ -0,0 +1,356 @@
+%include lhs2TeX.fmt
+%include lhs2TeX.sty
+%include pescofmt.fmt
+\title { module Pesco.Regex
+ \\
+ \large{--- Regular expression matching ``better than Perl'' ---}
+ }
+\author { Sven Moritz Hallberg \texttt{<pesco@@gmx.de>} }
+\date { December 6th, 2004 }
+% 0.2: December 6th, 2004 SMH
+% More documentation, code cleanup, export list.
+% 0.1: November 30th, 2004 SMH
+% Initial version.
+This document is a literate Haskell module.
+It wraps |Text.Regex|. It exposes functions for compiling,
+matching, and substitution.
+The functions are overloaded on the
+type of thing to match against, so strings or compiled regexes
+can be passed interchangeably wherever a regular expression
+is expected. The substitution operator is a polyvariadic function taking
+any combination of replacement strings and submatch references (|Int|s)
+as arguments, thus
+avoiding errors from parsing or constructing a replacement string with
+escape characters.
+{-# OPTIONS -fglasgow-exts #-}
+{- | Documentation for this module can be found in the doc directory
+in the MissingH distribution. -}
+module MissingH.Regex.Pesco
+ ( Regex (match) -- type class
+ , Match (..) -- data type
+ , Subst -- type class
+ , (=~), (~=)
+ , ($~), (~$)
+ , (//~), (~//)
+ , (/~), (~/)
+ , CRegex -- data type
+ , Rexopt (..) -- data type
+ , cregex
+ , subst
+ , subst1
+ , test -- to be removed
+ )
+ where
+import qualified Text.Regex as TR
+import Data.Maybe (isJust)
+import Data.List (unfoldr)
+% ========================================================================
+When asked the inevitable\footnote{``Does it support regexes?''}
+by a Perl programmer, what do we answer?
+Of course it does, it uses the POSIX regex library,
+just import |Text.Regex|, and have a look at
+|mkRegex| and |matchRegex|\ldots
+which to the Perl programmer must sound like ``Basically, it works as in C''.
+Therefore I'd like to answer instead
+Basically, it works just as in Perl.
+followed by appropriate mumbling about strong typing and syntax aesthetics.
+Well, of course Haskell neither can nor should absolutely resemble Perl.
+I've tried to catch the essence that makes the use of regular expressions
+so easy in Perl while still doing so
+in what a prototypical Haskell programmer could consider ``the right way''.
+% ========================================================================
+Motivated by the above, I export operators for the common regex operations:
+\item[|s =~ r|] tests whether string |s| matches the regular expression |r|.
+(=~) :: (Regex rho) => String -> rho -> Bool
+Notice the type class |Regex|. It alleviates the need to explicitly
+``compile'' or ``make'' regexes. You can pass compiled expressions or
+plain strings anywhere a |Regex| is expected.
+\item[|s $~ r|] applies regex |r| to the string |s|, yielding the list
+of all matches.
+($~) :: (Regex rho) => String -> rho -> [Match]
+The |Match| data type will be defined shortly. It's a record telling
+which substring of |s| matched, as well as any subexpression matches.
+\item[|(s //~ r) p |\ldots] replaces any match of |r| in |s| with
+pattern |p |\ldots.
+(//~) :: (Regex rho, Subst pi) => String -> rho -> pi
+Notice the type class |Subst|. This operator takes a variable number of
+arguments of possibly different types. The mechanism will become clear
+when class |Subst| is defined. The effect, anyway, is that |p |\ldots
+in the above can be an arbitrary sequence of |String| or |Int| arguments.
+The |Int|s represent submatch references, so for example,
+test = ("Hello, World!" //~ "W(o)rld") "Hell" (1::Int) :: String
+yields |"Hello, Hello!"|.
+\item[|(s /~ r) p |\ldots] is like |//~| but replaces only the first
+(/~) :: (Regex rho, Subst pi) => String -> rho -> pi
+In addition to the above, each operator has a ``flipped'' sibling, the
+rule being that ``the pattern goes on the same side as the
+tilde\footnote{In plain text code, |=~| is written as @=~@ and |~=| as @~=@,
+so |=~| is the one taking the pattern on the right.} (@~@)''.
+(~=) :: (Regex rho) => rho -> String -> Bool
+(~$) :: (Regex rho) => rho -> String -> [Match]
+(~//) :: (Regex rho, Subst pi) => rho -> String -> pi
+(~/) :: (Regex rho, Subst pi) => rho -> String -> pi
+All exported operators are non-associative and bind with priority 4. That
+makes them bind looser than |++| and |:|, similar to |==|.
+infix 4 =~, ~=, $~, ~$, ~//, //~, ~/, /~
+All operators are based on the fundamental pattern matching operation
+|match|, which is the single method of class |Regex|:
+class Regex rho where
+ match :: rho -> String -> Maybe Match
+For the purpose of substitution, functions of a non-polyvariadic
+type are also provided.
+subst :: (Regex rho) => rho -> [Repl] -> String -> String
+subst1 :: (Regex rho) => rho -> [Repl] -> String -> String
+|subst| performs a global substitution while |subst1| only replaces the
+first match. Both take the replacement pattern as a list of |Repl|s,
+representing consecutive parts of the replacement pattern. Each |Repl|
+is either a literal replacement string or a submatch reference.
+data Repl = Repl_lit String
+ | Repl_ref Int
+Finally, the |Match| data type is a record containing
+\item the substring preceding the match (|m_before|),
+\item the matching substring itself (|m_match|),
+\item the rest of the string after the match (|m_after|), and
+\item the list of strings matching the regex's subexpressions
+data Match = Match { m_before :: String
+ , m_match :: String
+ , m_after :: String
+ , m_submatches :: [String]
+ }
+ deriving (Eq, Show, Read)
+Note that the list of subexpression matches does \emph{not} include the
+match itself, so for example,
+|m_submatches (head ("Foo" $~ "F(o)"))| is |["o"]|, not |["Fo", "o"]|.
+% ========================================================================
+Compiled regular expressions are represented by the abstract data type
+|CRegex|, which wraps |Regex| from |Text.Regex|.
+newtype CRegex = CRegex TR.Regex
+They are created from regular expression strings by the function |cregex|,
+which can take options:
+data Rexopt = Nocase | Linematch deriving (Eq,Show,Read)
+|Nocase| makes the matching case-insensitive. |Linematch| results in
+|'^'| and |'$'| matching start and end of lines instead of the whole
+string, and |'.'| not matching the newline character.
+By default, matches are case-sensitive and |'^'| and |'$'| refer
+to the whole string.
+cregex :: [Rexopt] -> String -> CRegex
+cregex os s = CRegex (TR.mkRegexWithOpts s lm cs)
+ where
+ lm = elem Linematch os
+ cs = not (elem Nocase os)
+The matching operation is overloaded on the regex type. Matching of
+compiled regexes is performed by a helper |match_cregex|.
+If the regex is passed as a plain string
+it is compiled with default options
+before being passed to |match_cregex|.
+instance Regex CRegex where
+ match = match_cregex
+instance Regex String where
+ match = match_cregex . cregex []
+The |match_cregex| function is a wrapper around
+|Text.Regex.matchRegexAll| whose only purpose is to unwrap the |CRegex|
+argument and to wrap the result in a |Match|.
+match_cregex :: CRegex -> String -> Maybe Match
+match_cregex (CRegex cr) str =
+ do
+ (b,m,a,s) <- TR.matchRegexAll cr str
+ return $ Match { m_before = b
+ , m_match = m
+ , m_after = a
+ , m_submatches = s
+ }
+Now, the match testing operators are trivial to define.
+(~=) r = isJust . match r
+I define |=~| in terms of |~=| and not the
+other way around, so that applying |(r ~=)| to several
+strings compiles |r| only once (when |r| is a string). The
+same note applies to all other operators as well.
+(=~) = flip (~=)
+($~) = flip (~$)
+The |~$| operator must find all matches within the given string.
+That can be achieved by consecutively applying |match| to the
+|m_after| field of the previous match, if any. That's an instance
+of |unfoldr|.
+match_all :: (Regex rho) => rho -> String -> [Match]
+match_all r = unfoldr step
+ where
+ step :: String -> Maybe (Match, String)
+ step x = do ma <- match r x
+ return (ma, m_after ma)
+This way, however, each match's |m_before| field only extends
+to the end of the previous match. The list returned
+by |match_all| is only meaningful in its original order.
+For ther operators, I expand the matches to span the entire
+(~$) r = expand_matches . match_all r
+Let |m| be a match, as retured by |match_all|. If |m| is the first match in
+the list, it does not need to be
+expanded. It's expansion is the empty string |""|.
+If, on the other hand, |m| has a predecessor |p|, its expansion is
+|m_before p ++ m_match p|. So the list of expansions for all matches
+is given by:
+expansions :: [Match] -> [String]
+expansions ms = "" : map (\p -> m_before p ++ m_match p) ms
+That list contains one extraneous entry at the end, but that can
+be ignored because |expand_matches| is now a simple instance
+of |zipWith|\footnote{\textsc{Applause!}}.
+expand_matches :: [Match] -> [Match]
+expand_matches ms = zipWith expand ms (expansions ms)
+ where
+ expand m s = m { m_before = s ++ m_before m }
+% ========================================================================
+class Subst pi where
+ subst' :: String -> [Match] -> [Repl] -> pi
+instance Subst String where
+ subst' s ms rs = replace ms (reverse rs) s
+instance (Subst pi) => Subst (String -> pi) where
+ subst' s ms rs = \x -> subst' s ms (Repl_lit x : rs)
+instance (Subst pi) => Subst (Int -> pi) where
+ subst' s ms rs = \i -> subst' s ms (Repl_ref i : rs)
+replace :: [Match] -> [Repl] -> String -> String
+replace [] _ s = s
+replace (m:ms) rs _ = ( m_before m
+ ++ concatMap replstr rs
+ ++ replace ms rs (m_after m)
+ )
+ where
+ replstr r = case r of
+ Repl_lit x -> x
+ Repl_ref 0 -> m_match m
+ Repl_ref i -> m_submatches m !! (i-1)
+subst r = \rs s -> replace (match_all r s) rs s
+subst1 r = \rs s -> replace (take 1 (match_all r s)) rs s
+(~//) r = \s -> subst' s (match_all r s) []
+(~/) r = \s -> subst' s (take 1 (match_all r s)) []
+(//~) = flip (~//)
+(/~) = flip (~/)
