[Pkg-haskell-commits] [SCM] haskell-testpack branch, master, updated. debian/1.0.2-1-4-gb0d6b36

John Goerzen jgoerzen at complete.org
Fri Apr 23 14:54:34 UTC 2010


The following commit has been merged in the master branch:
commit a0eb2f4410db027a06c0e77312b4d788159ddcc4
Author: John Goerzen <jgoerzen at complete.org>
Date:   Sat Jan 29 00:29:47 2005 +0100

    Added Pesco's regexp library; updated copyright; fixed Printf.hs to work with cpphs
    
    Keywords:
    
    
    (jgoerzen at complete.org--projects/missingh--head--0.7--patch-180)

diff --git a/COPYRIGHT b/COPYRIGHT
index b9c5c56..07cf2b0 100644
--- a/COPYRIGHT
+++ b/COPYRIGHT
@@ -153,5 +153,19 @@ University wherever it may occur in that file.
 The code was obtained from
 http://urchin.earth.li/darcs/ian/inflate/Inflate.lhs
 
+----------------------------------------------------
+MissingH.Regex.Pesco was written by Sven Moritz Hallberg,
+<pesco@@gmx.de>, on December 6, 2004.  It was pulled from the darcs
+repository at http://www.scannedinavian.org/~pesco/code/Regex/ as of
+January 28, 2005.
+
+Sven's license is:
+
+TO THOSE CONCERNED ABOUT THE LAW:
+
+         All of this is mine! Take it and your soul shall be damned forever.
+         But you can keep it and do with it whatever you want.
+           -- SMH, Creator, Programmer.
+
 ------------------------------------------------------------------
 arch-tag: copyright statement
diff --git a/ChangeLog b/ChangeLog
index f1cf989..4ac9789 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -2,6 +2,27 @@
 # arch-tag: automatic-ChangeLog--jgoerzen at complete.org--projects/missingh--head--0.7
 #
 
+2005-01-28 17:29:47 GMT	John Goerzen <jgoerzen at complete.org>	patch-180
+
+    Summary:
+      Added Pesco's regexp library; updated copyright; fixed Printf.hs to work with cpphs
+    Revision:
+      missingh--head--0.7--patch-180
+
+
+    new files:
+     libsrc/MissingH/Regex/.arch-ids/=id
+     libsrc/MissingH/Regex/.arch-ids/Pesco.lhs.id
+     libsrc/MissingH/Regex/Pesco.lhs
+
+    modified files:
+     COPYRIGHT ChangeLog Makefile MissingH.cabal
+     libsrc/MissingH/Printf.hs
+
+    new directories:
+     libsrc/MissingH/Regex libsrc/MissingH/Regex/.arch-ids
+
+
 2005-01-27 18:44:49 GMT	John Goerzen <jgoerzen at complete.org>	patch-179
 
     Summary:
diff --git a/Makefile b/Makefile
index 7f6c6ed..8c50385 100644
--- a/Makefile
+++ b/Makefile
@@ -1,5 +1,5 @@
 # arch-tag: Main Makefile
-# Copyright (C) 2004 John Goerzen <jgoerzen at complete.org>
+# Copyright (C) 2004 - 2005 John Goerzen <jgoerzen at complete.org>
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
diff --git a/MissingH.cabal b/MissingH.cabal
index b587099..ee55654 100644
--- a/MissingH.cabal
+++ b/MissingH.cabal
@@ -13,6 +13,7 @@ Exposed-Modules: MissingH.IO, MissingH.IO.Binary, MissingH.List,
   MissingH.Email.Sendmail,
   MissingH.Hsemail.Rfc2234, MissingH.Hsemail.Rfc2821, 
     MissingH.Hsemail.Rfc2822,
+  MissingH.Regex.Pesco,
   MissingH.Str,
   MissingH.Cmd,
   MissingH.FiniteMap, MissingH.Path, MissingH.Path.NameManip,
diff --git a/libsrc/MissingH/Printf.hs b/libsrc/MissingH/Printf.hs
index ef9cfe4..dc41fe2 100644
--- a/libsrc/MissingH/Printf.hs
+++ b/libsrc/MissingH/Printf.hs
@@ -277,7 +277,6 @@ pio = id
 Welcome to the Haskell printf support.  This module is designed to emulate the
 C printf(3) family of functions.  Here are some quick introductory examples:
 
-#examples#
 
 >vsprintf "Hello"
 >> "Hello"
@@ -360,8 +359,6 @@ in an association list or FiniteMap passed in.  Python programmers will
 find this very similar to Python's @%@ operator, which can look up inside
 dicts.
 
-#alexample#
-
 Here's an example:
 
 >import MissingH.Printf
@@ -401,7 +398,7 @@ These make less sense in Haskell.
 Please be aware of the following as you use this module:
 
 If the type system cannot determine the type of an argument (as in the
-numeric literals in the examples at "MissingH.Printf#examples"), you may have to explicitly cast it to something.
+numeric literals in the examples in the introduction), you may have to explicitly cast it to something.
 In practice, this is only a problem in interactive situations like ghci or
 hugs.
 
@@ -497,8 +494,7 @@ When applied to the same example file as before, the output will be:
 >3          0000000015 000000000F
 >4          0000000023 0000000017
 
-There's a full association list example at
-"MissingH.Printf#alexample".
+There's a full association list example elsewhere in this document.
 
 -}
 
diff --git a/libsrc/MissingH/Regex/Pesco.lhs b/libsrc/MissingH/Regex/Pesco.lhs
new file mode 100644
index 0000000..4e6cdd0
--- /dev/null
+++ b/libsrc/MissingH/Regex/Pesco.lhs
@@ -0,0 +1,356 @@
+\documentclass[a4paper]{article}
+%include lhs2TeX.fmt
+%include lhs2TeX.sty
+%include pescofmt.fmt
+
+
+\title   {  module Pesco.Regex
+            \\
+            \large{--- Regular expression matching ``better than Perl'' ---}
+         }
+\author  {  Sven Moritz Hallberg \texttt{<pesco@@gmx.de>}  }
+\date    {  December 6th, 2004  }
+
+%% REVISION HISTORY %%
+%
+% 0.2: December 6th, 2004  SMH
+%   More documentation, code cleanup, export list.
+% 0.1: November 30th, 2004  SMH
+%   Initial version.
+
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+This document is a literate Haskell module.
+It wraps |Text.Regex|. It exposes functions for compiling,
+matching, and substitution.
+The functions are overloaded on the
+type of thing to match against, so strings or compiled regexes
+can be passed interchangeably wherever a regular expression
+is expected. The substitution operator is a polyvariadic function taking
+any combination of replacement strings and submatch references (|Int|s)
+as arguments, thus
+avoiding errors from parsing or constructing a replacement string with
+escape characters.
+\end{abstract}
+
+
+\begin{code}
+{-# OPTIONS -fglasgow-exts #-}
+{- | Documentation for this module can be found in the doc directory
+in the MissingH distribution. -}
+
+module MissingH.Regex.Pesco
+    (  Regex (match)       -- type class
+    ,  Match (..)          -- data type
+    ,  Subst               -- type class
+
+    ,  (=~),   (~=)
+    ,  ($~),   (~$)
+    ,  (//~),  (~//)
+    ,  (/~),   (~/)
+
+    ,  CRegex              -- data type
+    ,  Rexopt (..)         -- data type
+    ,  cregex
+    ,  subst
+    ,  subst1
+
+    ,  test                -- to be removed
+    )
+    where
+\end{code}
+\scriptsize
+\begin{code}
+import qualified Text.Regex as TR
+import Data.Maybe (isJust)
+import Data.List (unfoldr)
+\end{code}
+\normalsize
+
+\pagebreak
+
+
+% ========================================================================
+\section*{Motivation}
+When asked the inevitable\footnote{``Does it support regexes?''}
+by a Perl programmer, what do we answer?
+
+\begin{quote}
+Of course it does, it uses the POSIX regex library,
+just import |Text.Regex|, and have a look at
+|mkRegex| and |matchRegex|\ldots
+\end{quote}
+which to the Perl programmer must sound like ``Basically, it works as in C''.
+Therefore I'd like to answer instead
+\begin{quote}
+Basically, it works just as in Perl.
+\end{quote}
+followed by appropriate mumbling about strong typing and syntax aesthetics.
+
+Well, of course Haskell neither can nor should absolutely resemble Perl.
+I've tried to catch the essence that makes the use of regular expressions
+so easy in Perl while still doing so
+in what a prototypical Haskell programmer could consider ``the right way''.
+
+
+% ========================================================================
+\section*{Overview}
+
+Motivated by the above, I export operators for the common regex operations:
+
+\begin{description}
+\item[|s =~ r|] tests whether string |s| matches the regular expression |r|.
+\savecolumns
+\begin{code}
+(=~)   :: (Regex rho) => String -> rho -> Bool
+\end{code}
+Notice the type class |Regex|. It alleviates the need to explicitly
+``compile'' or ``make'' regexes. You can pass compiled expressions or
+plain strings anywhere a |Regex| is expected.
+
+\item[|s $~ r|] applies regex |r| to the string |s|, yielding the list
+of all matches.
+\restorecolumns
+\begin{code}
+($~)   :: (Regex rho) => String -> rho -> [Match]
+\end{code}
+The |Match| data type will be defined shortly. It's a record telling
+which substring of |s| matched, as well as any subexpression matches.
+
+\item[|(s //~ r) p |\ldots] replaces any match of |r| in |s| with
+pattern |p |\ldots.
+\restorecolumns
+\begin{code}
+(//~)  :: (Regex rho, Subst pi) => String -> rho -> pi
+\end{code}
+Notice the type class |Subst|. This operator takes a variable number of
+arguments of possibly different types. The mechanism will become clear
+when class |Subst| is defined. The effect, anyway, is that |p |\ldots
+in the above can be an arbitrary sequence of |String| or |Int| arguments.
+The |Int|s represent submatch references, so for example,
+\restorecolumns
+\begin{code}
+test = ("Hello, World!" //~ "W(o)rld") "Hell" (1::Int) :: String
+\end{code}
+yields |"Hello, Hello!"|.
+
+\item[|(s /~ r) p |\ldots] is like |//~| but replaces only the first
+match.
+\restorecolumns
+\begin{code}
+(/~)   :: (Regex rho, Subst pi) => String -> rho -> pi
+\end{code}
+\end{description}
+
+In addition to the above, each operator has a ``flipped'' sibling, the
+rule being that ``the pattern goes on the same side as the
+tilde\footnote{In plain text code, |=~| is written as @=~@ and |~=| as @~=@,
+so |=~| is the one taking the pattern on the right.} (@~@)''.
+\begin{code}
+(~=)   :: (Regex rho) => rho -> String -> Bool
+(~$)   :: (Regex rho) => rho -> String -> [Match]
+(~//)  :: (Regex rho, Subst pi) => rho -> String -> pi
+(~/)   :: (Regex rho, Subst pi) => rho -> String -> pi
+\end{code}
+
+All exported operators are non-associative and bind with priority 4. That
+makes them bind looser than |++| and |:|, similar to |==|.
+\begin{code}
+infix 4 =~, ~=, $~, ~$, ~//, //~, ~/, /~
+\end{code}
+
+All operators are based on the fundamental pattern matching operation
+|match|, which is the single method of class |Regex|:
+\begin{code}
+class Regex rho where
+    match :: rho -> String -> Maybe Match
+\end{code}
+
+For the purpose of substitution, functions of a non-polyvariadic
+type are also provided.
+\begin{code}
+subst   :: (Regex rho) => rho -> [Repl] -> String -> String
+subst1  :: (Regex rho) => rho -> [Repl] -> String -> String
+\end{code}
+|subst| performs a global substitution while |subst1| only replaces the
+first match. Both take the replacement pattern as a list of |Repl|s,
+representing consecutive parts of the replacement pattern. Each |Repl|
+is either a literal replacement string or a submatch reference.
+\begin{code}
+data Repl  =  Repl_lit  String
+           |  Repl_ref  Int
+\end{code}
+
+Finally, the |Match| data type is a record containing
+\begin{enumerate}
+\item the substring preceding the match (|m_before|),
+\item the matching substring itself (|m_match|),
+\item the rest of the string after the match (|m_after|), and
+\item the list of strings matching the regex's subexpressions
+(|m_submatches|).
+\end{enumerate}
+\begin{code}
+data Match  = Match  {  m_before      :: String
+                     ,  m_match       :: String
+                     ,  m_after       :: String
+                     ,  m_submatches  :: [String]
+                     }
+            deriving (Eq, Show, Read)
+\end{code}
+Note that the list of subexpression matches does \emph{not} include the
+match itself, so for example, 
+|m_submatches (head ("Foo" $~ "F(o)"))| is |["o"]|, not |["Fo", "o"]|.
+
+
+% ========================================================================
+\section*{Matching}
+Compiled regular expressions are represented by the abstract data type
+|CRegex|, which wraps |Regex| from |Text.Regex|.
+\begin{code}
+newtype CRegex = CRegex TR.Regex
+\end{code}
+They are created from regular expression strings by the function |cregex|,
+which can take options:
+\begin{code}
+data Rexopt = Nocase | Linematch deriving (Eq,Show,Read)
+\end{code}
+|Nocase| makes the matching case-insensitive. |Linematch| results in
+|'^'| and |'$'| matching start and end of lines instead of the whole
+string, and |'.'| not matching the newline character.
+By default, matches are case-sensitive and  |'^'| and |'$'| refer
+to the whole string.
+
+\begin{code}
+cregex :: [Rexopt] -> String -> CRegex
+cregex os s = CRegex (TR.mkRegexWithOpts s lm cs)
+    where
+    lm  = elem Linematch os
+    cs  = not (elem Nocase os)
+\end{code}
+
+The matching operation is overloaded on the regex type. Matching of
+compiled regexes is performed by a helper |match_cregex|.
+If the regex is passed as a plain string
+it is compiled with default options
+before being passed to |match_cregex|.
+\begin{code}
+instance Regex CRegex where
+    match = match_cregex
+instance Regex String where
+    match = match_cregex . cregex []
+\end{code}
+
+The |match_cregex| function is a wrapper around
+|Text.Regex.matchRegexAll| whose only purpose is to unwrap the |CRegex|
+argument and to wrap the result in a |Match|.
+\begin{code}
+match_cregex :: CRegex -> String -> Maybe Match
+match_cregex (CRegex cr) str =
+    do
+    (b,m,a,s) <- TR.matchRegexAll cr str
+    return $ Match  {  m_before      = b
+                    ,  m_match       = m
+                    ,  m_after       = a
+                    ,  m_submatches  = s
+                    }
+\end{code}
+
+Now, the match testing operators are trivial to define.
+\begin{code}
+(~=) r = isJust . match r
+\end{code}
+
+I define |=~| in terms of |~=| and not the
+other way around, so that applying |(r ~=)| to several
+strings compiles |r| only once (when |r| is a string). The
+same note applies to all other operators as well.
+\begin{code}
+(=~)  = flip (~=)
+($~)  = flip (~$)
+\end{code}
+
+The |~$| operator must find all matches within the given string.
+That can be achieved by consecutively applying |match| to the
+|m_after| field of the previous match, if any. That's an instance
+of |unfoldr|.
+\begin{code}
+match_all :: (Regex rho) => rho -> String -> [Match]
+match_all r = unfoldr step
+    where
+    step :: String -> Maybe (Match, String)
+    step x = do  ma <- match r x
+                 return (ma, m_after ma)
+\end{code}
+This way, however, each match's |m_before| field only extends
+to the end of the previous match. The list returned
+by |match_all| is only meaningful in its original order.
+For ther operators, I expand the matches to span the entire
+string.
+\begin{code}
+(~$) r = expand_matches . match_all r
+\end{code}
+
+Let |m| be a match, as retured by |match_all|. If |m| is the first match in
+the list, it does not need to be
+expanded. It's expansion is the empty string |""|.
+If, on the other hand, |m| has a predecessor |p|, its expansion is
+|m_before p ++ m_match p|. So the list of expansions for all matches
+is given by:
+\begin{code}
+expansions :: [Match] -> [String]
+expansions ms = "" : map (\p -> m_before p ++ m_match p) ms
+\end{code}
+That list contains one extraneous entry at the end, but that can
+be ignored because |expand_matches| is now a simple instance
+of |zipWith|\footnote{\textsc{Applause!}}.
+\begin{code}
+expand_matches :: [Match] -> [Match]
+expand_matches ms = zipWith expand ms (expansions ms)
+    where
+    expand m s = m { m_before = s ++ m_before m }
+\end{code}
+
+
+% ========================================================================
+\section*{Substitution}
+
+\begin{code}
+class Subst pi where
+    subst' :: String -> [Match] -> [Repl] -> pi
+
+instance Subst String where
+    subst' s ms rs = replace ms (reverse rs) s
+
+instance (Subst pi) => Subst (String -> pi) where
+    subst' s ms rs = \x -> subst' s ms (Repl_lit x : rs)
+instance (Subst pi) => Subst (Int -> pi) where
+    subst' s ms rs = \i -> subst' s ms (Repl_ref i : rs)
+
+replace :: [Match] -> [Repl] -> String -> String
+replace []      _   s  =  s
+replace (m:ms)  rs  _  =  (   m_before m
+                          ++  concatMap replstr rs
+                          ++  replace ms rs (m_after m)
+                          )
+    where
+    replstr r = case r of
+        Repl_lit x  ->  x
+        Repl_ref 0  ->  m_match m
+        Repl_ref i  ->  m_submatches m !! (i-1)
+
+
+subst   r = \rs s -> replace (match_all r s) rs s
+subst1  r = \rs s -> replace (take 1 (match_all r s)) rs s
+
+
+(~//) r  = \s -> subst' s (match_all r s) []
+(~/) r   = \s -> subst' s (take 1 (match_all r s)) []
+
+(//~)  = flip (~//)
+(/~)   = flip (~/)
+\end{code}
+
+\end{document}

-- 
haskell-testpack



More information about the Pkg-haskell-commits mailing list