Josh Stone, Blog

Josh’s projects and security nerdery

Parsing With Parsec

I’ve been dabbling with Haskell again, one of my more favorite languages. Perhaps someday it will eclipse even Common Lisp. One of the more interesting ideas I have run across lately is a comment made in the recent O’Reilly book, Real World Haskell, about parsing:

In many popular languages, people tend to put regular expressions to work for “casual” parsing. They’re notoriously tricky for this purpose … If we can write compact Parsec parsers, we’ll gain in readability, expressiveness, and error reporting.

This idea intrigued me, because in some of my projects I’ve encountered the difficulties that come with using regular expressions to simulate parsing. Exceptions are hard to trap, code is always tightly bound, and it’s very difficult to maintain. I had skipped the parsing chapter in the book for some time, but I finally decided to grit through. Parsing in Haskell had looked scary before, but it doesn’t seem quite so impenetrable to me now.

Not content with just going through the examples in the chapter, I decided to try a few of my own. I can really see how, once familiar with the Parsec library, I would actually write simple parsers for all kinds of text processing. While previously I had thought Haskell lacked in some of the basic text processing facilities (such as having no useful “split” function), I realize now that one could easily use Parsec instead.

For example (and possibly only for my own recollection and posterity), here is my attempt at an S-Expression parser (the foundational syntax for the Lisp languages). It does not parse a particular language, just the expressions themselves. It is also not a great Parsec example, since I have not fully grasped the “applicative functor” style, which would probably “tersen” things up even more.

data SNode = List | Value String
           deriving (Show, Eq)

retNode v cs = return (Node v cs)

sexp :: GenParser Char st (Tree SNode)
sexp     = list <|> literal

literal   = Value `liftM` parseLit >>= flip retNode []
    where parseLit = many (noneOf " ()\n\t")

list = inParens contents >>= retNode List
    where inParens = between (char '(') (char ')')
          contents = sepBy sexp (space >> spaces)

parseSEXP = parse sexp "S-EXP"

Hopefully it is easy to see how each component of the “S-Expression Grammar” is broken out as the behavior of a function (or two) in the parser. The result is an actual parse tree, using the Data.Tree module. Here is a sample S-Expression for parsing (the factorial function in Common Lisp):

(defun fact (n)
  (if (zerop n)
    (* n (fact (1- n)))))

Running this through an interactive Haskell session like so, produces the parse tree shown in the image above:

$ ghci
GHCi, version 6.10.2:  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Prelude> :l Sexp.hs
[1 of 1] Compiling Sexp             ( Sexp.hs, interpreted )
Ok, modules loaded: Sexp.
*Sexp> parseSEXP sample

I am looking forward to integrating my new parser skills with coding projects in the future!