Programming Language Design

Parser Combinators and Lisp


Parser combinators — Higher-order parsers

So far, on this course, we’ve implemented concrete (first-order) parser functions of type StringParseOption(𝑎). Similarly to functions, parsers can also be passed around to other parsers (functions). These “higher-order parsers” are referred to as parser combinators. Parser combinators are not parsers themselves, but instead return a parser.

What is a parser?

So far, we’ve defined a type alias for the very commonly used ParseOption type:

pub type ParseOption(a) = Option(#(String, a))

What about creating a similar alias for a Parser, a function that takes in a String and returns a ParseOption(a):

pub type Parser(a) = fn(String) -> ParseOption(a)

Now that we know what a parser is, let’s try creating one. Consider the simple parser generator tag:StringParser(String). tag takes in a String literal and returns a parser that parses that string literal.

let hello_parser = tag("hello")

hello_parser("hello world") // Some(#(" world", "hello"))

An implementation of tag could utilize the strip_prefix function soon to be added to the Gleam standard library1:

pub fn tag(lit: String) -> Parser(String) {
    fn(s) {
        case strip_prefix(s, lit) {
            Some(rest) -> Some(#(rest, lit))
            None -> None
        }
    }
}

tag could be referred to as a parser generator, how about parser combinators?.

Combining parsers

One of the simplest parser combinators is the and:Parser(𝑎)Parser(𝑏)Parser((𝑎,𝑏)) parser. It takes in two parsers, and applies them in sequence returning their outputs in a tuple if both of them succeed.

let paren_parser = and(tag("("), tag(")"))
paren_parser("()abc") // Some(#("abc", #("(", ")")))
paren_parser("(abc") // None
paren_parser(")abc") // None

The and parser can be implemented in Gleam using the use syntax:

pub fn and(p1: Parser(a), p2: Parser(b)) -> Parser(#(a, b)) {
  fn(s) {
    use #(rest, a) <- option.then(p1(s))
    use #(rest, b) <- option.then(p2(rest))
    Some(#(rest, #(a, b)))
  }
}

As one might expect, there’s also an or parser, namely or:Parser(𝑎)Parser(𝑎)Parser(𝑎). The or parser first tries to apply the first parser and if that fail, it tries to apply the second parser.

Notice that the results of both of the parsers passed to or must be of the same type.

let a_or_b = or(tag("a"), tag("b"))
a_or_b("a") // Some(#("", "a"))
a_or_b("b") // Some(#("", "b"))
a_or_b("c") // None

An implementation of the or parser in Gleam could look something like this:

pub fn or(p1: Parser(a), p2: Parser(a)) -> Parser(a) {
  fn(s) {
    case p1(s) {
      Some(#(rest, val)) -> Some(#(rest, val))
      None -> p2(s)
    }
  }
}

Lisp — a programming language for lists

Lisp is one of the most influential programming languages. It is a functional programming language developed in the late ’50s at MIT.

A Lisp program consists of s-expressions, parenthesized lists. A hello world program in the common lisp dialect looks like this:

(print "Hello, World!")

The above program could be represented with the following AST:

Expr([Identifier("print"),String("Hello, World!")])

A Lisp expression is simply a list of expressions wrapped in parenthesis.

Due to its recognizable and simple syntax, Lisp is notoriously simple to parse, at least compared to other languages. In the exercise of this chapter you’ll be developing your very own Lisp parser.

Parsing lists — Applying the same parser multiple times

Parsing Lisp (and other languages) involves parsing a similar pattern multiple times. This is where parser combinators such as many:Parser(𝑎)Parser([𝑎]) and even more involved combinators like separated:Parser(𝑎)Parser(𝑏)Parser([𝑎]) come in handy.

The many parser applies the parser passed to as many times as possible and returns a list of the results.

let many_a = many(tag("a"))
many_a("aaaab") // Some(#("b", ["a", "a", "a", "a"]))
many_a("b") // Some(#("b", []))

Similarly, the separated parser alternates between applying the two parsers passed to it.

let csv = separated(tag("v"), tag(","))
csv("v,v,va") // Some(#("a"), ["v", "v", "v"])
csv("v,v,") // Some(#(","), ["v", "v"])

Both many and separated always succeed on any input, but there are also stricter versions, usually called many1 and separated1 respectively, that never return an empty list of elements.

There are probably hundreds of parser combinators out there, most of which are not very useful on this course. Many languages, Gleam included, have their own implementations of parser combinator libraries.

LanguageParser combinator librariesHaskellParsec, …Gleamparty, chomp, nibble, atto, pickle, …RustNom, combine, pom, …ScalaParsley, …
Loading Exercise...

Footnotes

  1. https://github.com/gleam-lang/stdlib/pull/780