Parser Combinators and Lisp
Parser combinators — Higher-order parsers
So far, on this course, we’ve implemented concrete (first-order) parser functions of type . Similarly to functions, parsers can also be passed around to other parsers (functions). These “higher-order parsers” are referred to as parser combinators. Parser combinators are not parsers themselves, but instead return a parser.
What is a parser?
So far, we’ve defined a type alias for the very commonly used ParseOption
type:
pub type ParseOption(a) = Option(#(String, a))
What about creating a similar alias for a Parser
, a function that takes in a String
and returns a ParseOption(a)
:
pub type Parser(a) = fn(String) -> ParseOption(a)
Now that we know what a parser is, let’s try creating one.
Consider the simple parser generator .
tag
takes in a String
literal and returns a parser that parses that string literal.
let hello_parser = tag("hello")
hello_parser("hello world") // Some(#(" world", "hello"))
An implementation of tag
could utilize the strip_prefix
function soon to be added to the Gleam standard library1:
pub fn tag(lit: String) -> Parser(String) {
fn(s) {
case strip_prefix(s, lit) {
Some(rest) -> Some(#(rest, lit))
None -> None
}
}
}
tag
could be referred to as a parser generator, how about parser combinators?.
Combining parsers
One of the simplest parser combinators is the parser. It takes in two parsers, and applies them in sequence returning their outputs in a tuple if both of them succeed.
let paren_parser = and(tag("("), tag(")"))
paren_parser("()abc") // Some(#("abc", #("(", ")")))
paren_parser("(abc") // None
paren_parser(")abc") // None
The and
parser can be implemented in Gleam using the use
syntax:
pub fn and(p1: Parser(a), p2: Parser(b)) -> Parser(#(a, b)) {
fn(s) {
use #(rest, a) <- option.then(p1(s))
use #(rest, b) <- option.then(p2(rest))
Some(#(rest, #(a, b)))
}
}
As one might expect, there’s also an or
parser, namely .
The or
parser first tries to apply the first parser and if that fail, it tries to apply the second parser.
Notice that the results of both of the parsers passed to or must be of the same type.
let a_or_b = or(tag("a"), tag("b"))
a_or_b("a") // Some(#("", "a"))
a_or_b("b") // Some(#("", "b"))
a_or_b("c") // None
An implementation of the or
parser in Gleam could look something like this:
pub fn or(p1: Parser(a), p2: Parser(a)) -> Parser(a) {
fn(s) {
case p1(s) {
Some(#(rest, val)) -> Some(#(rest, val))
None -> p2(s)
}
}
}
Lisp — a programming language for lists
Lisp is one of the most influential programming languages. It is a functional programming language developed in the late ’50s at MIT.
A Lisp program consists of s-expressions, parenthesized lists. A hello world program in the common lisp dialect looks like this:
(print "Hello, World!")
The above program could be represented with the following AST:
A Lisp expression is simply a list of expressions wrapped in parenthesis.
Due to its recognizable and simple syntax, Lisp is notoriously simple to parse, at least compared to other languages. In the exercise of this chapter you’ll be developing your very own Lisp parser.
Parsing lists — Applying the same parser multiple times
Parsing Lisp (and other languages) involves parsing a similar pattern multiple times. This is where parser combinators such as and even more involved combinators like come in handy.
The many
parser applies the parser passed to as many times as possible and returns a list of the results.
let many_a = many(tag("a"))
many_a("aaaab") // Some(#("b", ["a", "a", "a", "a"]))
many_a("b") // Some(#("b", []))
Similarly, the separated
parser alternates between applying the two parsers passed to it.
let csv = separated(tag("v"), tag(","))
csv("v,v,va") // Some(#("a"), ["v", "v", "v"])
csv("v,v,") // Some(#(","), ["v", "v"])
Both many
and separated
always succeed on any input, but there are also stricter versions, usually called many1
and separated1
respectively,
that never return an empty list of elements.
There are probably hundreds of parser combinators out there, most of which are not very useful on this course. Many languages, Gleam included, have their own implementations of parser combinator libraries.