Tokens into Expressions and Statements with a Parser
Learning Objectives
- You know how to define expressions and statements for a parser.
- You know how to create a parser that parses tokens into expressions and statements.
To parse the list of tokens into statements and expressions, we need to start defining the statements and expressions that our language supports. So far, we’ve been looking only at the following program.
10 LET A = 5
20 PRINT A, "Hello, BASIC!"
The above program has two statements, a LET
statement and a PRINT
statement. The LET
statement assigns the value 5
to the variable A
, while the PRINT
statement prints the value of the variable A
and the string "Hello, BASIC!"
. The LET
statement has an expression 5
, while the PRINT
statement has two expressions, A
and "Hello, BASIC!"
. All of the expressions are evaluated to produce a value.
Note that even values such as 5
above in source code are considered as expressions, as they can be evaluated to produce a value. This makes the distinction between statements and expressions clear; statements perform actions, while expressions are evaluated to produce values. Moreover, it makes parsing easier, as we do not have to make judgements about what is an expression and what is not.
Expressions
Let’s start by creating a file called expressions.dart
and place a class called Expression
in it. The Expression
class will be the base class for all expressions in our language, which will be extended by specific expression classes. The sealed modifier restricts the class hierarchy so that all subclasses of the class must be declared in the same file, while the <T>
is a generic type that will be used to specify the return type of the evaluate
method.
sealed class Expression<T> {
T evaluate(Map<String, num> variables);
}
The evaluate
method of an expression takes a map of variables and evaluates the expression. As an expression may produce a numeric value or a string, the return type of the is defined by the class that extends the Expression
class.
Next, we create a NumberLiteralExpression
class that represents a number literal, extending the Expression class and providing the generic type parameter as num
. The NumberLiteralExpression
class will have a field called value
that will store the value of the number literal. We also override the evaluate
method to return the value of the number literal.
sealed class Expression<T> {
T evaluate(Map<String, num> variables);
}
class NumberLiteralExpression extends Expression<num> {
final num value;
NumberLiteralExpression(this.value);
@override
num evaluate(Map<String, num> variables) {
return value;
}
}
A class for representing string literals — StringLiteralExpression
— is created the same way.
sealed class Expression<T> {
T evaluate(Map<String, num> variables);
}
class NumberLiteralExpression extends Expression<num> {
// ...
}
class StringLiteralExpression extends Expression<String> {
final String value;
StringLiteralExpression(this.value);
@override
String evaluate(Map<String, num> variables) {
return value;
}
}
In addition, we create a class called IdentifierExpression
that represents an identifier (i.e., a variable) in the code. The IdentifierExpression
class will have a field called identifier
that will store the identifier. We also override the evaluate
method to return the value of the identifier from the map of variables.
As BASIC only supports numeric variables, we will return a num
type from the evaluate
method.
sealed class Expression<T> {
T evaluate(Map<String, num> variables);
}
class NumberLiteralExpression extends Expression<num> {
// ...
}
class StringLiteralExpression extends Expression<String> {
// ...
}
class IdentifierExpression extends Expression<num> {
final String identifier;
IdentifierExpression(this.identifier);
@override
num evaluate(Map<String, num> variables) {
return variables[identifier]!;
}
}
Statements
Let’s next create a file called statements.dart
and create a base class Statement
in it. This class will be the base class for all statements in our language, which will be extended by the specific statement classes. The Statement
class will have a method called execute
that will take a map of variables and execute the statement.
Although statements do not produce a value, we add the possibility for the statements to produce a value. This is useful, e.g., for the PRINT
statement, which will produce a string of the values of the expressions separated by a tab.
import "expressions.dart";
sealed class Statement<T> {
T execute(Map<String, num> variables);
}
Next, we create a LetStatement
that will represent the LET
statement in our language. The LetStatement
takes a name of an identifier and an expression. Executing the statement will assign the evaluation result of the expression to the identifier in the map of variables.
import "expressions.dart";
sealed class Statement<T> {
T execute(Map<String, num> variables);
}
class LetStatement extends Statement<void> {
final String identifier;
final Expression expression;
LetStatement(this.identifier, this.expression);
@override
void execute(Map<String, num> variables) {
variables[identifier] = expression.evaluate(variables);
}
}
Then, we create a PrintStatement
that will represent the PRINT
statement in our language. The PrintStatement
takes a list of expressions as an argument. Executing the statement will produce a string of evaluation results, where the results are joined by a tab.
import "expressions.dart";
sealed class Statement<T> {
T execute(Map<String, num> variables);
}
class LetStatement extends Statement<void> {
// ...
}
class PrintStatement extends Statement<String> {
final List<Expression> arguments;
PrintStatement(this.arguments);
@override
String execute(Map<String, num> variables) {
return arguments.map((argument) => argument.evaluate(variables)).join("\t");
}
}
Parser
Finally, we can create the parser that will take a list of tokens and produce a list of statements. The parser will have a method called parse
that will parse the tokens into statements, returning a map of line numbers to statements. The parser will also have a field called position
that will keep track of the current position in the list of tokens.
Create a file called parser.dart
and add the following code to the file as a starting point. The parse
method will be implemented later.
import "expressions.dart";
import "statements.dart";
import "tokens.dart";
class Parser {
final List<Token> tokens;
int position = 0;
Parser(this.tokens);
Map<int, Statement> parse() {
throw UnimplementedError();
}
}
Let’s again add tests to verify that the parser works as expected. Create a test file called parser_test.dart
to the test
folder of the project and add the following starting point to the file.
import 'package:basic_interpreter/expressions.dart';
import 'package:basic_interpreter/parser.dart';
import 'package:basic_interpreter/statements.dart';
import 'package:basic_interpreter/tokens.dart';
import 'package:test/test.dart';
void main() {
}
Parsing a LET statement
A line with a let statement consists of five tokens: a numeric identifier that indicates the line, the LET
keyword, an identifier, an equals sign, and an expression. Let’s start by adding a test for parsing a LET
statement.
The test checks that the parser can parse a list of tokens with a LET
statement so that the line number is correct and that the statement is of the correct type.
test("Parse LET statement", () {
final tokens = [
Token(Category.numberLiteral, "10"),
Token(Category.let, "LET"),
Token(Category.identifier, "A"),
Token(Category.equals, "="),
Token(Category.numberLiteral, "5"),
];
final parser = Parser(tokens);
final program = parser.parse();
expect(program[10], isA<LetStatement>());
});
When we run the test, this time using dart test test/parser_test.dart
, we see that the test fails as expected. This is because we have not yet implemented the parse
method in the parser.
00:00 +0 -1: Some tests failed.
To parse a let statement, let’s modify the parse
method in the parser to parse a statement. The method should first parse the line number, then look up the keyword, and then based on the keyword parse the statement. Then, finally, the statement should be added to the map, and the position should be advanced.
As a first step, we can use something like the following.
Map<int, Statement> parse() {
Map<int, Statement> program = {};
while (position < tokens.length) {
final lineNumber = int.parse(tokens[position].value);
position++;
final keywordToken = tokens[position];
final statement = switch(keywordToken.category) {
Category.let => parseLetStatement(),
_ => throw Exception("Unexpected token: ${tokens[position]}"),
};
position++;
program[lineNumber] = statement;
}
return program;
}
Now, we need to implement the function parseLetStatement
for parsing the LET
statement. The function should advance the position, parse the identifier, advance the position, handle the equals sign, advance the position, parse the expression, and finally return a LetStatement
with the identifier and the expression. A hacky implementation could look like the following.
Statement parseLetStatement() {
position++;
final identifier = tokens[position].value;
position += 2;
final value = tokens[position].value;
final expression = NumberLiteralExpression(num.parse(value));
return LetStatement(identifier, expression);
}
Now, when we run the test, we see that the test passes.
00:00 +1: All tests passed!
Let’s stop assuming and start expecting
The parser at it’s current state is suspectible to errors. We are assuming that the tokens are in the correct order and that the tokens are correct. Rather than assume that things are in order, we should expect — and verify — that they are.
Expecting token order
Let’s add a test that verifies that the parser throws an error if the tokens are not in the correct order.
test("Throw error if tokens are not in the correct order", () {
final tokens = [
Token(Category.numberLiteral, "10"),
Token(Category.let, "LET"),
Token(Category.equals, "="),
Token(Category.identifier, "A"),
Token(Category.numberLiteral, "5"),
];
final parser = Parser(tokens);
expect(() => parser.parse(), throwsException);
});
When we run the test, we see that the test fails.
00:00 +1 -1: Some tests failed.
To fix the test, let’s add a check to the parser to throw an exception if the tokens are not in an expected order. We can do this by introducing a method expectToken
that advances the position and checks that the token is of the expected category. If not, the method throws an exception.
void expectToken(Category category) {
position++;
if (tokens[position].category != category) {
throw Exception(
"Expected token category $category, found ${tokens[position].category}");
}
}
With the method in place, we can modify the parseLetStatement
method to use the expectToken
method to check that the tokens are in an expected order.
Statement parseLetStatement() {
expectToken(Category.identifier);
final identifier = tokens[position].value;
expectToken(Category.equals);
expectToken(Category.numberLiteral);
final value = tokens[position].value;
final expression = NumberLiteralExpression(num.parse(value));
return LetStatement(identifier, expression);
}
Now, when we run the tests, we see that the tests pass.
00:00 +2: All tests passed!
Expecting a numeric token at beginning
We can further modify the program so that it checks that the first token is a numeric token. To do this, we start the iteration from position -1
, and add an expectToken
call to the beginning of the loop.
// ...
class Parser {
final List<Token> tokens;
int position = -1;
Parser(this.tokens);
Map<int, Statement> parse() {
// ...
while (position < tokens.length) {
expectToken(Category.numberLiteral);
final lineNumber = int.parse(tokens[position].value);
// ...
}
return program;
}
Statement parseLetStatement() {
// ...
}
void expectToken(Category category) {
// ...
}
}
The tests still pass.
00:00 +2: All tests passed!
Failing gracefully at out of bounds
An additional thing that we do not pay attention to is whether the position is out of bounds. Let’s add a test that verifies that the parser throws an error if the position is out of bounds.
test("Throw error if position is out of bounds", () {
final tokens = [
Token(Category.numberLiteral, "10"),
Token(Category.let, "LET"),
Token(Category.identifier, "A"),
Token(Category.equals, "="),
];
final parser = Parser(tokens);
expect(() => parser.parse(), throwsException);
});
The test already passes, as the parser throws an exception when the position is out of bounds. However, we can make the error message more informative by adding a check to the expectToken
method.
void expectToken(Category category) {
position++;
if (position >= tokens.length) {
throw Exception("Unexpected end when looking for next token");
}
if (tokens[position].category != category) {
throw Exception(
"Expected token category $category, found ${tokens[position].category}");
}
}
Now, if we would parse a statement that is missing a token, the parser throws an exception with a more informative error message.
It would be also possible to introduce custom error classes to the parser to make it easier to catch and handle different types of errors. For example, we could introduce a ParserError
class that extends Error
and then create subclasses of the ParserError
class for different types of errors like UnexpectedTokenError
.
Then, in the tests, we could catch the error, verifying also the type of the error. The function throwsA would be useful here.
Multiple LET statements
Let’s add another test for parsing multiple LET statements. The test verifies that the parser can parse a list of tokens consisting of two lines both with LET statements.
test("Two LET statements", () {
final tokens = [
Token(Category.numberLiteral, "10"),
Token(Category.let, "LET"),
Token(Category.identifier, "A"),
Token(Category.equals, "="),
Token(Category.numberLiteral, "5"),
Token(Category.endOfLine, "\n"),
Token(Category.numberLiteral, "20"),
Token(Category.let, "LET"),
Token(Category.identifier, "B"),
Token(Category.equals, "="),
Token(Category.numberLiteral, "10"),
];
final parser = Parser(tokens);
final program = parser.parse();
expect(program[10], isA<LetStatement>());
expect(program[20], isA<LetStatement>());
});
When we run the test, we see that the test passes.
00:00 +4: All tests passed!
Time to move on.
Parsing a PRINT statement
To parse a print statement, we would add a test that verifies that the parser can parse a list of tokens with a PRINT
statement so that the line number is correct and that the statement is of the correct type.
test("Parse PRINT statement", () {
final tokens = [
Token(Category.numberLiteral, "10"),
Token(Category.print, "PRINT"),
Token(Category.identifier, "A"),
Token(Category.comma, ","),
Token(Category.identifier, "A"),
];
final parser = Parser(tokens);
final program = parser.parse();
expect(program[10], isA<PrintStatement>());
});
When we run the test, we see that the test fails.
00:00 +4 -1: Some tests failed.
To parse a print statement, we first modify the parse
method to include the print
token category in the switch statement used to determine what to do based on the token. In the case of the print
, we call a parsePrintStatement
method.
import "expressions.dart";
import "statements.dart";
import "tokens.dart";
class Parser {
final List<Token> tokens;
int position = -1;
Parser(this.tokens);
Map<int, Statement> parse() {
Map<int, Statement> program = {};
while (position < tokens.length) {
// ...
final keywordToken = tokens[position];
final statement = switch (keywordToken.category) {
Category.let => parseLetStatement(),
Category.print => parsePrintStatement(),
_ => throw Exception("Unexpected token: ${tokens[position]}"),
};
// ...
}
return program;
}
Statement parseLetStatement() {
// ...
}
Statement parsePrintStatement() {
throw UnimplementedError();
}
void expectToken(Category category) {
// ...
}
}
The parsePrintStatement
method should parse the print statement on the current line. As print statements have alternating expressions and commas, where the amount of expressions is not limited, we need to parse expressions until an end of line token is found. It is also possible that an end of line token does not exist, in which case the parser should stop at the end of the tokens.
To find the location where parsing should stop, we can look for an end of line token starting from the current position. If an end of line token exists, we can use the index of the token as the end position. If an end of line token does not exist, we can use the length of the tokens as the end position.
Statement parsePrintStatement() {
position++;
int endOfLineIndex =
tokens.indexOf(Token(Category.endOfLine, '\n'), position);
int endPosition = endOfLineIndex == -1 ? tokens.length : endOfLineIndex;
// ...
}
Then, we can loop over the tokens, starting from the current position and ending at the end position, skipping every second token (i.e., comma). When looping over the tokens, we parse them into expressions, and add them into a list of arguments. At the end, we update the position to match the end position (decremented by one due to position++ at the end of the marse method), and return a PrintStatement
with the list of arguments.
As a whole, this looks as follows.
Statement parsePrintStatement() {
position++;
int endOfLineIndex =
tokens.indexOf(Token(Category.endOfLine, '\n'), position);
int endPosition = endOfLineIndex == -1 ? tokens.length : endOfLineIndex;
final arguments = <Expression>[];
for (int i = position; i < endPosition; i += 2) {
final token = tokens[i];
Expression expression = switch (token.category) {
Category.numberLiteral =>
NumberLiteralExpression(num.parse(token.value)),
Category.stringLiteral => StringLiteralExpression(token.value),
Category.identifier => IdentifierExpression(token.value),
_ => throw Exception("Invalid token in PRINT arguments: $token"),
};
arguments.add(expression);
}
position = endPosition - 1;
return PrintStatement(arguments);
}
Now, when we run the tests, we see that the tests pass.
00:00 +5: All tests passed!
Multiple PRINT statements
Let’s finally add a test for parsing multiple PRINT statements. The test verifies that the parser can parse a list of tokens consisting of two lines both with PRINT statements.
test("Two PRINT statements", () {
final tokens = [
Token(Category.numberLiteral, "10"),
Token(Category.print, "PRINT"),
Token(Category.identifier, "A"),
Token(Category.endOfLine, "\n"),
Token(Category.numberLiteral, "20"),
Token(Category.print, "PRINT"),
Token(Category.stringLiteral, "Hello, BASIC!"),
];
final parser = Parser(tokens);
final program = parser.parse();
expect(program[10], isA<PrintStatement>());
expect(program[20], isA<PrintStatement>());
});
When we run the tests, we see that they pass.
00:00 +6: All tests passed!
At this point, we have a parser that can parse LET
and PRINT
statements. We can now move on to revising our interpreter so that it uses the lexer and the parser.
The parser has still plenty of room for improvement. As an example, the parser does not e.g. verify the existence of commas between the expressions in the