Back to BASICs

Tokens into Expressions and Statements with a Parser


Learning Objectives

  • You know how to define expressions and statements for a parser.
  • You know how to create a parser that parses tokens into expressions and statements.

To parse the list of tokens into statements and expressions, we need to start defining the statements and expressions that our language supports. So far, we’ve been looking only at the following program.

10 LET A = 5
20 PRINT A, "Hello, BASIC!"

The above program has two statements, a LET statement and a PRINT statement. The LET statement assigns the value 5 to the variable A, while the PRINT statement prints the value of the variable A and the string "Hello, BASIC!". The LET statement has an expression 5, while the PRINT statement has two expressions, A and "Hello, BASIC!". All of the expressions are evaluated to produce a value.

Note that even values such as 5 above in source code are considered as expressions, as they can be evaluated to produce a value. This makes the distinction between statements and expressions clear; statements perform actions, while expressions are evaluated to produce values. Moreover, it makes parsing easier, as we do not have to make judgements about what is an expression and what is not.

Expressions

Let’s start by creating a file called expressions.dart and place a class called Expression in it. The Expression class will be the base class for all expressions in our language, which will be extended by specific expression classes. The sealed modifier restricts the class hierarchy so that all subclasses of the class must be declared in the same file, while the <T> is a generic type that will be used to specify the return type of the evaluate method.

sealed class Expression<T> {
  T evaluate(Map<String, num> variables);
}

The evaluate method of an expression takes a map of variables and evaluates the expression. As an expression may produce a numeric value or a string, the return type of the is defined by the class that extends the Expression class.

Next, we create a NumberLiteralExpression class that represents a number literal, extending the Expression class and providing the generic type parameter as num. The NumberLiteralExpression class will have a field called value that will store the value of the number literal. We also override the evaluate method to return the value of the number literal.

sealed class Expression<T> {
  T evaluate(Map<String, num> variables);
}

class NumberLiteralExpression extends Expression<num> {
  final num value;

  NumberLiteralExpression(this.value);

  @override
  num evaluate(Map<String, num> variables) {
    return value;
  }
}

A class for representing string literals — StringLiteralExpression — is created the same way.

sealed class Expression<T> {
  T evaluate(Map<String, num> variables);
}

class NumberLiteralExpression extends Expression<num> {
  // ...
}

class StringLiteralExpression extends Expression<String> {
  final String value;

  StringLiteralExpression(this.value);

  @override
  String evaluate(Map<String, num> variables) {
    return value;
  }
}

In addition, we create a class called IdentifierExpression that represents an identifier (i.e., a variable) in the code. The IdentifierExpression class will have a field called identifier that will store the identifier. We also override the evaluate method to return the value of the identifier from the map of variables.

As BASIC only supports numeric variables, we will return a num type from the evaluate method.

sealed class Expression<T> {
  T evaluate(Map<String, num> variables);
}

class NumberLiteralExpression extends Expression<num> {
  // ...
}

class StringLiteralExpression extends Expression<String> {
  // ...
}

class IdentifierExpression extends Expression<num> {
  final String identifier;

  IdentifierExpression(this.identifier);

  @override
  num evaluate(Map<String, num> variables) {
    return variables[identifier]!;
  }
}

Statements

Let’s next create a file called statements.dart and create a base class Statement in it. This class will be the base class for all statements in our language, which will be extended by the specific statement classes. The Statement class will have a method called execute that will take a map of variables and execute the statement.

Although statements do not produce a value, we add the possibility for the statements to produce a value. This is useful, e.g., for the PRINT statement, which will produce a string of the values of the expressions separated by a tab.

import "expressions.dart";

sealed class Statement<T> {
  T execute(Map<String, num> variables);
}

Next, we create a LetStatement that will represent the LET statement in our language. The LetStatement takes a name of an identifier and an expression. Executing the statement will assign the evaluation result of the expression to the identifier in the map of variables.

import "expressions.dart";

sealed class Statement<T> {
  T execute(Map<String, num> variables);
}

class LetStatement extends Statement<void> {
  final String identifier;
  final Expression expression;

  LetStatement(this.identifier, this.expression);

  @override
  void execute(Map<String, num> variables) {
    variables[identifier] = expression.evaluate(variables);
  }
}

Then, we create a PrintStatement that will represent the PRINT statement in our language. The PrintStatement takes a list of expressions as an argument. Executing the statement will produce a string of evaluation results, where the results are joined by a tab.

import "expressions.dart";

sealed class Statement<T> {
  T execute(Map<String, num> variables);
}

class LetStatement extends Statement<void> {
  // ...
}

class PrintStatement extends Statement<String> {
  final List<Expression> arguments;

  PrintStatement(this.arguments);

  @override
  String execute(Map<String, num> variables) {
    return arguments.map((argument) => argument.evaluate(variables)).join("\t");
  }
}

Parser

Finally, we can create the parser that will take a list of tokens and produce a list of statements. The parser will have a method called parse that will parse the tokens into statements, returning a map of line numbers to statements. The parser will also have a field called position that will keep track of the current position in the list of tokens.

Create a file called parser.dart and add the following code to the file as a starting point. The parse method will be implemented later.

import "expressions.dart";
import "statements.dart";
import "tokens.dart";

class Parser {
  final List<Token> tokens;
  int position = 0;

  Parser(this.tokens);

  Map<int, Statement> parse() {
    throw UnimplementedError();
  }
}

Let’s again add tests to verify that the parser works as expected. Create a test file called parser_test.dart to the test folder of the project and add the following starting point to the file.

import 'package:basic_interpreter/expressions.dart';
import 'package:basic_interpreter/parser.dart';
import 'package:basic_interpreter/statements.dart';
import 'package:basic_interpreter/tokens.dart';
import 'package:test/test.dart';

void main() {

}

Parsing a LET statement

A line with a let statement consists of five tokens: a numeric identifier that indicates the line, the LET keyword, an identifier, an equals sign, and an expression. Let’s start by adding a test for parsing a LET statement.

The test checks that the parser can parse a list of tokens with a LET statement so that the line number is correct and that the statement is of the correct type.

  test("Parse LET statement", () {
    final tokens = [
      Token(Category.numberLiteral, "10"),
      Token(Category.let, "LET"),
      Token(Category.identifier, "A"),
      Token(Category.equals, "="),
      Token(Category.numberLiteral, "5"),
    ];

    final parser = Parser(tokens);
    final program = parser.parse();

    expect(program[10], isA<LetStatement>());
  });

When we run the test, this time using dart test test/parser_test.dart, we see that the test fails as expected. This is because we have not yet implemented the parse method in the parser.

00:00 +0 -1: Some tests failed.

To parse a let statement, let’s modify the parse method in the parser to parse a statement. The method should first parse the line number, then look up the keyword, and then based on the keyword parse the statement. Then, finally, the statement should be added to the map, and the position should be advanced.

As a first step, we can use something like the following.

Map<int, Statement> parse() {
  Map<int, Statement> program = {};

  while (position < tokens.length) {
    final lineNumber = int.parse(tokens[position].value);

    position++;

    final keywordToken = tokens[position];

    final statement = switch(keywordToken.category) {
      Category.let => parseLetStatement(),
      _ => throw Exception("Unexpected token: ${tokens[position]}"),
    };

    position++;
    program[lineNumber] = statement;
  }

  return program;
}

Now, we need to implement the function parseLetStatement for parsing the LET statement. The function should advance the position, parse the identifier, advance the position, handle the equals sign, advance the position, parse the expression, and finally return a LetStatement with the identifier and the expression. A hacky implementation could look like the following.

Statement parseLetStatement() {
  position++;
  final identifier = tokens[position].value;
  position += 2;
  final value = tokens[position].value;
  final expression = NumberLiteralExpression(num.parse(value));
  return LetStatement(identifier, expression);
}

Now, when we run the test, we see that the test passes.

00:00 +1: All tests passed!

Let’s stop assuming and start expecting

The parser at it’s current state is suspectible to errors. We are assuming that the tokens are in the correct order and that the tokens are correct. Rather than assume that things are in order, we should expect — and verify — that they are.

Expecting token order

Let’s add a test that verifies that the parser throws an error if the tokens are not in the correct order.

  test("Throw error if tokens are not in the correct order", () {
    final tokens = [
      Token(Category.numberLiteral, "10"),
      Token(Category.let, "LET"),
      Token(Category.equals, "="),
      Token(Category.identifier, "A"),
      Token(Category.numberLiteral, "5"),
    ];

    final parser = Parser(tokens);

    expect(() => parser.parse(), throwsException);
  });

When we run the test, we see that the test fails.

00:00 +1 -1: Some tests failed.

To fix the test, let’s add a check to the parser to throw an exception if the tokens are not in an expected order. We can do this by introducing a method expectToken that advances the position and checks that the token is of the expected category. If not, the method throws an exception.

void expectToken(Category category) {
  position++;
  if (tokens[position].category != category) {
    throw Exception(
        "Expected token category $category, found ${tokens[position].category}");
  }
}

With the method in place, we can modify the parseLetStatement method to use the expectToken method to check that the tokens are in an expected order.

Statement parseLetStatement() {
  expectToken(Category.identifier);
  final identifier = tokens[position].value;
  expectToken(Category.equals);
  expectToken(Category.numberLiteral);
  final value = tokens[position].value;
  final expression = NumberLiteralExpression(num.parse(value));
  return LetStatement(identifier, expression);
}

Now, when we run the tests, we see that the tests pass.

00:00 +2: All tests passed!

Expecting a numeric token at beginning

We can further modify the program so that it checks that the first token is a numeric token. To do this, we start the iteration from position -1, and add an expectToken call to the beginning of the loop.

// ...

class Parser {
  final List<Token> tokens;
  int position = -1;

  Parser(this.tokens);

  Map<int, Statement> parse() {
    // ...

    while (position < tokens.length) {
      expectToken(Category.numberLiteral);
      final lineNumber = int.parse(tokens[position].value);

      // ...
    }

    return program;
  }

  Statement parseLetStatement() {
    // ...
  }

  void expectToken(Category category) {
    // ...
  }
}

The tests still pass.

00:00 +2: All tests passed!

Failing gracefully at out of bounds

An additional thing that we do not pay attention to is whether the position is out of bounds. Let’s add a test that verifies that the parser throws an error if the position is out of bounds.

  test("Throw error if position is out of bounds", () {
    final tokens = [
      Token(Category.numberLiteral, "10"),
      Token(Category.let, "LET"),
      Token(Category.identifier, "A"),
      Token(Category.equals, "="),
    ];

    final parser = Parser(tokens);

    expect(() => parser.parse(), throwsException);
  });

The test already passes, as the parser throws an exception when the position is out of bounds. However, we can make the error message more informative by adding a check to the expectToken method.

void expectToken(Category category) {
  position++;
  if (position >= tokens.length) {
    throw Exception("Unexpected end when looking for next token");
  }

  if (tokens[position].category != category) {
    throw Exception(
        "Expected token category $category, found ${tokens[position].category}");
  }
}

Now, if we would parse a statement that is missing a token, the parser throws an exception with a more informative error message.

Custom error classes

It would be also possible to introduce custom error classes to the parser to make it easier to catch and handle different types of errors. For example, we could introduce a ParserError class that extends Error and then create subclasses of the ParserError class for different types of errors like UnexpectedTokenError.

Then, in the tests, we could catch the error, verifying also the type of the error. The function throwsA would be useful here.

Multiple LET statements

Let’s add another test for parsing multiple LET statements. The test verifies that the parser can parse a list of tokens consisting of two lines both with LET statements.

  test("Two LET statements", () {
    final tokens = [
      Token(Category.numberLiteral, "10"),
      Token(Category.let, "LET"),
      Token(Category.identifier, "A"),
      Token(Category.equals, "="),
      Token(Category.numberLiteral, "5"),
      Token(Category.endOfLine, "\n"),
      Token(Category.numberLiteral, "20"),
      Token(Category.let, "LET"),
      Token(Category.identifier, "B"),
      Token(Category.equals, "="),
      Token(Category.numberLiteral, "10"),
    ];

    final parser = Parser(tokens);
    final program = parser.parse();

    expect(program[10], isA<LetStatement>());
    expect(program[20], isA<LetStatement>());
  });

When we run the test, we see that the test passes.

00:00 +4: All tests passed!

Time to move on.

Parsing a PRINT statement

To parse a print statement, we would add a test that verifies that the parser can parse a list of tokens with a PRINT statement so that the line number is correct and that the statement is of the correct type.

  test("Parse PRINT statement", () {
    final tokens = [
      Token(Category.numberLiteral, "10"),
      Token(Category.print, "PRINT"),
      Token(Category.identifier, "A"),
      Token(Category.comma, ","),
      Token(Category.identifier, "A"),
    ];

    final parser = Parser(tokens);
    final program = parser.parse();

    expect(program[10], isA<PrintStatement>());
  });

When we run the test, we see that the test fails.

00:00 +4 -1: Some tests failed.

To parse a print statement, we first modify the parse method to include the print token category in the switch statement used to determine what to do based on the token. In the case of the print, we call a parsePrintStatement method.

import "expressions.dart";
import "statements.dart";
import "tokens.dart";

class Parser {
  final List<Token> tokens;
  int position = -1;

  Parser(this.tokens);

  Map<int, Statement> parse() {
    Map<int, Statement> program = {};

    while (position < tokens.length) {
      // ...

      final keywordToken = tokens[position];

      final statement = switch (keywordToken.category) {
        Category.let => parseLetStatement(),
        Category.print => parsePrintStatement(),
        _ => throw Exception("Unexpected token: ${tokens[position]}"),
      };

      // ...
    }

    return program;
  }

  Statement parseLetStatement() {
    // ...
  }

  Statement parsePrintStatement() {
    throw UnimplementedError();
  }

  void expectToken(Category category) {
    // ...
  }
}

The parsePrintStatement method should parse the print statement on the current line. As print statements have alternating expressions and commas, where the amount of expressions is not limited, we need to parse expressions until an end of line token is found. It is also possible that an end of line token does not exist, in which case the parser should stop at the end of the tokens.

To find the location where parsing should stop, we can look for an end of line token starting from the current position. If an end of line token exists, we can use the index of the token as the end position. If an end of line token does not exist, we can use the length of the tokens as the end position.

Statement parsePrintStatement() {
  position++;

  int endOfLineIndex =
      tokens.indexOf(Token(Category.endOfLine, '\n'), position);
  int endPosition = endOfLineIndex == -1 ? tokens.length : endOfLineIndex;

  // ...
}

Then, we can loop over the tokens, starting from the current position and ending at the end position, skipping every second token (i.e., comma). When looping over the tokens, we parse them into expressions, and add them into a list of arguments. At the end, we update the position to match the end position (decremented by one due to position++ at the end of the marse method), and return a PrintStatement with the list of arguments.

As a whole, this looks as follows.

  Statement parsePrintStatement() {
    position++;

    int endOfLineIndex =
        tokens.indexOf(Token(Category.endOfLine, '\n'), position);
    int endPosition = endOfLineIndex == -1 ? tokens.length : endOfLineIndex;

    final arguments = <Expression>[];

    for (int i = position; i < endPosition; i += 2) {
      final token = tokens[i];

      Expression expression = switch (token.category) {
        Category.numberLiteral =>
          NumberLiteralExpression(num.parse(token.value)),
        Category.stringLiteral => StringLiteralExpression(token.value),
        Category.identifier => IdentifierExpression(token.value),
        _ => throw Exception("Invalid token in PRINT arguments: $token"),
      };

      arguments.add(expression);
    }

    position = endPosition - 1;

    return PrintStatement(arguments);
  }

Now, when we run the tests, we see that the tests pass.

00:00 +5: All tests passed!

Multiple PRINT statements

Let’s finally add a test for parsing multiple PRINT statements. The test verifies that the parser can parse a list of tokens consisting of two lines both with PRINT statements.

  test("Two PRINT statements", () {
    final tokens = [
      Token(Category.numberLiteral, "10"),
      Token(Category.print, "PRINT"),
      Token(Category.identifier, "A"),
      Token(Category.endOfLine, "\n"),
      Token(Category.numberLiteral, "20"),
      Token(Category.print, "PRINT"),
      Token(Category.stringLiteral, "Hello, BASIC!"),
    ];

    final parser = Parser(tokens);
    final program = parser.parse();

    expect(program[10], isA<PrintStatement>());
    expect(program[20], isA<PrintStatement>());
  });

When we run the tests, we see that they pass.

00:00 +6: All tests passed!

At this point, we have a parser that can parse LET and PRINT statements. We can now move on to revising our interpreter so that it uses the lexer and the parser.

The parser has still plenty of room for improvement. As an example, the parser does not e.g. verify the existence of commas between the expressions in the PRINT statement. We’ll ignore this for now.

Loading Exercise...