Peeking into Compilers

Example: WAT and WASM


Learning Objectives

  • You know of WAT and WebAssembly (WASM), can understand simple WAT code, and know of the possibility of compiling WAT to WebAssembly.
  • You know how to compile Rust to WASM and that WASM can be loaded to other programs.

As another example of intermediate representation, let’s look at WAT. WAT — short for WebAssembly Text Format — is a human-readable representation of WebAssembly code.

Modules and Functions

WebAssembly programs consist of modules, which contain functions, data, and other elements. A module can export functions and data that can be used by other modules or the host environment. Functions in WebAssembly are defined using a stack-based instruction set, similar to assembly language. Data can be stored in linear memory, which is a contiguous array of bytes that can be accessed by the program.

As an example, the following WAT code defines a simple module with a function that adds two numbers:

(module
  (func $add (param i32 i32) (result i32)
    (local.get 0)
    (local.get 1)
    (i32.add))

  (export "add" (func $add))
)

The module keyword starts the module definition, followed by the function definition using the func keyword. The function $add takes two 32-bit integers as parameters and returns a 32-bit integer. The function body consists of three instructions: local.get to load the parameters onto the stack and i32.add to add the two values. Finally, the function is exported using the export keyword.

The following example on the other hand, shows a module with two functions, add and subtract:

(module
  (func $add (param i32 i32) (result i32)
    (local.get 0)
    (local.get 1)
    (i32.add))

  (func $subtract (param i32 i32) (result i32)
    (local.get 0)
    (local.get 1)
    (i32.sub))

  (export "add" (func $add))
  (export "subtract" (func $subtract))
)

The i32 type indicates a 32-bit integer, and the param and result keywords specify the function parameters and return type, respectively. The local.get instruction loads the function parameters onto the stack, and the i32.add and i32.sub instructions perform addition and subtraction, respectively. The addition and subtraction functions are built into the WebAssembly instruction set.

The following, on the other hand, shows a module with three functions, add, subtract, and mystery. While add and subtract are the same as before, the mystery function takes four numbers as input, adding the first two, subtracting the next two, and finally multiplying the result of the addition and the subtraction. The mystery function uses the two functions that we created earlier.

(module
  (func $add (param i32 i32) (result i32)
    (local.get 0)
    (local.get 1)
    (i32.add))

  (func $subtract (param i32 i32) (result i32)
    (local.get 0)
    (local.get 1)
    (i32.sub))

  (func $mystery (param i32 i32 i32 i32) (result i32)
    (local.get 0)
    (local.get 1)
    (call $add)
    (local.get 2)
    (local.get 3)
    (call $subtract)
    (i32.mul))

  (export "add" (func $add))
  (export "subtract" (func $subtract))
  (export "mystery" (func $mystery))
)

The mystery function effectively calculates (a + b) * (c - d) for the input numbers a, b, c, and d.

Loading Exercise...

Strings and Memory

WebAssembly is a low-level language that does not have built-in support for high-level data types like strings. Instead, strings are represented as arrays of bytes, and memory management is handled through linear memory. Reserving memory for strings and copying data to and from memory is a common pattern in WebAssembly programs — memory is allocated using the memory keyword, while the data keyword is used to initialize memory with specific values.

As an example, the following WAT code allocates memory (one allocation corresponds to 64 kilobytes), initializes the memory with the string “Hello… WAT?”, and then exports a function that returns a pointer to the string.

(module
  (memory 1)
  (data (i32.const 0) "Hello... WAT?")

  (func $getString (result i32)
    (i32.const 0)
  )

  (export "getString" (func $getString))
)

However, the problem with the above code is that it returns the memory offset as an integer, which is not very useful — we do not know the length of the string. A better version of the function would return a pointer to the start of the string and the length of the string, shown below.

(module
  (memory 1)
  (data (i32.const 0) "Hello... WAT?")

  (func $getString (result i32 i32)
    (i32.const 0)
    (i32.const 13)
  )

  (export "getString" (func $getString))
)

Compiling to WebAssembly

WAT code can be compiled to WebAssembly using the WebAssembly Binary Toolkit (WABT). There’s also an online demo of wat2wasm, which is one of the tools in the WABT toolkit. The wat2wasm tool compiles WAT to WebAssembly binary.

Using WebAssembly

WebAssembly modules can be loaded using standard JavaScript APIs. For example, the following JavaScript code loads a WebAssembly module, exports the function add from it, and calls the function with two arguments:

const wasmInstance = new WebAssembly.Instance(wasmModule, {});
const { add } = wasmInstance.exports;
console.log(add(1, 2));

The wasmModule is a reference to the WebAssembly module, which can be loaded using the fetch API. The WebAssembly.Instance constructor creates an instance of the module, and the exports property provides access to the exported functions and data. The add function is then called with the arguments 1 and 2, and the result is printed to the console.

WebAssembly is supported in modern browsers and can also be run outside the browser using tools like Deno, which provides a secure runtime for JavaScript and WebAssembly code. Deno has first-class support for WebAssembly and allows importing WebAssembly modules directly from URLs or local files.

As an example, if we have the mystery function in a WebAssembly binary called mepl-wat-mystery.wasm, we can create a web application that allows posting JSON-formatted data to an API and using the data as input to the mystery function:

import { Hono } from "@hono/hono";
import { mystery } from "./mepl-wat-mystery.wasm";
const app = new Hono();

app.post("/wat", async (ctx) => {
  const { a, b, c, d } = await ctx.req.json();
  const result = mystery(a, b, c, d);
  return ctx.json({ result });
});

export default app;

Running the above code in Deno would create a web server that listens for POST requests to the /wat endpoint, reads JSON data from the request body, calls the mystery function with the input data, and returns the result as JSON.

curl -X POST -d '{"a": 1, "b": 2, "c": 3, "d": 4}' localhost:8000/wat
{"result":-3}%
Web Software...

For additional details on creating web applications, see the course on Web Software Development. Using WebAssembly in web applications is discussed in the course Designing and Building Scalable Web Applications.


Loading Exercise...

Really, writing WAT?

Writing WebAssembly code directly in WAT is not common. Mostly, developers use higher-level languages like Rust, which can be compiled to WebAssembly. However, understanding WAT can be useful for debugging and optimizing WebAssembly code, as well as for understanding how WebAssembly works under the hood.

As an example, if we have a Rust function called mystery in a file called rust-mystery.rs that calculates (a + b) * (c - d):

#[no_mangle]
pub fn mystery(a: i32, b: i32, c: i32, d: i32) -> i32 {
  (a + b) * (c - d)
}

pub fn main() {
}

We can compile the Rust code to WebAssembly using the Rust toolchain.

rustc --target wasm32-unknown-unknown -O rust-mystery.rs

The #[no_mangle] attribute ensures that the function name is not mangled during compilation, allowing it to be called from other languages. The --target wasm32-unknown-unknown flag specifies the target platform as WebAssembly. If the target is not available, it can be installed using rustup target add wasm32-unknown-unknown.

With the code compiled into Wasm, it can be used in a web application as shown earlier — the default name is the same as the Rust file, but with a .wasm suffix.

import { Hono } from "@hono/hono";
import { mystery } from "./rust-mystery.wasm";
const app = new Hono();

app.post("/wat", async (ctx) => {
  const { a, b, c, d } = await ctx.req.json();
  const result = mystery(a, b, c, d);
  return ctx.json({ result });
});

export default app;

The above code would work similarly to the previous example, but instead of using the WAT code, it uses the WebAssembly binary compiled from Rust. When the application is run, the mystery function would be called with the input data, and the result would be returned as JSON.

curl -X POST -d '{"a": 10, "b": 11, "c": 4, "d": 2}' localhost:8000/wat
{"result":42}%

WebAssembly and Secure Sandbox

WebAssembly runs in a secure sandboxed environment, which means that it cannot access the host system directly. This provides a level of security and isolation, making WebAssembly suitable for running untrusted code in web applications. However, when running WebAssembly outside the browser, like in Deno, WebAssembly can interact with the host system through the runtime environment.

With Deno, you can restrict what can be accessed. As an example, by passing the flags --allow-read and --allow-net, the application has access to files and network, but cannot e.g. write files or access the system in any other way.

Similar permission functionality is also being added to other runtime environments, like Node.js.

You could even build an application that allows users to upload WebAssembly modules and run them in a secure environment, providing a platform for running custom code without compromising the host system. More on that in the course on Designing and Building Scalable Web Applications.

Loading Exercise...