/ Compiler says no!

Wasm for the impatient

Wasm is a virtual machine-based binary code and module format, available as compilation target for many other languages. This article contains a succinct description from the ground up.

WebAssembly (or Wasm) is portable virtual-machine based compile target that today finds application in reinventing Java-in-the-Browser1, as a target for smart contracts and some hopefully exciting usecases around shipping code into the cloud. In the future, it could also replace things like Lua as means to integrate user-written functionality.

This is a brief, but from the ground up explanation of WebAssembly that does not take any shortcuts.

The basics of WebAssembly

Like most assembly languages, WebAssembly approximately has a one-to-one mapping of its statements to machine instructions. Its virtual machine is stack based, with all operations optionally consuming stack values and writing any number of them back. A running program has access to the following environment:

section access description
stack read+write Finite stack where operands and outputs of instructions are read from/written to. Fulfills a similar function as CPU registers do in other languages. Can contain values, labels and activations.
memory read+write Finite preallocated (but growable), zero’d slice of memory similar to what in other environments would be called a heap. Written and read using memory instructions, e.g. i32.load 0x10 will load a 32-bit value at byte position 32 from memory and put it onto the stack.
locals read+write An indexed array of local values, use for passing arguments and storing temporary values, similar to the “stack” of register based virtual machines. Can be written and loaded using variable instructions. Typically scoped to activation frames, e.g. function calls.
globals read+write Similar to locals, except they are not cleared.

Writing raw web assembly

To start at the bottom, let’s write a WebAssembly program in the most low level way we can imagine, by writing raw binary data. Web assembly programs are shipped as modules, which have a binary encoding. The minimal web-assembly module is a completely empty module and consists of 8 bytes, namely:

00 61 73 6d  01 00 00 00

We can see the magic number (\0asm or 00 61 73 6d) and the 32-bit version number 00 00 00 01 little endian encoding.

Let’s write this to a file, then disassemble it using a suitable disassembler like wasm2wat from the wabt (WebAssembly binary toolkit):

$ echo -en '\x00\x61\x73\x6D\x01\x00\x00\x00' > minimal.wasm
$ wasm2wat minimal.wasm
(module)

The output (module) here is an empty module definition in WebAssembly text format, a format which we will deal with later.

Vectors and u32s

Before we continue we should define the concept of a vector or vec, as they are called in WebAssembly: A vec is a 32 bit length prefixed list of values of a single kind. With the version number being one notable exception, all integers are encoded using a variable length encoding called Little Endian Base 128 (LEB128). For small integers this is the same as writing the integer as a single byte, instead of four.

As an example, to encode the list of single bytes 0xaa, 0xbb and 0xcc, we write the vector containing them as

03 aa bb cc

Here, 03 is the length, followed by three elements, each one byte in size.

Adding a function

Our next goal is to add a simple example function defined by the following pseudo-code:

push the value 0xaa onto the stack
push the value 0xbb onto the stack
add

The result should be a single stack value of 0x165 or 357 in decimal. Its type signature contains zero arguments, but since it leaves a value on the stack it returns a single i32, as leaving values on the stack is how functions return values in WebAssembly.

Looking at the spec again (and with a little trial and error) we can deduce that to add a function to our module, we need to provide three sections, namely

Encoding types

Function types in WebAssembly are expressed as two vectors, one containg input types and one containing output types (of which there is only one). Our signature, written as two arrays, is thus []->[i32] or in WebAssembly terms vec() vec(i32).

A type is encoded in a single byte, i32 is encoded as 7f. The empty vector [] is thus encoded as 00, while our return values vector [i32] is encoded as 01 7f. The rule to encode the entire function type prefixing it by 60, thus we end up with 60 00 01 7f. The contents of the types section is a vector of all functions types, in our case just one. With the contents of our section known, we just need to encode it by appending it to our module.

A section starts with a byte indicating its type, followed by the LEB128 encoded length in bytes. The ID for our types section is 01 and we encoded it using five bytes above, thus we get for the complete section encoding:

"01" LENGTH vec( "60" vec() vec(i32) )
01 05 01 60 00 01 7f

Encoding funcs

Encoding the funcs section is rather boring, since it just maps a function to a function signature, which is much more useful in cases where multiple functions share a type signature.

funcs is just a vec of indices into types; each entry corresponds to a function that is identified by its index in funcs. In other words, funcs’ content is a vec(typeidx), with typeidx being a u32 referencing an element of types.

We only have one type definition, which has index 0, and a single function, also with index 0, thus our entire vec encoded with the length prefix is 01 00. Together with section ID 3 and length bytes, we just have to append 03 02 01 00 to our module.

Encoding code

The final bit to encode is the actual function body. In raw web assembly, our function body is ca lled an expression, must be terminated by end and looks like this:

i32.const 0xAA        ; push 0xAA onto the stack
i32.const 0xBB        ; push 0xBB onto the stack
i32.add               ; add two i32s on to of the stack
end                   ; terminate expression

The opcode for i32.const is 41, while i32.add is 6a and end is 0b. Keeping in mind that large integers like 0xaa and 0xbb are encoded as aa 01 and bb 01 due to LEB128 encoding we can write the entire sequences as

41 aa 01 41 bb 01 6a 0b

which is 8 bytes long. This code is prefixed by a definition for all its locals, which we have not used any of, but we still need to include an empty vec indicate this.

The raw bytes for our locals and function are stored as a vec when encoded2. This means our entire function body is encoded as

09 00 41 aa 01 41 bb 01 6a 0b

The code for the code section is 0a, and of course we have to provide a vec of multiple functions bodies, even though we only have one body in our example:

"0a" LENGTH_SECTION vec( BODY BODY .. )

This encodes to

0a 0b 01 09 00 41 aa 01 41 bb 01 6a 0b

for the entire code section, which we can append to our module file.

Putting it all together

With nothing up our sleeves we can write the entire web assembly module we created to a file called minimal.wasm using a small shell script:

#!/bin/sh
# Magic and Header
echo -en '\x00\x61\x73\x6D\x01\x00\x00\x00' > minimal.wasm
# Type section
echo -en '\x01\x05\x01\x60\x00\x01\x7f' >> minimal.wasm
# Function section
echo -en '\x03\x02\x01\x00' >> minimal.wasm
# Code section
echo -en '\x0a\x0b\x01\x09\x00\x41\xaa\x01\x41\xbb\x01\x6a\x0b' >> minimal.wasm

We can verify that it is a valid module by having wasm2wat disassemble it for us:

wasm2wat  minimal.wasm --generate-names
(module
  (type $t0 (func (result i32)))
  (func $f0 (type $t0) (result i32)
    i32.const 170
    i32.const 187
    i32.add))

wasm2wat added some redundant information and generated some names for convenience.

Adding an export

While we have a complete module that we can roundtrip from binary module to text, we have not exported our function yet. Exporting means making it visible to the outside by giving it a name, so that it will be available to a system that imports our module, e.g. a browser.

We will name our function “demo”, which implies we will need to encode a string. Strings in WebAssembly modules are called names and are just vecs of bytes that happen to be valid UTF-8. Without going into too much detail this time around, the exports section has an ID of 7, is a vec(NAME EXPORTDESC), where NAME is a string and EXPORTDESC, in our case, is 00 00, indicating an export a function (00) with index 0 (00).

The only thing to pay attention to is the order, we need to add the export snipped

# Exports section
echo -en '\x07\x08\x01\x04\x64\x65\x6d\x6f\x00\x00' >> minimal.wasm

right before the code section.

Calling our module inside an HTML file

Now our goal is to actually run our code inside a browser. We can leverage the xxd utility, since it has a convenient C-style include formatting function, to save us some work. Note that we’re not interested in anything but the comma-delimited list of hex-encoded bytes it produces that we can copy & paste later:

$ xxd -i minimal.wasm
unsigned char minimal_wasm[] = {
  0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00, 0x01, 0x05, 0x01, 0x60,
  0x00, 0x01, 0x7f, 0x03, 0x02, 0x01, 0x00, 0x07, 0x08, 0x01, 0x04, 0x64,
  0x65, 0x6d, 0x6f, 0x00, 0x00, 0x0a, 0x0b, 0x01, 0x09, 0x00, 0x41, 0xaa,
  0x01, 0x41, 0xbb, 0x01, 0x6a, 0x0b
};
unsigned int minimal_wasm_len = 42;

Inside our HTML file, for various reasons3, we need to create an ArrayBuffer with our JS code; the easiest way to do so is to create an Uint8Array typed array from our binary WASM code.

There are multiple functions available that deal with compiling and instantiating WebAssembly modules, but the most convenient and still fairly simple one is WebAssembly.instantiate(), which we can directly call on our binary WASM code.

The returned promise will resolve to a result posessing an instance property, which is our compiled4 and instantiated WebAssembly module. On this, we find a property exports, which itself has a property demo, our exported function from earlier.

Calling exports.demo() will then yield the expected result of 357. This is inspectable in the developer console after running the following complete demo in a suitable browser:

<html>

<body>
  WebAssembly example.

  <script>
    var wasmMod = new Uint8Array([
      0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00, 0x01, 0x05, 0x01, 0x60,
      0x00, 0x01, 0x7f, 0x03, 0x02, 0x01, 0x00, 0x07, 0x08, 0x01, 0x04, 0x64,
      0x65, 0x6d, 0x6f, 0x00, 0x00, 0x0a, 0x0b, 0x01, 0x09, 0x00, 0x41, 0xaa,
      0x01, 0x41, 0xbb, 0x01, 0x6a, 0x0b
    ]);

    WebAssembly.instantiate(wasmMod).then(function (result) {
      var fnDemo = result.instance.exports.demo;
      console.log("returned value", fnDemo());
    });
  </script>
</body>

</html>

Conclusion

WebAssembly is a stack-based virtual machine language for which we can hack together a module byte-by-byte and put straight into an HTML file without much fuss. Tools like wabt are convenient, but entirely optional. From this point on, we can work our way up from the bottom to run higher level languages inside the browser.


  1. This time around it’s standardized, properly sand-boxed and included, at least. ↩︎

  2. The astute reader will remark that it would not be necessarily to add the length of the encoded function body to correctly parse a module, since they are terminated by end, but having it makes it easy to skip over entire functions in a single jump. ↩︎

  3. Javascript being Javascript. Don’t pretend you’re not reading this article precisely to avoid having to deal with Javascript. ↩︎

  4. WebAssembly.instantiate includes an implicit WebAssembly.compile step. ↩︎