WebAssembly (or Wasm) is portable virtual-machine based compile target that today finds application in reinventing Java-in-the-Browser1, as a target for smart contracts and some hopefully exciting usecases around shipping code into the cloud. In the future, it could also replace things like Lua as means to integrate user-written functionality.
This is a brief, but from the ground up explanation of WebAssembly that does not take any shortcuts.
The basics of WebAssembly
Like most assembly languages, WebAssembly approximately has a one-to-one mapping of its statements to machine instructions. Its virtual machine is stack based, with all operations optionally consuming stack values and writing any number of them back. A running program has access to the following environment:
section | access | description |
---|---|---|
stack | read+write | Finite stack where operands and outputs of instructions are read from/written to. Fulfills a similar function as CPU registers do in other languages. Can contain values, labels and activations. |
memory | read+write | Finite preallocated (but growable), zero’d slice of memory similar to what in other environments would be called a heap. Written and read using memory instructions, e.g. i32.load 0x10 will load a 32-bit value at byte position 32 from memory and put it onto the stack. |
locals | read+write | An indexed array of local values, use for passing arguments and storing temporary values, similar to the “stack” of register based virtual machines. Can be written and loaded using variable instructions. Typically scoped to activation frames, e.g. function calls. |
globals | read+write | Similar to locals, except they are not cleared. |
Writing raw web assembly
To start at the bottom, let’s write a WebAssembly program in the most low level way we can imagine, by writing raw binary data. Web assembly programs are shipped as modules, which have a binary encoding. The minimal web-assembly module is a completely empty module and consists of 8 bytes, namely:
00 61 73 6d 01 00 00 00
We can see the magic number (\0asm
or 00 61 73 6d
) and the 32-bit version number 00 00 00 01
little endian encoding.
Let’s write this to a file, then disassemble it using a suitable disassembler like wasm2wat
from the wabt
(WebAssembly binary toolkit):
$ echo -en '\x00\x61\x73\x6D\x01\x00\x00\x00' > minimal.wasm
$ wasm2wat minimal.wasm
(module)
The output (module)
here is an empty module definition in WebAssembly text format, a format which we will deal with later.
Vectors and u32s
Before we continue we should define the concept of a vector or vec
, as they are called in WebAssembly: A vec
is a 32 bit length prefixed list of values of a single kind. With the version number being one notable exception, all integers are encoded using a variable length encoding called Little Endian Base 128 (LEB128). For small integers this is the same as writing the integer as a single byte, instead of four.
As an example, to encode the list of single bytes 0xaa
, 0xbb
and 0xcc
, we write the vector containing them as
03 aa bb cc
Here, 03
is the length, followed by three elements, each one byte in size.
Adding a function
Our next goal is to add a simple example function defined by the following pseudo-code:
push the value 0xaa onto the stack
push the value 0xbb onto the stack
add
The result should be a single stack value of 0x165
or 357 in decimal. Its type signature contains zero arguments, but since it leaves a value on the stack it returns a single i32
, as leaving values on the stack is how functions return values in WebAssembly.
Looking at the spec again (and with a little trial and error) we can deduce that to add a function to our module, we need to provide three sections, namely
- the
types
section, with ID1
, must contain a definition for the signature of our function, - the
funcs
section, with ID3
, associates our function with a type signature, and - the
code
section, with ID10
, contains the actual instructions.
Encoding types
Function types in WebAssembly are expressed as two vectors, one containg input types and one containing output types (of which there is only one). Our signature, written as two arrays, is thus []->[i32]
or in WebAssembly terms vec() vec(i32)
.
A type is encoded in a single byte, i32
is encoded as 7f
. The empty vector []
is thus encoded as 00
, while our return values vector [i32]
is encoded as 01 7f
. The rule to encode the entire function type prefixing it by 60
, thus we end up with 60 00 01 7f
. The contents of the types
section is a vector of all functions types, in our case just one. With the contents of our section known, we just need to encode it by appending it to our module.
A section starts with a byte indicating its type, followed by the LEB128 encoded length in bytes. The ID for our types section is 01
and we encoded it using five bytes above, thus we get for the complete section encoding:
"01" LENGTH vec( "60" vec() vec(i32) )
01 05 01 60 00 01 7f
Encoding funcs
Encoding the funcs
section is rather boring, since it just maps a function to a function signature, which is much more useful in cases where multiple functions share a type signature.
funcs
is just a vec
of indices into types
; each entry corresponds to a function that is identified by its index in funcs
. In other words, funcs
’ content is a vec(typeidx)
, with typeidx
being a u32
referencing an element of types
.
We only have one type definition, which has index 0, and a single function, also with index 0, thus our entire vec
encoded with the length prefix is 01 00
. Together with section ID 3 and length bytes, we just have to append 03 02 01 00
to our module.
Encoding code
The final bit to encode is the actual function body. In raw web assembly, our function body is ca lled an expression, must be terminated by end
and looks like this:
i32.const 0xAA ; push 0xAA onto the stack
i32.const 0xBB ; push 0xBB onto the stack
i32.add ; add two i32s on to of the stack
end ; terminate expression
The opcode for i32.const
is 41
, while i32.add
is 6a
and end
is 0b
. Keeping in mind that large integers like 0xaa
and 0xbb
are encoded as aa 01
and bb 01
due to LEB128 encoding we can write the entire sequences as
41 aa 01 41 bb 01 6a 0b
which is 8 bytes long. This code is prefixed by a definition for all its locals, which we have not used any of, but we still need to include an empty vec
indicate this.
The raw bytes for our locals and function are stored as a vec
when encoded2. This means our entire function body is encoded as
09 00 41 aa 01 41 bb 01 6a 0b
The code for the code
section is 0a
, and of course we have to provide a vec
of multiple functions bodies, even though we only have one body in our example:
"0a" LENGTH_SECTION vec( BODY BODY .. )
This encodes to
0a 0b 01 09 00 41 aa 01 41 bb 01 6a 0b
for the entire code
section, which we can append to our module file.
Putting it all together
With nothing up our sleeves we can write the entire web assembly module we created to a file called minimal.wasm
using a small shell script:
#!/bin/sh
# Magic and Header
echo -en '\x00\x61\x73\x6D\x01\x00\x00\x00' > minimal.wasm
# Type section
echo -en '\x01\x05\x01\x60\x00\x01\x7f' >> minimal.wasm
# Function section
echo -en '\x03\x02\x01\x00' >> minimal.wasm
# Code section
echo -en '\x0a\x0b\x01\x09\x00\x41\xaa\x01\x41\xbb\x01\x6a\x0b' >> minimal.wasm
We can verify that it is a valid module by having wasm2wat
disassemble it for us:
wasm2wat minimal.wasm --generate-names
(module
(type $t0 (func (result i32)))
(func $f0 (type $t0) (result i32)
i32.const 170
i32.const 187
i32.add))
wasm2wat
added some redundant information and generated some names for convenience.
Adding an export
While we have a complete module that we can roundtrip from binary module to text, we have not exported our function yet. Exporting means making it visible to the outside by giving it a name, so that it will be available to a system that imports our module, e.g. a browser.
We will name our function “demo”, which implies we will need to encode a string. Strings in WebAssembly modules are called names and are just vec
s of bytes that happen to be valid UTF-8. Without going into too much detail this time around, the exports section has an ID of 7, is a vec(NAME EXPORTDESC)
, where NAME
is a string and EXPORTDESC
, in our case, is 00 00
, indicating an export a function (00
) with index 0 (00
).
The only thing to pay attention to is the order, we need to add the export snipped
# Exports section
echo -en '\x07\x08\x01\x04\x64\x65\x6d\x6f\x00\x00' >> minimal.wasm
right before the code section.
Calling our module inside an HTML file
Now our goal is to actually run our code inside a browser. We can leverage the xxd
utility, since it has a convenient C-style include formatting function, to save us some work. Note that we’re not interested in anything but the comma-delimited list of hex-encoded bytes it produces that we can copy & paste later:
$ xxd -i minimal.wasm
unsigned char minimal_wasm[] = {
0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00, 0x01, 0x05, 0x01, 0x60,
0x00, 0x01, 0x7f, 0x03, 0x02, 0x01, 0x00, 0x07, 0x08, 0x01, 0x04, 0x64,
0x65, 0x6d, 0x6f, 0x00, 0x00, 0x0a, 0x0b, 0x01, 0x09, 0x00, 0x41, 0xaa,
0x01, 0x41, 0xbb, 0x01, 0x6a, 0x0b
};
unsigned int minimal_wasm_len = 42;
Inside our HTML file, for various reasons3, we need to create an ArrayBuffer
with our JS code; the easiest way to do so is to create an Uint8Array
typed array from our binary WASM code.
There are multiple functions available that deal with compiling and instantiating WebAssembly modules, but the most convenient and still fairly simple one is WebAssembly.instantiate()
, which we can directly call on our binary WASM code.
The returned promise will resolve to a result posessing an instance
property, which is our compiled4 and instantiated WebAssembly module. On this, we find a property exports
, which itself has a property demo
, our exported function from earlier.
Calling exports.demo()
will then yield the expected result of 357
. This is inspectable in the developer console after running the following complete demo in a suitable browser:
<html>
<body>
WebAssembly example.
<script>
var wasmMod = new Uint8Array([
0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00, 0x01, 0x05, 0x01, 0x60,
0x00, 0x01, 0x7f, 0x03, 0x02, 0x01, 0x00, 0x07, 0x08, 0x01, 0x04, 0x64,
0x65, 0x6d, 0x6f, 0x00, 0x00, 0x0a, 0x0b, 0x01, 0x09, 0x00, 0x41, 0xaa,
0x01, 0x41, 0xbb, 0x01, 0x6a, 0x0b
]);
WebAssembly.instantiate(wasmMod).then(function (result) {
var fnDemo = result.instance.exports.demo;
console.log("returned value", fnDemo());
});
</script>
</body>
</html>
Conclusion
WebAssembly is a stack-based virtual machine language for which we can hack together a module byte-by-byte and put straight into an HTML file without much fuss. Tools like wabt
are convenient, but entirely optional. From this point on, we can work our way up from the bottom to run higher level languages inside the browser.
-
This time around it’s standardized, properly sand-boxed and included, at least. ↩︎
-
The astute reader will remark that it would not be necessarily to add the length of the encoded function body to correctly parse a module, since they are terminated by
end
, but having it makes it easy to skip over entire functions in a single jump. ↩︎ -
Javascript being Javascript. Don’t pretend you’re not reading this article precisely to avoid having to deal with Javascript. ↩︎
-
WebAssembly.instantiate
includes an implicitWebAssembly.compile
step. ↩︎