Owning Your Invariants

One of my favorite things to do in Rust is to enforce invariants across a codebase, leveraging the compiler to correctness-check a huge portion of my actual domain logic. Compared to an implementation in a language like Python, a whole class of tests that would be required for correctness there is obviated through newtypes and other patterns.

While it is often straightforward to use the type system for these, the borrow checker and the somewhat recently introduced non-lexical lifetimes are also great tools to leverage for logical soundness.

A motivating example: Parsing custom networking protocols

A common operation when working with chunks of bytes received from a non-trustworthy source is calculating offsets and ranges into a given slice. If you are dealing with a network stream or any other data that will arrive piecewise, the excellent bytes crate should typically be involved. Let’s consider the following piece of innocuous looking code:

use bytes::{Buf, BytesMut};

// A frame of our networking protocol, details omitted.
use crate::Frame;

// The size of a frame header.
const HEADER_SIZE: usize = 4;

// Processes a header.
//
// Returns the size of the payload or `None` if
// insufficient bytes are in `raw`.
fn process_header(raw: &[u8]) -> Option<usize> {
    todo!("left as an exercise for the reader")
}

// A simplified frame processing function.
//
// Returns a full frame taken from `raw`, or `None` if
// there is not enough data yet.
fn process_frame(raw: &mut BytesMut) -> Option<Frame>
{
    let payload_size = process_header(raw)?;

    // Determine where the frame ends, then check if we
    // have received enough data.
    let frame_end = HEADER_SIZE + payload_size;
    if raw.remaining() < frame_end {
        return None;
    }

    // The entire frame is contained in `raw`; drop header
    // and return payload.
    raw.advance(HEADER_SIZE);
    let payload = raw.split_to(frame_end);

    Some(Frame::from_payload(payload))
}

There is a subtle¹ bug here: frame_end is an index into buf, pointing to the end of the frame. By calling raw.advance(HEADER_SIZE) we have essentially invalidated this index, since HEADER_SIZE bytes were removed from the start of the buffer.

We can attempt to fix this by dropping the index after it is invalidated:

// The entire frame is contained in `raw`; drop header
// and return payload.
// All dependent indices are also invalidated.
raw.advance(HEADER_SIZE);
drop(frame_end);

let payload = raw.split_to(payload_size);

This will only result in disappointment though, since the std::mem::drop documentation clearly states:

Integers and other types implementing Copy are unaffected by drop.

Due to frame_end being a usize implementing Copy, we have to take a different route.

Shadowing the value is an option, although the resulting boilerplate that hides the unusued variable warning is painful enough to warrant a macro, which is not ideal:

macro_rules! invalidate {
    ($id:ident) => {
        #[allow(unused_variables)]
        let $id: ::std::convert::Infallible;
    }
}

Note that Infallible is used in the absence of a stabilized never type (!). Replacing the drop(frame_end) with invalidate!() finally causes the type checker to grace us with an error:

52  |     let payload = raw.split_to(frame_end);
    |                       -------- ^^^^^^^^^ expected `usize`, found `Infallible`
    |                       |
    |                       arguments to this method are incorrect
    |

This is an improvement, but it requires paying close attention to which indices are invalidated and bugs can still easily be introduced during refactoring.

Phantom borrows

By shifting the responsibility to the creation of the index we can make things more robust: As long as the index is created with the dependency on the underlying buffer expressed, it will always be invalidated if the underlying buffer is modified.

/// Tracks a dependency of an otherwise independent type.
#[repr(transparent)]
struct DependsOn<'a, T, S> {
    pub value: T,
    phantom: PhantomData<&'a S>,
}

fn depends_on<'a, S, T>(_source: &'a S, value: T) -> DependsOn<'a, T, S> {
    DependsOn {
        value,
        phantom: PhantomData,
    }
}

This pattern is similar to a pointer with a lifetime. We can make an argument for adding a Deref implementation as well,

impl<'a, S, T> Deref for DependsOn<'a, T, S> {
    type Target = T;

    fn deref(&self) -> &T {
        &self.value
    }
}

though this will only change whether or not we write frame_end.value or *frame_end in our code, which now looks like this:

let frame_end = depends_on(&raw, HEADER_SIZE + payload_size);
if raw.remaining() < *frame_end {
   return None;
}

// The entire frame is contained in `raw`; drop header
// and return payload.
raw.advance(HEADER_SIZE);

let payload = raw.split_to(*frame_end);

This finally gives us a useful error from the borrow checker:

error[E0502]: cannot borrow `*raw` as mutable because it is also borrowed as immutable
  --> src/main.rs:74:5
   |
66 |     let frame_end = depends_on(&raw, HEADER_SIZE + payload_size);
   |                                ---- immutable borrow occurs here
...
74 |     raw.advance(HEADER_SIZE);
   |     ^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here
75 |
76 |     let payload = raw.split_to(*frame_end);
   |                                 --------- immutable borrow later used here

If we correct the buggy line to let payload = raw.split_to(payload_size); our code compiles, as long as your compiler has support for non-lexical lifetimes.

Conclusion

The example above is very close to how pointers with lifetimes are constructed and benefits from the same borrow checker assistance, which is not surprising since indices into buffers are at least “pointer adjacent”. Whether or not one wants to expend the extra effort is a matter of personal preference, i.e. subjective complexity of the code at hand ².

It may be easy to see in this contrived example, but I have had more complex code examples from the real world that were similar in structure while being much harder to read. ↩︎
…and also the reason why I am not pushing out a small crate for this. ↩︎