Build, don't call

Sometimes one runs across less-than-readable function calls, like this code for launching a rocket:

launch_rocket(50_000, 200_000, 10_000, false, true).expect("launch failed");

Encountering this in a code review can be tricky, one has to look at the function signature to discover that the last two arguments seem to have been swapped:

fn launch_rocket(payload_kg: u32,
                 fuel_kg: u32,
                 countdown_ms: u32,
                 has_crew: bool,
                 self_destruct: bool) {
    // ...
}

The call site is hard to read at a glance, especially if the signature is not available, since it

has many arguments,
includes arguments of the same type that can be confused (payload_kg, fuel_kg) and
are not immediately obvious from the function definition (self_destruct).

Code like this is also not very future proof as touching this specific section always carries the risk of accidentally messing up some of the arguments.

Luckily, there are ways to address this and while these vary from language to language, here are some approaches usable in Rust:

Type-safe arguments

One way of solving this issue is to introduce types for some, or all of the arguments, including units:

struct Kilogram(T);
struct Milliseconds(T);

This is solving an adjacent problem, but does not prevent confusion between payload and fuel, so the full example would include one of

// Fully typed.
struct Payload(Kg<u32>);

// Just using newtypes around parameters.
struct Payload(u32);

While using custom types to represent actual units can be a good practice as well, it is different than the problem at hand. More often, a bigger payoff is introducing custom enums, even if there are only two variants. Arguments named is_* or has_* turn from booleans into first-class types:

enum Crewing {
    Unmanned,
    HumanCrew
}

We can strike a middle ground and transform our call into something like this, correcting the error in the last two parameters:

launch_rocket(Payload(50_000),
              Fuel(200_000),
              Timeout(10_000),
              Crewing::HumanCrew,
              SelfDestruct::Disabled)
    .expect("launch failed");

This is a lot easier to read and review than the previous incarnation, but still feels off. Adding newtypes around values is a common technique to add additional information like validation state or domain to these and usually well worth the effort, but we are still relying on the “accidental” fact that all our parameters have different units. Imagine splitting the payload parameter into a payload_base and payload_cabin parameter, we would be hard pressed to produce yet another type distinguishing the two.

Builder pattern

One could argue we are actually looking for something similar to named arguments found in other languages. In Rust, their functionality is usually covered by the builder pattern, which makes our call immediately obvious:

RocketLaunch::new()
    .payload(50_000)
    .fuel(200_000)
    .countdown(10_000)
    .has_crew()
    // note: omitting .self_destruct() here means it is disabled
    .launch();

As a neat side-effect, we also get a replacement for default arguments for free as well, here self_destruct defaults to false.

Building a bike-shed, owned or mutable?

There is no one strict definition of a builder pattern and the topic is very amenable to bike-shedding. The most common distinction is whether to use &mut self or self receivers on setter methods, the former being the recommend variant in most cases, as it allows manipulating the builder without having to rebind it:

impl RocketLaunch {
    // ...
    
    fn payload(&mut self, payload: u32) -> &mut Self {
        self.payload = payload;
        self
    }    
}

let mut rocket = RocketLaunch::new();
rocket.payload(50_000);

if extra_fuel {
    rocket.fuel(300_000);    
} else {
    rocket.fuel(200_000);
}

rocket.launch();

Changing these to owned builders

fn payload(mut self, payload: u32) -> Self {
    self.payload = payload;
    self
}

makes conditional use more cumbersome

launch = if extra_fuel {
    launch.fuel(300_000)
} else {
    launch.fuel(200_000)
};

but makes no difference in chaining:

fn ride(rocket: Rocket) {
    // ...
}

// The code below will work equally well with both methods.
ride(Rocket::build()
    .payload(50_000)
    .fuel(200_000)
    .launch());

Reusable vs non-reusable

Creating reusable builders means having the final builder method (launch) take a reference instead of an owned receiver:

impl RocketLaunch {
    // ...
    
    fn payload(&mut self, payload: u32) -> &mut Self {
        self.payload = payload;
        self
    }
    
    fn launch(&self) -> Rocket {
        // ...
    }
}

This makes it possible to launch multiple rockets from the same builder instance, which is a fine use case in many applications. The opposing case is a launch function that takes an owned self, meaning it will consume the builder.

These two styles rarely makes sense combined: By-reference setters (&mut self) combined with a consuming (self) launch method means that the builder cannot be chained anymore, since the last setter will return a &mut RocketLaunch, whilst the builder requires an owned RocketLaunch. Vice-verse, chaining is possible if the setters use owned self receivers, but there is no advantage to use these over &mut self if the final builder is called by reference.

So why ever use owned, non-reusable builders? These builders are useful if expensive resources are non-cloneable or expensive to duplicate, or if the arguments should not be reused by accident (e.g. when a supposedly unique ID is passed into the builder). One could argue that these cases are much more likely to occur when a builder replaces what would otherwise be a function call. For this reason, non-reusable builders are often the better fit.

Constructor arguments and fallibility

Another differentiating factor between different builder styles is how to handle absent arguments that do not have a meaningful default value. Analog to how a function call with named parameters can force these to be present, or in their absence throw an exception or return a regular error, a builder can emulate this behavior by enforcing the argument’s presence in the constructor, panic!king or having a fallible builder.

Let’s add a required parameter, a LaunchPermit, for each of our rocket launches to illustrate this:

// Variant A: Build parameter
impl RocketLaunch {
    fn new(permit: LaunchPermit) -> Self { .. }
}

// Variant B: Panic
impl RocketLaunch {
    fn new() -> Self { .. }
    fn permit(mut self, permit: LaunchPermit) -> Self { .. }

    // `launch` can panic if `permit` was not called
    fn launch(self) -> Rocket { .. }
}

// Variant C: Fallible
impl RocketLaunch {
    fn new() -> Self { .. }
    fn permit(mut self, permit: LaunchPermit) -> Self { .. }

    fn launch(self) -> Result<Rocket, _> { .. }
}

Variants B and C defer a potential error (missing LaunchPermit) to runtime. This is usually not the desired behavior, unless there are multiple absolutely required parameters with the same type, which will reintroduce the initial problem we set out to solve, namely too opaque and error prone function signatures.

Although out of scope for this article, consider that setting attributes can also be fallible (e.g. exceeding the rocket’s fuel tank’s capacity), leading to another dimension of choices. At this point we are leaving the “builder” design space though.

Which style to use?

The standard library is not strictly confined to one style, e.g. std::thread::Builder uses non-reusable builders, while std::process::Command takes its receiver by mutable reference. In general, it rarely pays off to be dogmatic about these decisions.

For the general case of replacing complex functions calls using non-reusable builders for efficiency is a good default approach. Unless a large number of non-optional, non-defaultable arguments exist, infallible builders can be pair well with these designs, especially since they can be inserted without modification into any existing call site.

Crates & macros

Writing builders can be fairly repetitive, so using a macro is tempting but the trade-off of adding another crate should always be considered. Up to a certain extent, it is completely fine to write builders by hand, especially when aided by an editor snippet!

The issue with builder-deriving proc macro crates is that they tend to not offer all possible styles of builders, thus can be pretty opinionated. Here are some quirks of popular builder crates:

derive_builder offers a lot of options and almost all variants enumerated here, except it does not allow for generating an infallible build function (will always return a Result). Always generates an adjacent type MyTypeBuilder to a given MyType.
typed_builder’s unique feature is that it tackles the multiple required arguments problem through the type system. Instead of generating one builder type, it will generate multiple permutated types that cause a compile time error if RocketBuilder::permit is not called before RocketBuilder::launch.
A lot of other crates are available, providing one style over another, comparing these will usually show them differ across some of the design decisions specified above.
bldgen-rs is my own hat thrown into the ring, which does not attempt to derive a builder, but rather make it easy to write one by just deriving the setter and new methods. It purposely has a very limited scope.

Conclusion

The builder pattern is, in many cases, a very attractive way to replace otherwise unwieldy function signatures, even though in general it is no silver bullet and leaves a lot of room for problem-specific adaptation. Always consider the trade-off when bringing in an external crate to derive it.