Tracing for the impatient

The tracing crate is (much more than) a replacement of the classic log crate, that shares ideas with other tracing systems, with a large set of documentation¹. Unfortunately, due to their bottom-up nature, it takes a little over 3500 words² to get to the point where you are able to get started, so for the impatient, here is how to use tracing in traditional log style:

# Cargo.toml

# ...

[dependencies]
tracing = "0.1.25"
tracing-subscriber = "0.2.17"

//! main.rs

use tracing::debug;
use tracing_subscriber;

fn main() {
    tracing_subscriber::fmt()
        .with_max_level(tracing::Level::TRACE)
        .init();

    debug!("Hello, world!");
}

This sets a fixed log level of TRACE and makes tracing work like a good drop-in replacement for log in most projects. Want to use the popular env_logger environment variable style a.k.a. RUST_LOG? That is supported out of the box, just substitute

tracing_subscriber::fmt()
    .with_env_filter(tracing_subscriber::EnvFilter::from_default_env())
    .init();

for the fmt call above.

Reasons for using `tracing` over `log`

A legitimate question is “Why would I even want to use this when log seems to do the job just fine?”. To answer it, we look at the advanced features tracing offers:

Spans

Spans enclose a time period during which data is logged, forming a context. This may sound a little highbrow, but is quickly illustrated with an example such as handling a job in a worker:

The job handling has a clear start and end — once the worker begins processing a job n, it enters a context of “processing job n”. After it finishes, regardless of the outcome, it exits it.

We can put this into some naive simple code, extending our initial boilerplate from earlier, which we will improve upon later:

// This example still has a lot of room for improvement!

struct Job(i32);

impl Job {
    fn run(&self) {
        info!("running job: {}", self.0);

        let span = info_span!("job processing");
        let _entered = span.enter();

        info!("processing step");
        subtask();

        // _entered is dropped here, causing a span exit.
    }
}

Spans might even be nested: If a job has another subtask that is delegated to a different part of the application, e.g. like running cleanup, we still want to track the fact it was called as a result of processing a particular job.

fn subtask() {
    let span = info_span!("subtask processing");
    let _entered = span.enter();

    info!("processing subtask, outcome: {}", 42);
}

If we run this on 100 sequential jobs,

let jobs: Vec<_> = (0..100).map(|n| Job(n)).collect();
jobs.iter().for_each(Job::run);

this is the output:

...
Mar 14 17:00:33.427  INFO tracingexample: running job: 71
Mar 14 17:00:33.427  INFO job processing: tracingexample: processing step
Mar 14 17:17:38.835  INFO job processing:subtask processing: tracingexample: processing subtask, outcome: 42
Mar 14 17:00:33.427  INFO tracingexample: running job: 72
Mar 14 17:00:33.427  INFO job processing: tracingexample: processing step
Mar 14 17:17:38.835  INFO job processing:subtask processing: tracingexample: processing subtask, outcome: 42
Mar 14 17:00:33.427  INFO tracingexample: running job: 73
...

Note that the subtask span is nested under job processing; should we call subtask from elsewhere, this would not be the case.

Attributes

The calls to log! have in common that they all create an event, which is the tracing term for what we would otherwise call “log entries”. Both events and spans can made to carry attributes, which saves us from having to stringify all our values:

# in `subtask()`:
info!(outcome = 42, "processing subtask");

# Attributes go to the front on events, end in spans.

# Job::run
fn run(&self) {
    let span = info_span!("job processing", id = self.0);
    let _entered = span.enter();

    info!("processing step");
    subtask();
}

We also do not need to indicate which job we are processing, that information is carried by the span itself. Our output looks cleaner and more structured:

Mar 14 17:21:06.127  INFO job processing{id=71}: tracingexample: processing step
Mar 14 17:21:06.127  INFO job processing{id=71}:subtask processing: tracingexample: processing subtask outcome=42
Mar 14 17:21:06.127  INFO job processing{id=72}: tracingexample: processing step
Mar 14 17:21:06.127  INFO job processing{id=72}:subtask processing: tracingexample: processing subtask outcome=42

tracing supports other output formats like JSON, in which this pattern really shines.

Parallel execution

At the same we also solved another issue we did not even encounter yet: Parallel execution. Turning our linear job execution in a non-deterministically ordered parallel one using rayon³,

jobs.par_iter().for_each(Job::run);

we can see that it is still possible to associate each log entry with a job, even though they are out of order:

Mar 14 17:26:25.136  INFO job processing{id=71}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=94}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=65}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=76}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=88}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=66}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=13}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=69}:subtask processing: tracingexample: processing subtask outcome=42
Mar 14 17:26:25.136  INFO job processing{id=28}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=64}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=38}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=63}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=32}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=51}:subtask processing: tracingexample: processing subtask outcome=42
Mar 14 17:26:25.136  INFO job processing{id=15}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=19}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=84}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=21}: tracingexample: processing step
Mar 14 17:26:25.136  INFO job processing{id=71}:subtask processing: tracingexample: processing subtask outcome=42

Without the spans, there would be zero chance to associate the correct log message with a particular job.

Instrumentation

Tracing comes with a set of tools that make life even easier, allowing for some of the boilerplate-ish code to be removed. A good example is #[instrument], which can be used to quickly create a span for a function:

impl Job {
    #[instrument("process job",
                 skip(self),
                 fields(id=self.0),
                 level=Level::DEBUG)]
    fn run(&self) {
        info!("processing step");
        subtask();
    }
}

#[instrument("subtask processing",
             level=Level::DEBUG)]
fn subtask() {
    info!(outcome = 42, "processing subtask");
}

We had to put a little effort in to make the output exactly the same as before, if you are content with the defaults, the annotations become shorter.

`async` users beware

There is an important hazard lurking here: As we will cover later, tracing registers the current span on a thread local. Some async runtimes (e.g. tokio) often move tasks around among different threads, which breaks the assumption that a span will be exited in the same thread it was entered.

To alleviate this, Instrument::instrument() is available,

my_future.instrument(info_span!("example")).await

, which behind the scenes will enter and exit a span every time the future is polled. The #[instrument] macro is clever enough to do this by itself if called on an async function.

This also means that holding a Span across an await point will cause incorrect traces!

let _enter = span.enter();
call_an_async_function().await; // Yielding here.
// At this point, our trace will be wrong.

To correct this, one would have to enter and exit the span before (or after) the await. Since this is error prone, tracing offers Span::in_scope to avoid having to call enter manually.

Distributed tracing & other goodies

tracing can be integrated into various systems that support distributed tracing. When processing of a single request is handled by multiple servers, this allows collecting and recombinating traces to get a better picture of what happened. The related crates are listed in the tracing docs, but not discussed here.

Under the hood

Since tracing is open source, we can do a little digging to gain a deeper understanding into what at the time of this writing was version 0.1.25:

What happends when you call `debug!`

The debug! macro is a good representative for all its siblings (info!, …) and roughly works as follows:

It first expands to the event! macro with the Level::DEBUG.
In event!, STATIC_MAX_LEVEL is checked to allow for compile-time filtering of events.

Then the potential setup of the callsite commences:

A static⁴ instance of a MacroCallsite (these types live in a quasi-hidden⁵ module) is created.
Each of these callsites encodes metadata like file, line number, level and fields. Each time the program comes across the call site for the first time, it is registered.
Registration means that the callsite is added to a global registry (more on that later).
A thing called “interest” is determined. This allows an indication of whether or not the callsite needs to be checked at all in the future (it can be “sometimes”, “always” or “never”).

One must remember that statics are essentially evaluated only once, thus the registration process runs only once as well, but leaves a permanent callsite behind in memory that can now be used.

With a callsite in place, an event is constructed, containing a parent, a reference to the callsite metadata and a valueset with all fields, i.e. attributes.
The resulting event is dispatched to a subscriber⁶ (see below) via the Subscriber::event method and handled.

What can I actually put into an attribute?

Following the macro, we can answer the question of what we can actually record on an event, e.g. when calling info!:

info!(foo=bar, "some event")

Here foo is a field name and bar a value. These are captured in a ValueSet, which has the following non-public implementation:

/// A set of fields and values for a span.
pub struct ValueSet<'a> {
    values: &'a [(&'a Field, Option<&'a (dyn Value + 'a)>)],
    fields: &'a FieldSet,
}

At first this seems fairly open, after all, we only need to implement Value for our custom types we would like to pass as attribute values. However Value is just dispatching to field::Visit, which has a pretty narrow set of values it can handle: i64, u64, bool, str or error.

tracing offers a shortcut for converting values to strings though in % and ? for convenience.

What is this registry and subscriber business?

Inside tracing, there is a single place where a static Subscriber registry is allocated. A subscriber is the entity that actually does something with an event, such as formatting it and printing it to stderr.

A very important facet is that spans are created by the subscriber as well. The debug_span! macro works very similar to its debug! kin, except that the subscriber has to generate a unique ID for the span. Whenever the span is entered or exited, only its ID is passed to Subscriber::enter/Subscriber::exit.

Performance benefits aside, this has two important effects:

Typically only one subscriber active at the same time, since a unique ID has to be generated per Span. This is not possible if there are competing subscribers.
It is up to the subscriber to track which Span is currently active. Common implementations typically track this via a thread local, which causes the issues with async described above.

Why does `tracing_subscriber` exist?

It may seem strange that there is no usable subscriber bundled directly with tracing, but does make more sense when one considers that the crate might be used in less common circumstances:

A Rust program compiled to web-assembly might solely use a browser’s console.log for output, not even allocating additional memory when doing so.
On a baremetal embedded system, a subscriber is not likely to have thread locals available.
If task locals were supported on an async runtime, it is conceivable that a subscriber might be written that does not contain the async issues mentioned earlier.

Additionally, tracing_subscriber offers additional functionality in Layers. This works directly around the issue that a subscriber must provide a unique ID for each created span. Layers allow composition, ultimately deferring to a single core for ID generation.

The more common use case is likely to leverage tracing_subscriber::fmt and the resulting builder to create a subscriber though.

Conclusion

The tracing crate is a powerful replacement for any existing logging setup and ties in well with modern observability patterns, at the cost of a higher initial learning curve. Any Rust developer with a grasp of its core concepts and some implementation details will find it a powerful tool in their toolbox though. Hopefully, this brief tour will provide a good start to all that have not braved the official crate documentation yet.

If you want to dive even deeper, many concepts stem from the Google “Dapper” paper. ↩︎
I counted: Start at the top of the docs’ index of tracing, follow the first mention of tracing_subscriber, generously assuming you understand that it is the correct crate. From there on, read tracing_subscriber’s docs until EnvFilter appears and the hopefully notice how to create a basic instance. ↩︎
The fact that rayon is magic and allowed us to paralllize our application by simply substituting .iter with .par_iter never fails to put a smile on my face :). ↩︎
As a reminder, static values in Rust are very similar to consts, except that they will always have a memory address. This will be important later. ↩︎
It is hidden from the docs and has an off-putting name in tracing::__macro_support, but due to how macros work, it cannot actually be non-pub. ↩︎
Technically it goes through a dispatcher first, but this seems to be a minor implementation detail. ↩︎