The tracing
crate is (much more than) a replacement of the classic log
crate, that shares ideas with other tracing systems, with a large set of documentation1. Unfortunately, due to their bottom-up nature, it takes a little over 3500 words2 to get to the point where you are able to get started, so for the impatient, here is how to use tracing
in traditional log
style:
# Cargo.toml
# ...
[dependencies]
tracing = "0.1.25"
tracing-subscriber = "0.2.17"
//! main.rs
use tracing::debug;
use tracing_subscriber;
fn main() {
tracing_subscriber::fmt()
.with_max_level(tracing::Level::TRACE)
.init();
debug!("Hello, world!");
}
This sets a fixed log level of TRACE
and makes tracing work like a good drop-in replacement for log
in most projects. Want to use the popular env_logger
environment variable style a.k.a. RUST_LOG
? That is supported out of the box, just substitute
tracing_subscriber::fmt()
.with_env_filter(tracing_subscriber::EnvFilter::from_default_env())
.init();
for the fmt
call above.
Reasons for using tracing
over log
A legitimate question is “Why would I even want to use this when log
seems to do the job just fine?”. To answer it, we look at the advanced features tracing
offers:
Spans
Spans enclose a time period during which data is logged, forming a context. This may sound a little highbrow, but is quickly illustrated with an example such as handling a job in a worker:
The job handling has a clear start and end — once the worker begins processing a job n
, it enters a context of “processing job n
”. After it finishes, regardless of the outcome, it exits it.
We can put this into some naive simple code, extending our initial boilerplate from earlier, which we will improve upon later:
// This example still has a lot of room for improvement!
struct Job(i32);
impl Job {
fn run(&self) {
info!("running job: {}", self.0);
let span = info_span!("job processing");
let _entered = span.enter();
info!("processing step");
subtask();
// _entered is dropped here, causing a span exit.
}
}
Spans might even be nested: If a job has another subtask that is delegated to a different part of the application, e.g. like running cleanup, we still want to track the fact it was called as a result of processing a particular job.
fn subtask() {
let span = info_span!("subtask processing");
let _entered = span.enter();
info!("processing subtask, outcome: {}", 42);
}
If we run this on 100 sequential jobs,
let jobs: Vec<_> = (0..100).map(|n| Job(n)).collect();
jobs.iter().for_each(Job::run);
this is the output:
...
Mar 14 17:00:33.427 INFO tracingexample: running job: 71
Mar 14 17:00:33.427 INFO job processing: tracingexample: processing step
Mar 14 17:17:38.835 INFO job processing:subtask processing: tracingexample: processing subtask, outcome: 42
Mar 14 17:00:33.427 INFO tracingexample: running job: 72
Mar 14 17:00:33.427 INFO job processing: tracingexample: processing step
Mar 14 17:17:38.835 INFO job processing:subtask processing: tracingexample: processing subtask, outcome: 42
Mar 14 17:00:33.427 INFO tracingexample: running job: 73
...
Note that the subtask
span is nested under job processing
; should we call subtask
from elsewhere, this would not be the case.
Attributes
The calls to log!
have in common that they all create an event, which is the tracing
term for what we would otherwise call “log entries”. Both events and spans can made to carry attributes, which saves us from having to stringify all our values:
# in `subtask()`:
info!(outcome = 42, "processing subtask");
# Attributes go to the front on events, end in spans.
# Job::run
fn run(&self) {
let span = info_span!("job processing", id = self.0);
let _entered = span.enter();
info!("processing step");
subtask();
}
We also do not need to indicate which job we are processing, that information is carried by the span itself. Our output looks cleaner and more structured:
Mar 14 17:21:06.127 INFO job processing{id=71}: tracingexample: processing step
Mar 14 17:21:06.127 INFO job processing{id=71}:subtask processing: tracingexample: processing subtask outcome=42
Mar 14 17:21:06.127 INFO job processing{id=72}: tracingexample: processing step
Mar 14 17:21:06.127 INFO job processing{id=72}:subtask processing: tracingexample: processing subtask outcome=42
tracing
supports other output formats like JSON, in which this pattern really shines.
Parallel execution
At the same we also solved another issue we did not even encounter yet: Parallel execution. Turning our linear job execution in a non-deterministically ordered parallel one using rayon
3,
jobs.par_iter().for_each(Job::run);
we can see that it is still possible to associate each log entry with a job, even though they are out of order:
Mar 14 17:26:25.136 INFO job processing{id=71}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=94}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=65}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=76}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=88}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=66}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=13}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=69}:subtask processing: tracingexample: processing subtask outcome=42
Mar 14 17:26:25.136 INFO job processing{id=28}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=64}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=38}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=63}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=32}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=51}:subtask processing: tracingexample: processing subtask outcome=42
Mar 14 17:26:25.136 INFO job processing{id=15}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=19}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=84}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=21}: tracingexample: processing step
Mar 14 17:26:25.136 INFO job processing{id=71}:subtask processing: tracingexample: processing subtask outcome=42
Without the spans, there would be zero chance to associate the correct log message with a particular job.
Instrumentation
Tracing comes with a set of tools that make life even easier, allowing for some of the boilerplate-ish code to be removed. A good example is #[instrument]
, which can be used to quickly create a span for a function:
impl Job {
#[instrument("process job",
skip(self),
fields(id=self.0),
level=Level::DEBUG)]
fn run(&self) {
info!("processing step");
subtask();
}
}
#[instrument("subtask processing",
level=Level::DEBUG)]
fn subtask() {
info!(outcome = 42, "processing subtask");
}
We had to put a little effort in to make the output exactly the same as before, if you are content with the defaults, the annotations become shorter.
async
users beware
There is an important hazard lurking here: As we will cover later, tracing registers the current span on a thread local. Some async runtimes (e.g. tokio
) often move tasks around among different threads, which breaks the assumption that a span will be exited in the same thread it was entered.
To alleviate this, Instrument::instrument()
is available,
my_future.instrument(info_span!("example")).await
, which behind the scenes will enter and exit a span every time the future is polled. The #[instrument]
macro is clever enough to do this by itself if called on an async
function.
This also means that holding a Span
across an await
point will cause incorrect traces!
let _enter = span.enter();
call_an_async_function().await; // Yielding here.
// At this point, our trace will be wrong.
To correct this, one would have to enter and exit the span before (or after) the await
. Since this is error prone, tracing
offers Span::in_scope
to avoid having to call enter
manually.
Distributed tracing & other goodies
tracing
can be integrated into various systems that support distributed tracing. When processing of a single request is handled by multiple servers, this allows collecting and recombinating traces to get a better picture of what happened. The related crates are listed in the tracing
docs, but not discussed here.
Under the hood
Since tracing
is open source, we can do a little digging to gain a deeper understanding into what at the time of this writing was version 0.1.25
:
What happends when you call debug!
The debug!
macro is a good representative for all its siblings (info!
, …) and roughly works as follows:
- It first expands to the
event!
macro with theLevel::DEBUG
. - In
event!
,STATIC_MAX_LEVEL
is checked to allow for compile-time filtering of events.
Then the potential setup of the callsite commences:
- A
static
4 instance of aMacroCallsite
(these types live in a quasi-hidden5 module) is created. - Each of these callsites encodes metadata like file, line number, level and fields. Each time the program comes across the call site for the first time, it is registered.
- Registration means that the callsite is added to a global registry (more on that later).
- A thing called “interest” is determined. This allows an indication of whether or not the callsite needs to be checked at all in the future (it can be “sometimes”, “always” or “never”).
One must remember that static
s are essentially evaluated only once, thus the registration process runs only once as well, but leaves a permanent callsite behind in memory that can now be used.
- With a callsite in place, an event is constructed, containing a parent, a reference to the callsite metadata and a valueset with all fields, i.e. attributes.
- The resulting event is dispatched to a subscriber6 (see below) via the
Subscriber::event
method and handled.
What can I actually put into an attribute?
Following the macro, we can answer the question of what we can actually record on an event, e.g. when calling info!
:
info!(foo=bar, "some event")
Here foo
is a field name and bar
a value. These are captured in a ValueSet
, which has the following non-public implementation:
/// A set of fields and values for a span.
pub struct ValueSet<'a> {
values: &'a [(&'a Field, Option<&'a (dyn Value + 'a)>)],
fields: &'a FieldSet,
}
At first this seems fairly open, after all, we only need to implement Value
for our custom types we would like to pass as attribute values. However Value
is just dispatching to field::Visit
, which has a pretty narrow set of values it can handle: i64
, u64
, bool
, str
or error
.
tracing
offers a shortcut for converting values to strings though in %
and ?
for convenience.
What is this registry and subscriber business?
Inside tracing
, there is a single place where a static Subscriber
registry is allocated. A subscriber is the entity that actually does something with an event, such as formatting it and printing it to stderr
.
A very important facet is that spans are created by the subscriber as well. The debug_span!
macro works very similar to its debug!
kin, except that the subscriber has to generate a unique ID for the span. Whenever the span is entered or exited, only its ID is passed to Subscriber::enter
/Subscriber::exit
.
Performance benefits aside, this has two important effects:
- Typically only one subscriber active at the same time, since a unique ID has to be generated per
Span
. This is not possible if there are competing subscribers. - It is up to the subscriber to track which
Span
is currently active. Common implementations typically track this via a thread local, which causes the issues withasync
described above.
Why does tracing_subscriber
exist?
It may seem strange that there is no usable subscriber bundled directly with tracing
, but does make more sense when one considers that the crate might be used in less common circumstances:
- A Rust program compiled to web-assembly might solely use a browser’s
console.log
for output, not even allocating additional memory when doing so. - On a baremetal embedded system, a subscriber is not likely to have thread locals available.
- If task locals were supported on an async runtime, it is conceivable that a subscriber might be written that does not contain the
async
issues mentioned earlier.
Additionally, tracing_subscriber
offers additional functionality in Layer
s. This works directly around the issue that a subscriber must provide a unique ID for each created span. Layers allow composition, ultimately deferring to a single core for ID generation.
The more common use case is likely to leverage tracing_subscriber::fmt
and the resulting builder to create a subscriber though.
Conclusion
The tracing
crate is a powerful replacement for any existing logging setup and ties in well with modern observability patterns, at the cost of a higher initial learning curve. Any Rust developer with a grasp of its core concepts and some implementation details will find it a powerful tool in their toolbox though. Hopefully, this brief tour will provide a good start to all that have not braved the official crate documentation yet.
-
If you want to dive even deeper, many concepts stem from the Google “Dapper” paper. ↩︎
-
I counted: Start at the top of the docs’ index of
tracing
, follow the first mention oftracing_subscriber
, generously assuming you understand that it is the correct crate. From there on, readtracing_subscriber
’s docs untilEnvFilter
appears and the hopefully notice how to create a basic instance. ↩︎ -
The fact that rayon is magic and allowed us to paralllize our application by simply substituting
.iter
with.par_iter
never fails to put a smile on my face :). ↩︎ -
As a reminder,
static
values in Rust are very similar toconst
s, except that they will always have a memory address. This will be important later. ↩︎ -
It is hidden from the docs and has an off-putting name in
tracing::__macro_support
, but due to how macros work, it cannot actually be non-pub
. ↩︎ -
Technically it goes through a dispatcher first, but this seems to be a minor implementation detail. ↩︎