Safely opening (and closing) an SQLite database

The fundamental step of wrapping a C library for use with Rust is creating the *-sys crate and ensuring applications can link against the compiled C sources. But what to do after all the library’s functions and types have been exposed through unsafe FFI wrappers? We will have a look at this through the example of wrapping the SQLite3 library, for which the sys-crate already exists.

Our goal will be to create the minimal wrapping crate that, with as little overhead as possible, makes working with SQLite feel “rusty”, i.e. ergonomic and familiar for using other canonical source code. We will first start with a simple working version, then gradually refine our abstractions.

Opening and closing databases

SQLite is centered around an opened database “object”, which in typical fasion use a function to allocate/create/open the resource (sqlite3_open), and another one to close and free it (sqlite3_close). In Rust, we want to use RAII to ensure that these things happen automatically.

Looking at the sqlite3_open first, we see the following signature:

int sqlite3_open(
  const char *filename,   /* Database filename (UTF-8) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);

which has already been translated for us in the libsqlite3-sys crate as

pub unsafe extern "C" fn sqlite3_open(
    filename: *const c_char,
    ppDb: *mut *mut sqlite3
) -> c_int

Whenever dealing with external C functions, it is extremely important to pay close attention to the documentation¹, which first tells us that filename is a UTF-8 C-style² string and that ppDb is an output parameter, a pointer to a pointer, used to “return” a pointer this way. The use of output parameters for actual produced values is typical³ for C-style APIs, leaving the return value for an integer-based error code.

We also consider memory ownership: The SQLite library allocates the memory for ppDb itself, writing out a pointer, which it expects to be passed back for freeing once we are done using it. In contrast, there is no mention of taking ownership of the passed in string filename, we are responsible for allocating and deallocating it and assume that SQLIte does not reference it after the call to sqlite3_open returns.

The “object” we are handling is of type (libsqlite3-sys::)sqlite3, although the type is irrelevant for us, as we are only ever holding on to a pointer to it. The documentation specifies that ppDb may hold an allocated database even if opening failed with an error, so we will need to check it for a NULL value.

Error handling

Before we can write the function, we need to consider how SQLite reports results. Fallible operations return a result code as a c_int:

There are only a few non-error result codes: SQLITE_OK, SQLITE_ROW, and SQLITE_DONE. The term “error code” means any result code other than these three.

There is no additional data attached to an error code, so we can keep it compact and derive string representations using const char *sqlite3_errstr(int);, which given an error code, returns a pointer to a string managed entirely by SQLite⁴.

/// An SQLite error.
#[derive(Debug)]
pub struct Error {
    /// Error code returned by an SQLite function.
    code: c_int,
}

impl Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        // Note: The returned string pointer is entirely managed by SQLite
        //       and guaranteed a 'static lifetime.
        let c_str = unsafe { CStr::from_ptr(libsqlite3_sys::sqlite3_errstr(self.code)) };
        let err_str = c_str
            .to_str()
            .expect("SQLite error string contained invalid UTF-8 characters");

        f.write_str(err_str)
    }
}

To ensure the unsafe block is safe, we need to know how long the returned string pointers live, a quick inspection of the source (sqlite3ErrStr()) shows that they are static strings that we need (and must) not free. We assume the .expect() is never triggered, in case our assumption ever turns out wrong, we will find out quickly. The function should only be called with actual error codes (as opposed to arbitrary c_int values), however we can see that the lookup function guards against invalid values by doing array bounds checking.

The missing piece is a function to determine if a function call failed:

/// A positive outcome of an SQLite operation.
#[derive(Debug)]
enum SqliteSuccess {
    /// No error occurred.
    Ok,
    /// Another row of output is available.
    Row,
    /// Operation completed.
    Done,
}

/// Given an SQLite result code, convert to result.
fn parse_result_code(code: c_int) -> Result<SqliteSuccess, Error> {
    match code {
        libsqlite3_sys::SQLITE_OK => Ok(SqliteSuccess::Ok),
        libsqlite3_sys::SQLITE_ROW => Ok(SqliteSuccess::Row),
        libsqlite3_sys::SQLITE_DONE => Ok(SqliteSuccess::Done),
        _ => Err(Error { code }),
    }
}

Implementing `SqliteDatabase`

We can finally write the opening part to our implementation:

/// An opened SQLite database.
#[derive(Debug)]
struct SqliteDatabase {
    /// Pointer to the SQLite allocated database.
    db: *mut libsqlite3_sys::sqlite3,
}

SqliteDatabase itself is a thin wrapper⁵ around the sqlite3 pointer, for which we implement neither Copy nor Clone. It also is not Send or Sync, all of which we will have to consider later down the line.

Creating such a type may seem a small improvement at first, but it is important to define the invariants we now want to uphold. At the top is memory safety, nothing we do with SqliteDatabase through safe Rust should be able to have any adverse on the memory safety of our program. We also enforce that the type always holds an openened database, and tie the lifetime of the database in memory to the lifetime of our SqliteDatabase object.

We start by wrapping sqlite3_open():

impl SqliteDatabase {
    /// Opens up a new database instance.
    ///
    /// # Panics
    ///
    /// Will panic if a `filename` containing a NUL byte is given, or if freeing a database
    /// after opening failure does not succeed.
    pub(crate) fn open(filename: &str) -> Result<Self, Error> {
        let filename_c = CString::new(filename).expect("cannot handle filename with NUL byte");

        let mut out_ptr = ptr::null_mut();
        let outcome = parse_result_code(unsafe {
            libsqlite3_sys::sqlite3_open(filename_c.as_ptr(), &mut out_ptr)
        });

        match outcome {
            Ok(_success) => Ok(SqliteDatabase { db: out_ptr }),
            Err(err) => {
                if !out_ptr.is_null() {
                    // Even though opening failed, a database object has still been allocated.
                    if unsafe { libsqlite3_sys::sqlite3_close(out_ptr) }
                        != libsqlite3_sys::SQLITE_OK
                    {
                        panic!("failed to close sqlite3 db after error during open");
                    }
                }

                Err(err)
            }
        }
    }
}

Our first step is to make a passed in &str for the filename suitable for SQLite. The documentation explicitly mentions that const char* filename needs to be UTF-8 encoded, so we are spared from having to work with Rust’s OsString string types. We still need to convert it into a C-style zero-terminated string, which involves some reallocation⁶ and can possibly fail, thus we note the resulting panic.

It is notable that we made a decision to restrict the input of filename: &str here in a way that is not expressed in the type system, namely we will not accept a NUL byte in filename. Panicking in this case is a valid design choice even in systems that need to be extremely reliable, as it is similar to failure of a bounds-checked array access. The caller can always ensure that no panic happens by ensuring that there are no NUL bytes present, provided they read the documentation.

After allocating a new NULL pointer on the stack we make our call to sqlite3_open() and review the safety of the unsafe block: We ensured that all input parameters to the call are valid, and no other documented invariants that are not expressible in the C function signature are potentially violated. The return value is guaranteed to be a valid SQLite result code, so we can run it through the helper function defined earlier and end up with a nicer Rust type.

While we would like to use a ?-operator next and exit early, we have to take the fact that out_ptr may contain a valid database pointer even in the case of failure into account, so we conditionally clean it up in the case of an error. The second potential panic here differs from the first one in that it is not avoidable by the caller and would not be permissable in a library used in cases where high reliability is absolutely required (i.e. the program must never crash). To avoid this, we would have to extend our error type or swallow the error silently, which we will avoid for simplicity reasons in this example.

All that’s left is returning our newly constructed object, after which we can set our sights on the accompanying Drop implementation:

impl Drop for SqliteDatabase {
    #[inline]
    fn drop(&mut self) {
        let db = self.db;
        self.db = ptr::null_mut();

        // Safety: It is permitted and harmless to call `sqlite3_close` with a NULL pointer.
        if unsafe { libsqlite3_sys::sqlite3_close(db) } != libsqlite3_sys::SQLITE_OK {
            panic!(
                "failed to close sqlite3 db due to unclosed resources, this should never happen"
            );
        }
    }
}

The first step is to remove the sqlite3 pointer from self, setting self.db to zero, before passing it to sqlite3_close. This a bit of defensive programming⁷, as it ensures that we only ever call sqlite3_close() with the actual pointer once, avoiding a potential double free. Technically there is no need to do this, since the Rust compiler should guard against multiple calls of the destructor (calling it explicitly will result in a compiler error) but some of this code may end up being moved around out the destructor due to refactoring later on. We also would need to find a specification stating it is impossible to deliberately or accidentally end up calling Drop::drop more than once, which is not easy to come by at the moment⁸.

Ensuring that only valid inputs are provided to sqlite3_close without relying on this exactly-once guarantee would also require the following section at the start of the function:

        if self.db.is_null() {
            return;
        }

This is to ensure we never call sqlite3_close with a NULL pointer, which under ordinary circumstances might result in undefined behavior.

All these conditions rely on the following invariants:

An existing SqliteDatabase will always contain a valid sqlite3 pointer before Drop::drop is called. This means a non-NULL value, we are not handling the case of a random value popping up there.
Drop::drop is only called once.

The drawback is that these conditions are not local. By putting the NULL check at the top and setting the variable to NULL afterwards, at the small expense of a single check and assigment outside the hot path, we obviate the need for these invariants elsewhere, the code will be safe without depending on outside conditions. A reviewer of that code section can conclude that the following unsafe call is actually safe just from looking at the function alone.

Due to SQLite stating in its documentation⁹ that passing NULL to sqlite3_close is permissible and harmless, this self.db.is_null() check is omitted in the code, and replaced by the comment, which fulfills the same purpose as the check¹⁰.

Before returning we have to consider the outcome of the operation, which may return a value that is not SQLITE_OK. In general it is not a good idea to panic in destructors, as it will prevent other destructors from running. We can have a look at what causes a non-OK value to be produced:

Ideally, applications should finalize all prepared statements, close all BLOB handles, and finish all sqlite3_backup objects associated with the sqlite3 object prior to attempting to close the object. If the database connection is associated with unfinalized prepared statements, BLOB handlers, and/or unfinished sqlite3_backup objects then sqlite3_close() will leave the database connection open and return SQLITE_BUSY.

As long as we do not leave any resources lying around before closing, we will never encounter a failed close attempt. We have picked up another invariant at this point, which we must be mindful of enforcing through the type system and/or borrow checker later on. The “this should never happen” part of the .expect() call is thus as essential as the earlier comment; if we ever encounter this message, we have found a bug in our code ¹¹.

A simple smoke test

At this point, we can run the simplest of smoke tests:

#[test]
fn open_and_close() {
    let db = SqliteDatabase::open(":memory:").expect("could not open db");
    drop(db);
}

The explicit drop(db) exists to make it clear what the point of the test is, if it was removed, we should put an equivalent comment there.

The only thing to point out here is that it is almost always wrong to “try out” code by hand instead of writing a test instead, as the latter stay around and can be repeatedly run at almost no cost.

Conclusion

Just opening and closing an SQLite database comes with a surprising amount of things to keep track of that are not expressed in the C API itself. With RAII and Rust’s type system, we can carefully move a large portion of these over the compiler, making it a lot easier to correctly use the library. While merely generating the FFI code for the C library is often straightforward, creating an idiomatic wrapper usually involves a deep dive not just into the documentation of the library being wrapped, but often its actual source code, but the reward is a much easier to use product for the actual user of the crate.

With this in place, we can do the preparatory work to ultimately run our first query.

If good documentation is available, sometimes the only recourse is to read the source. Thankfully, SQLite has excellent, precise and concise documentation. ↩︎
C-style string refers to a zero-terminated string here. ↩︎
e.g. most POSIX APIs work the same way. ↩︎
We would get a better error message by using const char *sqlite3_errmsg(sqlite3*);, which is able to build a string with more context, as it has access to the sqlite3 object involved, but this requires careful management of how the function is called. At this point, we leave for later. ↩︎
We could make this a newtype, but naming the parameter db makes the code a bit more readable when we are passing the pointer to various methods later on. ↩︎
There are more efficient methods available for converting a string to a C-style string that in many cases may avoid a new string allocation at the cost of some interface ergonomics, but since open is probably called quite infrequently compared to other methods, there is little gain in optimizing this little corner. ↩︎
I tried to find a good definition for defensive programming, but they are a little hard to come by on the web. To me, it includes the combination of these two:

Defensive programming is an approach in which the programmer assumes that there may be undetected faults or inconsistencies in code.

– Jean-Louis Boulanger, Certifiable Software Applications 1

Defensive programming is programming that assumes the caller will do something wrong.

– Shane Cook, CUDA Programming

↩︎
C++ will happily let you call destructors multiple times, for example. ↩︎
The actual SQLite 3 code performs this check in the first line of the function’s source and states that it is safe and harmless to pass NULL as a parameter. An interesting tidbit is that guarantees/requirements found in the documentation are linked to specific sections of the code through a requirements matrix, this case listed as R-63257-11740:
```
static int sqlite3Close(sqlite3 *db, int forceZombie){
if( !db ){
    /* EVIDENCE-OF: R-63257-11740 Calling sqlite3_close() or
    ** sqlite3_close_v2() with a NULL pointer argument is a harmless no-op. */
    return SQLITE_OK;
}
```
↩︎
This is a good example of an essential comment, it is not merely descriptive but “moves” the guarantee for correctness from external source to the local scope. ↩︎
An alternative would be to use the sqlite3_close_v2 function, which will attempt to keep the database around in a “zombie” state until all the mentioned resources are freed and always return SQLITE_OK, but this would only hide bugs in our implementation. ↩︎