On Rust

This is one of a series of posts on programming languages; you can read more about that here.

Back in 2013, I started a series of posts on programming languages I found interesting. One of the languages I wanted to write about at that time was Rust. As often happens, life got in the way, and it’s only now, in the twilight of 2018 I’m coming round to a long overdue post.

A disclaimer - I have no production experience with Rust. That said, the language seems very much designed for production use and makes some impressive engineering decisions in that regard. So as for the question, “is Rust production-ready?”, the answer to my mind is yes and any challenges the language has lie elsewhere.

What’s to like?

Syntax, Types & Functions

Syntactically Rust is an Algol, making it familiar to many programmers. Like Go and Java, it provides very limited flexibility on syntax, which simplifies maintenance and comprehension of existing code. As with Scala and again Go, declaration order of types and names is reversed relative to C++ and Java, which improves comprehension ('a of Int' is easier than 'an Int named a'). Rust has expressions and statements - expressions evaluate to a value, statements (which can contain expressions) are terminated using semicolons and typically used to assign to variables.

Rust compiles to an executable avoiding dynamically linked dependencies (similar to Go and Haskell). The compiler uses LLVM for optimisations and code generation. Notably Rust is not a garbage collected runtime. Its memory management is deterministic, akin to C and C++, but it works differently to either, demanding strong controls over memory allocation and references (we’ll cover this in more detail later on). The lack of a heavy runtime means startup times are good, roughly in the same ballpark as C++, Haskell or Go. Rust has hygienic macros, both for creating functions and annotations (the `println!()` function used throughout this post is an example of a function macro).

Rust has inbuilt primitive types, arrays and tuples, as distinct from what I'll loosely call custom objects. Variables are created using the let keyword, and are immutable by default. You can declare mutable variables using the syntax `let mut x = ...` Structs are used to define custom types via a C/Go-like syntax. Rust supports slices on strings and arrays, using a syntax similar to Python’s.

The type system is interesting and powerful for a language that considers itself a low level systems one. Rust eschews traditional object orientation, notably in its emphasis on traits (which are kind of like type classes), structs (which are kind of like structs) and enums (which are kind of not like enums, unless you’re coming from Swift). There's no inheritance as you'd find in languages like Java, C++, Python and Ruby, instead types tend to be mixed in together and composed.

Structs exist to define custom types using a C/Go-like syntax. Methods can be bound to structs after the fact using the `impl` keyword. Functions defined this way take a `&self` reference as the first parameter, akin to Python’s object methods. The equivalent of static methods, called associated functions can be declared by not supplying `&self`, and invoked using `::` notation rather than `.`.

    struct Foo {
        x: String,
    }

    impl Foo {
        fn method(&self) {
            println!("woo {}", self.x);
        }

        fn associated_method() {
            println!("hoo");
        }
    }

    fn main() {
        let f = Foo { x: String::from("x") };
        f.method();
        Foo::associated_method()
    }

Enums are a bit more general than structs and can be considered Rust's version of algebraic data types (or sum types). Rust allows a variance in structure for each element declared inside the enum, but which are all considered to belong to the enum’s type. Here’s an example from the Rust book where one enum is a quad of ints and the other is a string:

  enum IpAddr {
    V4(u8, u8, u8, u8),
    V6(String),
  }

Functions are as you’d expect from a modern language. Functions can take functions as arguments and return functions as results. Anonymous functions (closures) are also supported. Collections like `Vector` implement an Iterator trait, whose declared combinator methods like map/filter accept functions as arguments.

I’d be remiss to not point out the excellent Cargo tool for managing packages and organising codebases, and that Rust has unit testing support built in, which can be run easily using Cargo. I won’t debate the merits of inbuilt standard approaches versus allowing evolution via an ecosystem in this post, but suffice to say these particular inbuilts look to be in good shape.

Pattern Matching, Combinators & Error Handling

Structured types afford pattern matching via the `match` keyword. The Option type for example is an enum that can be matched over:

match my_function(something) {
    Some(thing) => thing,
    None  => "Not Found",
}

Pattern matching is exhaustive and will error where not all cases are covered by an arm (Rust lingo for a match condition). You can use the `_` underscore keyword to indicate a default match. Matches can destructure enums and structs (destructure also works for let assignments,). Conditional guards can be added to arms using `if`, as can range declarations of values using `...`. Match clauses, because they are expressions, can be combined with if (`if let`) and while (`while let`) for convenience, or simply assigned to a variable via `let`. Alternatively, you can chain things together using combinators like `map()` or `and_then()` (Rust lingo for flatmap). More generally a range of combinators can be used to elegantly process collections, or anything that implements the Iterator trait; the usual suspects like fold, zip, filter are all available. This is a pretty impressive set of capabilities, again bearing in mind we’re talking about a systems language that’s a viable alternative to C or C++.

Traits can used to indicate method signatures for a type, with the receiving type having the trait methods implemented for it.

    trait Foo {
        fn woo(&self, s: String);
    }

    struct Bar {
        x: String,
    }

    impl Foo for Bar {

    fn woo(&self, s: String) {
        println!("woo {} and {}", self.x, s);
    }

    fn main() {
        let y = String::from("y");
        let b = Bar { x: String::from("x") };
        b.woo(y);
    }

Traits can be added to existing types allowing for open extensions and abstractions created after the fact.

Rust has generics types and for functions, albeit different in style to C++ and Java. Rust at compile time will type check and then generate a function based on each type found; so while syntactically generics look (quite) like Java's, they aren't erased, ensuring runtime safety. Generics can have bounds.

Rust has learned (I would argue) from past languages by not providing exceptions - you can either panic (like Go) and exit or signify an error that could be handled using a `Result<R, E>`, which captures the successful computation (`R`) or an Error (`E`); conceptually this is similar to eithers in Haskell and Scala. The Result type is an enum, and like Option affords pattern matching. This avoids more traditional but more demanding approaches like checking return codes (eg in C) or result/error tuples (eg in Go). You have to unpack the Result in Rust, but you have elegant means in doing so.

Finally it’s worth mentioned compile time errors and the resulting quality of error messages, which is simply extraordinary. Here’s an example - this code:

    fn main() {
        let first = String::from("first");
        let next = first;
        println!("{}", first);
    }

produces this compilation error:

on-rust-error-example.png

Rust sets the standard for error messages in a compiled language. I think if I were a contributor or community member, I would feel incredibly proud of this work.

Memory Management, References & Ownership

The compiler doesn't allow null references and so no billion dollar mistakes are to be had. This is an enormous step forward for production work, and puts Rust into a small elite of languages that don’t have nulls, such as Haskell and OCaml. As well as this, Rust doesn't allow data races between threads mutating a value. To be clear, race conditions in general can happen in Rust, but this is a specific, important case that the language designs out.

Rust places significant restrictions on memory access that the programmer has to deal with in order to provide these safety properties. This is probably the most notable feature of the language relative to others, whether they’re directly memory managed like C and C++, or garbage collected like Ruby, JavaScript or the JVM and .NET families. It’s an innovative approach and worth looking at in a bit of detail.

For example, this won't compile:

    fn main() {
        let first = String::from("first"); // put a string on the heap
        let next = first; // move the string from first to next
        let next_again = first;  // can't move a second time
    }

because two variables can't both point to, or "own", the same block of heap memory (unless they point at scalars or tuples, which have a fixed size known at compile time). This is called moving in Rust and is done to provide memory safety properties. The idea is that memory allocated on the heap has one owning" reference. This also applies to function calls - passing a value to a function call reassigns ownership as does returning a value. That means this won't compile:

    fn foo(s: String) {
        println!("foo: {}",s )
    }

    fn bar(s: String) {
        println!("bar: {}",s )
    }

    fn main() {
        let first = String::from("first");
        foo(first); // move the string to foo
        bar(first); // can't move it again to bar
    }

This alternative, which returns the string from foo, will compile as it allows owner to pass back from foo into bar:

    fn foo(s: String) -> String {
        println!("foo: {}", s);
        s
    }

    fn bar(s: String) {
        println!("bar: {}", s)
    }

    fn main() {
        let first = String::from("first");
        bar(foo(first)); // 
    }

To handle common cases, Rust provides references. These are signified with an ampersand (the symmetric dereference is signified with an asterisk). References allow a variable to refer to a value without taking ownership of the value’s memory. So this example will compile:

    fn foo(s: &String) {
        println!("foo: {}", s);
    }

    fn bar(s: &String) {
        println!("bar: {}", s)
    }

    fn main() {
        let first = String::from("first");
        foo(&first); // a reference to be borrowed rather than owned
        bar(&first); // and again here
    }

Functions, when declared with a type prefixed with an ampersand (e.g., `s: &Something`) say they deal with references.When receiving the argument the function is said to borrow the reference rather than take ownership of the value it points at.

Rust has some further constructs for references, known as smart pointers such as `Box<T>` which allows you indicate the value `T` should be allocated on the heap (analogous is `Vec<T>` which is a growable array placed on the heap). The `Rc<T>` and `Arc<T>` allow references to share values (rc stands for reference counting. You can annotate a variable with a named lifetime marker, `&’` allowing you to declare the variable’s scope rather than have it inferred (e.g., `x: &’a i32` says the variable is in a lifetime called ‘a’).

Externalities

Onto some things that bother me about Rust.

The borrow checker and memory management will eat time. The cost you will pay in Rust for its safety properties is typically in what might be called “fighting with the borrow checker”. You will be patiently, cautiously trustingly, investing, in the, to be specific, allocation of memory and handing of references. Depending on what you're trying to do, this is a productivity loss, arguably even over-engineering. My sense is manual care of allocations, and subsequent dealing with the borrow checker will be the main reason people bounce off the language back to one that provides the conveniences of a garbage collected language.

You can write unsafe Rust and work around the compiler. You can leak memory by using reference counting smart pointers. Now, it’s an eminently practical decision to have escape valves in a language when you need them. That said, you will almost certainly will depend unsafe Rust via libraries, so it’s worth being aware this can be the case.

You won't have anything like the compositional properties you'd find with Haskell’s do- notation or in Scala for comprehensions and its cats/shapeless functional ecosystems and affordances. How importance this is to you, is really up to you, but combine it with the focus on memory management and I don’t think there’s any question you may feel a productivity hit coming from a typed functional language.

Compilation isn’t always quick; you pay a cost for type safety and LLVM’s benefits. This seems to be a known issue and a focus of work. Rust does have incremental compilation, and as a workaround you can use cargo check to avoid a full pass and just typecheck in a typical code/compile/test dev cycle.

Rust allows variable shadowing. This is fine:

    fn main() {
        let x = 1;
        let x = "woo";
        println!("{}", x);
    }

There are arguments to be made for this, but I can't say it's a feature I like and it sticks out like a sore thumb given the language's overall focus.

Network programming is work in progress, as are the shape of asynchronous constructs like Futures. A working group has been formed for networking, and an RFC exists for Futures, but it will be a while for them to shake out, and these seem like pre-requisites for libraries around databases and web development to flourish - the rise of Go and JavaScript (via Node) indicate the importance of batteries included for networking. More generally, Rust is early into its adoption phase and the library ecosystem is a little thin compared to more established platforms.

Conclusions

Rust I think results in something of a cognitive shift around programming. Instead of (or, as well as) thinking about threads of execution, sequences of statements, or compositions of functions, you end up thinking about references to blocks of memory being passed from owner to owner. And so while you're still manually managing memory, Rust's design is coherent and can be reasoned about. All in all, it’s very impressive and to my mind, moves the state of the art forward.

The safety properties for nulls, memory and data races alone makes Rust worth looking at. I think if you were writing something like a database, a cache, high performance middleware, or data/control plane software like sidecars and meshes seen in modern clusters, it's reasonable as of late 2018 to look at Rust as well as the usual suspects of C++, Java and Go. There’s arguably no equivalent that offers its combination of performance, safety, and expressiveness.

What Rust provides through its type system is awesome given the systems oriention of the language. No, you don't get the more powerful constructs like higher kinded types, but get a lot that’s traditionally only been available to higher level languages typically built on garbage collected runtimes, such as Haskell, Scala, TypeScript, OCaml [1]. Being able to work at a systems level but having a basis to reason about code at a high level is an outstanding engineering achievement, and I suspect a valuable one for broad classes of work. This post spent a lot of time on Rust’s memory management as that distinguishes the language from others, but the type system is likely the more important capability.

The harder question is Rust's general applicability. For example, with rank and file web/microservice development, or line of business apps, there's no question you will sink more upfront time on memory allocation and working with the borrow checker compared to a garbage collected language where those concerns are largely dealt with for you. It’s hard to argue you can throw a microservice or web app together faster on Rust than Node. On the other hand, Rust that simply compiles has a good probability of running effectively, and absent a class of issues that plague production use, like segfaults, leaks and nulls. I could see Rust making a dent in emerging spaces like machine learning, IoT/edge and crypto currency, but that would require investment. That speaks to broader points about language adoption, namely that many successful languages have either a large powerful corporate backer, a killer app, or both, to propel them. I would argue Rust has enough to be one of those languages that require neither, and am excited to see how the language progresses over the next decade.


Notes

[1] This list originally included Swift as a gc’d language, but it’s more precisely described as one using reference counting (specifically ARC, or Automatic Reference Counting). Thanks to Tim Vermeulen for the correction.