More advanced aspects of pattern matching in Rust

`ref` keyword, match ergonomics, box patterns, references patterns and other interesting stuff

Introduction

In Haskell pattern matching is mostly nice and easy. It might be made more complicated by strictness annotations (i.e. whether to evaluate sub-patterns lazily or strictly) or irrefutability annotations, but it doesn't have any special interactions with ownership, borrowing and mutability which are bread and butter of Rust. So, naturally, pattern matching in Rust has additional complexity which I wanted to investigate here. Interestingly enough, some of this material is not covered by current edition of the Rust Book.

The plan of attack is as follows:

  • Reference patterns: things like let &x = &1

  • ref keyword: what's the difference between using Some(ref x) => and Some(&x) => in patterns? Maybe no difference?

  • So called "match ergonomics": what Rust compiler does under the hood to make our lives easier? (and maybe less predictable?)

  • Box patterns: things like box (x, y) => in patterns

  • Other useful auxiliary notes: binding modes, patterns (ir-)refutability, auto dereferencing, printing types for debugging purposes, etc.

Reference patterns

As the Rust Book tells us, match is not the only place where pattern matching occurs. It also happens in many other places: let, if let, while let, function arguments, etc. So, even a simple let x = 1 exhibits pattern matching: we match 1 against pattern x. This is called identifier pattern and it never fails, hence it is called irrefutable pattern. We can use only irrefutable patterns with let. If we try something like let Some(x) = Some(1) it will fail, because we don't cover all the cases. A pattern which may fail is called refutable pattern. I am reminding you about those let patterns here just because it's easier to use them in examples rather than fully fledged match even tough such examples may feel more contrived.

Let's move to the next class of patterns called reference patterns. It's a pattern starting with one or two & like this: let &y = &1. After that we can use y and its value would be 1, no surprises:

fn test() {
let x = 1;
print_type_of("x", &x);
println!("x: {}", x);
let &y = &1;
print_type_of("y", &y);
println!("y: {}", y);
}

When run it prints:

type of x: i32
x: 1
type of y: i32
y: 1

A quick aside: here I am using a little helper for printing types which uses std::any::type_name::<T>() function from stable Rust. This is useful for investigating pattern matching, auto-dereferencing and other situations when compiler may do something strange and the type might not be clear from context:

fn print_type_of<T>(msg: &str, _: &T) {
println!("type of {}: {}", msg, std::any::type_name::<T>())
}

Ok, back to reference patterns. When do we use this kind of a pattern? It's used for dereferencing the pointers which are being matched. Or you can say that they are used to "destructure" the pointer. For example let's look at a common idiom for mapping a function with an iterator: xs.iter().map(|&x| x + 1).collect() Note how we use &x pattern here to dereference (deconstruct) the pointer returned by the iterator. We can also use map(|x| *x + 1) for that, making dereference explicit, but I'd prefer the first variant because one can immediately see from &x pattern that a parameter is a reference without looking at its usage. I've run a quick grep over Rust standard library looking for iter/map combination and it feels like the first variant is more popular when we need to dereference, but it's probably a matter of taste.

An exercise for the reader: what error message do you expect if you try let &x = 1?

Binding modes and `ref` keyword

When we pattern match some expressions against patterns, for example let x = y we have several options with respect to binding modes.

A binding mode determines how values are bound to identifiers in patterns. The default binding mode is to move, i.e. we just move those values: y is moved into x and we can't use it again. If y has Copy then we copy. Those two binding modes are normally called binding by value. There is also binding by reference, which is introduced with ref keyword attached to an identifier pattern like this: let ref x = y. This means that we want to borrow y instead of moving/copying it, so x will be a reference. It is exactly the same as let x = &y. For mutable reference there is ref mut, i.e let ref mut x = y which is the same as let x = &mut y.

A thing to remember. Those two statements are identical:

let ref x = y;

let x = &y;

Now the question is: if ref is just a funky way of saying that you want to borrow, why is it needed? Is just using & not enough? ref becomes more useful when used in nested patterns.

When I first saw ref in Rust code, I thought that it was an antiquated way of borrowing, a remnant from previous Rust versions. It is indeed so, mostly because of the feature called "match ergonomics" which we look into in the next section. This feature mostly obsoleted the need for ref and so in modern Rust using ref is rarely needed, but no doubt you'll see it a lot in existing codebases. I think it's easy to understand it better by contrasting with other patterns. Let's say we pattern match on an Option with match

Then the difference between Some(ref y) => and Some(&y) => in patterns is that the first borrows whatever is in Some and the second dereferences the pointer in Some (remember that we've already covered reference patterns in previous section).

The difference between Some(ref y) => and some Some(y) => is that in the first case we borrow and in the second case we move/copy into y.

Match ergonomics

In older versions of Rust (up to 1.26 which was released in May 2018) when you had a reference to Option and wanted to pattern match against it, your life was hard. You had two choices:

  • Use reference patterns and ref because: a) you needed to match references with reference patterns b) you had to use ref because you couldn't move part of Option into s, since you only had a borrowed version of Option:

fn f(x: &Option<String>) {
match x {
&Some(ref s) => // do something with `s`
&None => {}
}
}
  • Dereference the Option pointer and still use ref for the same reasons as in (1).

fn f(x: &Option<String>) {
match *x {
Some(ref s) => // do something with `s`
None => {}
}
}

Both of those choices were unsatisfactory and too verbose for such a common operation. Hence, RFC-2005 "Match Ergonomics" has been proposed and implemented. It allowed one to just write the following code, without ref and reference patterns:

fn f(x: &Option<String>) {
match x {
Some(s) => // do something with `s`
None => {}
}
}

It works like this: it sees that x is a reference which is matched by non-reference patterns. Therefore it automatically dereferences x and uses appropriate binding modes for identifiers in that pattern, in this case "bind by reference". In other words, it automatically inserts ref for you, since it's the only sensible thing to do in this situation.

You can learn more details in the original RFC, it's amazingly readable and has nice motivating examples and links (actually I've stolen my example from there). Here is a memorable quote from that RFC:

Match expressions are an area where programmers often end up playing 'type Tetris': adding operators until the compiler stops complaining, without understanding the underlying issues. This serves little benefit - we can make match expressions much more ergonomic without sacrificing safety or readability.

Case study: tree traversal

Let's start with a case study which actually led me to writing this note. Say I want to implement a data structure for full binary trees (i.e. each node has either 0 or 2 children) and traverse it recursively. Of course I'll need pattern matching for that! Armed with our knowledge from previous sections it now seems like an easy task:

struct T {
data: u8,
children: Option<Box<(T, T)>>,
}
fn traverse(s: &T) {
match &s.children {
None => {}, // TODO: do something
Some(ch) => {
traverse(&ch.0);
traverse(&ch.1);
}
}
}

I'll additionally comment on some relatively subtle points to make sure we understand what's going on in this example.

First, what is the type of s.children? It is a little bit non-obvious, since s is a reference and the we have a field accessor. Knowing that Rust likes to do things automatically for us, we may suspect that something "auto-" is happening here. Indeed, here we have an example of auto-dereferencing. If the left hand side of . operator is a pointer, it's dereferenced as many times as needed to get access to the field, as described here. So basically s.children is equivalent to (*s).children. And therefore its type is Option<Box<(T, T)>>.

Next, since we only have a reference to T and can't move out of it (and don't want to) we need to match a reference to s.children (as opposed to s.children itself) to kick-in the match ergonomics process. Note that we don't need & before None and Some or ref keyword.

Then what is the type of ch when it is bound? Remember that we bind by reference here, so it is &Box<(T, T)>. You can easily check it with print_type_of function listed above.

Now we want to recurse and to get access to our tuple which is hidden behind two references (& and Box). Auto-dereferencing to the rescue! We can just say ch.0 and this will add ** automatically. Finally, we need to borrow that with &: this operator has lower precedence than field accessors, so &a.b is identical to &(a.b) In order to appreciate how much "auto-" is happening here let's look at the fully parenthesised unambiguous analog of &ch.0 (it even broke my Markdown renderer!):

&((**ch).0)

Box patterns

Finally, let's look if we can make previous code slightly nicer by further binding the tuple elements to names like (left, right). How do we do this?

Some(ch) => {
// type of ch: &Box<(T, T)>
//
// and we want bind it to something like (left, right)
}

Being emboldened by impressive deductive powers of match ergonomics we quickly type the following code and expect it to work:

Some((left, right)) => {
traverse(left);
traverse(right);
}

After all, we have a reference and we match it against a non-reference tuple pattern, so match ergonomics should auto-dereference twice and use appropriate binding modes for left and right. But not so fast: unfortunately, match ergonomics only works for ordinary references and we have a Box there! So, we have two options: either use box_patterns feature from unstable Rust or wait for match ergonomics starting to work for anything implementing Deref including Boxes as discussed here.

Solution with box_patterns looks like this:

#![feature(box_patterns)]
struct T {
data: u8,
children: Option<Box<(T, T)>>,
}
fn traverse(s: &T) {
match &s.children {
None => {},
Some(box (left, right)) => {
traverse(left);
traverse(right);
}
}
}

And the tracking issue for box_patterns feature along with interesting discussions on the subject can be found here.

This note is already getting too long, so there won't be any summary or references section: enjoy the ergonomics!