When Ferrous Metals Corrode, pt. IX

Intro

This post corresponds to Chapter 10. Enums and Patterns.

Rusts enums are like C unions: they can be one of several kinds of data.

Patterns in Rust resemble the ones in Erlang; they allow to match on parts of a data structure and bind parts of it to variables.

Enums

We've already used enums: Result is one (it's either Ok or Err).

Simple enums – first with automatic numbering, second with user-provided:

enum Ordering {
    Less,
    Equal,
    Greater,
}

enum HttpStatus {
    Ok = 200,
    NotModified = 304,
    NotFound = 404,
    ...
}

use self::Ordering::*;

The last line demonstrates a self-import, so we can write Greater in the module instead of Ordering::Greater

These C-style enums can be cast to integers too, but not the other way.

To get enums from ints either do something like:

fn http_status_from_u32(n: u32) -> Option<HttpStatus> {
    match n {
        200 => Some(HttpStatus::Ok),
        304 => Some(HttpStatus::NotModified),
        404 => Some(HttpStatus::NotFound),
        ...
        _ => None,
    }
}  

Or, use the enum_primitive crate which has macros for things like this.

Enums can grow impl blocks. This example has time units and some display methods(and gets some std traits too):

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
enum TimeUnit {
    Seconds, Minutes, Hours, Days, Months, Years,
}

impl TimeUnit {
    /// Return the plural noun for this time unit.
    fn plural(self) -> &'static str {
        match self {
            TimeUnit::Seconds => "seconds",
            TimeUnit::Minutes => "minutes",
            TimeUnit::Hours => "hours",
            TimeUnit::Days => "days",
            TimeUnit::Months => "months",
            TimeUnit::Years => "years",
        }
    }

    /// Return the singular noun for this time unit.
    fn singular(self) -> &'static str {
        self.plural().trim_end_matches('s')
    }
}

Enums with Data

These are enums that hold variants of data.

The first is an enum that holds a tuple variant:

/// A timestamp that has been deliberately rounded off, so our program
/// says "6 months ago" instead of "February 9, 2016, at 9:49 AM".
#[derive(Copy, Clone, Debug, PartialEq)]
enum RoughTime {
    InThePast(TimeUnit, u32),
    JustNow,
    InTheFuture(TimeUnit, u32),
}

let four_score_and_seven_years_ago =
    RoughTime::InThePast(TimeUnit::Years, 4 * 20 + 7);

let three_hours_from_now =
    RoughTime::InTheFuture(TimeUnit::Hours, 3);

InThePast and InTheFuture take a TimeUnit arg and can be used as constructors.

There's a corresponding struct variant enum too. For example:

enum Shape {
    Sphere { center: Point3d, radius: f32 },
    Cuboid { corner1: Point3d, corner2: Point3d },
}

let unit_sphere = Shape::Sphere {
    center: ORIGIN,
    radius: 1.0,
};  

So, Shapes can be either Spheres or Cuboids; each constructor takes specific args.

This mirrors the types of structs: Unit structs (no fields), named field structs and tuple structs.

Fields of the enum share the same visibility as the enum itself.

Enums in Memory

In memory enums have a small tag to identify them plus the mem required for the largest possible field. The exact layout is a rustc implementation detail though.

Rich Data Structures Using Enums

Enums are useful for tree like data structures where each node can be one of several types. For instance, to define a JSON node could use:

use std::collections::HashMap;

enum Json {
    Null,
    Boolean(bool),
    Number(f64),
    String(String),
    Array(Vec<Json>),
    Object(Box<HashMap<String, Json>>),
}  

The Box here is used for memory efficiency – a Box is smaller than using a HashMap directly.

Much more compact than setting up a class hierarchy of JSONxxx node types!

Generic Enums

We already used Results, and of course it's generic, same as Options:

enum Option<T> {
    None,
    Some(T),
}

enum Result<T, E> {
    Ok(T),
    Err(E),
}  

A nice little generic tree structure:

// An ordered collection of `T`s.
enum BinaryTree<T> {
    Empty,
    NonEmpty(Box<TreeNode<T>>),
}

// A part of a BinaryTree.
struct TreeNode<T> {
    element: T,
    left: BinaryTree<T>,
    right: BinaryTree<T>,
}  

The tree can be empty or not; if it's not it'll contain TreeNodes with some T value, and left/right pointers to more subtrees.

The downside of all these variants is that we can't just access fields, as we don't know which actually might be present. This is where patterns come in.

Patterns

Patterns are constructed with match expressions. These can deconstruct an enum value for you. Each leg of the match is compared against the actual run time value, and can then safely access fields. This seems to be quite similar to what Erlang has with case blocks.

Example with the RoughTime enum from before:

fn rough_time_to_english(rt: RoughTime) -> String {
    match rt {
        RoughTime::InThePast(units, count) =>
            format!("{} {} ago", count, units.plural()),
        RoughTime::JustNow =>
            format!("just now"),
        RoughTime::InTheFuture(unit, 1) =>
            format!("a {} from now", unit.singular()),
        RoughTime::InTheFuture(units, count) =>
            format!("{} {} from now", count, units.plural()),
    }
}  

The syntax of field access looks similar to constructors, and in a way they are complementary. Constructors produce values while patterns consume them.

For each leg, the match binds fields of the enum to the mentioned values; the code in the match legs can subsequently operate on the bound values. So here we get counts and units for two legs, while for the JustNow case there is no field to bind. Binding here means copying or moving, depending on the capabilities of the data type.

Patterns are tried from top to bottom. Once a pattern matches, it's run and no other patterns are tried. That's why RoughTime::InTheFuture(unit, 1) needs to come before the more generic RoughTime::InTheFuture(units, count)

The pattern matching language is quite flexible; and pattern matching works with non-enums too (as we've already seen).

Literals, Variables, and Wildcards in Patterns

We've already seen matching against literals and variables. Wildcard is the "_" pattern, it matches anything (and therefore needs to come as the last pattern) and doesn't bind anything.

Example: first matches for specific number of rabbits, and any number of rabbits as a fall-through option last.

match meadow.count_rabbits() {
    0 => {}  // nothing to say
    1 => println!("A rabbit is nosing around in the clover."),
    n => println!("There are {} rabbits hopping about in the meadow", n),
}  

Tuple and Struct Patterns

Here a tuple of x,y coordinates are compared against 0, the outcome is an ordering:

fn describe_point(x: i32, y: i32) -> &'static str {
    use std::cmp::Ordering::*;
    match (x.cmp(&0), y.cmp(&0)) {
        (Equal, Equal) => "at the origin",
        (_, Equal) => "on the x axis",
        (Equal, _) => "on the y axis",
        (Greater, Greater) => "in the first quadrant",
        (Less, Greater) => "in the second quadrant",
        _ => "somewhere else",
    }
}  

Struct patterns use curly braces. The first pattern is against a balloon directly above us (x=0), the second for a balloon some way off:

match balloon.location {
    Point { x: 0, y: height } =>
        println!("straight up {} meters", height),
    Point { x: x, y: y } =>
        println!("at ({}m, {}m)", x, y),
}

A shorthand for the pattern in the second arm is Point {x, y} – it binds the same vars.

For large structs you can tell rustc to match against only some fields and ignore the rest by using ..:

Some(Account { name, language, .. }) =>
    language.show_custom_greeting(name),  

Array and Slice Patterns

These work as you'd expect.

Array pattern example:

fn hsl_to_rgb(hsl: [u8; 3]) -> [u8; 3] {
    match hsl {
        [_, _, 0] => [0, 0, 0],
        [_, _, 255] => [255, 255, 255],
        ...
    }
}

Slices are similar but have variable length. The .. operator again can be used as a placeholder:

fn greet_people(names: &[&str]) {
    match names {
        [] => { println!("Hello, nobody.") },
        [a] => { println!("Hello, {}.", a) },
        [a, b] => { println!("Hello, {} and {}.", a, b) },
        [a, .., b] => { println!("Hello, everyone from {} to {}.", a, b) }
    }
}  

Reference Patterns

Non-copyable matched values are moved by default. Ref patterns are a way of saying we want a reference not the actual value bound. The ref keyword says to borrow a ref for name and language:

match account {
    Account { ref name, ref language, .. } => {
        ui.greet(name, language);
        ui.show_settings(&account); 
    }
}

Similarly, use ref mut to borrow a mut reference.

The opposite is a pattern matching &. Example:

match sphere.center() {
    &Point3d { x, y, z } => ...
}

This matches references to Point3d values. The x,y,z above are copied if possible; otherwise you'd need to borrow with a ref.

Match Guards

Like in Erlang, patterns can also specify extra conditions:

fn check_move(current_hex: Hex, click: Point) -> game::Result<Hex> {
    match point_to_hex(click) {
        None => Err("That's not a game space."),
        Some(hex) if hex == current_hex =>
            Err("You are already there! You must click somewhere else"),
        Some(hex) => Ok(hex)
    }
}

Matching Multiple Possibilities

Combine patterns with "|"

let at_end = match chars.peek() {
    Some(&'\r' | &'\n') | None => true,
    _ => false,
};

Can also match ranges:

match next_char {
    '0'..='9' => self.read_number(),
    'a'..='z' | 'A'..='Z' => self.read_word(),
    ' ' | '\t' | '\n' => self.skip_whitespace(),
    _ => self.handle_punctuation(),
}

Binding with @ Patterns

This works like other patterns except that you can get a binding (copy or move) of the complete value the pattern used. Here we want to get the whole rect instead of individual fields:

match self.get_selection() {
    rect @ Shape::Rect(..) => {
        optimized_paint(&rect)
    }
    ...
}

Where Patterns Are Allowed

Patterns can be used to unpack values in various other places besides match expressions:

// ...unpack a struct into three new local variables
let Track { album, track_number, title, .. } = song;

// ...unpack a function argument that's a tuple
fn distance_to((x, y): (f64, f64)) -> f64 { ... }

// ...iterate over keys and values of a HashMap
for (id, document) in &cache_map {
    println!("Document #{}: {}", id, document.title);
}

// ...automatically dereference an argument to a closure
// (handy because sometimes other code passes you a reference
// when you'd rather have a copy)
let sum = numbers.fold(0, |a, &num| a + num);

These all are patterns that always match – called irrefutable patterns. Let, func args, for, closure args can only take irrefutable patterns. This makes sense because what would a let etc. do with a non-matching pattern? It could only panic.

In contrast, refutable patterns can be used in match, if, while expressions:

/ ...handle just one enum variant specially
if let RoughTime::InTheFuture(_, _) = user.date_of_birth() {
    user.set_time_traveler(true);
}

// ...run some code only if a table lookup succeeds
if let Some(document) = cache_map.get(&id) {
    return send_cached_response(document);
}

// ...repeatedly try something until it succeeds
while let Err(err) = present_cheesy_anti_robot_task() {
    log_robot_attempt(err);
    // let the user try again (it might still be a human)
}

// ...manually loop over an iterator
while let Some(_) = lines.peek() {
    read_paragraph(&mut lines);
}

Populating a Binary Tree

Here's a look at the match expr for add fn for the binary tree class from before.

impl<T: Ord> BinaryTree<T> {
    fn add(&mut self, value: T) {
        match *self {
            BinaryTree::Empty => {
                *self = BinaryTree::NonEmpty(Box::new(TreeNode {
                    element: value,
                    left: BinaryTree::Empty,
                    right: BinaryTree::Empty,
                }))
            }
            BinaryTree::NonEmpty(ref mut node) => {
                if value <= node.element {
                    node.left.add(value);
                } else {
                    node.right.add(value);
                }
            }
        }
    }
}

The impl is defined for Ts that have the Ord trait, i.e. our T values must be comparable.

The first match arm is for the empty case – we construct a node, using a Boxed TreeNode. This is done via the BinaryTrees NonEmpty constructor.

The second match arm is a ref pattern; we bind a mut ref 'node' then add our new val either left or right. The add fn call would then recurse down the tree until it finds and Empty node.

Coda

I found the BinaryTrees example a good motivation that shows how enums and pattern matching can be used to elegantly and safely code up data structures. On the other hand, match patterns and enum structure are pretty closely coupled – making extension of enums likely a bit cumbersome.