When Ferrous Metals Corrode, pt. XV

Intro

This post deals with standard library collections, and corresponds to chapter 16. in the Programming Rust book.

Rust moves elements into collections to avoid deep-copying of values. Due to Rusts borrow checker we won't get dangling pointers while collections are changed or resized.

Vec<T>

We've used that one extensively already. Create it with the vec![] macro or with Vec::new()

Vec implements FromIterator; create a vector with the collect method:

// Convert another collection to a vector.
let my_vec = my_set.into_iter().collect::<Vec<String>>();

Accessing Elements

// Get a reference to an element
let first_line = &lines[0];

// Get a copy of an element
let fifth_number = numbers[4];       // requires Copy
let second_line = lines[1].clone();  // requires Clone

// Get a reference to a slice
let my_ref = &buffer[4..12];

// Get a copy of a slice
let my_copy = buffer[4..12].to_vec();  // requires Clone

If an index is out of bounds the access will cause a panic. The indices must be usizes; cast with x as usize.

Access methods:

l.first()

ref to first element, None if l is empty

l.last()

ref to last

l.get(i)

ref to some elem in the middle (or None)

l.first_mut(), l.last_mut(), l.get_mut(index)

Same as above but return mut refs

l.to_vec()

clone a vec

Iteration

Naturally Vec support iteration. Iterating over a Vec moves elems out. Iterating over a &Vec<T> returns refs, and &mut Vec<T> mut refs. Also see the iter() and iter_mut() methods from the previous chapter.

Growing and Shrinking Vectors

l.len()

length

l.is_empty()

true if empty

Vec::with_capacity()

create and specify an initial cap

l.capacity()

current capacity

l.reserve(n)

ensure enough cap for n more elements

=l.shrinktofit()

maybe free up some mem

l.push(v)

add to the end (moves!)

l.pop()

remove and return last element (moves out!)

l.insert(i, v)

insert value; move existing vals to the right (might be slow)

l.remove(i)

remove value, move existing vals to the left

l.resize(new, v)

set vector to new len. If this increases the vector size fill in copies of v. If the new len is less, the vector gets truncated

l.resize_with(new, f)

like above but calls f() to construct new values

l.truncate(new)

reduce the size to new length. Noop if it's already equal or less

l.clear()

remove all elements

l.extend(iter)

add elements from iter to the end

l.split_off(i)

truncates and returns the split-off tail

l.append(l2)

moves elements from l2 into l. l2 still exists (but is empty)

l.drain(range)

range should be sth like 1..4. will remove those elements and returns an iterator over them

l.retain(f)

keep elements for which the predicate f() returns true

l.dedup()

drop consecutive repeated elements, like uniq

l.dedup_by_key(f)

like dedup above, but passes elems through f – elements are considered equal if f(e1) == f(e2)

Joining

These operate on 2-dim arrays:

l.concat()

concatenate the arrays

l.join(s)

join arrays, separated by s

Splitting

l.split_at(i), l.split_at_mut(i)

break a slice in two, returns two refs (shared or mut)

l.split_first(), l.split_first_mut()

split into first, rest. Returns an Option that may be None if the slice is empty

l.split_last(), l.split_last_mut()

as above, but split off last, rest

l.split(is_sep), l.split_mut(is_sep)

split into one subslices if the predicate is_sep() is true

l.split_inclusive(is_sep), l.split_inclusive_mut(is_sep)

as above, but include the separator element

l.splitn(n, is_sep), l.splitn_mut(n, is_sep)

as above, but split into max. n slices

l.rsplitn(n, is_sep), l.rsplitn_mut(n, is_sep)

as above, but split from the right

l.chunks(n), l.chunks_mut(n)

split into subslices of given lenght, returns iterator. Last chunk may have less than n elements.

l.rchunks(n), l.rchunks_mut(n)

as above, but from the right

l.chunks_exact(n), l.chunks_exact_mut(n)

chunk up slice into subslices of exactly len n. The rest can be retrieved via the results remainder() method

l.windows(n)

sliding window of given length. Has no mut variant as it's overlapping

Swapping

l.swap(i, j)

swap elements at indices

l1.swap(&mut l2)

l1 and l2 must be same len. swap all elements in slices

l.swap_remove(i)

remove and return element at i, swap in last element instead

Filling

l.fill(v)

paint v into l

l.fill_with(f)

call f, use result to paint into l

Sorting and Searching

l.sort()

sort elements, needs Ord trait

l.sort_by(f)

sort, using f(elem) to determine order. f must return an Ordering

l.sort_by_key(f)

sort, using f to retrieve a key

l.reverse()

reverse elements in place

l.binary_search(&value), l.binary_search_by(&value, cmp), l.binary_search_by_key(&value, key)

search for a value in a sorted slice. Return a Result with Ok(idx) for success and Err if not found

=l.contains(&value)

true if any elem in l equals value

slice.iter().position(|x| *x = value)=

use this to find the index of a value

Comparing Slices

Slices of comparable elements are comparable as well – traits PartialEq and PartialOrd are relevant here

l.starts_with(l2)

true if l starts with l2

l.ends_with(l2)

true if l ends with l2

Random Elements

Use the rand crate to get random numbers in Rust

l.choose(&mut rng)

returns ref to a random element

=l.shuffle(&mut rng)

shuffle elements in place

Rust Rules Out Invalidation Errors

The book gives an example of the kind of errors that occur when modifying a list in place, which Rusts ownership rules prevents:

my_list = [1, 3, 5, 7, 9]
for index, val in enumerate(my_list):
    if val > 4:
        del my_list[index]  # bug: modifying list while iterating

The Python interpreter will execute this snippet without complaints, with surprising results.

With a short example like this it's easy to see the bug – just avoid modifying the list you're iterating over and you're good. The question is, is it that easy to see with a more complicated data structure or program?

VecDeque<T>

Rust’s std::collections::VecDeque<T> is a deque, a double-ended queue, for when you have to add elements at the front or at the back.

d.push_front(value), d.push_back(value)

push a value at the beginning or the end of the deque

d.pop_front(), d.pop_back()

remove a value from the beginning or the end of the deque, return it as an Option<T>

d.front(), d.back()

return a value (as an Option<&T>) from beginning or end

d.front_mut(), d.back_mut()

as above, but return Option<&mut T>

d.make_contiguous()

normally deques make an effort to store elements contiguously; if you need that this method can arrange elements

Vec::from(deque)

convert deque into Vec

VecDeque::from(vec)

convert Vec into deque

BinaryHeap<T>

Data type for prio queues. Methods:

heap.push(v)

Add to heap

heap.pop()

Fetch back the greatest value, returns an Option (None if the heap was empty)

heap.peek()

Return a ref to the greatest val (not removing the val), returning an Option ref

heap.peek_mut()

Return a PeekMut which acts as a mut ref to the greatest val; PeekMut has a static method pop() to optionally get the val

BinaryHeaps also have some methods from Vec like new(), len(), append(&mut heap2)

Example:

use std::collections::BinaryHeap;
// Construct from Vec
let mut heap = BinaryHeap::from(vec![1, 5]);
// Add another val
heap.push(2);
// Greatest val is on top
assert_eq!(heap.peek(), Some(&5));

// Iterate (random order!)
for x in &heap {
    println!("{x}");
}

// Pop gets us back vals in order
assert_eq!(heap.pop(), Some(5));
assert_eq!(heap.pop(), Some(2));
assert_eq!(heap.pop(), Some(1));
assert_eq!(heap.pop(), None);

HashMap<K, V> and BTreeMap<K, V>

Rust has two types of maps/dicts in the stdlib; they differ more in implementation than API

  • HashMaps needs keys that implement Hash and Eq, as it's using a hash table

  • BTreeMap needs keys to have Ord for storing entries in a order in a tree

Rust uses B-trees instead of binary trees as the former have better cache locality

Creating:

HashMap::new(), BTreeMap::new()

static constructor

iter.collect()

create from key/val iterator Iterator<Item=(K, V)>

HashMap::with_capacity(n)

HashMaps have a cap, so they can be created with some space pre-alloc for efficiency

Core API:

m.len()

length

m.is_empty()

true if empty

m.contains_key(&k)

true if entry present

m.get(&k)

get Some ref if found, None otherwise

m.get_mut(&k)

as above but get Some mut ref

m.insert(k, v)

insert k/v and return previous value as Option, if there was any

m.extend(iter)

add k/v entries from iter

m.append(&mut m2)

move entries from map m2 into m

m.remove(&k)

remove and Optionally return value for k

m.remove_entry(&k)

remove and Optionally return entry i.e. k/v

m.retain(f)

remove entries which don't pass predicate f (which must be some callable FnMut(&K, &mut V) -> bool

m.clear()

remove everything

Maps also support the Index trait which makes lookups like m[&k] possible – however this will panic if k is not in the map

Keys with a type that have Borrow<K> can be used as keys instead of type K as well

BTreeMaps also support splitting off a part of the tree with m.split_off(&k), leaving all entries with smaller keys in m and returning a tree with the others

Entries

Entries are enums, views into a map which is either Occupied (i.e. a value exists for a key) or Vacant (no value set for this key). The idea is to economize on hash/btree lookups when an entry might or might not exist.

The below checks the student map for "peter" and either returns an existing entry or sets up a newly constructed one, and return that:

let rec = student_map.entry("peter").or_insert_with(Student::new);

Some methods:

m.entry(k).or_insert(v)

add v to map if k isn't present already; return a mut ref to the value

m.entry(k).or_default()

as above but just setup Default::default() (needs trait Default naturally)

m.entry(k).or_insert_with(default_fn)

as above but use callable default_fn to get a new value

m.entry(k).and_modify(c)

if entry exists call closure c, passing it a mut ref to the value; returns an Entry again so can be chained

Map Iteration

for (k, v) in map

iter over k/v, consuming the map

for (k, v) in &map

iter over shared k/v refs

for (k, v) in &mut map

gets you shared k ref (can't mut keys!), mut ref for values

m.iter() and m.iter_mut()

return ref or mut ref iterators like above

m.keys()

shared ref iter over keys

m.values()

as above but for values

m.values_mut()

as above but get mut refs

m.into_iter(), m.into_keys(), m.into_values()

consume map, iter over k/v, keys or values alone

HashMaps iter in arbitrary order while BTreeMaps are in key order

HashSet<T> and BTreeSet<T>

Set data types, wrap the corresponding map types.

Iterate over a set as you'd expect:

for v in set

consume set

for r in &set

iterate over refs

There's no way to mut a set value. Similar to the way HashMaps and BTreeMaps behave, HashSets iterate in arbitrary order while BTreeSets iterate in order

Whole-Set Operations

s.intersection(s2)

iterator over vals both in s and s2; similarly &s & &s2 constructs a new set with those

s.union(s2)

iterate over both vals, similar new set: &s | &s2

s.difference(s2)

iterate over vals in s, except those in s2; similar new set: &s - &s2

s.symmetric_difference(&s)

iterate over vals in s or s2, but not those in both s and s2; similar new set: &s ^ &s2

s.is_disjoint(s2)

test: no common vals

s.is_subset(s2)

test is s subset of s2?

s.is_superset(s2)

test, s superset of s2?

s == s2, s != s2

sets also work with equality tests

Hashing

Most built-in types that have Eq also have Hash.

A design principle of the stdlib is that values should have the same hash regardless of where it's stored or how it's pointed at. That's why refs hash the same as their referents, and boxes hash to their boxed vals, and a String hashes to the &str with the same text.

For structs and enums with hashable fields Hash can be derived

If you have custom PartialEq for a type you should also give it custom Hash. For instance with a type that defines equality as equality by ID:

sruct Artifact {
    id: MuseumNumber,
    name: String,
    ...
}

impl PartialEq for Artifact {
    fn eq(&self, other: &Artifact) -> bool {
        self.id == other.id
    }
impl Eq for Artifact {}

To make hashing consistent with PartialEq should have hashing like the below, where values hash to the ID hashes:

use std::hash::{Hash, Hasher};

impl Hash for Artifact {
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        // Delegate hashing to the MuseumNumber.
        self.id.hash(hasher);
    }
}

Using a Custom Hashing Algorithm

Rusts built-in hashing is a cryptographical algo (using SipHash-1-3). For a faster but not as secure algo use the fnv create to get the FNV hash algo

Coda

That's it, we have a nice, safe and diverse set of standard collection types here. Next will be another very practical topic, Strings and Text