When Ferrous Metals Corrode, pt. XIV

Intro

I'm working through chapter 15, "Iterators" in the Programming Rust book for this post.

Iterators produce sequences of values, and all kinds of things can be iterated over – strings, collection types, files, connections, database records, etc.

Rust iterators have a rich set of methods like fold(), map(), filter(), reduce(), flatten().

The Iterator and IntoIterator Traits

The Iterator trait from std::iter::Iterator:

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
    ... // many default methods
}

Item defines the type things the iterator will return. If you implement the next() method you can get many default methods for free.

This is the IntoIterator trait:

trait IntoIterator where Self::IntoIter: Iterator<Item=Self::Item> {
    type Item;
    type IntoIter: Iterator;
    fn into_iter(self) -> Self::IntoIter;
}

This is defined for types that can create iterators for themselves – we'll call those iterables. Item again is the type of things the iterator returns, while IntoIter the type of the iterator.

Example from the std lib, a type that wraps a vector of i32 ints:

struct MyCollection(Vec<i32>);

impl MyCollection {
    fn new() -> MyCollection {
        MyCollection(Vec::new())
    }

    fn add(&mut self, elem: i32) {
        self.0.push(elem);
    }
}

impl IntoIterator for MyCollection {
    type Item = i32;
    type IntoIter = std::vec::IntoIter<Self::Item>;

    fn into_iter(self) -> Self::IntoIter {
        self.0.into_iter() // just delegate to the vector
    }
}

let mut c = MyCollection::new();
c.add(0);
c.add(1);
c.add(2);

// get an iterator and enumerate it
for (i, n) in c.into_iter().enumerate() {
    assert_eq!(i as i32, n);
}

Apropos for loops, for x in iterab { ... } under the hood calls iterab.into_iter() then calls next() until the iterable is empty. Iterators auto-implement IntoIterator (and simply return themselves on into_iter()).

Behaviour on calling next() on an empty iterator is not specified by Iterator – may just return None again.

Creating Iterators

iter and iter_mut Methods

By convention, collection types often implement methods iter() and iter_mut() which return iterators that produce shared or or mutable refs, respectively. This is no fixed spec though, each type may implement iter() the way it makes sense for it.

If there's one than more way to iterate, types should implement specific methods – e.g. &str has bytes() and chars() for returning bytes and unicode chars.

IntoIterator Implementations

Collections often will implement several versions of IntoIterator:

  • for shared refs: return an iterator that produces shared refs to items

  • for mut refs: return an iterator that produces mut refs to items

  • for pass-by-value: return an iterator over owned values (consuming the collection when iterated over)

This is what happens when running a for loop in the below variants:

for element in &collection { ... }
for element in &mut collection { ... }
for element in collection { ... }

The underlying assumption is though that an iterator should be efficient and predictable, so collections generally will implement only those variants where this can be implemented. E.g. the mut ref variant isn't implemented on HashSet, as modifying elements might need a rehashing.

Slices implement the ref variants (can't implement owning variant).

There's some overlap between IntoIterator variants for refs and the iter() and iter_mut() methods, this is for ergonomics, and IntoIterator is useful as a bound on type vars, e.g. for something like this:

use std::fmt::Debug;

fn dump<T, U>(t: T)
    where T: IntoIterator<Item=U>,
          U: Debug
{
    for u in t {
        println!("{:?}", u);
    }
}

This specifies that the dump fun must only get an arg if that is an iterable over elements that implement the Debug trait.

from_fn and successors

The std lib fun std::iter::from_fn will create iterators from a passed-in function (closure), where it'll call the fun for each element. The function or closure must return an Option<T>.

Example using a move closure:

let mut count = 0;
let counter = std::iter::from_fn(move || {
    // Increment our count. This is why we started at zero.
    count += 1;

    // Check to see if we've finished counting or not.
    if count < 6 {
        Some(count)
    } else {
        None
    }
});
assert_eq!(counter.collect::<Vec<_>>(), &[1, 2, 3, 4, 5]);

The std::iter::successors function will also create an iterator. It takes an initial value and a fun which will, given an value, return the values' successor in the sequence. Ex.:

use std::iter::successors;

let powers_of_10 = successors(Some(1_u16), |n| n.checked_mul(10));
assert_eq!(powers_of_10.collect::<Vec<_>>(), &[1, 10, 100, 1_000, 10_000]);

drain Methods

Many collections also provide drain() methods which return iterators that drain the collection, i.e. leave it empty. Compared to inter_iter() the drain methods don't take ownership entire collection but only take a mut ref. Types that can take range indices also can be drained by range:

let mut outer = "Earth".to_string();
let inner = String::from_iter(outer.drain(1..4));

assert_eq!(outer, "Eh");
assert_eq!(inner, "art");

Other Iterator Sources

Type/trait Expression Notes
std::ops::Range 1..10 Range of ints, excl end
(1..10).step_by(2) Every second number
std::ops::RangeFrom 1.. Unbounded range
Option<T>, Result<T,E> .iter() Iterable of Ok/Some vals
Vec<T>, &[T] v.chunks(3) Chunk elems
v.chunks_mut(3) Mut chunks
v.split(predf) Split by predicate fun
v.splitmut(predf) Split into mut slices
v.rsplit(predf) Right-to-left
v.splitn(predf, n) At most n slices
String, &str s.bytes() Indiv. bytes
s.chars() Unicode chars
s.split_whitespace() Split string on whitespace
s.lines() Split into lines
s.split(f) Split on chars, strings, closures
s.split_inclusive(f) Split, also incl. separator
s.split_terminator(f) When elems are sep + terminated
s.split_once(f) Split in two
s.matches(f) Slices that match a pattern
Maps m.keys() Shared refs to keys
m.values() Shared refs to values
m.values_mut() Mut refs to values
Sets s1.union(s2) Shared refs to set union
s1.intersection(s2) Shared refs to set intersection
std::io::Read stream.bytes() Bytes from i/o stream
stream.chars() Unicode chars from i/o stream
std::io::BufRead bufstream.lines() Lines from bufferd stream
std::fs::ReadDir std::fs::read_dir(path) Dir entries
std::net::TcpListener listener.incoming() Incoming conns
Funs from std::iter empty() Empty iterator
once(X) One entry
repeat(X) Endless iterator

Iterator Adapters

These are methods of Iterator that process an iterator in some way

map and filter

Similar to the Python built-ins. The map() applies some fun or closure to an iterator and returns a new one. The filter() method takes a predicate fun and returns an iterator with those elems for which the predicate returns true. Example: split a text into lines, trim whitespace from those and exclude iguanas:

let text = "  ponies  \n   giraffes\niguanas  \nsquid".to_string();
let v: Vec<&str> = text.lines()
    .map(str::trim)
    .filter(|s| *s != "iguanas")
    .collect();
assert_eq!(v, ["ponies", "giraffes", "squid"]);

Note that maps passes items to the fun by value, whereas filter predicates get a shared ref. Also note that iterators are lazy, i.e. with iterator chains like this, nothing gets processed until something calls next() somewhere – in this case the final collect() call.

filter_map and flat_map

The filter_map() works like a combination filter/map. When the fun passed to filter_map() returns None, it's dropped from the resulting iterator.

Example, split some text on whitespace, and try to parse the elems as float. Every element will result in f64::from_str() returning a Result, from which we return floats on ok and None otherwise; and Nones will be droppped by filter.

use std::str::FromStr;

let text = "1\nfrond .25  289\n3.1415 estuary\n";
for number in text
    .split_whitespace()
    .filter_map(|w| f64::from_str(w).ok())
{
    println!("{:4.2}", number.sqrt());
}

An alternative implementation with map and filter needs two maps:

text.split_whitespace()
    .map(|w| f64::from_str(w))
    .filter(|r| r.is_ok())
    .map(|r| r.unwrap())

The flat_map() method takes a fun which takes an elem and can return a sequence – with the individual sequences then getting flattened into the resulting iterator. Example, merge chars from a number of strings together:

let words = ["alpha", "beta", "gamma"];

// chars() returns an iterator
let merged: String = words.iter()
                          .flat_map(|s| s.chars())
                          .collect();
assert_eq!(merged, "alphabetagamma");

flatten

Speaking of flatten, this is what the flatten() method does.

One interesting use case:

assert_eq!(vec![None, Some("day"), None, Some("one")]
           .into_iter()
           .flatten()
           .collect::<Vec<_>>(),
           vec!["day", "one"]);

The Option values are iterable themselves. A value of None produces an empty sequence and will not produce a value, while a Some value will contribute it's value. Also works well with Option<Vec<...>> values.

Result is iterable as well, where Err vals are empty – applying flatten will effectively throw away all the errors.

take and take_while

These take ownership of an iterator and return parts of it. The take(n) method takes the first n elements. The take_while(pred) method returns elements until the predicate fun returns false.

skip and skip_while

The opposite of the above take methods: skip either a fixed number of elements, or until a predicate returns true.

peekable

Peeking means looking at the next element of an iterator without consuming it. Peekable iteratos have a peek() method for this. An iterator can be turned into a peekable iterator by calling the peekable() method on it. The peek method returns an Option<&Item> with a shared ref to the next item if there is one.

fuse

When an Iterator is done it'll return None. Calling next() on it again is unspecified – many Iterators just return None again, but the trait doesn't actually specify. The fuse() method will enforce this, i.e. return an Iterator that once it's done always will return None.

Reversible Iterators and rev

The td::iter::DoubleEndedIterator trait extends Iterator. It's for iterators that can be iterated either from the start or the end. Those implement a next_back() method that will consume elements from the end. E.g. slices or ordered collections like BTreeMap can do this. They can be reversed with the rev() method.

inspect

The inspect() method takes a fun which will get a shared ref to each elem passing through, a bit like a span port – useful for debugging.

chain

This method will chain two iterators together:

let v: Vec<i32> = (1..4).chain([20, 30, 40]).collect();
assert_eq!(v, [1, 2, 3, 20, 30, 40]);

enumerate

This adds a running index and returns pairs of index, element.

zip

The zip() method zips up two iterables to create a sequence of pairs.

let v: Vec<_> = (0..).zip("ABCD".chars()).collect();
assert_eq!(v, vec![(0, 'A'), (1, 'B'), (2, 'C'), (3, 'D')]);

by_ref

The above methods typically take ownership of the iterator. With by_ref() you can borrow an iterator, use the mut ref, and then continue with the original iterable.

let mut words = ["hello", "world", "of", "Rust"].into_iter();

// Borrow a mut ref, take the first two words. 
let hello_world: Vec<_> = words.by_ref().take(2).collect();
assert_eq!(hello_world, vec!["hello", "world"]);

// Collect the rest of the words with the orig. iterable
// We can only do this because we used `by_ref` earlier.
let of_rust: Vec<_> = words.collect();
assert_eq!(of_rust, vec!["of", "Rust"]);

cloned, copied

With cloned() you get an iterator of cloned elems; copied() uses Copy instead of Clone.

cycle

This endlessly repeats the underlying iterator – which must be Clone-able.

Consuming Iterators

Or, how to spend it

Simple Accumulation: count, sum, product

These methods pretty much do what it says on the tin:

count()

count elements

sum()

add up elements

product()

multiply elements

max, min

These work with items that are Ord, i.e. which are comparable. The return an Option (which will be None if the iterator is empty). They don't work with floats as they only implement PartialOrd (due to the NaN problem).

max_by, min_by

Like max() and min() but use a user-supplied fun for comparison. Useful for comparing PartialOrd elems like floats

max_by_key, min_by_key

These take a fun to select an element by which to compare, or perform some computation.

E.g. to select from a list of two-tuple (name, pop) elements the one with the largest pop field:

populations.iter().max_by_key(|&(_name, pop)| pop)

Comparing Item Sequences

Strings, vectors and slices can be compared with < and == operators. Iterators don't overload those operators, the have methods eq() and lt() which compare items from two iterators pairwise. They will end up early if a decision can be reached before the end of the iterators.

any and all

Apply a fun to iterator elements; if all/any elements fun evaluation true the methods will be true.

position, rposition, and ExactSizeIterator

position

return index of element for which f(element) is true for a given f

rposition

like position, but start from the right/end side

Note that rposition and similar methods need an iterator that is reversible and has a known size. – It doesn't make much sense to start from the end of an open-ended iterator like a stream or so. But also the str::chars iterator does not, it interprets unicode chars and those are var-width.

Iterators which have the std::iter::ExactSizeIterator trait know their length:

trait ExactSizeIterator: Iterator {
    fn len(&self) -> usize { ... }
    fn is_empty(&self) -> bool { ... }
}

fold and rfold

The fold method is like the one in other functional langs: update an accumulator with the result of some fun applied to each elem in turn:

let a = [5, 6, 7, 8, 9, 10];

assert_eq!(a.iter().fold(0, |n, _| n+1), 6);        // count
assert_eq!(a.iter().fold(0, |n, i| n+i), 45);       // sum
assert_eq!(a.iter().fold(1, |n, i| n*i), 151200);   // product

rfold is the same but starts from the right/end side

tryfold and tryrfold

The = tryfold()= methods are similar to fold but can stop early. The fun/closure you pass to try_fold() indicates this by returning a Result Err value, or an Option None, or a std::ops::ControlFlow

nth, nthback

Skip n elements from an iterator and return the next one. Does not take ownership. The nth_back() method works the same, but from the end of the iterator.

last

Return the last element, consuming all before

find, rfind, and findmap

The find() method takes a fun and returns the first elem from the iterator for which the fun returns true. Similarly for the rfind() method, but starting from the end.

The find_map() method expands on that: here the fun should return an Option, and the method will return the first of these that's Some.

Building Collections: collect and FromIterator

The collect() method is implemented for all stdlib collection types via the FromIterator trait:

trait FromIterator<A>: Sized {
    fn from_iter<T: IntoIterator<Item=A>>(iter: T) -> Self;
}

For some collection types this means just adding elements sequentially; others like Vector use the size_hint() method from the Iterator trait to prealloc its memory. Similarly for the hash tables used for HashSet and HashMap.

The Extend Trait

The std::iter::Extend trait defines and extend method that allows to extend an iterable with the elements from some other iterable. There's some overlap with the FromIterator trait here, and indeed some types implement one trait with the help of the other.

partition

This allows to split an iterator with the help of some fun/closure as a predicate into two collections.

foreach and tryforeach

The for_each() method applies a fun/closure to each element, similarly to a for loop. Note in contrast to a map, no new iterator is built

The try_for_each() method is for fallible funs/closures, or if you need to exit early. It will stop at the first error (and returning that error).

Implementing Your Own Iterators

Implementing the IntoIterator and Iterator traits for user defined types often can be done by just implementing the next() method; for the other methods there are default implementations.

Example, a simple range type:

// data 
struct I32Range {
    start: i32,
    end: i32
}
// iterator
impl Iterator for I32Range {
    type Item = i32;  // type of the elements we're going to produce

    fn next(&mut self) -> Option<i32> {
        // needs to be mut self b/c we incr. the index
        // we either return Some(i32) or None

        if self.start >= self.end {
            // already exhausted
            return None;
        }
        // get result and incr. index
        let result = Some(self.start);
        self.start += 1;
        result
    }
}

Coda

This was nice long chapter. There's quite a lot of abstractions available to deal with lists of things. Next will be collection types from the std lib.