Intro
I'm working through chapter 15, "Iterators" in the Programming Rust book for this post.
Iterators produce sequences of values, and all kinds of things can be iterated over – strings, collection types, files, connections, database records, etc.
Rust iterators have a rich set of methods like fold()
, map()
,
filter()
, reduce()
, flatten()
.
The Iterator and IntoIterator Traits
The Iterator trait from std::iter::Iterator
:
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
... // many default methods
}
Item defines the type things the iterator will return. If you
implement the next()
method you can get
many default methods for free.
This is the IntoIterator trait:
trait IntoIterator where Self::IntoIter: Iterator<Item=Self::Item> {
type Item;
type IntoIter: Iterator;
fn into_iter(self) -> Self::IntoIter;
}
This is defined for types that can create iterators for themselves – we'll call those iterables. Item again is the type of things the iterator returns, while IntoIter the type of the iterator.
Example from the std lib, a type that wraps a vector of i32 ints:
struct MyCollection(Vec<i32>);
impl MyCollection {
fn new() -> MyCollection {
Vec::new())
MyCollection(}
fn add(&mut self, elem: i32) {
self.0.push(elem);
}
}
impl IntoIterator for MyCollection {
type Item = i32;
type IntoIter = std::vec::IntoIter<Self::Item>;
fn into_iter(self) -> Self::IntoIter {
self.0.into_iter() // just delegate to the vector
}
}
let mut c = MyCollection::new();
.add(0);
c.add(1);
c.add(2);
c
// get an iterator and enumerate it
for (i, n) in c.into_iter().enumerate() {
assert_eq!(i as i32, n);
}
Apropos for loops, for x in iterab { ... }
under the hood calls
iterab.into_iter()
then calls next()
until the iterable is empty. Iterators
auto-implement IntoIterator (and simply return themselves on into_iter()
).
Behaviour on calling next()
on an empty
iterator is not specified by Iterator – may just return None again.
Creating Iterators
iter
and iter_mut
Methods
By convention, collection types often implement methods iter()
and iter_mut()
which return iterators that produce
shared or or mutable refs, respectively. This is no fixed spec though,
each type may implement iter()
the way it
makes sense for it.
If there's one than more way to iterate, types should implement
specific methods – e.g. &str
has bytes()
and chars()
for returning bytes and unicode
chars.
IntoIterator Implementations
Collections often will implement several versions of IntoIterator:
for shared refs: return an iterator that produces shared refs to items
for mut refs: return an iterator that produces mut refs to items
for pass-by-value: return an iterator over owned values (consuming the collection when iterated over)
This is what happens when running a for loop in the below variants:
for element in &collection { ... }
for element in &mut collection { ... }
for element in collection { ... }
The underlying assumption is though that an iterator should be efficient and predictable, so collections generally will implement only those variants where this can be implemented. E.g. the mut ref variant isn't implemented on HashSet, as modifying elements might need a rehashing.
Slices implement the ref variants (can't implement owning variant).
There's some overlap between IntoIterator variants for refs and the
iter()
and iter_mut()
methods, this is for ergonomics, and
IntoIterator is useful as a bound on type vars, e.g. for something like
this:
use std::fmt::Debug;
fn dump<T, U>(t: T)
where T: IntoIterator<Item=U>,
: Debug
U{
for u in t {
println!("{:?}", u);
}
}
This specifies that the dump fun must only get an arg if that is an iterable over elements that implement the Debug trait.
from_fn
and successors
The std lib fun std::iter::from_fn
will
create iterators from a passed-in function (closure), where it'll call
the fun for each element. The function or closure must return an Option<T>
.
Example using a move closure:
let mut count = 0;
let counter = std::iter::from_fn(move || {
// Increment our count. This is why we started at zero.
+= 1;
count
// Check to see if we've finished counting or not.
if count < 6 {
Some(count)
} else {
None
}
});
assert_eq!(counter.collect::<Vec<_>>(), &[1, 2, 3, 4, 5]);
The std::iter::successors
function will
also create an iterator. It takes an initial value and a fun which will,
given an value, return the values' successor in the sequence. Ex.:
use std::iter::successors;
let powers_of_10 = successors(Some(1_u16), |n| n.checked_mul(10));
assert_eq!(powers_of_10.collect::<Vec<_>>(), &[1, 10, 100, 1_000, 10_000]);
drain Methods
Many collections also provide drain()
methods which return iterators that drain the collection, i.e. leave it
empty. Compared to inter_iter()
the drain
methods don't take ownership entire collection but only take a mut ref.
Types that can take range indices also can be drained by range:
let mut outer = "Earth".to_string();
let inner = String::from_iter(outer.drain(1..4));
assert_eq!(outer, "Eh");
assert_eq!(inner, "art");
Other Iterator Sources
Type/trait | Expression | Notes |
---|---|---|
std::ops::Range | 1..10 | Range of ints, excl end |
(1..10).step_by(2) |
Every second number | |
std::ops::RangeFrom | 1.. | Unbounded range |
Option<T>, Result<T,E> | .iter() | Iterable of Ok/Some vals |
Vec<T>, &[T] | v.chunks(3) | Chunk elems |
v.chunks_mut(3) |
Mut chunks | |
v.split(predf) | Split by predicate fun | |
v.splitmut(predf) | Split into mut slices | |
v.rsplit(predf) | Right-to-left | |
v.splitn(predf, n) | At most n slices | |
String, &str | s.bytes() | Indiv. bytes |
s.chars() | Unicode chars | |
s.split_whitespace() |
Split string on whitespace | |
s.lines() | Split into lines | |
s.split(f) | Split on chars, strings, closures | |
s.split_inclusive(f) |
Split, also incl. separator | |
s.split_terminator(f) |
When elems are sep + terminated | |
s.split_once(f) |
Split in two | |
s.matches(f) | Slices that match a pattern | |
Maps | m.keys() | Shared refs to keys |
m.values() | Shared refs to values | |
m.values_mut() |
Mut refs to values | |
Sets | s1.union(s2) | Shared refs to set union |
s1.intersection(s2) | Shared refs to set intersection | |
std::io::Read | stream.bytes() | Bytes from i/o stream |
stream.chars() | Unicode chars from i/o stream | |
std::io::BufRead | bufstream.lines() | Lines from bufferd stream |
std::fs::ReadDir | std::fs::read_dir(path) |
Dir entries |
std::net::TcpListener | listener.incoming() | Incoming conns |
Funs from std::iter | empty() | Empty iterator |
once(X) | One entry | |
repeat(X) | Endless iterator |
Iterator Adapters
These are methods of Iterator that process an iterator in some way
map and filter
Similar to the Python built-ins. The map()
applies some fun or closure to an iterator
and returns a new one. The filter()
method
takes a predicate fun and returns an iterator with those elems for which
the predicate returns true. Example: split a text into lines, trim
whitespace from those and exclude iguanas:
let text = " ponies \n giraffes\niguanas \nsquid".to_string();
let v: Vec<&str> = text.lines()
.map(str::trim)
.filter(|s| *s != "iguanas")
.collect();
assert_eq!(v, ["ponies", "giraffes", "squid"]);
Note that maps passes items to the fun by value, whereas filter
predicates get a shared ref. Also note that iterators are lazy, i.e.
with iterator chains like this, nothing gets processed until something
calls next()
somewhere – in this case the
final collect()
call.
filter_map
and flat_map
The filter_map()
works like a
combination filter/map. When the fun passed to filter_map()
returns None, it's dropped from the
resulting iterator.
Example, split some text on whitespace, and try to parse the elems as
float. Every element will result in f64::from_str()
returning a Result, from which
we return floats on ok and None otherwise; and Nones will be droppped by
filter.
use std::str::FromStr;
let text = "1\nfrond .25 289\n3.1415 estuary\n";
for number in text
.split_whitespace()
.filter_map(|w| f64::from_str(w).ok())
{
println!("{:4.2}", number.sqrt());
}
An alternative implementation with map and filter needs two maps:
.split_whitespace()
text.map(|w| f64::from_str(w))
.filter(|r| r.is_ok())
.map(|r| r.unwrap())
The flat_map()
method takes a fun which
takes an elem and can return a sequence – with the individual sequences
then getting flattened into the resulting iterator. Example, merge chars
from a number of strings together:
let words = ["alpha", "beta", "gamma"];
// chars() returns an iterator
let merged: String = words.iter()
.flat_map(|s| s.chars())
.collect();
assert_eq!(merged, "alphabetagamma");
flatten
Speaking of flatten, this is what the flatten()
method does.
One interesting use case:
assert_eq!(vec![None, Some("day"), None, Some("one")]
.into_iter()
.flatten()
.collect::<Vec<_>>(),
vec!["day", "one"]);
The Option values are iterable themselves. A value of None produces
an empty sequence and will not produce a value, while a Some value will
contribute it's value. Also works well with Option<Vec<...>>
values.
Result is iterable as well, where Err vals are empty – applying flatten will effectively throw away all the errors.
take and take_while
These take ownership of an iterator and return parts of it. The take(n)
method takes the first n elements. The
take_while(pred)
method returns elements
until the predicate fun returns false.
skip and skip_while
The opposite of the above take methods: skip either a fixed number of elements, or until a predicate returns true.
peekable
Peeking means looking at the next element of an iterator without
consuming it. Peekable iteratos have a peek()
method for this. An iterator can be
turned into a peekable iterator by calling the peekable()
method on it. The peek method returns
an Option<&Item>
with a shared
ref to the next item if there is one.
fuse
When an Iterator is done it'll return None. Calling next()
on it again is unspecified – many
Iterators just return None again, but the trait doesn't actually
specify. The fuse()
method will enforce
this, i.e. return an Iterator that once it's done always will return
None.
Reversible Iterators and rev
The td::iter::DoubleEndedIterator
trait
extends Iterator. It's for iterators that can be iterated either from
the start or the end. Those implement a next_back()
method that will consume elements
from the end. E.g. slices or ordered collections like BTreeMap
can do this. They can be reversed with
the rev()
method.
inspect
The inspect()
method takes a fun which
will get a shared ref to each elem passing through, a bit like a span
port – useful for debugging.
chain
This method will chain two iterators together:
let v: Vec<i32> = (1..4).chain([20, 30, 40]).collect();
assert_eq!(v, [1, 2, 3, 20, 30, 40]);
enumerate
This adds a running index and returns pairs of index, element.
zip
The zip()
method zips up two iterables
to create a sequence of pairs.
let v: Vec<_> = (0..).zip("ABCD".chars()).collect();
assert_eq!(v, vec![(0, 'A'), (1, 'B'), (2, 'C'), (3, 'D')]);
by_ref
The above methods typically take ownership of the iterator. With
by_ref()
you can borrow an iterator, use
the mut ref, and then continue with the original iterable.
let mut words = ["hello", "world", "of", "Rust"].into_iter();
// Borrow a mut ref, take the first two words.
let hello_world: Vec<_> = words.by_ref().take(2).collect();
assert_eq!(hello_world, vec!["hello", "world"]);
// Collect the rest of the words with the orig. iterable
// We can only do this because we used `by_ref` earlier.
let of_rust: Vec<_> = words.collect();
assert_eq!(of_rust, vec!["of", "Rust"]);
cloned, copied
With cloned()
you get an iterator of
cloned elems; copied()
uses Copy instead
of Clone.
cycle
This endlessly repeats the underlying iterator – which must be Clone-able.
Consuming Iterators
Or, how to spend it
Simple Accumulation: count, sum, product
These methods pretty much do what it says on the tin:
- count()
-
count elements
- sum()
-
add up elements
- product()
- multiply elements
max, min
These work with items that are Ord, i.e. which are comparable. The return an Option (which will be None if the iterator is empty). They don't work with floats as they only implement PartialOrd (due to the NaN problem).
max_by, min_by
Like max()
and min()
but use a user-supplied fun for
comparison. Useful for comparing PartialOrd elems like floats
max_by_key, min_by_key
These take a fun to select an element by which to compare, or perform some computation.
E.g. to select from a list of two-tuple (name, pop) elements the one with the largest pop field:
.iter().max_by_key(|&(_name, pop)| pop) populations
Comparing Item Sequences
Strings, vectors and slices can be compared with <
and ==
operators. Iterators don't overload those operators,
the have methods eq()
and lt()
which compare items from two iterators
pairwise. They will end up early if a decision can be reached before the
end of the iterators.
any and all
Apply a fun to iterator elements; if all/any elements fun evaluation true the methods will be true.
position, rposition, and ExactSizeIterator
- position
-
return index of element for which
f(element)
is true for a given f - rposition
- like position, but start from the right/end side
Note that rposition and similar methods need an iterator that is
reversible and has a known size. – It doesn't make much sense to start
from the end of an open-ended iterator like a stream or so. But also the
str::chars
iterator does not, it
interprets unicode chars and those are var-width.
Iterators which have the std::iter::ExactSizeIterator
trait know their
length:
trait ExactSizeIterator: Iterator {
fn len(&self) -> usize { ... }
fn is_empty(&self) -> bool { ... }
}
fold and rfold
The fold method is like the one in other functional langs: update an accumulator with the result of some fun applied to each elem in turn:
let a = [5, 6, 7, 8, 9, 10];
assert_eq!(a.iter().fold(0, |n, _| n+1), 6); // count
assert_eq!(a.iter().fold(0, |n, i| n+i), 45); // sum
assert_eq!(a.iter().fold(1, |n, i| n*i), 151200); // product
rfold is the same but starts from the right/end side
tryfold and tryrfold
The = tryfold()= methods are similar to fold but can stop
early. The fun/closure you pass to try_fold()
indicates this by returning a Result
Err value, or an Option None, or a std::ops::ControlFlow
nth, nthback
Skip n elements from an iterator and return the next one. Does not
take ownership. The nth_back()
method
works the same, but from the end of the iterator.
last
Return the last element, consuming all before
find, rfind, and findmap
The find()
method takes a fun and
returns the first elem from the iterator for which the fun returns true.
Similarly for the rfind()
method, but
starting from the end.
The find_map()
method expands on that:
here the fun should return an Option, and the method will return the
first of these that's Some.
Building Collections: collect and FromIterator
The collect()
method is implemented for
all stdlib collection types via the FromIterator trait:
trait FromIterator<A>: Sized {
fn from_iter<T: IntoIterator<Item=A>>(iter: T) -> Self;
}
For some collection types this means just adding elements
sequentially; others like Vector use the size_hint()
method from the Iterator trait to
prealloc its memory. Similarly for the hash tables used for HashSet and
HashMap.
The Extend Trait
The std::iter::Extend
trait defines and
extend method that allows to extend an iterable with the elements from
some other iterable. There's some overlap with the FromIterator trait
here, and indeed some types implement one trait with the help of the
other.
partition
This allows to split an iterator with the help of some fun/closure as a predicate into two collections.
foreach and tryforeach
The for_each()
method applies a
fun/closure to each element, similarly to a for loop. Note in contrast
to a map, no new iterator is built
The try_for_each()
method is for
fallible funs/closures, or if you need to exit early. It will stop at
the first error (and returning that error).
Implementing Your Own Iterators
Implementing the IntoIterator and Iterator traits for user defined
types often can be done by just implementing the next()
method; for the other methods there are
default implementations.
Example, a simple range type:
// data
struct I32Range {
: i32,
start: i32
end}
// iterator
impl Iterator for I32Range {
type Item = i32; // type of the elements we're going to produce
fn next(&mut self) -> Option<i32> {
// needs to be mut self b/c we incr. the index
// we either return Some(i32) or None
if self.start >= self.end {
// already exhausted
return None;
}
// get result and incr. index
let result = Some(self.start);
self.start += 1;
result}
}
Coda
This was nice long chapter. There's quite a lot of abstractions available to deal with lists of things. Next will be collection types from the std lib.