When Ferrous Metals Corrode, pt. II

Intro

Second part of my Rust learning notes, about Rust datatypes. This corresponds with chapter 3 of Programming Rust, 2nd Edition

Fundamental Types

Coming from a Python-heavy background, Rust type handling is of course completely different – everything statically typed, and a lot of nuance in basic datatypes. Fortunately the compiler does a decent job of inferring type info (avoiding much of the verbosity that I was unhappy about in Java).

Type overview

Integer types

i8, i16, i32, i64, i128, u8, u16, u32, u64, u128

Integer literals

42, -5i8, 0x400u16, 0o644i16, 20_922_789_888_000u64, b'*'

Address width integers

isize, usize

Floats

f32, f64

Bool

bool

Unicode char (32b)

char

Tuple (mixed type allowed!)

(char, u8, i32)

Unit (empty tuple)

()

Named field struct

struct S { x: f32, y: f32 }

Tuple-like struct

struct T (i32, char)

Struct without fields

struct E

Enum

enum Attend { OnTime, Late(u32) }

Box: owning pointer to value in heap

Box<T>

Shared/mutable ref

&i32, &mut i32

UTF-8 string, dynamic

String

Reference to str: non-owning pointer to UTF-8 text

&str

Array, fix len, same type

[f64; 4], [u8; 256]

Vector, varying len

Vec<u64>

Ref to slice

&[u8], &mut [u8]

Optional value, either None or Some(v)

Option<&str>

Result of operation: Ok(v) xor Err(e)

Result<u64, Error>

Trait object

&dyn Any, &mut dyn Read

Fn pointer

fn(&str) -> bool

Closures

e.g. |a, b| { a*a + b*b }

Numbers

Fixed-Width Numeric Types

For efficiency, Rust provides ints and floats with fixed width from 8 to 128bits, signed and unsigned – see above. The isize/usize are "native" width, either 32b or 64 depending on machine

Byte values are written as u8 – this is what you get if you read from a socket or from a binary file

One constraint regarding arrays: indices for those must be usize

Int literals can be suffized with their type, 123u32. If those are left off, Rust tries to infer the correct type. Prefixes 0x, 0o, and 0b are hexadecimal, octal, and binary literals. For readability, one may add "_" in the literal at arbirtray places.

Char values are not numeric. However, there's a byte literal: b'A' and 65u8 are equivalent. Byte literal use backslashes for escaping, e.g. b'\'', b''. ASCII codes are provided as b'', e.g. ASCII escape: b'1b'

There's a few int methods in the std lib, e.g. 2u16.pow(4), (-4i32).abs(). Note the parens around the neg. int, method calls have higher precedence that unary minus. Also note that while usually Rust will infer types, here this can be ambiguous. Rust refuses to guess and will throw a compiler error.

Depending on the build type (either debug or release), Rust will runtime panic on overflow, or silently wrap around. There are versions of operators for doing checked operation, always overflow, or saturate (i.e. return the "nearest" val). They have the prefixes checked_, wrapping_, saturating_, or overflowing_ – e.g. checked_add()

Floating-Point Types

Rust’s f32 and f64 correspond to the float and double types in C. When inferring the type, Rust will prefer f64 if both would be possible. When inferring, Rust will always keep ints and floats separate.

The std::f32::consts and std::f64::consts modules have widely used constants – E, PI, etc. Note theres the primitive types f32/f64 but also modules for each (std::f32 / std::f64).

Bool

Comparison operators have bool values. Only those are allowed in control structures, e.g. must write if x != 0 { ... }, the Pythonesque if x { ... } is not allowed.

The as operator converts bool to integers (false == 0, true == 1), the other way around is not possible though

Characters

These represent single characters, as a 32bit value – while Strings are encoded as utf8, i.e. Strings are not vectors of chars.

Char literals can be written in single quotes, in hex and in 32b hex notation '\u{HHHHHH}'

Tuples

Tuple elements don't have to be of the same type. They can only be indexed with a constant, e.g. t.2

Funs often returns tuples – use pattern matching syntax to unpack:

let (head, tail) = text.split_at(21);

The zero tuple () is used in places where context requires a value but we don't have meaningful value to use. E.g. a fun that doesn't return a value, this evals to ()

Btw. trailing commas are allowed everywhere where commas are used – in tuples, fun params, arrays, etc.

Pointer Types

References

This is something I really dig about Rust – how it explicitly exposes pointers in a safe way. This results in C-like control, while still preventing the kind of catastrophic faults that plague C/C++. And separating this into shared (r/o) and exclusive (r/w) helps muchly with safety, also in the face of concurrency.

&T

a shared read-only reference

&mut T

exclusive read-write reference

Box

Box variables can be used to allocate a variable on the heap: let b = Box::new(somevalue);

Raw Pointers

Just like C pointers. May only be dereferenced in an unsafe block. So this is where the foot-shooting area begins

*const T

read-only

*mut T

read-write

Arrays, Vectors, and Slices
[T; N]

Array of N values, each of type T. Arrays are of constant size; it's size is part of the type signature. E.g. let x = [true; 10000] is an array of 10000 bools set to true; let buf = [0u8; 1024] is a 1kb buffer, zero-initialized

Vec<T>

Vector, dynamically allocated on the heap, like a Python list

&[T], &mut [T]

A shared slice of Ts and a mutable slice of Ts. These can refer to arrays or vectors, consist of a pointer to the first elem and the len. Shared vs. mut works as for other references

For generality the latter should be used in function signatures if either an array or a vector would work. Rust auto-converts a ref to an array to a slice ref, thus many methods on slices are available for arrays as well

For vectors there is the vec![] macro. Some examples

  • Creating from a literal: let mut nums = vec![1,2,3,4]

  • Adding a new elem: nums.push(123)

  • Collecting items from some iterable into a vector: let v: Vec<i32> = (0..5).collect()

Similar to arrays, vectors also can use methods defined on slices, e.g.:

  • Length: nums.len()

  • Sort: nums.sort()

  • Reverse: nums.reverse()

A new empty vector can be created with Vec::new(); for optimization you can pre-allocate a capacity with Vec::with_capacity(N) if you have a good guess what that will be. This can avoid re-allocation of the Vector elems that would occur on capacity increases

Some vector methods:

  • Insert: ve.insert(pos, val)

  • Remove: ve.remove(pos)

  • Remove and return last: ve.pop()

Iterating over a vector:

let nums = vec![1,2,3,4];
for x in nums {
    println("=> {}", x);
}

Slices are regions of arrays or vectors. Slices can't be stored in variables, only refs to slices can.

String Literals

Standard strings literals are double quoted and can span several lines; the allow or the usual backslash escapes. If these are not wanted, there are also raw strings, r"some \raw text", or with a number of hash signs: r###"Look ma -- a \" quote"### – those don't recognize escapes.

Byte Strings

These are strings of u8 characters: b"GET" , and a raw variant: br"POST"

Using Strings

A &str is similar to a &[T], a fat pointer to some data (which can come from a string literal or a String). A String on the other hand resembles a Vec<T> – it lives on the heap, is resizable etc.

String vars can be created with the .to_string() method which copies from a &str. The format!() macro interpolates string similarly to how println!() works.

String iterables have methods .concat() and .join() that produce a single string from several strings. Strings have an assortment of methods as you'd expect for case conversion, finding and replacing, trimming, etc. Also, Strings are comparable.

Strings in Rust are utf-8. If other charsets are needed:

  • use Vec<u8>

  • use std::path::PathBuf for filesystem paths

  • use OsString for things like cli args or envvars

  • and finally std::ffi::CString for null-terminated strings from C

Type Aliases

Similar to typedef: type Bytes = Vec<u8>

Coda

This concludes the basic Rust types. There's a rich selection of datatypes, in line with Rusts efficiency mission. The most important and interesting bits to me were strict division between immutable/mutable, and the way Rust handles references, again with the distinction shared+r/o and mutable-but-exclusive. This is gold for safety, especially in concurrent code.