When Ferrous Metals Corrode, pt. XVII

Intro

This post summarizes chapter 18, "Input and Output"

Rust I/O is organized around three basic traits: Read, BufRead, and Write. Read does byte-oriented input, BufRead buffered reads (lines of text and similar), Write does output.

Example Read types are Stdin, File, TcpStream; BufRead: Cursor and StdinLock. Examples for Write: Stdout, Stderr, File, TcpStream, BufWriter

Readers and Writers

An instructive example: the stdlib copy fun:

// import Read Write etc, also io
use std::io::{self, Read, Write, ErrorKind};

const DEFAULT_BUF_SIZE: usize = 8 * 1024;


pub fn copy<R: ?Sized, W: ?Sized>(reader: &mut R, writer: &mut W)
    // generic over R, W, both questionably sized, and
    // we want both to be mut ref
    -> io::Result<u64> // returning a result incl. bytes copied
    where R: Read, W: Write
    // R, W must fulful Read and Write contracts
{
    let mut buf = [0; DEFAULT_BUF_SIZE];
    let mut written = 0;
    loop { // loop forever

        // read into buf, remember number read
        let len = match reader.read(&mut buf) {
            // Success, nothing to read -> return Ok
            Ok(0) => return Ok(written),
            // Success, read some bytes
            Ok(len) => len,
            // match on error interrupted: just try again
            Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
            // Some other error -> return Err
            Err(e) => return Err(e),
        };
        // Write what we read, return any errors
        writer.write_all(&buf[..len])?;
        // remember total bytes read
        written += len as u64;
    }
}    

Since this is generic it can be used to copy data between all kinds of readers and writes

Readers

Readers have these methods:

reader.read(&mut buffer)

read some bytes into buffer (which must be [u8]). This returns Result<u64> which is an alias for Result<u64, io:Error>, i.e. the module-specific Result type for io. The io::Error type has a .kind() method to specify the type of error. As seen above the Interrupted error (i.e. EINTR) should be handled specially, as the read just was interrupted by a signal – typically the operation should be retried. This is a pretty low-level method; there's some convenience wrapper around this in the stdlib too

reader.read_to_end(&mut byte_vec)

reads all remaining data into a byte vector

reader.read_to_string(&mut string)

as above, but append to String; input must be valid utf8

reader.read_exact(&mut buf)

read exactly enough data to fill the buf; errors out if there's not enough

reader.bytes()

makes an iterator over input bytes, returning a Result for every byte

reader.chain(reader2)

chain two readers, create a new reader

reader.take(n)

create a new reader which only reads n bytes

Note readers don't have a close method; typically they implement Drop which auto-closes

Buffered Readers

Buffered readers manage a buffer of some kilobytes which they use to serve data out of. They implement the Read and BufRead traits:

reader.read_line(&mut line)

read a line of text and append it to the "line" String buffer. No newlines nor CRs are stripped.

reader.lines()

return an iterator over input lines, with Result elements. Newlines and CR are stripped. This is the typical way to read text.

reader.read_until(stop_byte, &mut byte_vec), reader.split(stop​_byte)

similar to the above, but read bytes instead of char, and configure a separator byte

Reading Lines

Example, a fgrep utility:

use std::io;
use std::io::prelude::*;

// search for target string in a reader
// just return found/not found
fn grep<R>(target: &str, reader: R) -> io::Result<()>
    // we require the BufRead trait
    where R: BufRead
{
    // iterate over the input lines, printing any matches
    for line_result in reader.lines() {
        let line = line_result?;
        if line.contains(target) {
            println!("{}", line);
        }
    }
    Ok(())
}

// usage

// read from stdin. Stdin needs to be locked first
// which will return a BufRead 
let stdin = io::stdin();
grep(&target, stdin.lock())?;  // ok

// or, open some file, and construct a BufRead from 
// the File (File only implements Reader not BufRead)
let f = File::open(file)?;
grep(&target, BufReader::new(f))?;  // also ok
Collecting Lines

If you want to collect lines from a reader those will come first as Results; to get rid of those see below.

// we're getting a Vector of Results 
let results: Vec<io::Result<String>> = reader.lines().collect();

// error: can't convert collection of Results to Vec<String>
let lines: Vec<String> = reader.lines().collect();

// works but not very elegant:
let mut lines = vec![];
for line_result in reader.lines() {
    lines.push(line_result?);
}

// to get a Result with a string vector ask for it explicitly
let lines = reader.lines().collect::<io::Result<Vec<String>>>()?;    

This uses a FromIterator for Result which builds a collection of Ok results, but stops on any error.

Writers

Write to Writers and format via the writeln! or write! macros:

writeln!(io::stderr(), "error: world not helloable")?;

Those return Results (as opposed to println which panics).

Write trait methods:

writer.write(&buf)

low-level writing of some bytes, seldom used directly

writer.write_all(&buf)

write all bytes

writer.flush()

flushes buffers

There's a BufWriter type too:

let file = File::create("tmp.txt")?;
let writer = BufWriter::new(file);

Dropping BufWriters flushes the buffer, however any errors are ignored. Better flush explicitly.

Files

Opening files, from the std::fs module:

File::open(filename)

open existing file for reading

File::create(filename)

create a new file for writing, truncate existing files

The OpenOptions type allows more control on how to open a file:

use std::fs::OpenOptions;

let log = OpenOptions::new()
    .append(true)  // if file exists, add to the end
    .open("server.log")?;

let file = OpenOptions::new()
    .write(true)
    .create_new(true)  // fail if file exists
    .open("new_file.txt")?;

Chaining these methods is a pattern called a "builder" in Rust.

Seeking

Files also implement Seek:

pub trait Seek {
    fn seek(&mut self, pos: SeekFrom) -> io::Result<u64>;
}

pub enum SeekFrom {
    Start(u64),
    End(i64),
    Current(i64)
}

// usage

file.seek(SeekFrom::Start(0)) // to rewind to the beginning
file.seek(SeekFrom::Current(-8)) // go back a few bytes

Other Reader and Writer Types

io::stdin()

reader for stdin, protected by a mutex:

let stdin = io::stdin();
let lines = stdin.lock().lines();
io::stdout(), io::stderr()

stdin and stderr writers, also mutex-protected

Vec<u8>

mem writer

Cursor::new(buf)

creates a buffered reader which reads from buf, which can be anything that implements AsRef<[u8]>. These implement Read, BufRead, and Seek; also implements Write if buf is a Vec<u8> or &mut [u8]

std::net::TcpStream

a tcp connection, both Read and Write. Create with TcpStream::connect(("hostname", PORT))

std::process::Command

spawn a child process, pipe data to its stdin

use std::process::{Command, Stdio};

// another builder, chaining to add args
let mut child =
    Command::new("grep")
    .arg("-e")
    .arg("a.*e.*i.*o.*u")
    .stdin(Stdio::piped())  // connect stdin to pipe
    .spawn()?;

// take value out of option
let mut to_child = child.stdin.take().unwrap();
for word in my_words {
    // feed words to child
    writeln!(to_child, "{}", word)?;
}
drop(to_child);  // close grep's stdin, so it will exit
child.wait()?;

The .stdin(Stdio::piped()) bit ensures the process has stdin attached. As you'd expect you can also readers/writers via methods .stdout() and .stderr()

Some generic readers and writers which can be created via std::io

io::sink()

null writer, just discards input

io::empty()

null reader, reading succeeds but returns eof

io::repeat(byte)

reads, returns 'byte' forever

Binary Data, Compression, and Serialization

Some external crates that add functionality for reading and writing.

byteorder

Traits for reading/writing binary data

flate2

gzip compression

serde

serialization, e.g. json

A bit of a closer look at serde.

Example serialization, serialize a hashmap to stdout:

let mut mydata = HashMap::new();
// ...add some data
serde_json::to_writer(&mut std::io::stdout(), &mydata)?;

The serde crate attaches the serde::Serialize trait to all types it knows about (which includes hashmaps).

Adding serde support to custom structs via derive:

#[derive(Serialize, Deserialize)]
struct Player { ... }

The derive must be requested specifically in Cargo.toml:

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

Files and Directories

Support filesystems can be found in the std::path and std::fs modules.

OsStr and Path

Rusts str type is strict about Unicode, unlike operating system path handling functions. That's where the std::ffi::OsStr and OsString types come in. Those can hold a superset of utf8. OsString is to OsStr as String is to str. Lastly there is std::path::Path for filenames (and correspondingly the type PathBuf that represents the actual heap-alloc owned value).

These all implement AsRef<Path> so we can make a generic fun that accepts "any filename-like thingy", e.g. like this:

fn swizzle_file<P>(path_arg: P) -> io::Result<()>
   where P: AsRef<Path>
{
    let path = path_arg.as_ref();
    ...
}

Path and PathBuf Methods

Path::new(str)

create from str or OsStr

path.parent()

return the parent directory as an Option<&Path>

path.file_name()

return last path component as an Option

path.is_absolute(), path.is_relative()

test for absolute / relative paths

path1.join(path2)

join two paths, returning a new PathBuf. If path2 is absolute, it'll just return path2

path.components()

return iterator over path components, left to right.

path.ancestors()

return iterator that walks up the path

path.to_str()

convert to Option<&str>, None if path isn't valid utf8

path.to_string_lossy()

as above but replace any invalid char with the utf8 missing char

path.display()

returns a value that implements Display, for printing / formatting.

Methods that query the filesystem: .exists(), .is_file(), .is_dir(), .read_dir(), .canonicalize() and others.

Filesystem Access Functions

Functions from std::fs. These all return io::Result vals.

create_dir(path)
mkdir
create_dir_all(path)
mkdir -p
remove_dir(path)
rmdir
remove_dir_all(path)s
rm -r
remove_file(path)
unlink
copy(src_path, dest_path) -> Result<u64>
cp -p
rename(src_path, dest_path)
rename
hard_link(src_path, dest_path)
link
canonicalize(path) -> Result<PathBuf>
realpath
metadata(path) -> Result<Metadata>
stat, also see path.metadata()
symlink_metadata(path) -> Result<Metadata>
lstat
read_dir(path) -> Result<ReadDir>
opendir, also see path.read_dir()
read_link(path) -> Result<PathBuf>
readlink
set_permissions(path, perm)
chmod

Reading Directories

Example reading dir contents of some path value:

for entry_result in path.read_dir()? {
    let entry = entry_result?;
    println!("{}", entry.file_name().to_string_lossy());
}

The std::fs::DirEntry being printed here have these methods:

entry.file_name()

entry name as an OsStr

entry.path()

full PathBuf

entry.file_type()
FileType result; file types have methods .is_file(), .is_dir(), .is_symlink()
entry.metadata()

rest of metadata

Dirs . and .. are not returned by .read_dir()

Platform-Specific Features

The std::os module has some basic platform-specific features. On Unix-likes there's a symlink() fun to symlink paths. There are also some platform-specific extension traits. More platform-specifics are available via 3rd-party crates.

Networking

The std::net module has low-level network support; native_tls provides SSL/TLS.

Example: echo server with inline notes

use std::net::TcpListener;
use std::io;
use std::thread::spawn;

/// Accept connections forever, spawning a thread for each one.
fn echo_main(addr: &str) -> io::Result<()> {
    // bind a listener to the addr
    let listener = TcpListener::bind(addr)?;
    println!("listening on {}", addr);
    // serve forever
    loop {
        // Wait for a client to connect. note addr shadows the listen addr
        let (mut stream, addr) = listener.accept()?;
        println!("connection received from {}", addr);

        // Spawn a thread to handle this client.
        let mut write_stream = stream.try_clone()?;
        // move streams into closure
        spawn(move || {
            // Echo everything we receive from `stream` back to it.
            io::copy(&mut stream, &mut write_stream)
                .expect("error in client thread: ");
            println!("connection closed");
        });
    }
}

fn main() {
    echo_main("127.0.0.1:17007").expect("error: ");
}

Higher level networking is supported by 3rd party crates. Example: http client via the reqwest crate, using it's blocking feature.

use std::error::Error;
use std::io;

// result with an error trait obj
fn http_get_main(url: &str) -> Result<(), Box<dyn Error>> {
    // Send the HTTP request and get a response.
    let mut response = reqwest::blocking::get(url)?;
    if !response.status().is_success() {
        Err(format!("{}", response.status()))?;
    }

    // Read the response body and write it to stdout.
    let stdout = io::stdout();
    // note we need to lock stdout
    io::copy(&mut response, &mut stdout.lock())?;

    Ok(())
}

fn main() {
    let args: Vec<String> = std::env::args().collect();
    if args.len() != 2 {
        eprintln!("usage: http-get URL");
        return;
    }

    if let Err(err) = http_get_main(&args[1]) {
        eprintln!("error: {}", err);
    }
}

More networking crates:

actix-web

http server framework

websocket

websocket proto

tower

components for building clients and servers

glob

file globbing

notify

monitor files

walkdir

recurse into a dir

Coda

Another very practical chapter! The stdlib i/o seems well-structured though a bit bare-bones (no tempfile and async i/o, need external crates for that).