Interaction: Input, Command Line and Operating System
Learning objectives
- You know how to read and parse text input in a Rust application.
- You know how to read command line arguments in a Rust application.
- You know how to read environment variables in a Rust application.
- You know how to read and write files in a Rust application.
Rust for the command line
Rust is a great language for writing command line applications, and not least for its speed, safety and multi-platform support. Many standard and time-honored tools have been rewritten in Rust. As an example fd and ripgrep are popular (faster) alternatives to the standard find and grep tools that offer basic (yet powerful) directory searching capabilities.
In this part, we will be making our own command line applications using Rust. We will make interactive programs by reading and parsing user input from the command line, reading command line arguments, reading and writing files, and recursing into directories.
Reading user input
A simple way to create an interactive program is to have the program read user input from the command line. To read user input in Rust we will be needing the io
(input and output) module in the std
library. Calling io::stdin()
returns a Stdin
struct which can be used to handle input from stdin. Stdin, short for standard input stream, handles the input text written on the command line.
In the above example, we use read_line(&mut input)
to read one line from the input stream and write it to the predefined input
string. The read_line
method returns a Result
type, where the Ok
variant contains the number of bytes read from the stream and the Err
variant contains an error message.
Note: there is no interactive input implemented for the embedded code editor. Instead, input can be specified in the "Inputs" text field above the code editor. The input is then passed to the program as if it was typed in the command line terminal. Try out modifying the input to see the output change.
For an authentic experience, you can run the program on your own computer in a terminal and type the input therein.
Reading Input in Rust with VS Code
To read input in Rust with VS Code, you need to open an integrated terminal (command line). Instructions for opening a terminal in VS Code can be found here. Alas, if you run a Rust program in VS Code via the default run button, you won't see an input prompt.
On your computer, you can also cause an error with read_line
by passing invalid UTF-8 as input from e.g. an image file. File contents can be redirected to the standard input of a program process with the syntax command < file
. For instance, the following example redirects the contents of my-image.png
file into the standard input of the program launched with the command cargo run
(when run in a Rust project directory that contains a my-image.png
file).
cargo run < my-image.png
When we don't mind panicking on errors, we may opt to simply unwrapping the result.
The read_line
method that we used to read a line from stdin is a blocking function. It will read from the underlying input stream until it encounters a newline \n
(pressing Enter when inserting input in an interactive command line environment) or an EOF
i.e. end of file marker. In other words, calling read_line
will wait until a new line appears, which is the case when we press Enter in an interactive command line program, or the input stream ends.
In the online embedded environment, we can see printed input even when there is no newline in the input. This is because the input in the embedded editor is not an open stream but ends with EOF
.
Let's look at another example.
Here, one line of input is read to the string name
and then printed out. We can also notice (after testing out the code) that the !
ends up on the next line. This happens because the read_line
method pushes the newline character to the string. We can solve this minor annoyance by using the trim
method of str
.
When one line (at a time) is not enough
We can get an iterator over the lines in the standard input stream by using the lines
method of Stdin
.
We use the iterator's take
method to stop the iterator after going through three lines. Otherwise the program would run forever waiting for more lines when executed in a command line — in the embedded editor, the program ends nicely even with fewer lines because there the input stream is finite. Similarly as read_line
gives out a Result
in case input can't be converted to UTF-8, the lines
method gives out an iterator of such Result
s.
As an alternative to read_line
. We may take one line from the iterator given by calling lines
.
Here we see something that can be a bit unpleasant to the eye: two unwrap
s in a row. The first unwrap
is on the Option
returned by next
(the next line might not exist) that needs to be handled for any iterator. The second unwrap
is on the Result
for handling invalid input.
With the read_line
method, we needed only one unwrap
. With this approach, we don't need to trim the string and we don't need to define a mutable string to store the input. Choose your poison.
In the previous chapter we looked at various ways to iterate over finite collections. Iterators are not only useful for processing finite sequences of values, but also for processing infinite streams of values.
The collect
method on iterators is useful for converting the iterator to a vector or a hash map. What happens if we collect the iterator from io::stdin()::lines()
to a vector?
In the embedded editor this a bit of an anticlimax, since it seems to work just fine. But this is just because the embedded editor receives a finite input ending in an EOF. If we run the program in a terminal, it will keep waiting for more input and never end.
Notice also, that the type of the input validation result is io::Result<String>
which does not have an error type specified. This is because the error type is io::Error
and io::Result
is an alias for Result<String, io::Error>
.
Let's try out using another infinite iterator. The repeat
method in std::iter
creates an iterator that repeats the same value over and over again. To collect from it, we need to specify how many we want to collect. Otherwise we'll just be collecting till the end of time or space.
Parsing input into numbers
Next, we have a slightly more complex example than just reading input. We will read two numbers from the standard input and print out their sum.
To parse a string into a number, we can use the parse
method on the str
type (line 4 in the below example).
Parsing the string into i32
returns a Result
, so we need to handle that too in addition to all the possible errors from reading input. The resulting code is a bit verbose, but it is necessary to keep the compiler happy.
The error type of the Result
returned by the parse
method is ParseIntError
, which represents multiple different error kinds that are defined in the enum IntErrorKind
. For example, parsing a string that contains invalid characters will result in an IntErrorKind::InvalidDigit
.
We can handle the different error kinds by first getting the kind enum from the error with kind()
, and then using the match
expression to handle the enum variants.
Command line arguments and environment variables
When running a program from a command line, we can provide arguments to the program after giving the program name, like run
for cargo
in cargo run
. Passing arguments to a program is not that different from passing arguments (i.e. values) to a parameterized function.
Executing the command echo Hello
prints out Hello
, because the echo
program just prints out all of its arguments. Notice that shells aka command line interpreters use a space character to separate the base program.
The following example works like the "echo" program, it prints out the arguments it is given.
In the example, we first import the std::env
module, which contains various functions for getting information about the environment of the process (the instance of a computer program being executed). We then get an iterator of the program arguments with the std::env::args
function and collect the iterator values before printing them out.
Running the above example with cargo run -- Hello World
in a terminal prints out ["target/debug/echo", "Hello", "World"]
(assuming the project name is echo
). The first argument (at index 0) is the path to the program. The rest of the arguments are the arguments passed to the program. We need to include the --
argument to tell cargo
that the arguments after the --
are not for cargo run
, but for the program that is run.
Since we collect the arguments into a Vec
, we can use the normal operations available on vector to process the arguments. We should still take care that our program is prepared for common user mistakes, such as forgetting to add all the required arguments. The number of arguments is exactly as many arguments as has been passed to the program, and the compiler cannot know that number beforehand.
Below, we have a program that reads two arguments and multiplies them together. It gets the arguments by using the indices 1 and 2, and then parses them into f64
s. It doesn't handle the case where the user doesn't provide two arguments very nicely but provides an obscure message instead. With get
, we can provide a better error explanation or a default value to use when the index is out of bounds.
With this sort of an application that simply multiplies its values, we could easily do much better than multiplying only two values. We can handle an arbitrary number of arguments by using the product
method directly on the args iterator.
We'll want to ignore the filename argument at the beginning of the iterator though for our multiplication. For this, the skip
method of the iterator comes in handy. The product
method returns 1.0
if the iterator is empty, which makes this approach safe to use also when providing no arguments.
Environment variables
Environment variables are variables defined in a shell's environment that programs inherit when they are run in the shell. Environment variables are often used to configure a program. For example, cargo uses the RUST_BACKTRACE
environment variable for enabling backtrace for Rust runtime errors.
In Rust, we can access environment variables with the env::vars
function. It returns an iterator of environment variables names and values as tuples, which we can collect into a hash map for further use.
Let's see what environment variables are available to us in our program.
When running the code in the embedded editor, we get to see the used RUST_VERSION
of the embedded editor (among a plethora of other variables). This is the same version the automatic exercise grader uses when grading exercises.
We can define or update existing environment variables in a shell by exporting them. To see the backtraces for runtime errors in Rust, we can set the RUST_BACKTRACE
environment variable to 1
with
# sh (posix shell / unix-like systems)
export RUST_BACKTRACE=1
# Windows CMD
set RUST_BACKTRACE="1"
# Windows PowerShell
$env:RUST_BACKTRACE="1"
In Unix-like systems (e.g. Linux, Mac), we can overwrite or set new environment variables for just a single command in a shell. This is done by prefixing the command with VAR=value
. As an example, to enable Rust backtrace for only one cargo run
, we can run
RUST_BACKTRACE=1 cargo run
Managing files and directories
An operating system (OS) manages the resources our computer can use: memory, disks, networking, filesystem and drawing to the screen. Next, we will look at how to interact with the operating system by reading and writing files within Rust code
An operating system should not be taken for granted, not every programmable piece of machinery has one. Rust provides low-level access for working with hardware, so it can be used in an environment which has no operating system, like on an embedded microcontroller. There we don't have access to input (stdin), output (println) or the Internet.
Writing code for an embedded device is an advanced topic however, and we will not cover it in this course.
If you are interested in the topic and feel comfortable using Rust already (or after completing this course), you can read more about it in the Rust embedded discovery book (for those new to embedded programming) or the Rust embedded book (more advanced, for those with some experience in embedded programming).
Reading files
Reading a file requires knowing the path to it. In Unix-like operating systems, like Linux, the directories and files of the directory structure are separated by slashes /
in the path. In Windows, the directories and files are separated by backslashes \
. We use unix-like paths in this course material.
A path can start with ./
to indicate that it is relative to the directory the program is being run at. Let's say we are running the following program from the path /home/user/project/
. We can use the std::fs::read
function to read the contents of a file into a vector of bytes (Vec<u8>
). We can then convert the bytes into a string with the String::from_utf8
function.
Calling fs::read("./src/main.rs")
will try to read the /home/user/project/src/main.rs
file. If the file exists, and the user's permissions are sufficient, the contents of that file will be saved in the bytes
variable. Try to modify the path in the above example to a file that does not exist, e.g. /src/main.rs
, to see a runtime error.
In the usual case, we want to read a file and convert its contents to a string, like we did with fs::read
and String::from_utf8
. Being such a common operation fs
has a function for just that fs::read_to_string
.
Writing to a file
We can use the standard library function fs::write
to write a string to a file in the specified path. The fs::write
first creates a file (if it doesn't already exist) and then writes to the file by combining the fs::File::create
and io::Write::write_all
functions into one convenient function.
We can also pass a byte vector to fs::write
to write any binary data to a file, like the contents of an image or a video.
Note that the fs::write
function will overwrite the file if it already exists. To avoid overwriting an existing file, we can check its existence before writing to it with the path::Path
struct and its exists
method.
Appending to a file
Rust does not provide a convenience function for appending to a file, but we can use the fs::OpenOptions
struct to open the file in append mode. We can then append text to the file using the writeln!
macro, which is a convenience macro for writing a string and a newline to a buffer (there is also write!
when we don't want a new line at the end). Using the macro requires an additional method for OpenOptions
though, which can be added by importing the trait std::io::Write
(the compiler kindly hints us to do so in case we forget).
Removing a file
Removing a file in Rust code is as straightforward as creating or overwriting them with fs::write
with the fs::remove_file
function. This function will return an error if the given path doesn't exist, the path is a directory, or the user doesn't have permission to remove the file.
Try removing or commenting out the fs::write
line to see the runtime error of trying to remove a non-existent file.
We can also read files at compile time with the include_str!
macro. The include_str!
macro will read the file at compile time and include the contents of the file as a string. The path of the read file is located relative to the file where the macro is called.
An invalid path will cause a compile time error. On the other hand, the file will not be read at runtime so the file does not need to exist when the program is run.
Listing directories
Listing directories can be a bit more complicated than reading and writing files because we have more possible errors to deal with. Although with the help of the ?
(try) operator, we can streamline through most of them by propagating the errors back to the caller.
Rust can often be verbose, but it doesn't have to be always. Let's have a look at a simple backup function that leverages the fs::read_to_string
function along with the fs::write
to create a backup copy of a file.
Even though the function does not do that much, it contains quite a lot of code. We could of course use the more concise expect
or unwrap
functions to handle the error by causing a runtime panic, but often we want to propagate the error back to the caller instead. This way the caller can choose how to handle the error, and that is also the way most programming languages work implicitly.
To make error handling simpler, Rust provides a way to propagate errors by using the ?
(pronounced try) operator. With it, our backup function can look rather nice and concise.
The ?
operator works for both Option
s and Result
s by checking if the value in front of it is None
or Err
and returning the error prematurely. If the value is Some
or Ok
, ?
unwraps the value.
Note that using ?
requires the function to return either an Option
or a Result
, and the propagated value needs to match the return type.
The ?
operator can also be used propagate errors from the main
function by giving it a return type of Result
.
Using std::fs::read_dir
we can get an iterator over all the files and directories at the path provided as argument.
The read_dir
function returns an io::Result<ReadDir>
, which we can iterate over, but iterating over a Result
only gives the wrapped value if it is Ok
. We want to iterate over the ReadDir
instead to get individual DirEntry
s, which contain information about the entry, like whether it is a directory or a file.
Here is a good place to try the ?
operator to get the value inside the result and propagate the error to the caller if it is an Err
. Note that we need to give the function a return type of Result
or Option
to be able to use ?
.
The ReadDir
iterator gives us io::Result<DirEntry>
s, which is interesting because we have just handled the errors from read_dir
. The reason is that the ReadDir
iterator doesn't contain the contents of the directory in any way. When the for loop calls next()
during each iteration, the program gets the next DirEntry
from the operating system. As with anything that interacts with the operating system, this may also fail.
But now we finally have access to the DirEntry
s which have many useful methods, like file_name
, path
and metadata
. We can use the metadata
method on a DirEntry
to get more information about the file or directory. metadata
also interacts with the operating system, thus requiring us to handle potential errors.
We can see for example, which entries are directories and how big each file is.
Here we also use the Result
type from the io
module as the return type, which works with the ?
operator because it is just a regular Result
with the error type already set to io::Error
.
The metadata for a file or directory can be accessed also by using the fs::metadata
function, which takes a path as argument. It too returns a Result
in case the path doesn't exist or the program doesn't have permission to access it.
When we need to create a new directory, Rust standard library provides the functions fs::create_dir
and fs::create_dir_all
. The create_dir
function will return an error if a directory with the same name already exists or if one of it's parent directories doesn't exist. The create_dir_all
function will create all the parent directories if they don't exist and will return Ok
even when all directories in a given path exists.
For removing directories, Rust standard library provides the functions fs::remove_dir
and fs::remove_dir_all
. The remove_dir
function only works for empty directories, while remove_dir_all
recursively removes all the files and directories inside the directory before removing the directory itself.
Like the file modification and removal functions, these all return an error on failure due to e.g. insufficient permissions.
OsString and pesky temporary values
The file_name
method of DirEntry
doesn't return a String
or a &str
which are already familiar to us, but rather an std::ffi::OsString
. This OsString
is a compatibility feature in Rust which can store data in the different encodings different operating systems use — an OsString
may contain non-valid UTF-8 unlike a String
.
Let's say we want to format our file metadata listing from previous example with padding (:>20
) for more pleasant reading. An OsString
can't be displayed without debug format (:?
) and padding doesn't work on debug format, so we need to get a String
or &str
from the OsString
.
The simplest way to convert an OsString
to a &str
is to use the to_string_lossy
method, which returns a &str
where invalid unicode characters are replaced with �
. This method technically returns a Clone-on-write smart pointer Cow<str>
, but we don't need to worry about that yet. For our current purposes, we can use it like a normal &str
— we'll cover smart pointers later when looking closer into memory and lifetimes.
With this information, we should now know for instance how to format the prints in our previous metadata listing example for prettier output. The following code won't compile however because in it a temporary value is dropped before it is being used. To fix this, we need to follow the compiler's advice and store the result of calling entry.file_name()
in a separate variable.
This mistake is very common, and can be quite surprising to new Rustaceans. The problem here is that entry.file_name()
returns a new OsString
, which is not a reference to entry
. Then calling to_string_lossy
on the OsString
returns a value that references the OsString
. But the referenced OsString
gets dropped because no variable is going to be its owner in the current scope. To fix this, we can add a variable for the temporary owned value OsString
. Later in the course when discussing lifetimes this behaviour will hopefully become clearer.
Hi! Please help us improve the course!
Please consider the following statements and questions regarding this part of the course. We use your answers for improving the course.
I can see how the assignments and materials fit in with what I am supposed to learn.
I find most of what I learned so far interesting.
I am certain that I can learn the taught skills and knowledge.
I find that I would benefit from having explicit deadlines.
I feel overwhelmed by the amount of work.
I try out the examples outlined in the materials.
I feel that the assignments are too difficult.
I feel that I've been systematic and organized in my studying.
How many hours (estimated with a precision of half an hour) did you spend reading the material and completing the assignments for this part? (use a dot as the decimal separator, e.g 8.5)
How would you improve the material or assignments?