Data Structures
Learning objectives
- You know how to define and use tuples in Rust.
- You know how to define and use arrays in Rust.
- You know how to define and use vectors in Rust.
- You know how to define and use hash maps in Rust.
Data structures
Data structures are used to organize data values in a collection. Data structures are efficient at different things, so choosing one over another is often based on the memory and speed requirements.
We will cover some of the most common data structures in Rust: tuples, arrays, vectors (you may be familiar with a similar structure called list from other programming languages) and hash maps (also known as dictionaries).
Tuples
A tuple in Rust is a collection of values where the values may be of different types. Tuples are also fixed-length collections, meaning that once declared, they cannot grow or shrink in size.
A tuple is defined by wrapping the values we want separated by commas inside parentheses.
The type of a tuple is written as a tuple of the types of the elements. As we can see in the above example, the type of ("one", 2, 3.0)
is (&str, u32, f32)
.
Since the values in a tuple are not restricted by type, we can even include tuples inside tuples.
Pretty debug format
We need to add the debug formatting parameter :?
for tuples in the format string to be able to print it. Try to change the :?
to the pretty print debug parameter :#?
to print a tuple over multiple lines.
Tuples are sequential collections and the values of a tuple can be accessed by its assigned index.
Indices of a sequence start from zero in Rust, so the first value of a tuple is referenced with the index 0.
Values in tuples (and other collections) are commonly referred to as elements. Trying to access a tuple element using an index too large for the tuple causes a compiler error, which we can see if we run the following code.
Tuples are useful, for instance, when we want to return multiple values at once from a function.
Notice from the main function in the above example that we can use the tuple syntax to destructure the tuple values into individual variables. We can use the destructuring pattern also in parameters and for matching multiple values of a tuple at once.
Mutating tuple elements
Like with any value in Rust, collections need to be declared mutable in order to modify them. The values within the collections are also immutable unless the variable that owns the collection is defined as mutable.
In the above example, we have to change the move_point
function to take a mutable reference to the point parameter, instead of an immutable reference. In the main
function, the point
variable needs to marked as mut
in order to take a &mut point
and pass it to move_point
. Note that this makes the whole collection mutable.
Notice also that we don't need to explicitly dereference point.0
but the dereferencing is done implicitly when accessing the element in the referred tuple. If we destructure the tuple parameter into individual variables, we need to use the *
symbol to dereference them.
Tuple ownership
In Rust, each value has exactly one owner and changing ownership is called moving. Passing an argument to a function that does not borrow the value means that the parameter of the function takes ownership of a given value and the old owner of the value can no longer be used.
When we still need the value after passing it to a function, we can use the clone
method to create a copy of the value that would be moved instead.
If the type is copiable, e.g. i32
, the value is implicitly copied instead of moved.
When we define a variable with a value from a tuple, the value is moved or copied from the tuple. Remember that we need to be careful with ownership when moving values, because in Rust every value can only have one owner.
In the example, we need to reference the value in the tuple if we still want the tuple to contain the element. Another option would be to clone
it, but that is unnecessary in this example, as we don't need ownership of the string to print it. Try to fix the example or inspect the solution.
When all of the elements of a tuple can be copied, the tuple itself also is copiable.
Otherwise, the tuple is moved.
Question not found or loading of the question is still in progress.
Arrays
Arrays are similar to tuples in that they are fixed-length and contain a contiguous sequence of values.
An array is defined using square brackets [
]
. Like with tuples, the elements in an array are separated by commas.
Unlike a tuple, an array may not contain values of different types. Trying to run the following code will result in a compiler error.
Indexing arrays
An element of an array can be accessed using its index with the indexing operator []
.
Arrays can be indexed using a dynamic value known only during runtime, for example a value in a function parameter. This is not possible with tuples, which can be indexed only by compile-time constants.
The type of an index needs to be usize
, which is an unsigned integer type where the size depends on machine architecture. Trying to index with other types is not allowed.
Destructuring the elements in an array is similar to destructuring a tuple.
The array type
The notation used for the type of an array is [T; N]
, where T
is the type of each element and N
is the number of elements in the array.
We may not modify an array in a way that would decrease or increase its length. We can still mutate the array elements.
Handling indexing errors
Accessing an out of bounds index causes a panic during runtime.
This is because the compiler can't know if the index is valid at compile time. Well, in the above case it would be possible to know with some improvements in the compiler, but generally no.
When we want to avoid crashing, we can use the get
method on arrays to return None
if the index is out of bounds rather than panicking. A successful indexing returns a reference to the value wrapped in Some
.
Due to restrictions in the borrow checker, we cannot simultaneously have a mutable reference to an element and an immutable reference to another of the same array.
Currently there is no safe and efficient workaround for this.
Moving array values?
Another difference to tuples is that we cannot move values from arrays. Indexing into an array tries to implicitly copy the value, which only works for certain types such as integers, but not strings.
The following code will not compile. By running the code (or trying to), the compiler will tell us why.
As the compiler output suggests, we can use a &
to borrow a string from the array. If we instead wanted another copy of the string, we would explicitly clone the value in the array: cardinals[1].clone()
.
Moving and modifying arrays
The copy versus move semantics are the same for arrays as for tuples. When the array contains copiable values, the whole array is also copiable (even if it may not be cheap to copy, like when the array size is large).
To modify an array in-place in a function, we need to take a mutable reference to it.
Array methods
Rust provides multiple methods that make working with arrays easier. As an example, we can check whether an array contains an element using the contains
method.
This counterexample fails to compile, because contains
expects a reference to a value of the element type char
. The type of the input for contains
should be &char
. The compiler is helpful and suggests borrowing 'y'
to us. Run the code to see the compiler hint.
Another useful method is sort
. It sorts the array in-place in increasing order according to the ordering rules defined for the element type.
Tuples are sorted like strings, the first letter is considered first.
We have been a bit misleading here with the array methods. The example methods, including get
, are not defined for arrays but for slices instead.
The Array doesn't actually have a get
method, see for your self: https://doc.rust-lang.org/std/primitive.array.html. The get
method exists on slices, which arrays coerce to — so technically we are using the slice's get method.
Arrays get coerced into slices so we can use slice methods on arrays too.
Slices
A slice in Rust is a view into a sequence of values. Slices are very similar to arrays, but unlike arrays, slices are dynamically sized. This means that the size of a slice is known only at runtime.
Slicing into a collection is similar to indexing. Only with slicing, we specify a range of values to get.
A slice is a kind of reference to a sequence of values. Slices are borrowed from the collection they are slicing into, wherefore we use the &
operator to create a slice. The slice data type is written like that of the array's but without the size: [T]
, where T
can be almost any type.
Holding a slice to an array is like having a reference to it. Thus the array cannot be modified while the slice is still alive. Even the values that are outside the slice cannot be modified.
We can leave the start or end indices of the range out in the range syntax to slice from the beginning or to the end of the collection. Oh, and slices can be sliced further as well — just like a pizza.
Although slices are dynamically sized, they can be destructured like arrays and tuples in match and if let expressions.
Coercing arrays to slices
Let's have a look at an example, where we create an array of slices from three arrays of different sizes.
In the example, we use three different ways to fully slice the arrays. The first two, &one[..]
and &two.as_slice()
are both slices of the full array. The third one, &three
is not a slice but a reference to the array. The third method works just as well, because arrays coerce implicitly to slices.
This makes slices very useful as parameters for passing around collections of values. The function can take any type that coerces into a slice or we can pass around a slice of a collection instead of the whole collection.
The string slice str
The str
type, aka string slice, is a special kind of slice. It is a slice of bytes ([u8]
) that always represents a valid Unicode (UTF-8) string — not all 8 byte sequences are valid UTF-8. We don't normally see str
's in Rust, but it's borrowed form &str
.
In order to convert a &[u8]
to a &str
, we need the check its validity using e.g. std::str::from_utf8
.
With String
, we have a broader toolset available. We can use String::from_utf8_lossy
to get a String
from a &[u8]
even if it's not valid UTF-8.
We can slice strings to get string slice references, &str
. Slicing a String
is done just like with arrays.
Let's look at this in action with a function starts_with
, which takes two string slices and checks whether the second starts with the first.
In the first call to starts_with
, we pass a &str
to it, but in the second it looks like we pass in a &String
. String
s coerce to str
slices, so &start
coerces from a &String
to a &str
.
Many programming languages allow indexing into strings, but Rust doesn't. Rust does not allow indexing into a string because it would cause confusion in indexing performance by allowing non-constant time indexing operations.
Strings in Rust are encoded as UTF-8, and UTF-8 is not a constant sized encoding. Thus, finding the nth character requires iterating over the string and that is not a constant-time operation.
To find the nth character in a String
, we can use the chars
method to get an iterator over the characters of a string and then the nth
method, e.g. "abcdefg".chars().nth(5)
. Try out yourself!
For those interested, a more comprehensive explanation can be read in the Rust book.
Vectors
Often we want an array-like structure that we can modify at will by adding or removing values. For such operations we need another data structure many modern languages call a list. In Rust the most common list-like data structure is Vec, short for vector.
There are multiple ways to create a vector. For instance, we can use the new
function of Vec
, or we can use the macro vec!
:
As we can see from the example, an empty vector requires type annotations with a similar syntax as we have seen with Option
(let maybe_i32: Option<i32> = None;
).
The vec!
macro allows us to initialize a vector as seamlessly as we have done with arrays and tuples.
If we already have an array or a slice, we can create a vector containing a clone of all the values by calling the to_vec
method.
Indexing a vector works the same as with arrays and slices.
Vector values, like String
s, are not implicitly copied. Instead, they are moved. This means that if we want to use a vector after we have passed it to a function, we need to pass it as a reference or pass an explicitly cloned value.
Adding and removing values
New values can be added to an existing vector via the push
method. We can use the append
method to move elements over from another vector.
We also used the len
method like with arrays and slices to get the current length of the collection. All the slice methods we used on arrays (get
, contains
, join
, sort
) also work for vectors.
Removing elements from a vector is rather straightforward with the remove
method. It removes the element from the vector in a given index and gets the ownership of the value. The pop
method is useful for removing the last value, if the vector is not empty.
Vec does not have a method for removing elements directly by value. We will look into the different ways this can be done in the next part. You can also look for other useful vector methods in the standard library documentation. Or you can just fiddle with a Vec
variable in an IDE or the Rust REPL to see suggestions.
Maps
The data structure map or an associative array (or dictionary in some languages) is a useful data structure when we want to associate and access values by something other than their order in a collection. Maps incorporate key-value pairs such that each key is unique and each value can be accessed by its paired key.
Rust has different map implementations in standard library but they need to be explicitly imported in order to use them. Here we will concentrate only on HashMap, which is the most commonly used map implementation.
To use a HashMap
, we import it from the std::collections
module. Creating an empty HashMap
is similar to creating a vector with a new
function. Adding key-value pairs can be done with HashMap
's insert
method.
While there is no builtin macro for initialising a HashMap (like with vectors), a slightly more concise initialization is possible with HashMap
's from
function.
Accessing and removing hash map values is similar to working with vectors. We can use the same square bracket syntax []
to get a value directly or the get
method to get it safely through an Option
.
We can use the remove
method with a key to remove a pair from the map and gain access to the value. We also have the option to return the removed key-value pair as a tuple with remove_entry
.
Modifying values is a wee trickier with hash maps since we cannot update a HashMap
value directly. Something like food_prices["cabbage"] += 10
will not compile. Instead, we have other options such as get_mut
which gets a mutable reference to the value inside an Option
.
Another way to access hash maps values is to take an entry, which represents a spot for a key-value pair in the map.
The hash map Entry
has the method or_insert
, which allows us to easily and safely retrieve the value from the entry. If the entry is empty, it will insert the provided value and return a mutable reference to it. If the entry is occupied, it will return a mutable reference to the value inside.
We can easily modify the value in the entry when getting a mutable reference to it with the or_insert
method.
The Entry
type returned from the entry
method on HashMap is another enum in Rust. Like Option
and Result
, it has two possible variants. These are Occupied
and Vacant
. If we wanted to do something different depending on whether the entry was occupied or vacant, we could use the match
expression to handle both cases.
There are some caveats in the above example code. We use the owned String
instead of the borrowed &str
for the keys in the hash map because we want to avoid lifetime collisions of references. We will look into other methods later in the course. Also, the prices
parameter needs to be mutable even when we don't modify the hash map because the entry
method returns a mutable reference from the hash map.
Summary of symbols
Symbol | Description |
---|---|
(a, b, ... ) | Defines or destructures a tuple |
[a, b, ... ] | Defines an array, or destructures an array or a slice |
use | Used to import items from other modules. |
Hi! Please help us improve the course!
Please consider the following statements and questions regarding this part of the course. We use your answers for improving the course.
I can see how the assignments and materials fit in with what I am supposed to learn.
I find most of what I learned so far interesting.
I am certain that I can learn the taught skills and knowledge.
I find that I would benefit from having explicit deadlines.
I feel overwhelmed by the amount of work.
I try out the examples outlined in the materials.
I feel that the assignments are too difficult.
I feel that I've been systematic and organized in my studying.
How many hours (estimated with a precision of half an hour) did you spend reading the material and completing the assignments for this part? (use a dot as the decimal separator, e.g 8.5)
How would you improve the material or assignments?