Simple Rust Part Four
layout: post title: “Simple Rust, Part Four” date: 2015-12-30T14:07:28-06:00 subtitle: “Iterators, References and Options” author: “Ian Whitney” —
This is it, the home stretch. In the 3 previous articles we’ve learned a lot about Rust (like Strings, Arrays and Lifetimes), but we haven’t actually written the code that started us down this path – finding anagrams of words. Today we will write that code.
Of our 11 tests we have managed to get the first one passing. Just 10 more tests to go! Here’s our next one:
Our current function won’t work, obviously. It only returns an empty vector:
Now, I have no idea how to do this in Rust. But I know how I would do it in Ruby, which is a good place to start.
I can use my working Ruby implementation like a Rosetta Stone to translate code I know how to write into a language that I’m still learning.
In the first line of my Ruby anagrams_for
, I’m using reject
to remove any input that is the same word as our source. “ant” is not an anagram of “ant” because they are the same word. Then I use select
to gather up all the input words that have the exact same letters as our source. This relies on Ruby’s Enumerable library and its support for closures.
Rust loves list-transformation functions like these, which it calls Iterator Adaptors. Another common name for them is Higher Order Functions. Ruby’s select
and reject
are examples of the filter higher order function. Rust also offers filter, which it calls filter. The filter
documentation reads:
An iterator that filters the elements of iter with predicate.
Ok. That almost makes sense. I know what a predicate is (though I just learned it recently). It’s an expression that returns true or false. But what’s iter
? Let’s go back to Ruby and see how it handles iteration.
In Ruby, you can iterate over anything that’s enumerable:
Arrays, hashes, ranges – these are enumerable in Ruby so you can iterate over their elements and use Ruby’s collection of higher order functions like select
, map
and inject
.
Rust offers arrays, hashes and ranges as well. You can iterate over them…sometimes:
Notice my qualifying ‘sometimes’ above:
The range 1..3
can be iterated over, but an array can not. Let’s follow the error’s advice and use iter()
:
Great. Simply put, iter()
takes a collection converts it into an Iterator; once you have an Iterator you can iterate over it and use all those fancy higher order functions, like filter
.
Filter will make it easy for us to replicate the first part of my Ruby solution, rejecting all inputs that are duplicates of the source word.
Our filter uses the same_word
function as its predicate. If the words are the same, we remove the input. If not, we keep the input. Running this gives us the error:
(expected struct `collections::vec::Vec`,
found struct `core::iter::Filter`) [E0308]
We’ve told Rust that anagrams_for
returns a Vec
, but it’s returning a Filter
. Weird.
This is because Rust’s iterators are lazy. Our call to filter doesn’t actually do the filtering until we try to gather up the results. Ruby’s Enumerator works similarly, as does its lazy enumerable. Rust calls its result-gathering functions Consumers. Since we want to collect the results of our filter, we use the collect
consumer. And, as the documentation points out, we have to tell collect what type we want to get back. Our function promises to return a Vec<&str>
, so let’s have collect give us one of those:
Now we’re sure to compile, right? By now you should know that every time I ask this question the answer is, “No.”
error: the trait `core::iter::FromIterator<&&str>` is not implemented for the type `collections::vec::Vec<&str>`
Hell. What this is saying is that collect can’t give me a Vec<&str>
because I’ve given it a iterator that contains &&str
.
&&str
? Or should I say &&str
??. Where did that second &
come from?
The short answer is that iter()
borrows each element in inputs. Inputs was full of &str
, which became &&str
when they were borrowed by iter()
(and &&&str
if they are borrowed a 3rd time. I don’t think there’s a limit to how many times you can re-borrow something). Herman Radtke’s blog post Effectively Using Iterators In Rust dives into this topic, if you want more details.
We can fix our code by changing it to collect the right type, or we can also tell collect
to figure out the type itself:
The _
is a ‘type placeholder’, which allows us to leave out the type of elements our vector will contain and let Rust determine it for us.
The type placeholder is nice, but it just gets us a new error. Our vector is still full of &&str
elements, and our function says it returns a Vec<&str>
. That’s a no go. We need a way to return Vec<&str>
.
I’ve been using the words ‘reference’ and ‘borrow’ pretty interchangeably without defining what’s going on. There’s a good reason for that – I don’t understand what’s going on. But let me take a stab at it:
If I write:
Then x is a &str
. I read that as “x is borrowing a str”, or “x holds a reference to a str”. Somewhere in memory is the actual str
, and our variable x
is holding a reference to that location.
If we then borrow x again:
Then y
is a &&str
. Or, “y holds a reference to a reference to a str.”
Rust offers a way to get at the thing we are referencing – the entirely un-googleable *
operator.
As far as I can tell *
is not well documented in the Rust book. It’s mentioned in the References and Borrowing section with the almost throwaway sentence:
You’ll also notice we added an asterisk (*) in front of y, making it *y, this is because y is an &mut reference. You’ll also need to use them for accessing the contents of a reference as well.
Emphasis mine. I dunno, maybe I’m off base, but this sounds pretty important. It’s certainly important to us right now. We have a vector full of references to references, and we want to get a vector of just the original references:
We create a new mutable vector and fill it with the &str
references from our filtered collection. We then return that vector, and our code compiles!
Now that we can remove inputs that match our source word, we have to find inputs that are anagrams of our source word. In my Ruby solution, I converted the word to an array of letters with chars
and then sorted the letters alphabetically with sort
:
I can then compare the sorted array of input characters to the sorted array of source characters.
We can do the same thing in Rust. The code looks familiar.
In same_letters
we use chars()
to convert our source and input to an iterable collection of characters, which we then collect
. We then sort those collections and compare them. This looks an awful lot like my Ruby example. Though in Rust, unlike Ruby, the sort()
function sorts the receiver in place, it does not return the sorted collection.
If we run this code against our test suite, nearly everything passes. Some tests cover upper-case characters, but the fix for case insensitivity isn’t that interesting so I’m going to skip it. If you want to see my solution, it’s over here.
Our tests pass, but I’m not happy with the code. I understand why I have to dereference everything, but that code seems totally irrelevant to what the anagrams_for
function is doing. It exists to make the compiler happy.
When I complained about this on Twitter, the Rusting Rubyist Mike Piccolo pointed me towards the filter_map
function. Its description is:
Creates an iterator that both filters and maps elements. If the specified function returns None, the element is skipped. Otherwise the option is unwrapped and the new value is yielded.
‘Unwrapping’ is kind of a weird word, and not one we’ve come across yet. There’s actually a lot of complexity here, which I’m going to entirely skip. The relevant thing we need to know is that I can stick a reference in the type Some
and when I ‘unwrap’ it I get my reference back. So:
Some
’s partner in crime is None
. You always see the two of them together. Taken together they are the “Option” or “Optional” type. If an expression can return something or nothing, you’ll see code like this:
When you deal with the result of this expression you write code that checks, “Did I get Some or None?” and behave appropriately.
Though it’s not about Rust, I found the book Maybe Haskell to be a great introduction to the concepts behind Some and None It can be a hard concept to wrap your head around, though once you do the value it provides is immense.
Knowing that background, the filter_map
description makes more sense. It:
- filters an iterator’s elements, just like
filter
- The filter uses a function that returns Some or None
- If None, the element is removed
- If Some, the reference is returned
Let’s see what our anagrams_for
function would look like if it used filter_map
2 lines shorter. More importantly, I find this easier to understand. The result of filter_map
is what we want to return, no need to loop and dereference a bunch of references.
And that’s it! Not only do we have a working anagram finder, but we (hopefully) know a lot more about Rust that when we started. There’s a lot (a lot) more to Rust than we’ve covered here, but this is where I’m going to stop for now. The next post on this site will return to the topic of Refactoring, and will be the first in a series of posts on Practicing Refactoring.
If you want to read more about Rust, Ruby, Refactoring and other things that start with R (and maybe other letters), maybe check out my totally free newsletter. You can read previous newsletters, or sign up for free. Comments/feedback/&c. welcome on twitter, at ian@ianwhitney.com, or leave comments on the pull request for this post.