Exercism: The Hamming Exercise
We need to find the differences between two strings. But those strings are also very array-like; as in, “On this DNA strand there’s a G at the first position.” And Ruby already has a lot of nice syntax for comparing arrays. So I started thinking about this problem as one with an array-focused solution.
I also thought about the Ruby Set library, but that won’t work because Sets want unique objects, so a “G” in the first position and a “G” in the 8th position would be squished down to just one “G”.
strand = %w(g a t t a g)
=> ["g", "a", "t", "t", "a", "g"]
strand.to_set
=> #<Set: {"g", "a", "t"}>
That won’t work.
So, following my standard approach, I did the easiest fastest thing that got the tests passing. And it looked like this
class Hamming
def self.compute(a,b)
a = a.chars
b = b.chars
max = (a.count < b.count) ? a.count : b.count
a = a[0,max]
b = b[0,max]
y = a.zip(b)
y.inject(0) {|ret, h| ret += 1 if h.first != h.last; ret}
end
end
Oh, yeah, that’s nasty. The duplicate logic; the weird, meaningless sorting and trimming of arrays; the block that takes a few seconds to puzzle out. It’s rough stuff.
But, it passed my tests, and that’s all I wanted. With tests green, I can refactor. And refactor I did! Removing the array trimming was an easy fix. Then I began to pull out Strand behaviors
class Strand
include Enumerable
attr_accessor :collection
def self.parse(strand_string)
self.new(strand_string)
end
def initialize(strand_string)
self.collection = strand_string.chars
end
def each(&block)
collection.each(&block)
end
end
Like I said, I quickly thought of a Strand as being array-ish. So Strand
has a way of parsing a string into an array and then acts as a thin wrapper around the array.
But if an array for each Strand is good, then surely one more array must be better! After noticing the behaviors that required two Strands, I decided to contain that logic in a Strands
object.
This was a mistake. And even when I tried to make it better it was still a mistake. The compute
method became nedlessly complex and tied to its knowledge of both Strand and Strands. Time to ditch this and revisit how I originally wanted to solve this problem.
I was initially drawn to a Strand as an array, largely because of Ruby’s implementation of array subtraction/diffing/etc. The Ruby implementaition of subtraction doesn’t work for Hamming, but I still liked that syntax. In my perfect world, this would work:
hamming_differences = Strand.parse('GAT') - Strand.parse('AAT')
Your own desires are a great design tool. This is the syntax you want to see? Write code that makes it happen. Don’t know how? Learn!
So I ditched Strands. As I say in the commit:
Strands didn’t really give me anything beyond a way to manipulate two Strand objects. It seemed unecessary if I could have a Strand know how to do the maniuplation. Like, if I want to add 1 + 2, I don’t have an Integers object that does the math, the Integer object knows how to do it.
And, after I googled “What the hell is Hamming?” and discovered that it is a very generic idea used all the time, I also decide to separate the Hamming logic in such a way that I can add it to any class.
class Hamming
def self.compute(a,b)
(Strand.parse(a) - Strand.parse(b)).count
end
end
module Hammable
def -(other)
sorted = [self,other].sort_by { |x| x.count }
combined = sorted.first.zip(sorted.last)
x = combined.select {|strand_set| strand_set.first != strand_set.last}
self.class.new(x)
end
end
class Strand
include Enumerable
include Hammable
def self.parse(strand_string)
self.new(strand_string.chars)
end
def initialize(strand_array)
self.collection = Array(strand_array)
end
def each(&block)
collection.each(&block)
end
private
attr_accessor :collection
end
Now I have a Hammable
module I can add to whatever. The -
method implementaiton is ugly, but it works.
I started to doubt the wisdom of overriding the array -
method. It is probably a really dumb idea. Imagine the unexepcted surprises. Want to subtract two normal arrays, it works one way. Subtract two hammable arrays and it’s totally different. Yikes! And if you had one hammable and one normal array, the behavior would change depending on the order you used. Terrible. So a better approach is to define a new method, say hamming_difference
Strand.parse(a).hamming_difference(Strand.parse(b))
It’s a better idea, but it’s not anywhere as nice to read. Oh, well. Sometimes we can’t have everything we want.
And there’s the question of what type of object it should return. Right now it will return a Strand
as it was called by a Strand
. But a better approach might be to have it return a HammingDifference
object (itself just a thin wrapper around an array).
module Hammable
def hamming_difference(other)
sorted = [self,other].sort_by { |x| x.count }
combined = sorted.first.zip(sorted.last)
x = combined.select {|strand_set| strand_set.first != strand_set.last}
HammingDifference.new(x)
end
end
class HammingDifference < SimpleDelegator; end
But what does that give us? Is there an appreciable difference between Hamming and HammingDifference? If so, I’m not seeing it. So now, maybe after all this code re-arranging, we should move the Hamming logic back under Hamming
.
class Hamming
def self.compute(a, b, type = Strand)
self.new(a, b, type).count
end
def initialize(a, b, type)
self.one = type.parse(a)
self.two = type.parse(b)
end
def count
difference.count
end
def difference
one.zip(two).select {|pair| Comparison.different?(pair)}
end
class Comparison
def self.different?(couple)
couple.first != couple.last &&
!couple.last.nil? &&
!couple.first.nil?
end
end
private
attr_accessor :one, :two
end
require 'delegate'
class Strand < SimpleDelegator
def self.parse(strand_string)
self.new(strand_string.chars)
end
end
And finally, maybe, I’m ok with the code. Well, the attributes inside of Hamming
are badly named, but that’s minor. compute
has become a declarative builder method that instantiates an instance of Hamming
and call’s the instance’s count
method.
Gone is Hammable
, since its logic now lives in Hamming
. We no longer have to mix that code into Strand
, though we do now have to tell what kind of objects Hamming
should create. I’ve defaulted it to Strand
here, but Hamming
could easily work with other strings we want to find the Hamming distance of.
Comparison
lives inside of the Hamming
namespace, which looks a little weird. Why not just have difference?
be a method in Hamming
? Well, I wanted to encapsulate the logic of what constitutes a difference between two strands: both strands have to have a letter and those letters have to be different from one another. And to encapsulate the logic inside of a Hamming
method seemed weird. Because somewhere in the code we’nd end up calling self.different?(pair)
and that wasn’t a method I wanted a Hamming
instance to have. For some reason. Even I’m a little unclear on why I didn’t like it. Further proof that code design is frequently more an art that a science.
Summary
I think the code here is more reasonable than my work on the Bob problem. There’s less cleverness for cleverness’ sake. Hamming
works easily with Strand
objects, but can work with other objects. The code is simple enough to understand. I can see this implementation being one that I’d be happy to work with.
Possible dowsides are that the array manipulation will make this slow. I’m sure there’s a much better algorithm out there for figuring out the hamming distance. But Exercism isn’t about performance tuning, so I’m not going to worry about that.