Exercism: The RNA Transcription Exercise

As with the Gigasecond exercise, it doesn’t take much to get this to pass. The example solution is:

class Complement
  def self.of_dna(strand)
    strand.tr('CGTA', 'GCAU')
  end

  def self.of_rna(strand)
    strand.tr('GCAU', 'CGTA')
  end
end

(An aside: Did you know that Exercism has example solutions in their Git repo? I did not. I was wondering why 40% of people’s solutions looked exactly the same. Copying and pasting ain’t learning, folks)

I, personally, have never seen tr in the wild, so that was not my first solution. Mine was super dumb, and much longer.

class Complement
  def self.of_dna(strand)
    ret = ""
    strand.each_char { |x| ret << find_dna_complement_of(x) }
    ret
  end

  def self.of_rna(strand)
    ret = ""
    strand.each_char { |x| ret << find_rna_complement_of(x) }
    ret
  end

  def self.find_dna_complement_of(nucleotide)
    case nucleotide
    when 'C'
      'G'
    when 'G'
      'C'
    when 'T'
      'A'
    when 'A'
      'U'
    end
  end

  def self.find_rna_complement_of(nucleotide)
    case nucleotide
    when 'C'
      'G'
    when 'G'
      'C'
    when 'U'
      'A'
    when 'A'
      'T'
    end
  end
end

My commit message talks a bit about the duplication in this code and I try to highlight the knowledge duplication that I want to remove in my next commit. Which I did, thusly:

  #...
  def self.of_dna(strand)
    build_complement_for(strand, "dna")
  end

  def self.of_rna(strand)
    build_complement_for(strand, "rna")
  end

  def self.build_complement_for(strand, type)
    ret = ""
    strand.each_char { |x| ret << public_send("find_#{type}_complement_of".to_sym, x) }
    ret
  end
  #...

Better. But a giant red flag just went up – a parameter named type. That almost always means one thing: I’m trying to implement polymorphism without using classes. Rarely a good idea. Here come the RNA and DNA classes:

class Complement
  def self.of_dna(strand)
    build_complement_for(strand, DNA)
  end

  def self.of_rna(strand)
    build_complement_for(strand, RNA)
  end

  def self.build_complement_for(strand, type)
    ret = ""
    strand.each_char { |x| ret << type.new(x).complement }
    ret
  end
end

class RNA
  attr_accessor :nucleotide

  def initialize(nucleotide)
    self.nucleotide = nucleotide
  end

  def complement
    case nucleotide
    when 'C'
      'G'
    when 'G'
      'C'
    when 'U'
      'A'
    when 'A'
      'T'
    end
  end
end

class DNA
  #...same as RNA except for the complement return values
end

Ok, but why don’t RNA and DNA know anything about strands? That seems like something they should know.

class Complement
  def self.of_dna(strand)
    DNA.new(strand).complement
  end

  def self.of_rna(strand)
    RNA.new(strand).complement
  end
end

class RNA
  attr_accessor :strand

  def initialize(strand)
    self.strand = strand
  end

  def nucleotides
    strand.chars
  end

  def complement
    nucleotides.inject("") do |ret, nucleotide|
      ret << complement_of(nucleotide)
    end
  end
  #...
end
#...

And then we can make it more Ruby-like by using its to_* idiom, creating the to_dna and to_rna methods

Also, here I’m moving the resonsibility of knowing about complements. Previously, DNA would know what its RNA complement was. Why? DNA should know DNA complements and RNA should know RNA complements. If DNA wants to know about RNA, it should ask RNA.

class Complement
  def self.of_dna(strand)
    DNA.new(strand).to_rna
  end

  def self.of_rna(strand)
    RNA.new(strand).to_dna
  end
end

class RNA
  attr_accessor :strand

  def self.complement_for(nucleotide)
    {"C" => "G", "G" => "C", "T" => "A", "A" => "U"}[nucleotide]
  end

  #...

  def to_dna
    nucleotides.inject("") do |ret, nucleotide|
      ret << DNA.complement_for(nucleotide)
    end
  end
end

class DNA
  #... you can imagine what this looks like
end

Finally, I tackle the obvious problem of inheritance. DNA and RNA are both examples of nucleic acids and their behavior is almost exactly the same. I’m comfortable with using inheritance here because I don’t see any of the common inheritance problems arising. This is a shallow, narrow object family; and there aren’t going to be weird grand-children classes or partial API implementations. My final solution to the problem:

class Complement
  def self.of_dna(strand)
    DNA.new(strand).to_rna
  end

  def self.of_rna(strand)
    RNA.new(strand).to_dna
  end
end

class NucleicAcid
  attr_accessor :strand

  def initialize(strand)
    self.strand = strand
  end

  def nucleotides
    strand.chars
  end

  def transcribe_to(acid)
    nucleotides.inject("") do |ret, nucleotide|
      ret << acid.complement_for(nucleotide)
    end
  end

  def to_dna
    raise StandardError, "Call this on descendants"
  end

  def to_rna
    raise StandardError, "Call this on descendants"
  end

  def self.complement_for(nucleotide)
    raise StandardError, "Call this on descendants"
  end
end

class RNA < NucleicAcid
  def self.complement_for(nucleotide)
    {"C" => "G", "G" => "C", "T" => "A", "A" => "U"}[nucleotide]
  end

  def to_dna
    transcribe_to(DNA)
  end

  def to_rna
    self
  end
end

class DNA < NucleicAcid
  def self.complement_for(nucleotide)
    {"C" => "G", "G" => "C", "U" => "A", "A" => "T"}[nucleotide]
  end

  def to_rna
    transcribe_to(RNA)
  end

  def to_dna
    self
  end
end

My one hesitation here was having a to_dna method on DNA and a to_rna method on RNA. These methods have to be there to satisfy Liskov, but I wondered if they were necessary. However, Ruby actually has a lot of methods like this. For example, String instances respond to to_s, and Integers respond to to_i. Realizing that made me a lot more comfortable with my approach.

As has been true in all of my Exercism solutions, this code goes far beyond what it needs to do in order to get the tests to pass. It’s also about 60 lines longer than Exercism’s own example solution. Is there value in this verbosity? That is not a question with a single answer. If I were writing this code to help me pass a Biology 101 class, then no. If I were writing it for use in a synthetic biology lab that is creating their own nucleic acids? Then maybe.

But I’m writing it for Exercism (and for these blog posts). Exercism wants me to, “Make the tests pass. Keep the code as simple, readable, and expressive as you can.” And it advises nitpickers to make suggestions that make the code:

Simple
Readable
Maintainable
Modular

Those 4 rules are pretty close to the “4 Rules of Simple Design”, as stated by Corey Haines

Tests pass
Express Intent
No duplication of knowledge
Small

Or, if you prefer Sandi Metz’s acronym, code that is TRUE

Transparent
Reasonable
Usable
Exemplary

These descriptions of “good design” (or “better design”, if you’re Corey Haines) are all different words to describe code that people have found easy to work with over a long period of time. Because design doesn’t matter if you just want to run the code once. If you want to do that, just slap in Exercism’s example code and move on. Plenty of people do.

But if you’re trying to design better code, then you need to look at more than just making a few tests pass. The questions I like to ask myself are:

Will I understand this code if I look at it in 6 months?
Will my co-workers understand this code when they have to fix it?
Will future maintaners be able to extend this code with very little hassle?
Will other teams be able to extend this code without problems?

The first question affects just me, the second affects 3-5 people, the third affects dozens and the fourth affects an uknown number of people. Thinking about the number of people that will be able to easily understand/use/modify your code is a usefuly way of thinking about design. Poorly designed code will not satisfy many people; well desgined code will.

So, with that in mind, let’s circle back to the original question. Is there value in my Exercism solution? I obviously think so, but I’m biased. I’ll leave the question for you to answer. Which code would you rather work with?