SRY: Sometimes Repeat Yourself
Any programmer worth their salt will have heard of the DRY principle: Don’t Repeat Yourself. The idea is that repetition is bad: it makes for more code to read through, and it makes code harder and more error-prone to maintain because you have to make the same change in multiple places.
These are valid points, but like with any guideline or rule of thumb, it’s oversimplified, and the real answer is always “it depends”. So today I’d like to offer some counterarguments, in the form of situations where a little repetition can be a good thing. I’m calling it SRY: Sometimes Repeat Yourself.
For divergence
Sometimes repetition is by accident, rather than by necessity. A Person
class
and a Country
class may both have a name
property, with corresponding
getters and setters, but that doesn’t mean you should extract a NamedEntity
superclass (or mixin, depending on the language). Imagine you did, and you
later wanted to split the person’s name into a firstName
and lastName
(not recommended,
by the way)?
In general, if it is conceivable and valid for the repeated pieces of code to diverge and change independently of each other, they do not have to be the same, they just happen to be the same right now, and it is probably a good idea to leave the repetition in.
For readability
Some people go to great lengths to avoid repeating even a single line of code.
This isn’t always a good idea, because it comes at the cost of more abstraction
layers. But wait, you say, I thought abstraction was a good thing? Again, it
depends. Abstractions can make it easier to reason about a piece of code, but
they can also make code harder to read. Consider the canonical for
loop in
C-like languages:
for (int i = start; i < end; i++) {
doSomething();
}
You’re likely to see this all over the place, and indeed it is a bit error-prone, so you might think: let’s create an abstraction, so I have to write it only once!
for_numbers_between(start, end, doSomething);
This might be perfectly clear to you at the time you write it, but it does add
a layer of opacity to the code. The for
loop was completely transparent to
anyone familiar with this idiom, but when stumbling upon your helper function,
they wonder: what exactly does this function do? Is end
included in the
range? Does the function pass the loop counter to doSomething
? Can I return a
boolean to break out of the loop? In JavaScript: what will the value of this
be in the callback? The only way to be sure is to look up the definition of
for_numbers_between
, which will most likely be buried far away in some
utils
package.
So when adding abstractions like these, besides considering the gain in reduced repetition, also be sure to consider the loss in transparency, and weigh them against each other.
For searchability
Somewhat related to the previous point, some people want to avoid repetition in
strings like file paths and naming prefixes. For example, suppose your style
guide says that all CSS classes should be prefixed with dirname-filename-
to
avoid clashes between packages:
.profile-description-edit-bio {
...
}
.profile-description-edit-interests {
...
}
When referring to such a CSS class in code (or even SASS or another CSS preprocessing language), it’s tempting to extract the prefix into a string constant and put it at the top of the file somewhere:
const CSS_PREFIX = "profile-description-edit-";
...
element.addClass(CSS_PREFIX + "bio");
Not only does this suffer from the same readability issue as in the previous
section, but it also lacks another useful property: searchability. If a reader
sees the CSS rule for .profile-description-edit-bio
in the style sheet, they
might wonder where it is used. Since this is the über-dynamic stringly typed
world of the web, the only viable way of finding out is to search for the name
of the CSS class throughout the project. But thanks to the string pasting, that
particular usage is not going to show up. In the best case, the reader tries to
search for some prefixes or suffixes of the class name, like bio
, and might
find it that way among a pile of false positives. In the worst case, they might
conclude that the CSS rule is unused, and remove it! Conversely, any use of
such string pasting in the codebase makes removal of actually unused CSS
rules difficult to do.
So here too, it’s better to have the string constant spelled out in its entirety every time it’s used. Just pretend it’s an identifier, like the name of a variable or a function; you don’t chop those up either. It’s a bit more verbose, but much easier to maintain with in the long term.