Ashley Sheridan​.co.uk

Programming by Coincidence

Posted on

(or why copy-and-paste development is a bad idea)

This past year, working with CodeSniffer and more recently SonarQube, I'm acutely more aware of the quality and consistency of the code that I write. SonarQube has really been great for this, because of the way the deployment procedures operate at TMW; it integrates extremely well with TeamCity and helps the whole team with all the languages we develop in here. This takes the responsibility of running code checks away from the individual and moves it to the deployment process, effectively enforcing checks for everyone within the team when deployments happen.

One of the concerns this has highlighted is the use of duplicated code, which is a symptom of copy-and-paste code.

Reading online you'll find plenty of mixed opinions regarding this practice, ranging from the “no problem” through to the “you're going to hell for even suggesting it” views. So is there really a problem?

Assumptions and Answers

The answer lies with the assumptions we make. If you're relatively new to the development scene, you probably don't have the experience to understand the baggage that can come with copying code (whether it's your own or somebody else's), and maybe you don't even understand how the code works, which is why you're copying it in the first place. If you're more experienced, you will understand some of the inherent problems with third-party code, and even know that the code you wrote a year ago is probably not as good as the code you're writing now.

The Pros and Cons

So what are the pros to copy-and-paste development?

And the cons?

Shades of Grey

The fact is though, it's not always a simple black and white situation. Consider a situation where you find a need to convert a hex-formatted colour string into it's component RGB values. Sure, you can write this yourself, but in that time it's probably far quicker to search online and use something that someone else has written before you.

This is where some people stop though, and they shouldn't. When you copy something that someone else has written, at the very least you should have a basic understanding of what you've just copied. I've seen a recent victim of copy-and-paste programming where a developer had committed code that contained angled quotation marks to encompass a string. Obviously this had come from copying code off of a website and not checking it over. The ensuing syntax error highlighted the problem fairly quickly, but it does show that not checking code sourced from elsewhere like this can result in software only working by pure blind luck at best, or doing something very different at worst.

The less risky result to copy-paste development is the technical debt involved with maintaining duplicate copies of code in all of the various places they've been strewn throughout your codebase. Any time you need to make an update to a part of code, you end up needing to make it in multiple places; this is extra work and effort, and it really ought not to be the case.

Dangerous Territory

Taking the above case of copying third-party code, imagine a scenario where the copied code was more than a single line, and was actually a whole function or block? Unless the copier is giving this code more than a cursory glance, then theoretically that code could be a Trojan horse just waiting to happen. It could be argued that code like this would be found quickly and downvoted through whatever mechanism the code-hosting site offered, but what if you were the first person to be stung this way through this piece of code? At some point, voting mechanisms fail (and shouldn't be relied upon for something as potentially critical as this).

The second example could become very disastrous if you're up against the clock and needing to fix an urgent bug quickly. Did you absolutely fix every instance of the duplication? How long did it take you compared to fixing it in one place?

The Solution

In an ideal world, everyone would understand every line of code they're outputting, and everything would be perfectly written to make maximum reuse of all shared code. Unfortunately we don't live in such a world. Developers have to learn their craft by doing, and we often find ourselves putting off re factoring code when we have a tight deadline to meet.

So what can we really do if this is the reality we're facing? The simple answer is to just do our best, and remember that we're not perfect, we just need to aim for perfect all of the time.

Give third-party code a look-over to see if anything looks really odd. Should that PDF library really be calling eval() like that? Are those single and double quotes really meant to be angled, or was that a mistake by the original editor? Does the linting tool you're using highlight anything as being wrong with the code (and who isn't using some kind of linter these days?)

If you're copy-pasting your own code into a project, take a step back and ask yourself would it really take that much longer to re-factor it properly now? Is 10 minutes now an acceptable trade-off for an hours worth of 5-minute updates later in the coming months? If you're not the next person who will be working with this, does the other person know that changes may need to be made in more than one place? And will they make fun of you about it afterwards at the water cooler?