Wednesday, October 2, 2013

The Lesser Evil, avoiding Copy and Paste

I'm a great supporter of clean code. My own take on this can be found in an earlier post. The most common issue that I find is Copy and Paste-programming, and the most common explanation is lack of time. The problem is that it seldom saves time, even in the short run.

Copy and Paste-programming is a time thief. Every time. Even that deadline you have in 2 weeks, 2 days or 2 hours is endangered.

I'm hoping this post may inspire some hard-pressed-for-time developers to find the resolve to tackle the demon that is Copy and Paste, and come out feeling a little bit better and actually delivering more in less time.

The rationale

You probably know that Copy and Paste is a bad thing. Unfortunately most focus is on long-term maintenance aspects when explaining why it's bad. This makes for a perfect rationale when you're in a hurry. "I'll Copy and Paste this now, just to get the functionality done in time. I know I'll have to pay for this later during maintenance, but I don't see any other choice.".

I'd like to point out that the problems wíth Copy and Paste are much more severe than this, and that there really are other choices! Remember that evils are seldom equal, and if you have two evils to choose from, go for the lesser evil!

The maintenance argument

The main problem with the maintenance argument is that maintenance is not strictly separated from development. If you're lucky enough to work in a really agile environment, there's really no such thing as maintenance anyway, just a continuous number of releases. So which release/sprint/deploy should pay the cost of later, in favor of now?

It's seldom old code that you're Copying and Pasting. It's typically relatively new code, which means that it's still in rapid movement. Chances are, that within those 2 weeks/days/hours to release or end of the sprint, you'll have revisited that Copy and Pasted code more than once. In which case you'll have to propagate the changes to both copies, or forget one, and then have to bugfix it during acceptance testing - or worse hotfix it after deployment.

In either case, you're paying the cost for the Copy and Paste, not in that magical later slow phase of maintenance, but right now, when time's most at a premium. You're not saving time. You're losing time, and the reason you have so little time to lose is partially because of many small decisions like this.

Why is it likely that it's new code you're Copying and Pasting? Because otherwise it's unlikely that you know that the code is there to Copy! The other reason is that the most stressed-for-time releases are the major ones, especially the first. These are times where there is lots and lots of new code, also increasing the chances that it's new code. Finally, if the code is really old and stable, there's at least a chance that the functionality in question really has been factored out to common ground and there's no need any longer to Copy and Paste.

So, the "we'll gain time now, and pay for it later but that's ok"-argument is simply wrong. You're not gaining time now. You're losing time now.

The lesser evil

There are however options. Copy and Paste is the greater evil, but what are the lesser evils? For example:
  • Extract the code snippet to a general miscellanenous utility type. So, now you have a type with various disconnected pieces of code. A bucket full of... That's a lesser evil.
  • Move it to a common base class. So, now you're putting support code in a base class, and using inheritence to extend rather than specialize. That's a lesser evil.
  • Make the code snippet in question a public method just where it happens to be. So, now you have an unholy dependency between two types that probably should not have the dependency. That's a lesser evil.
All of these lesser evils are explicitly known to the compiler and fairly easy to safely refactor later, and it might even be good strategy since usually a pattern emerges after a while and it becomes clearer just where the common code should reside. There's only one copy of the code in either of these lesser evil alternatives, and that has to be a good thing!

Even if you never get to the refactoring stage, these are still lesser evils than the alternative - Copy and Paste.

Hopefully some readers will here find some supporting arguments and useful techniques reducing the amount of Copy and Paste.

Sunday, June 9, 2013

Aggressive Coding

Why you should code agressively, not defensively

Defensive coding is a concept that has it's origin in the absolute first ideas about programming as a craft, at least 40 years ago. Wikipedia describes it as something "intended to ensure the continuing function of a piece of software in spite of unforeseeable usage of said software". Apparently principles that can be characterized as defensive coding techniques are still being taught, or at least not actively discouraged.

Defensive coding sounds good in theory, but in practice it tends to excarberate the problems in question, and clutter up the code making it harder to understand and refactor.

A typical idea in defensive coding, is the Wikipedia example of copying strings from one buffer to another. The idea is that if the caller provides a longer source string than expected, this might in the case of C/C++ open up for the classic buffer overrun security vulnerability.

The defensive coder will, as the example shows, provide a function that checks the maximum buffer length and silently refuses to copy more than that to the destination buffer. This is bad and dangerous!

The defensive coder has now just hidden a serious bug in the calling code. If the contract states that 1000 characters is the maximum length of the input, the caller must ensure this and the callee refuse to accept anything that violates the contract.

The agressive coder will instead throw an exception or simply terminate the program if the function is called with a source larger than the allowed 1000 characters. This is safe and secure programming!

In terms of my current preferred language C#, I see this principle violated in a variety of ways. One frequent pattern is checking return results from other library functions for NULL, or empty strings etc, and then attempt to silently do something despite that this was unexpected. This typically indicates that the programmer does not know the contract of the method s(he) is calling. Do know the contract and ensure to follow it when providing input, and assume that it is followed for the produced output!

I recently rewrote a major functionality for a client and this also involved updating and refactoring the dependent code. To my horror a huge amount of code was devoted to checking the outputs of other methods, even to the extent of catch-all clauses silently ignoring any and all problems.

Instead of checking outputs from other other code because 'maybe it can return a NULL' - find out if it can, and what the appropriate action is! If you can't find out with reasonable effort and it doesn't seem a useful response, just use the return value and let your code blow up with a NullReferenceException should it in fact happen. This will alert you to the problem, and you can then find out what you really should do about it. Do check the inputs to your own code, and when the caller violates your contract report this with an exception.

Controlled crashing is good when it's because a caller violates a contract!

Agressive coding increases the chance of problems being caught and fixed early, and reduces the amount of clutter in the code immensly. This in turn lets you concentrate on what your code should do, instead of what someone elses code should not.

Thursday, April 25, 2013

The shortest book on good programming, ever!

The Coders Decalogue

This text is about making software work better and saving huge amounts of time, irritation and frustration for developers, users, customers and other stakeholders in the software business. Which likely means you.

When not developing my own software AxCrypt and Xecrets, I work as a contractor and consultant. In my work, I work with new software and old software. I work quite a bit with advanced troubleshooting and performance optimization in the .NET area.

Over the years, I've come to realize that I spend most of my time as a developer, doing things I wouldn't need to do if just a few simple rules are followed. I'd still have more than enough to do, no worries, but I'd be delivering much more real value to my customers for each hour spent. And so would millions of other developers. Come on - this is really not that hard!

I won't explain the rationale here, or give lot's of pedagogical examples. That would turn this into a real book, which would be nice. But I don't have time to write a book and you probably don't have time to read one.

So just trust me on this ;-) Really.

  • Do write code for humans. Smart one-liners, compact code, use of sneaky language constructs etc may not break your program. But it's not enough that the compiler understands the code. Don't write for the compiler, write your code in a style to make it as easy to read for humans as possible.
  • Don't copy and paste code with any kind of logic (ifs, loops, selects etc). Do always factor out common snippets. Even when you're in a hurry. Single-liners without logic are ok. That's called a statement in most languages, and you do need a few of those to make something happen and they can't all be uniqe.
  • Don't check-in commented out code. It's ok when you're trying out the new code - but when you're done, you're done. Since the code anyway resides in a version control system (right?), the old code is still available in the history. Do check-in clean code frequently always improving it slightly at the very least..
  • Do use long and descriptive class, method and member names. Letters in your source code are cheap. Use them freely. Don't abbreviate unless it's an industry or domain standard. 
  • Don't comment code to explain what it does. If you need comments to explain the code, fix the code instead so it's understandable. If you release libraries, use structured comments for public classes and methods to document intended usage patterns, assumptions and contract details. Do comment why the code does what it does, when it's not obvious.
  • Don't nest if-statements or loops. In some special cases, one-liners inside a nested if may be ok. Do use early-exit and write small methods to remove the need for nestling inside a method.
  • Don't catch exceptions unless you know why you're catching them and what to do about it. Never catch all exceptions, except at the top of a given thread's call hierarchy and then only if consequences of not catching it dictate the need. If you do, log it! Do program to avoid exceptions when you know the conditions to prevent it happening in the first place.
  • Do write short methods that does one thing and are named accordingly. If a method does not fit on a screen of a reasonable size, then it does too much. If you have trouble naming it properly, it probably does too many things. Don't write long methods that you need to scroll to see all of.
  • Don't try to be smart. When there is no known need for advanced or smart solutions, do keep it simple and use simple standard patterns until you know it needs special treatment. 
  • Don't optimize unless you know you need to. You'll know by measurements using performance profilers. This is not the same thing as writing inefficient code. Do write efficient code according to best practices that avoids known pitfalls and bad design. Performance optimizations come on top of that, for example caching or special-purpose thread-synchronization constructs, and are to be avoided until the need is proven.
  • Do always step through your code at least once to verify your assumptions about it's behavior. Don't trust just running the application and be satisified when it appears to work.
This is in no way the complete zen of good programming, nor is it revolutionary or unique. All of this has been said before. I'm sure you'll have your own pet peeves you'd like to add to the list. I have a few of my own, but the idea here is to list important things that are really super-simple to do. Now.

I am absolutely convinced that if these rules are followed, overall productivity in the software industry will rise dramatically.

If you're a developer, are there any of these rules you honestly disagree with? Do you work like this already? If not, try it out! Use peer-reviews to discuss your check-ins with this list as a guideline.

It's really this simple.

PS - There are 11 rules here. I'd like to get it down to 10 as the title indicates. Cast your vote on which one should go! Or perhaps what needs to be added, but then you'll have to drop two... ;-)