Monday, April 26, 2021

Code should work every time

The Philosophy of Black and White

I have a background in real time operating system kernels, compilers and encryption.

In all of these contexts, it is self-evident that a piece of code either works every time, or else it's broken and it must be fixed immediately.

Code that works most of the time, or even almost every time just won't cut it.

If there are special circumstances during heavy load, or during startup or shutdown or just with bad enough "luck" that the software does not work as expected, it  needs fixing. Perhaps it controls heavy machinery, compiles your code or encrypts your data. Right?

Would you feel comfortable with a compiler that emitted faulty code every now and then, depending on how heavy the system load was? Of course not.

I have taken this view to heart, and in my mind there are really only two types of software:
  1. Software that behaves consistently every time, regardless of load or timing.
  2. Broken software.
(Software of type 1 can of course still be broken in other ways - but it should then be consistently broken!)

Here I'd like to argue that this is just as applicable to a national informational website, a booking system, or even a game.

Software should behave consistently every time, else it's broken.

Unfortunately it's often hard to achieve, especially as so much code today runs in a web server environment, which is an inherently multi-threaded environment with a high degree of parallelism. Most issues with software that behaves inconsistently come from parallelism, and race conditions.

A race condition is when two different actors invoke code at the same time, and the outcome depends on who wins the race.

There are numerous mechanisms, and even whole books, dedicated to solving these problems.

My purpose here is not to talk about all those ways, but rather to argue that it's important as a developer to feel that it's important that our code works consistently every time.

In real life, I often have to argue about these matters, and I'm not infrequently met with the basic opinion "well, it really doesn't happen that often and it looks like hard work to fix so we'll live with it".

The problem is that these things tend to grow worse exponentially with increased load. So, if you're working on something that is expecting fewer and fewer users and lower and lower load, well maybe you can get away with "it's not worth fixing".

If you're like most of us, working on something that expects increased load and continued development, remember that bugs do not age well. They just get worse and worse, until it's really bad.

So, while perfectionism over all is not always a winning strategy, for consistency under load always strive for perfection. Otherwise you're building software broken by design.

For these matters, it really is back or white. Either the code works every time, or it's broken.

No comments:

Post a Comment