The (non) importance of health-checks in your code

Source: https://knowyourmeme.com/photos/896487-batman

There are probably quite a few articles on the subject, but the idea of writing the health-check code is still alive and well, even when it hardly brings any value, so here is another one — let’s spread the word.

Looking at most health-check code you probably had that feeling that something is not right with it. In this short article, I will try to show you one of the reasons why you could feel that way.

Let’s assume we have a database with a communication layer controlling every call we make to it — let’s call it a SneakyDBLayer because of how sneaky and evil it is.

The pseudo-code for the SneakyDBLayer could be looking like this:

class SneakyDBLayer {   private SomeDatabase db = SomeDatabase();   SneakyDBLayer() {
// naive disruption mechanism
int delay = getRandomNumberFromRange(0, 10000);
db.shutdownAfter(delay, TimeUnit.MILLISECONDS);
}
public void save(MyObject mo) {
db.save(mo);
}
public boolean isHealthy() {
// it's delegated and looks simple, but
// custom health-checks can sometimes take tens of lines
// and look like hacks copied from dark side of the Internet
return db.isHealthy();
}
}

As we can see, the SneakyDBLayer will eventually shut down the database making it, obviously, unhealthy. Of course, it’s not a real-life scenario, but it will help us create this feeling of imminent disaster and create the need for a valid health-check.

Let’s assume we want to save a domain object. We want the operation to be successful or at least give us a hint that something is not right. Our save method could be looking like this:

void createObjectAndSaveToDatabase() {
MyObject mo = new MyObject("very-important-object");

if (sneakyDBLayer.isHealthy()) {
sneakyDBLayer.save(mo);
} else {
log.error("Could not save {}", mo);
}
}

Cool, right? This way we should ensure saving our object will be a successful operation — our health-check has just informed us that database is up and running!

And now the ambiguity begins. The operation may or may not be successful because the database may or may not be running at the moment our save operation will be executed.

Milliseconds may be enough to perform save in synchronous and single-connection environments, but if we add calls from other users that we race against and network delay into the equation our chances for success do not look good at all.

Exceptions. In most cases when connecting to external dependencies developers use some communication layer provided either by the author of the dependency or by the framework they use. It usually has quite a range of informative exceptions you can use to successfully detect the unhealthiness of such a dependency.

Even if, for some reason, you have no exception provided and want to failsafe your code measure the time and throw TimeoutException when the call to the external dependency reaches the limit.

Infrastructure. From the developer’s point of view, it’s very nice to know that, for example, a Docker container with a database service has started (not only the container but the service itself) before we run the app, that establishes a connection with it on startup. Another example, to some extent, could be k8s probes.

Hope you have enjoyed this short article. Think about it and try to approach the health-check code with caution.

Happy coding!