Don't let it crash

2017 22 Jan

We have a specific mindset when writing Erlang programs. We focus on the normal execution of the program and don't handle most of the errors that may occur. We sometimes call this normal execution the happy path.

The general pattern behind writing only for the happy path, letting the VM catch errors (writing them to a log for future consumption) and then having a supervisor restart the processes that failed from a clean state, has a name. We call it let it crash; and it drives many of our design decisions.

It's a really great way to program and the results are fantastic compared to most other programming languages. And yet, let it crash barely convinced anyone that they should use Erlang. Why would that be?

You may already know that Cowboy is capable of handling at least 2 million Websocket connections on a single server. This is in large part thanks to the capabilities of the VM. Still, 2 million is good, much better than most other servers can do.

Cowboy is not just a Websocket server; it's also an HTTP and HTTP/2 server, and it handles many related features like long polling or the parsing of most request headers.

Can you guess how large the Cowboy codebase is, without looking at the source?

Do make sure you have a clear answer in your mind before you go check.

Good, you are back. Now what were the results? If I am correct, you overestimated the size of Cowboy. Cowboy is in fact about five thousand lines of code. You probably thought it was at least ten thousand. About eighty percent of readers will have overestimated the size of Cowboy. And you did only because I mentioned it can handle millions of Websocket connections.

Numerous studies show this effect. Just mentioning the large number already prepared your mind to think in that direction. Repeating the number made you focus even more on it. Then the question asked for a number, which ended up larger than the reality.

The same effect can be applied to negotiation for example. You generally want to start by giving your offer (and not let the other party initiate) and you want to give a really large number first. You can also prepare your customer by mentioning an even larger number in the previous discussion.

And it's not just numbers either. An experiment showed that just by looking at an image of clouds, customers of a pillow store were buying pillows more comfortable (and more expensive) than those who didn't see that image.

This is the power of associations. It is covered in much larger detail in the books Influence and Pre-suasion. I highly recommend reading those and applying what you learn to your daily life. I'm definitely not a professional psychologist so take this post with a grain of salt.

When selling Erlang, whether we are selling it to a customer or trying to convince a developer friend to start using it, we often talk about how Erlang lets you sleep at night, that it is auto healing and always gets fantastic uptimes.

And then we talk about let it crash.

And we describe what it means.

We might as well just say that Erlang crashes a lot and then take the door. It would have the same effect. It doesn't even stop at programs crashing. You know what else crashes? Cars, planes, trains. Often with disastrous consequences. Is that really the message we want to convey?

They even printed it on a t-shirt! Keep calm and let it crash. It's the kind of t-shirt you probably shouldn't wear in an airport, and for good reasons. A few people did, then realized what they were wearing and were not too smug about it.

And yet this is how we sell Erlang.

A better way would be to focus on the positives, of course, but also to make sure that those positives are phrased in a way that prevents bad associations to be formed in people's minds.

Instead of let it crash, you can say that Erlang has auto healing mechanisms. Healing is a good thing and accurately describes what happens in the system.

Should you need to go into more details, you will probably want to avoid recover from crashes and instead say recover from exceptions. Exceptions are a pretty neutral word and, should you explain what you mean by that, you can talk about exceptions that occur for reasons unrelated to Erlang, like hardware failure or network instability.

The trick is to always use positive words and phrases to describe Erlang, and to use external factors to explain how Erlang deals with failures. Never mention the failures internal to Erlang systems unless you are asked specifically, in which case you can say that the auto healing applies to all exceptions.

The let it crash philosophy is great when learning Erlang or when writing fault-tolerant systems. But it's not going to convince anyone to use it unless they were already looking for it.

Do you like this post? Tell me on Twitter. I might make more.