Sunday, 7 August 2011

Node: strangely familiar and not in a good way

Programmers are always interested in new idioms and languages, generally as a route to making their lives easier. One of the latest trendy idioms is seen in Node, which promises an easy way to program systems, such as high-traffic web servers, that need a lot of concurrency and a lot of raw performance. Concurrency, performance and ease of programming have always had fraught relationships, so Node has produced a lot of interest. But is Node really the way forward?

Node is essentially a JavaScript library in which nothing ever blocks. Anything that could plausibly block, like I/O, takes a callback function, which is queued until a result is available. This approach has benchmarked well for tasks for web-server type stuff.

So what about the non-blocking and the callbacks as a programming idiom? Easy to code? You see a lot of callback functions in Node. So many callback functions in fact that computer scientists will start to get a funny feeling that they've seen something similar before. Now where did we see lots and lots of callbacks? Erm, its coming, yes, it was, er, yes, it was back in a class at university, probably the second year, something about interpreters, meta-interpreters perhaps, oh my good god no, surely not. It is! It's continuation passing! Ah, no, no the horror! The flashbacks! The brain ache! Make it stop, make it stop, please!

Why would anyone do this to themselves again? People have been drawn to Node because of the benchmarks but surely no one is going to put up with writing programs in a continuation-passing style? CPS is fine for writing a Scheme meta-interpreter for an assignment, but for production code? Really? Also, while Node handles concurrency, it is fundamentally single threaded, and any true parallelism has to be bodged on.

Node does have a certain something though, and the benchmarks cannot be ignored. There is a lesson to be learned from Node, and the lesson is that concurrency can no longer be left to operating system threads. OS threads are too heavyweight. They use too much memory and impose an unacceptable context switching overhead when used with blocking I/O. Node shows that concurrency can be left to the programming language.

But what is that alternative? Node provides a proof of concept, but do we have to accept the cost of programming in a continuation-passing style? Well, no, we don't. CPS was invented as a way of specifying programming languages, so obviously the burden can be shifted into a language.

This is where Go comes in. Go introduces a very lightweight concurrency construct which they've called the goroutine. Goroutines have the Node-like property that they don't cause their OS thread to block. Even better Go can run goroutines in parallel by multiplexing them on to operating system threads. And goroutines are so cheap that even my five-year-old laptop can run hundreds of thousands of the things.

In a neat acronym twist, goroutines are partly inspired by Hoare's work on communicating sequential processes. So we can say that what Node is trying to do with CPS, Go does better using CSP. Nicely done, Bell Labs Google.

So where does that leave Node? The way forward? No, it isn't. But it makes an important point well, and it will be influential.