node.js scope weirdity

So I’ve been using Node.js a lot in my new job. Quick note: it’s super awesome. The job, and node.js. Anyways. I’ve put together a couple of non-trivial pieces with it and one thing that keeps tripping me up is: when is my variable in and out of scope? So I thought I’d write this up to see if I can explain it.

First example

Let’s look at this simple server code – it’s just a dumb webserver (shamelessly stolen from the node.js home page) that says ‘hello world’ and spits out a connection count:

var http = require('http');
var conncount=0;
http.createServer(function (req, res) {
  conncount++;
  num=conncount;
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.write("Here is some stuffn");
  res.write("And the connection count is: "+conncount+"n");
  setTimeout(function() {
   res.end('Hello World: conn count was: '+conncount+' and my connection # is: '+num+'n');
        conncount--;
  },5000);
}).listen(1337, "127.0.0.1");


console.log('Server running at http://127.0.0.1:1337/');

So if I just curl that (curl http://localhost:1337/), I get:

Here is some stuff
And the connection count is: 1

…5 seconds pass, and then…

Hello World: conn count was: 1 and my connection # is: 1

So that seems to make some sense. However, what happens if I run the code 12 times? This:


Here is some stuff
And the connection count is: 1
Here is some stuff
And the connection count is: 2
Here is some stuff
And the connection count is: 3
Here is some stuff
And the connection count is: 4
Here is some stuff
And the connection count is: 5
Here is some stuff
And the connection count is: 6
Here is some stuff
And the connection count is: 7
Here is some stuff
And the connection count is: 8
Here is some stuff
And the connection count is: 9
Here is some stuff
And the connection count is: 10
Here is some stuff
And the connection count is: 11
Here is some stuff
And the connection count is: 12


…then 5 seconds elapse, then…

Hello World: conn count was: 12 and my connection # is: 12
Hello World: conn count was: 11 and my connection # is: 12
Hello World: conn count was: 10 and my connection # is: 12
Hello World: conn count was: 9 and my connection # is: 12
Hello World: conn count was: 8 and my connection # is: 12
Hello World: conn count was: 7 and my connection # is: 12
Hello World: conn count was: 6 and my connection # is: 12
Hello World: conn count was: 5 and my connection # is: 12
Hello World: conn count was: 4 and my connection # is: 12
Hello World: conn count was: 3 and my connection # is: 12
Hello World: conn count was: 2 and my connection # is: 12
Hello World: conn count was: 1 and my connection # is: 12

So my question is, why does it do that? Each execution of my function _should_ have its own stack, no? And so wouldn’t each stack have its own variables?

Now, mind you – I know a (horrible) way to fix this – wrap my setTimeout call in an anonymous function and pass ‘num’ as a parameter – but what I don’t really get is ‘why’? I threw this line all the way at the end (with apologies to Haddaway) –

setTimeout(function() { sys.debug("What is num! Baby don't hurt me, don't hurt me, no more..."+num)},10000);

(And I had to require('sys') at the top too)

And, in my terminal with Node running, I got:


DEBUG: What is num! Baby don't hurt me, don't hurt me, no more...12

What?! I would’ve expected ‘num’ to fall out of scope?! Why wouldn’t that function scope up there make ‘num’ only exist for this execution? Is there no concept of ‘stack’ or anything? And even if there wasn’t any, each execution of my function is an execution and should ‘freeze’ the variable or something, right? Apparently not.

So what happened? Well, I can tell you – that variable ‘num’ that I referenced, since I *didn’t* define it using ‘var’, is GLOBAL. So that’s why it’s acting so global. Simply adding ‘var’ to the definition (var num=conncount;) made it start working properly. E.g., after the delay, my output became:

Hello World: conn count was: 12 and my connection # is: 1
Hello World: conn count was: 11 and my connection # is: 2
Hello World: conn count was: 10 and my connection # is: 3
Hello World: conn count was: 9 and my connection # is: 4
Hello World: conn count was: 8 and my connection # is: 5
Hello World: conn count was: 7 and my connection # is: 6
Hello World: conn count was: 6 and my connection # is: 7
Hello World: conn count was: 5 and my connection # is: 8
Hello World: conn count was: 4 and my connection # is: 9
Hello World: conn count was: 3 and my connection # is: 10
Hello World: conn count was: 2 and my connection # is: 11
Hello World: conn count was: 1 and my connection # is: 12

Such a terribly easy way to blow up your javascript! So apparently node.js supports “strict mode” – just make the first line of your javsascript code say:

"use strict";

(Note, that’s just a string, with the quotes. A javascript parser will just ignore it if it doesn’t understand it. You could also put a line in the middle of your code saying "poop"; and it would be ignored the same way).
Now with strict mode enabled, the previous version of my code (without the ‘var’ declaration) says:


ReferenceError: num is not defined

So I think I’ll be using this from now on. Unless ‘strict’ mode starts making me crazy – which is certainly also possible.

Next Example

"use strict";
var sys=require('sys');
for(var i=0;i<10;i++) {
 setTimeout(function() {sys.debug("I is now: "+i)},1000);
}

(Notice how I've learned my lesson? Yeah, I don't need concurrency bugs biting me in the ass, thankyouverymuch.)

The output is, unfortunately:

DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10

So this one - I definitely know how to fix. The problem is that by the time the timeout actually _fires_, the value of 'i' will be different - in this case, incremented all the way to 10. I need to somehow 'freeze' the value of i within the timeout.

So I would do:

"use strict";
var sys=require('sys');
for(var i=0;i<10;i++) {
 setTimeout((function(number) {return function() {sys.debug("I is now: "+number)}})(i),1000);
}

Which results in:

DEBUG: I is now: 0
DEBUG: I is now: 1
DEBUG: I is now: 2
DEBUG: I is now: 3
DEBUG: I is now: 4
DEBUG: I is now: 5
DEBUG: I is now: 6
DEBUG: I is now: 7
DEBUG: I is now: 8
DEBUG: I is now: 9

The problem is, that's ugly as shit. What better way to do it is there that's more readable, maintainable, debuggable, etc? And that function instantiation thing gives me the willies. Well, I don't know the best answer for that yet. How about this:

"use strict";
var sys=require('sys');
for(var i=0;i<10;i++) {
 (function(number) {
  setTimeout(function() {sys.debug("I is now: "+number)},1000);
 })(i);
}

(The output is still the same). That feels a little less awful and unreadable - and doesn't give me the anonymous-function-returning-function yucky feelings that the previous one did. (Though it still is effectively doing that, isn't it?) The crazy squiggly brace, close-paren, open-paren business is still a little awkward though.

A piece of advice I got from the node.js group seemed pretty sage, in terms of making this stuff more readable:

"use strict";
var sys=require('sys');

function make_timeout_num(number)
{
 setTimeout(function() {sys.debug("I is now: "+number)},1000);
}

for(var i=0;i<10;i++) {
 make_timeout_num(i);
}

(output is still the same again). And, wow, yeah, that's a hell of a lot more readable, at the expense of 4 or so more lines. But sometimes, logically, you don't want to split out your functions like that - if every time you need to freeze something in a scope you have to declare a function somewhere, your eyes will have to scan all over the place, and that could be ugly. So you could maybe declare the function within the for loop - though that's still in the global scope, it would just be for readability's sake.

I think I'll probably stick with the previous one, with the anonymous function declared in-line. It's not too insanely unreadable, and it's compact enough. If the contents of my anonymous function gets a few lines long, or gets a few variables deep, I might split it out into its own function for readability's sake.