How to do: Gulp + Browserify + Coffeescript + Sourcemaps + Uglify as of mid-2015

I blew so much time on this, it’s crazy.

I got it down to something simple and clean, finally. And it turns out the “recipes” section on the Gulp website pretty much had the answer already. I just needed to understand more about what I was trying to do in the first place.

Note: You need to install coffeeify to have it available as a transform!

var gulp        = require('gulp');
var browserify  = require('browserify');
var source      = require('vinyl-source-stream'); //to 'rename' your resulting file
var uglify      = require('gulp-uglify');
var buffer      = require('vinyl-buffer'); // to transform the browserify results into a 'stream'

var sourcemaps  = require('gulp-sourcemaps');

gulp.task("your_target",function() {
  browserify({
      entries: ["./sample.coffee"],
      debug: true,
      extensions: [".coffee"],
      transform: ["coffeeify"] // npm install --save-dev coffeeify
      })
    .bundle()
    .pipe(source('resulting_file_name.js'))
    .pipe(buffer())
    .pipe(sourcemaps.init({loadMaps: true,debug: true}))
    .pipe(uglify(/* {
          debug: true,
          options: {
            sourceMap: true,
          }
      }*/)).pipe(sourcemaps.write("./" /* optional second param here */))
    .pipe(gulp.dest('./build/'));

});

If you don’t want the sourcemap URL in your resulting JS file (which I didn’t, because I keep my sourcemap file private), add a second parameter ,{addComment: false} to the sourcemaps.write line near the end.

Edit: – the parameter to uglify isn’t even needed.

Amazon Elastic Load Balancer (ELB) performance characteristics

So at my new job, I get to use AWS stuff a lot. We have many, many servers, usually sitting behind a load-balancer of some sort. Amazon’s documentation on these things isn’t very clear, so I’m trying to figure out what the damned things are doing.

First off – a good thing. These Load Balancers are really easy to use. Adding a new instance is a few clicks away using the Console.

Another good thing – you can use multiple availability zones as a way to avoid trouble when an entire Availability Zone goes down (as happened less than a year ago). And here’s where it gets ugly.

It seems to evenly split traffic across zones – even if you don’t have a balanced number of instances in each zone. And it seems to determine which zone to hit based on some hash of the source address. So, if you have 2 instances in an Autoscale group, in two availability zones, and you hit your array from one IP really hard – bad things ensue. The one AZ you’re hitting will max out the CPU, and the AZ you’re not hitting will be nearly 100% idle. That adds up to 50% utilization on average – not enough to cause a scaling event (with my thresholds, anyway).

So, in short, if you have fojillions of people from all over hitting your services indiscriminately, I’m sure it’ll be fine. But, if like me, you have 1000’s of people (or so) from all over, some really hard from one particular IP – it may not be a good idea to try to spin up more than one AZ. And spinning up 4 AZ’s seems silly – you’ll definitely have more of a chance of at least one of them going bad, and until your load-balancer figures that out, you’ll have one out of ‘n’ requests failing.

Another thing I’ve noticed – the ELB’s seem to have a strict timeout on accessing the back-end service. If it doesn’t get a response within 30 seconds, it’s going to drop the connection and hit the service again. I had a service that was nearly getting DOS’ed by the load balancer that way. Make sure you have sane timeouts.

So the next thing I was curious about was whether or not the ELB would do any batching-together of any of the returned data – would it act like some kind of reverse-proxy? I wrote a tiny Node server which spat out a little data, waited 10 seconds, spat out some more, waited another 10 seconds, then finished. Here it is:


#!/usr/local/bin/node
"use strict";


var http=require('http');


http.createServer(function (req,res) {
  console.warn("CONNECTION RECEVIED - waiting 10 seconds: "+req.url);
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.write("initial output...n");
  setTimeout(function () {
    console.warn("first stuff written for "+req.url);
    res.write("Here is some stuff before an additional delayn");
    setTimeout(function () {
      console.warn("second stuff written for "+req.url);
      res.end('Hello Worldn');
    },10000);
  },10000);


}).listen(80);

So – I couldn’t tell any difference between hitting the server directly, or hitting the load balancer (if I telnetted straight to port 80). It acted the same way. There was definitely no batching.
What about if I have a slow-client, or something that’s trying to upload something – will it get batched together on the POST side?
I modified my code to do a ‘slow POST’ – and it worked similarly – I couldn’t actually tell the difference between running that on the load-balancer or running it on the instance directly.
I also wrote code to generate a large (1MB) response dynamically on the server-side, then wrote a client that would receive a little, pause the stream, resume it after a few seconds, pause it again, and so on. The one thing I noticed different between accessing the server directly versus the load balancer was that the server tended to give me data in 1024 byte chunks, whereas the load-balancer was giving me blocks closer to 1500 bytes. Weird! Well, maybe not – I *do* know for sure that the LB is reterminating the connection at the TCP level – the source IP address changes. I was writing the data in blocks of 1k, so maybe each write turned into exactly one packet of 1024 bytes? But in the LB side, the LB, when re-streaming my TCP data, was sending larger segments. Or so it would seem.
So it looks like I can’t get rid of the reverse-proxy that sits on top of some of our Ruby code. Oh well.

More on (Moron?) Ted Dziuba

(Math and pedantry ahead. Feel free to skip if that’s not your thing. More stuff about Node.js and this guy Ted Dziuba, who I hate.) To start, read Mr. Dziuba’s latest blog rant about Node.js.

I will make two main points here. First, that Mr. Dziuba does not intend to be comprehended; that he is deliberately phrasing points so as to confuse. Second, I will prove that his arguments are invalid. It appears to me that he has no interest in deriving any kind of truth or shedding any light on anything; he is merely trying to draw attention to himself, or draw blog traffic, or just make noise. I don’t know what his reasons are. I just know what he’s doing, and that seems to be being deliberately misleading. I would not feed the troll, as it were, except for the fact that his blog post is permanently on the internet and I’m going to end up being pointed to it someday if I ever propose an event-loop based solution for anything.

First off, let’s try to decipher what he’s saying; he does not do a very good job of being clear.

Let’s look at Theorem 1.

What does it actually say? Here’s an attempt at deciphering it.

He asks, let’s check out how things work when you have something that’s heavily CPU-biased (less I/O).

Note = the value ‘k’ is the ratio of I/C – in other words, for something really really IO intensive (I > C) then k should be ‘big’ (greater than one), and for I < C, k should be small (less than one). You can think of ‘k’ as the “IO-ish-ness” factor. It’s big for something that’s very IO-ish, and it’s little if it’s not. Why doesn’t he explicitly state the definition that k=I/C? Because he has no desire to be understood; he’s attempting hand-waving. Everywhere he uses this ‘k’ construct he could just as easily use I & C.

The important definition is: W=I+C=kC+C=(k+1)C

Therefore K is I/C

(k+1)C is equal to W = the wall clock time of I + C.

His theorem begins with the supposition:

1000/C > 1000N/(k+1)C

What does that mean?

Let’s change his equation to make more sense of it. Since (k+1)C=(I+C) by definition, he’s really just saying:

1000/C > 1000N/(C+I)

He’s trying to suppose that *IF* the number of times I can execute just the CPU-part of my event-loop code is greater than the number of times I can do that, threaded, but also with I/O time taken into account, *THEN* it must be the case that the number of threads I am using is one. Why would you make the argument like that? The same argument can be made, much more simply, by saying:

1000/(C+I) > 1000N/(C+I) only if N is one. But the problem is then you can see what he’s doing. He doesn’t want this, hence the pointless variable substitution.

Notice that ‘N’ factor on there? He is saying that a system with 2 threads runs twice as fast as a system with one thread. And apparently a system with 100 threads runs 100 times as fast as a system with one thread. I’ve worked with much software throughout my personal and professional life, and this supposition is not true. By this assumption, of course threads will always outperform event-loop software.

He attempts the same song-and-dance in Theorem 2. He still is making the assumption that N threads equals N*single-threaded performance.

It’s most clear in his “Practical Example.” There, you can see him making the n-threads-means-n-times-performance argument most clearly. If that’s true, why not 1000 threads? Why not a million?

Another point here. Why threads? Why not fully fork()’ed processes? His math (such as it is) still holds up just the same if you assume forked contexts as threaded. And, yet, none of his math requires threads instead of forks to run.

Effectively, he has proven that in a system with an infinite number of infinitely fast CPU’s, and infinite RAM, and zero threading context-switch time, and zero thread-accounting time, that threads are faster than events. Congratulations.

So there are my arguments as to why he is incorrect. Now I wish to ask questions about how he seems to be deliberately misleading.

First off – some stylistic questions. Why has he written his argument so obtusely? Why has he not shown his work? Why does he not explain what he’s supposing? He just throws symbols down, in beautiful little .PNG files, and runs off with manipulating them with algebra. That’s seems like he’s trying an “appeal to authority” via jargon. Why all the milliseconds everywhere? We’re in theoretical Comp Sci world now, why pick those units? It would appear that he has done so specifically to throw 1000’s in his equations everywhere, just to confuse things further.

Next – a more theoretical question. Why do things like the select() or poll() system calls exist? Or epoll or /dev/poll? Since they’re so “obviously” inferior to threading-based solutions, they shouldn’t exist at all, right? There should be no use for them. If I can always just use threaded I/O instead of event-looped, why use event-looped at all? It is, after all, very difficult to program.

And finally – why did Dziuba himself advocate for an event-based I/O solution – “eventlet” – in one of his own blog posts? He seems to have gotten quite the performance boost –

…but the one that really stands out in the group is Eventlet. Why is that? Two reasons:

1. You don’t need to get balls deep in theory to be productive with Eventlet.
2. You need to modify very little pre-existing code to adapt a program to be event-driven.

This all sounds great in theory, but I have actually made a large I/O bound program work using monkey patching and changing the driver. It is a piece of software that reads jobs from a queue and processes them, putting the result in memcached. For esoteric reasons I will not go into, the job processors could not thread the work, they had to fork. Using this setup, one production box with 8GB of RAM was consistently 7.5GB full. After a less than 5 line code change to the driver, that same production box uses only around 1GB of RAM consistently, and can handle 5 to 10x the throughput of the old system.

The answers to these questions I cannot be sure of. As much as I would like to imagine that Mr. Dziuba is simply terribly ignorant; it would seem far worse – that he just intends to say things that are untrue for the purpose of drawing attention to himself.

Node.js is not a cancer, you are just a moron

My tone is going to seem strangely even and un-ranty. This is because I am doing everything I can to keep myself from completely exploding when I read this bullshit that this moron is spewing. OK, that was a little ranty, but the rest will read evenly. Maybe.

So one of my programming friends posts an article at http://teddziuba.com/2011/10/node-js-is-cancer.html and says, “Ah, here’s what’s wrong with Node.js!”

The article is rather strongly written – “Node.js is Cancer”, “node.js nonsense”, “Node.js is a tumor on the programming community”, “completely braindead”, “Scalability disaster”, etc.

He then shows a Fibonacci sequence and how it performs badly under node.

The problem he has proposed is, fundamentally, CPU-bound. I wrote a version of it in C and it did perform faster than it did in Node, but still, the problem definitely took finite-time. My command-line Node.js version calculated the answer in 8 seconds, the C version did it in 4. I was rather impressed that Javascript (Node.js’s V8 engine) was able to come as close to C’s performance in pure CPU-bound execution.

The problem, and what the author perhaps misunderstands, is that this is not the situation in which Node is an ideal solution. I use Node.js in production for work – and I know of many other shops that do too. If the problems you are dealing with are CPU-related, Node.js will not help you. Node.js works well when your problems are I/O-related -e.g., reading something out of a database, running web servers, reading files, writing files, writing to queues, reading from queues, reading from other web services, aggregating several web services together, etc. The reason that this solution has become so popular of late is because these are the types of problems that are most common in web development today. Thus, node.js becomes a helpful arrow in one’s quiver with which to solve these types of issues.

Considering that the article’s author seems to have some level of experience, I wonder if his choice of skewed example was perhaps deliberate. He has other articles on his blog about other event-loop libraries. His comment at the bottom – “tl;dr – Node.js is an unpleasant software library and I will not use it” – is possibly the real source of his anger. And – an irrefutable point – if you don’t like something, you don’t want to use it, and he obviously doesn’t. That’s fine.

Node is a tool; one of many – no panacea. If you’re dealing with problems of ‘slow’ services that need to wait for various bits of I/O to complete in order to return a result – it can be a very powerful and useful tool. If you’re computing the fortieth member of the fibonacci sequence recursively, it won’t be.

The sad fact is that the author’s completely valid point – that Node.js isn’t a good tool for CPU-bound problems – is completely buried in his bile. This is because he never states that, explicitly. Node.js has other drawbacks as well – it’s very easy to end up in callback-spaghetti, it’s very minimal, and it’s very very very young. The database integration libraries have some pretty serious immaturity issues to work through; and I’ve had to code around a good deal of that.

It’s a tool that’s good at particular things, and I will continue to use it for those things. Those ‘things’ tend to be the bulk of what web development and web services development actually are. So when I can write a two hundred line program that can replace entire arrays of servers and interconnected services with just one server; I am going to do that, and I won’t feel particularly braindead in doing so.

node.js scope weirdity

So I’ve been using Node.js a lot in my new job. Quick note: it’s super awesome. The job, and node.js. Anyways. I’ve put together a couple of non-trivial pieces with it and one thing that keeps tripping me up is: when is my variable in and out of scope? So I thought I’d write this up to see if I can explain it.

First example

Let’s look at this simple server code – it’s just a dumb webserver (shamelessly stolen from the node.js home page) that says ‘hello world’ and spits out a connection count:

var http = require('http');
var conncount=0;
http.createServer(function (req, res) {
  conncount++;
  num=conncount;
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.write("Here is some stuffn");
  res.write("And the connection count is: "+conncount+"n");
  setTimeout(function() {
   res.end('Hello World: conn count was: '+conncount+' and my connection # is: '+num+'n');
        conncount--;
  },5000);
}).listen(1337, "127.0.0.1");


console.log('Server running at http://127.0.0.1:1337/');

So if I just curl that (curl http://localhost:1337/), I get:

Here is some stuff
And the connection count is: 1

…5 seconds pass, and then…

Hello World: conn count was: 1 and my connection # is: 1

So that seems to make some sense. However, what happens if I run the code 12 times? This:


Here is some stuff
And the connection count is: 1
Here is some stuff
And the connection count is: 2
Here is some stuff
And the connection count is: 3
Here is some stuff
And the connection count is: 4
Here is some stuff
And the connection count is: 5
Here is some stuff
And the connection count is: 6
Here is some stuff
And the connection count is: 7
Here is some stuff
And the connection count is: 8
Here is some stuff
And the connection count is: 9
Here is some stuff
And the connection count is: 10
Here is some stuff
And the connection count is: 11
Here is some stuff
And the connection count is: 12


…then 5 seconds elapse, then…

Hello World: conn count was: 12 and my connection # is: 12
Hello World: conn count was: 11 and my connection # is: 12
Hello World: conn count was: 10 and my connection # is: 12
Hello World: conn count was: 9 and my connection # is: 12
Hello World: conn count was: 8 and my connection # is: 12
Hello World: conn count was: 7 and my connection # is: 12
Hello World: conn count was: 6 and my connection # is: 12
Hello World: conn count was: 5 and my connection # is: 12
Hello World: conn count was: 4 and my connection # is: 12
Hello World: conn count was: 3 and my connection # is: 12
Hello World: conn count was: 2 and my connection # is: 12
Hello World: conn count was: 1 and my connection # is: 12

So my question is, why does it do that? Each execution of my function _should_ have its own stack, no? And so wouldn’t each stack have its own variables?

Now, mind you – I know a (horrible) way to fix this – wrap my setTimeout call in an anonymous function and pass ‘num’ as a parameter – but what I don’t really get is ‘why’? I threw this line all the way at the end (with apologies to Haddaway) –

setTimeout(function() { sys.debug("What is num! Baby don't hurt me, don't hurt me, no more..."+num)},10000);

(And I had to require('sys') at the top too)

And, in my terminal with Node running, I got:


DEBUG: What is num! Baby don't hurt me, don't hurt me, no more...12

What?! I would’ve expected ‘num’ to fall out of scope?! Why wouldn’t that function scope up there make ‘num’ only exist for this execution? Is there no concept of ‘stack’ or anything? And even if there wasn’t any, each execution of my function is an execution and should ‘freeze’ the variable or something, right? Apparently not.

So what happened? Well, I can tell you – that variable ‘num’ that I referenced, since I *didn’t* define it using ‘var’, is GLOBAL. So that’s why it’s acting so global. Simply adding ‘var’ to the definition (var num=conncount;) made it start working properly. E.g., after the delay, my output became:

Hello World: conn count was: 12 and my connection # is: 1
Hello World: conn count was: 11 and my connection # is: 2
Hello World: conn count was: 10 and my connection # is: 3
Hello World: conn count was: 9 and my connection # is: 4
Hello World: conn count was: 8 and my connection # is: 5
Hello World: conn count was: 7 and my connection # is: 6
Hello World: conn count was: 6 and my connection # is: 7
Hello World: conn count was: 5 and my connection # is: 8
Hello World: conn count was: 4 and my connection # is: 9
Hello World: conn count was: 3 and my connection # is: 10
Hello World: conn count was: 2 and my connection # is: 11
Hello World: conn count was: 1 and my connection # is: 12

Such a terribly easy way to blow up your javascript! So apparently node.js supports “strict mode” – just make the first line of your javsascript code say:

"use strict";

(Note, that’s just a string, with the quotes. A javascript parser will just ignore it if it doesn’t understand it. You could also put a line in the middle of your code saying "poop"; and it would be ignored the same way).
Now with strict mode enabled, the previous version of my code (without the ‘var’ declaration) says:


ReferenceError: num is not defined

So I think I’ll be using this from now on. Unless ‘strict’ mode starts making me crazy – which is certainly also possible.

Next Example

"use strict";
var sys=require('sys');
for(var i=0;i<10;i++) {
 setTimeout(function() {sys.debug("I is now: "+i)},1000);
}

(Notice how I've learned my lesson? Yeah, I don't need concurrency bugs biting me in the ass, thankyouverymuch.)

The output is, unfortunately:

DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10

So this one - I definitely know how to fix. The problem is that by the time the timeout actually _fires_, the value of 'i' will be different - in this case, incremented all the way to 10. I need to somehow 'freeze' the value of i within the timeout.

So I would do:

"use strict";
var sys=require('sys');
for(var i=0;i<10;i++) {
 setTimeout((function(number) {return function() {sys.debug("I is now: "+number)}})(i),1000);
}

Which results in:

DEBUG: I is now: 0
DEBUG: I is now: 1
DEBUG: I is now: 2
DEBUG: I is now: 3
DEBUG: I is now: 4
DEBUG: I is now: 5
DEBUG: I is now: 6
DEBUG: I is now: 7
DEBUG: I is now: 8
DEBUG: I is now: 9

The problem is, that's ugly as shit. What better way to do it is there that's more readable, maintainable, debuggable, etc? And that function instantiation thing gives me the willies. Well, I don't know the best answer for that yet. How about this:

"use strict";
var sys=require('sys');
for(var i=0;i<10;i++) {
 (function(number) {
  setTimeout(function() {sys.debug("I is now: "+number)},1000);
 })(i);
}

(The output is still the same). That feels a little less awful and unreadable - and doesn't give me the anonymous-function-returning-function yucky feelings that the previous one did. (Though it still is effectively doing that, isn't it?) The crazy squiggly brace, close-paren, open-paren business is still a little awkward though.

A piece of advice I got from the node.js group seemed pretty sage, in terms of making this stuff more readable:

"use strict";
var sys=require('sys');

function make_timeout_num(number)
{
 setTimeout(function() {sys.debug("I is now: "+number)},1000);
}

for(var i=0;i<10;i++) {
 make_timeout_num(i);
}

(output is still the same again). And, wow, yeah, that's a hell of a lot more readable, at the expense of 4 or so more lines. But sometimes, logically, you don't want to split out your functions like that - if every time you need to freeze something in a scope you have to declare a function somewhere, your eyes will have to scan all over the place, and that could be ugly. So you could maybe declare the function within the for loop - though that's still in the global scope, it would just be for readability's sake.

I think I'll probably stick with the previous one, with the anonymous function declared in-line. It's not too insanely unreadable, and it's compact enough. If the contents of my anonymous function gets a few lines long, or gets a few variables deep, I might split it out into its own function for readability's sake.