Amazon Elastic Load Balancer (ELB) performance characteristics

So at my new job, I get to use AWS stuff a lot. We have many, many servers, usually sitting behind a load-balancer of some sort. Amazon’s documentation on these things isn’t very clear, so I’m trying to figure out what the damned things are doing.

First off – a good thing. These Load Balancers are really easy to use. Adding a new instance is a few clicks away using the Console.

Another good thing – you can use multiple availability zones as a way to avoid trouble when an entire Availability Zone goes down (as happened less than a year ago). And here’s where it gets ugly.

It seems to evenly split traffic across zones – even if you don’t have a balanced number of instances in each zone. And it seems to determine which zone to hit based on some hash of the source address. So, if you have 2 instances in an Autoscale group, in two availability zones, and you hit your array from one IP really hard – bad things ensue. The one AZ you’re hitting will max out the CPU, and the AZ you’re not hitting will be nearly 100% idle. That adds up to 50% utilization on average – not enough to cause a scaling event (with my thresholds, anyway).

So, in short, if you have fojillions of people from all over hitting your services indiscriminately, I’m sure it’ll be fine. But, if like me, you have 1000’s of people (or so) from all over, some really hard from one particular IP – it may not be a good idea to try to spin up more than one AZ. And spinning up 4 AZ’s seems silly – you’ll definitely have more of a chance of at least one of them going bad, and until your load-balancer figures that out, you’ll have one out of ‘n’ requests failing.

Another thing I’ve noticed – the ELB’s seem to have a strict timeout on accessing the back-end service. If it doesn’t get a response within 30 seconds, it’s going to drop the connection and hit the service again. I had a service that was nearly getting DOS’ed by the load balancer that way. Make sure you have sane timeouts.

So the next thing I was curious about was whether or not the ELB would do any batching-together of any of the returned data – would it act like some kind of reverse-proxy? I wrote a tiny Node server which spat out a little data, waited 10 seconds, spat out some more, waited another 10 seconds, then finished. Here it is:


#!/usr/local/bin/node
"use strict";


var http=require('http');


http.createServer(function (req,res) {
  console.warn("CONNECTION RECEVIED - waiting 10 seconds: "+req.url);
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.write("initial output...n");
  setTimeout(function () {
    console.warn("first stuff written for "+req.url);
    res.write("Here is some stuff before an additional delayn");
    setTimeout(function () {
      console.warn("second stuff written for "+req.url);
      res.end('Hello Worldn');
    },10000);
  },10000);


}).listen(80);

So – I couldn’t tell any difference between hitting the server directly, or hitting the load balancer (if I telnetted straight to port 80). It acted the same way. There was definitely no batching.
What about if I have a slow-client, or something that’s trying to upload something – will it get batched together on the POST side?
I modified my code to do a ‘slow POST’ – and it worked similarly – I couldn’t actually tell the difference between running that on the load-balancer or running it on the instance directly.
I also wrote code to generate a large (1MB) response dynamically on the server-side, then wrote a client that would receive a little, pause the stream, resume it after a few seconds, pause it again, and so on. The one thing I noticed different between accessing the server directly versus the load balancer was that the server tended to give me data in 1024 byte chunks, whereas the load-balancer was giving me blocks closer to 1500 bytes. Weird! Well, maybe not – I *do* know for sure that the LB is reterminating the connection at the TCP level – the source IP address changes. I was writing the data in blocks of 1k, so maybe each write turned into exactly one packet of 1024 bytes? But in the LB side, the LB, when re-streaming my TCP data, was sending larger segments. Or so it would seem.
So it looks like I can’t get rid of the reverse-proxy that sits on top of some of our Ruby code. Oh well.

More on (Moron?) Ted Dziuba

(Math and pedantry ahead. Feel free to skip if that’s not your thing. More stuff about Node.js and this guy Ted Dziuba, who I hate.) To start, read Mr. Dziuba’s latest blog rant about Node.js.

I will make two main points here. First, that Mr. Dziuba does not intend to be comprehended; that he is deliberately phrasing points so as to confuse. Second, I will prove that his arguments are invalid. It appears to me that he has no interest in deriving any kind of truth or shedding any light on anything; he is merely trying to draw attention to himself, or draw blog traffic, or just make noise. I don’t know what his reasons are. I just know what he’s doing, and that seems to be being deliberately misleading. I would not feed the troll, as it were, except for the fact that his blog post is permanently on the internet and I’m going to end up being pointed to it someday if I ever propose an event-loop based solution for anything.

First off, let’s try to decipher what he’s saying; he does not do a very good job of being clear.

Let’s look at Theorem 1.

What does it actually say? Here’s an attempt at deciphering it.

He asks, let’s check out how things work when you have something that’s heavily CPU-biased (less I/O).

Note = the value ‘k’ is the ratio of I/C – in other words, for something really really IO intensive (I > C) then k should be ‘big’ (greater than one), and for I < C, k should be small (less than one). You can think of ‘k’ as the “IO-ish-ness” factor. It’s big for something that’s very IO-ish, and it’s little if it’s not. Why doesn’t he explicitly state the definition that k=I/C? Because he has no desire to be understood; he’s attempting hand-waving. Everywhere he uses this ‘k’ construct he could just as easily use I & C.

The important definition is: W=I+C=kC+C=(k+1)C

Therefore K is I/C

(k+1)C is equal to W = the wall clock time of I + C.

His theorem begins with the supposition:

1000/C > 1000N/(k+1)C

What does that mean?

Let’s change his equation to make more sense of it. Since (k+1)C=(I+C) by definition, he’s really just saying:

1000/C > 1000N/(C+I)

He’s trying to suppose that *IF* the number of times I can execute just the CPU-part of my event-loop code is greater than the number of times I can do that, threaded, but also with I/O time taken into account, *THEN* it must be the case that the number of threads I am using is one. Why would you make the argument like that? The same argument can be made, much more simply, by saying:

1000/(C+I) > 1000N/(C+I) only if N is one. But the problem is then you can see what he’s doing. He doesn’t want this, hence the pointless variable substitution.

Notice that ‘N’ factor on there? He is saying that a system with 2 threads runs twice as fast as a system with one thread. And apparently a system with 100 threads runs 100 times as fast as a system with one thread. I’ve worked with much software throughout my personal and professional life, and this supposition is not true. By this assumption, of course threads will always outperform event-loop software.

He attempts the same song-and-dance in Theorem 2. He still is making the assumption that N threads equals N*single-threaded performance.

It’s most clear in his “Practical Example.” There, you can see him making the n-threads-means-n-times-performance argument most clearly. If that’s true, why not 1000 threads? Why not a million?

Another point here. Why threads? Why not fully fork()’ed processes? His math (such as it is) still holds up just the same if you assume forked contexts as threaded. And, yet, none of his math requires threads instead of forks to run.

Effectively, he has proven that in a system with an infinite number of infinitely fast CPU’s, and infinite RAM, and zero threading context-switch time, and zero thread-accounting time, that threads are faster than events. Congratulations.

So there are my arguments as to why he is incorrect. Now I wish to ask questions about how he seems to be deliberately misleading.

First off – some stylistic questions. Why has he written his argument so obtusely? Why has he not shown his work? Why does he not explain what he’s supposing? He just throws symbols down, in beautiful little .PNG files, and runs off with manipulating them with algebra. That’s seems like he’s trying an “appeal to authority” via jargon. Why all the milliseconds everywhere? We’re in theoretical Comp Sci world now, why pick those units? It would appear that he has done so specifically to throw 1000’s in his equations everywhere, just to confuse things further.

Next – a more theoretical question. Why do things like the select() or poll() system calls exist? Or epoll or /dev/poll? Since they’re so “obviously” inferior to threading-based solutions, they shouldn’t exist at all, right? There should be no use for them. If I can always just use threaded I/O instead of event-looped, why use event-looped at all? It is, after all, very difficult to program.

And finally – why did Dziuba himself advocate for an event-based I/O solution – “eventlet” – in one of his own blog posts? He seems to have gotten quite the performance boost –

…but the one that really stands out in the group is Eventlet. Why is that? Two reasons:

1. You don’t need to get balls deep in theory to be productive with Eventlet.
2. You need to modify very little pre-existing code to adapt a program to be event-driven.

This all sounds great in theory, but I have actually made a large I/O bound program work using monkey patching and changing the driver. It is a piece of software that reads jobs from a queue and processes them, putting the result in memcached. For esoteric reasons I will not go into, the job processors could not thread the work, they had to fork. Using this setup, one production box with 8GB of RAM was consistently 7.5GB full. After a less than 5 line code change to the driver, that same production box uses only around 1GB of RAM consistently, and can handle 5 to 10x the throughput of the old system.

The answers to these questions I cannot be sure of. As much as I would like to imagine that Mr. Dziuba is simply terribly ignorant; it would seem far worse – that he just intends to say things that are untrue for the purpose of drawing attention to himself.

Node.js is not a cancer, you are just a moron

My tone is going to seem strangely even and un-ranty. This is because I am doing everything I can to keep myself from completely exploding when I read this bullshit that this moron is spewing. OK, that was a little ranty, but the rest will read evenly. Maybe.

So one of my programming friends posts an article at http://teddziuba.com/2011/10/node-js-is-cancer.html and says, “Ah, here’s what’s wrong with Node.js!”

The article is rather strongly written – “Node.js is Cancer”, “node.js nonsense”, “Node.js is a tumor on the programming community”, “completely braindead”, “Scalability disaster”, etc.

He then shows a Fibonacci sequence and how it performs badly under node.

The problem he has proposed is, fundamentally, CPU-bound. I wrote a version of it in C and it did perform faster than it did in Node, but still, the problem definitely took finite-time. My command-line Node.js version calculated the answer in 8 seconds, the C version did it in 4. I was rather impressed that Javascript (Node.js’s V8 engine) was able to come as close to C’s performance in pure CPU-bound execution.

The problem, and what the author perhaps misunderstands, is that this is not the situation in which Node is an ideal solution. I use Node.js in production for work – and I know of many other shops that do too. If the problems you are dealing with are CPU-related, Node.js will not help you. Node.js works well when your problems are I/O-related -e.g., reading something out of a database, running web servers, reading files, writing files, writing to queues, reading from queues, reading from other web services, aggregating several web services together, etc. The reason that this solution has become so popular of late is because these are the types of problems that are most common in web development today. Thus, node.js becomes a helpful arrow in one’s quiver with which to solve these types of issues.

Considering that the article’s author seems to have some level of experience, I wonder if his choice of skewed example was perhaps deliberate. He has other articles on his blog about other event-loop libraries. His comment at the bottom – “tl;dr – Node.js is an unpleasant software library and I will not use it” – is possibly the real source of his anger. And – an irrefutable point – if you don’t like something, you don’t want to use it, and he obviously doesn’t. That’s fine.

Node is a tool; one of many – no panacea. If you’re dealing with problems of ‘slow’ services that need to wait for various bits of I/O to complete in order to return a result – it can be a very powerful and useful tool. If you’re computing the fortieth member of the fibonacci sequence recursively, it won’t be.

The sad fact is that the author’s completely valid point – that Node.js isn’t a good tool for CPU-bound problems – is completely buried in his bile. This is because he never states that, explicitly. Node.js has other drawbacks as well – it’s very easy to end up in callback-spaghetti, it’s very minimal, and it’s very very very young. The database integration libraries have some pretty serious immaturity issues to work through; and I’ve had to code around a good deal of that.

It’s a tool that’s good at particular things, and I will continue to use it for those things. Those ‘things’ tend to be the bulk of what web development and web services development actually are. So when I can write a two hundred line program that can replace entire arrays of servers and interconnected services with just one server; I am going to do that, and I won’t feel particularly braindead in doing so.

node.js scope weirdity

So I’ve been using Node.js a lot in my new job. Quick note: it’s super awesome. The job, and node.js. Anyways. I’ve put together a couple of non-trivial pieces with it and one thing that keeps tripping me up is: when is my variable in and out of scope? So I thought I’d write this up to see if I can explain it.

First example

Let’s look at this simple server code – it’s just a dumb webserver (shamelessly stolen from the node.js home page) that says ‘hello world’ and spits out a connection count:

var http = require('http');
var conncount=0;
http.createServer(function (req, res) {
  conncount++;
  num=conncount;
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.write("Here is some stuffn");
  res.write("And the connection count is: "+conncount+"n");
  setTimeout(function() {
   res.end('Hello World: conn count was: '+conncount+' and my connection # is: '+num+'n');
        conncount--;
  },5000);
}).listen(1337, "127.0.0.1");


console.log('Server running at http://127.0.0.1:1337/');

So if I just curl that (curl http://localhost:1337/), I get:

Here is some stuff
And the connection count is: 1

…5 seconds pass, and then…

Hello World: conn count was: 1 and my connection # is: 1

So that seems to make some sense. However, what happens if I run the code 12 times? This:


Here is some stuff
And the connection count is: 1
Here is some stuff
And the connection count is: 2
Here is some stuff
And the connection count is: 3
Here is some stuff
And the connection count is: 4
Here is some stuff
And the connection count is: 5
Here is some stuff
And the connection count is: 6
Here is some stuff
And the connection count is: 7
Here is some stuff
And the connection count is: 8
Here is some stuff
And the connection count is: 9
Here is some stuff
And the connection count is: 10
Here is some stuff
And the connection count is: 11
Here is some stuff
And the connection count is: 12


…then 5 seconds elapse, then…

Hello World: conn count was: 12 and my connection # is: 12
Hello World: conn count was: 11 and my connection # is: 12
Hello World: conn count was: 10 and my connection # is: 12
Hello World: conn count was: 9 and my connection # is: 12
Hello World: conn count was: 8 and my connection # is: 12
Hello World: conn count was: 7 and my connection # is: 12
Hello World: conn count was: 6 and my connection # is: 12
Hello World: conn count was: 5 and my connection # is: 12
Hello World: conn count was: 4 and my connection # is: 12
Hello World: conn count was: 3 and my connection # is: 12
Hello World: conn count was: 2 and my connection # is: 12
Hello World: conn count was: 1 and my connection # is: 12

So my question is, why does it do that? Each execution of my function _should_ have its own stack, no? And so wouldn’t each stack have its own variables?

Now, mind you – I know a (horrible) way to fix this – wrap my setTimeout call in an anonymous function and pass ‘num’ as a parameter – but what I don’t really get is ‘why’? I threw this line all the way at the end (with apologies to Haddaway) –

setTimeout(function() { sys.debug("What is num! Baby don't hurt me, don't hurt me, no more..."+num)},10000);

(And I had to require('sys') at the top too)

And, in my terminal with Node running, I got:


DEBUG: What is num! Baby don't hurt me, don't hurt me, no more...12

What?! I would’ve expected ‘num’ to fall out of scope?! Why wouldn’t that function scope up there make ‘num’ only exist for this execution? Is there no concept of ‘stack’ or anything? And even if there wasn’t any, each execution of my function is an execution and should ‘freeze’ the variable or something, right? Apparently not.

So what happened? Well, I can tell you – that variable ‘num’ that I referenced, since I *didn’t* define it using ‘var’, is GLOBAL. So that’s why it’s acting so global. Simply adding ‘var’ to the definition (var num=conncount;) made it start working properly. E.g., after the delay, my output became:

Hello World: conn count was: 12 and my connection # is: 1
Hello World: conn count was: 11 and my connection # is: 2
Hello World: conn count was: 10 and my connection # is: 3
Hello World: conn count was: 9 and my connection # is: 4
Hello World: conn count was: 8 and my connection # is: 5
Hello World: conn count was: 7 and my connection # is: 6
Hello World: conn count was: 6 and my connection # is: 7
Hello World: conn count was: 5 and my connection # is: 8
Hello World: conn count was: 4 and my connection # is: 9
Hello World: conn count was: 3 and my connection # is: 10
Hello World: conn count was: 2 and my connection # is: 11
Hello World: conn count was: 1 and my connection # is: 12

Such a terribly easy way to blow up your javascript! So apparently node.js supports “strict mode” – just make the first line of your javsascript code say:

"use strict";

(Note, that’s just a string, with the quotes. A javascript parser will just ignore it if it doesn’t understand it. You could also put a line in the middle of your code saying "poop"; and it would be ignored the same way).
Now with strict mode enabled, the previous version of my code (without the ‘var’ declaration) says:


ReferenceError: num is not defined

So I think I’ll be using this from now on. Unless ‘strict’ mode starts making me crazy – which is certainly also possible.

Next Example

"use strict";
var sys=require('sys');
for(var i=0;i<10;i++) {
 setTimeout(function() {sys.debug("I is now: "+i)},1000);
}

(Notice how I've learned my lesson? Yeah, I don't need concurrency bugs biting me in the ass, thankyouverymuch.)

The output is, unfortunately:

DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10
DEBUG: I is now: 10

So this one - I definitely know how to fix. The problem is that by the time the timeout actually _fires_, the value of 'i' will be different - in this case, incremented all the way to 10. I need to somehow 'freeze' the value of i within the timeout.

So I would do:

"use strict";
var sys=require('sys');
for(var i=0;i<10;i++) {
 setTimeout((function(number) {return function() {sys.debug("I is now: "+number)}})(i),1000);
}

Which results in:

DEBUG: I is now: 0
DEBUG: I is now: 1
DEBUG: I is now: 2
DEBUG: I is now: 3
DEBUG: I is now: 4
DEBUG: I is now: 5
DEBUG: I is now: 6
DEBUG: I is now: 7
DEBUG: I is now: 8
DEBUG: I is now: 9

The problem is, that's ugly as shit. What better way to do it is there that's more readable, maintainable, debuggable, etc? And that function instantiation thing gives me the willies. Well, I don't know the best answer for that yet. How about this:

"use strict";
var sys=require('sys');
for(var i=0;i<10;i++) {
 (function(number) {
  setTimeout(function() {sys.debug("I is now: "+number)},1000);
 })(i);
}

(The output is still the same). That feels a little less awful and unreadable - and doesn't give me the anonymous-function-returning-function yucky feelings that the previous one did. (Though it still is effectively doing that, isn't it?) The crazy squiggly brace, close-paren, open-paren business is still a little awkward though.

A piece of advice I got from the node.js group seemed pretty sage, in terms of making this stuff more readable:

"use strict";
var sys=require('sys');

function make_timeout_num(number)
{
 setTimeout(function() {sys.debug("I is now: "+number)},1000);
}

for(var i=0;i<10;i++) {
 make_timeout_num(i);
}

(output is still the same again). And, wow, yeah, that's a hell of a lot more readable, at the expense of 4 or so more lines. But sometimes, logically, you don't want to split out your functions like that - if every time you need to freeze something in a scope you have to declare a function somewhere, your eyes will have to scan all over the place, and that could be ugly. So you could maybe declare the function within the for loop - though that's still in the global scope, it would just be for readability's sake.

I think I'll probably stick with the previous one, with the anonymous function declared in-line. It's not too insanely unreadable, and it's compact enough. If the contents of my anonymous function gets a few lines long, or gets a few variables deep, I might split it out into its own function for readability's sake.

Firefox 5 extensions compatibility woes

Didn’t find this on the intertubes anywhere, so I thought I would write it down to help somebody else with the problem.

The new Firefox 5 disabled a lot of my plug-ins. You can disable the plug-in compatibility check by doing the following:

Launch FF5, go to “about:config”

Right click to add a new property, make it of type ‘Boolean’, name: extensions.checkCompatibility.5.0

Set it to ‘false’.

Restart.

You’re welcome.

(Oh, and I should mention, any previously disabled plug-ins are now re-enabled. Good point, Beckley.)

A Modest Proposal…

Had a great discussion via Twitter and my blog’s comments about some stuff about IPv6 and other ways of handling the IPv4 address space shortage. Over the course of discussing things and arguing stuff and going back and forth, I think I’ve sharpened my idea.

IPv4+

IPv4+ is an extension of IPv4 address-space via backwards-compatible use of the IPv4 ‘Options’ fields. Two such options shall be used, one for an ‘extended-address-space’ destination, one for ‘extended-address-space’ source. If these fields are used, they MUST be the first two options in the packet. If both extended-source and extended-destination options are used, the destination MUST be first. This enabled (eventual) hardware-assisted routing at a fixed offset within the IPv4 header. Extended addresses grant an additional 16 bits of address space. If any routing decisions are to be made based upon extended address space, those SHOULD only be done at an intra-networking layer, within one autonomous system. The extended source and extended destination options are exactly 32-bits long, each. The format is as follows:

Bits 0 1-2 3-7 8-15 16-31
Field Copied Class Number Length Data
Values 1 0 5/6 3 Address Data

Option 5 will be used for Extended Destination, and Option 6 will be used for Extended Source. Perhaps additional options could be reserved and specified for future use as “Super-Extended Destination” and “Super-extended Source”.

IPv4+ addresses will be specified in text as having two additional octets – e.g. 72.14.204.99.56.43. The extra octets on the right-hand side correspond to the extra 16 bits of addressing data. An address with both additional octets of zero is understood to mean a legacy IPv4 node at that address. E.g. 72.14.204.99.0.0 means the IPv4 node at 72.14.204.99.

Operation of the protocol will be designed to be as backwards-compatible with Unextended IPv4 as possible.

IPv4+ nodes have both an IPv4 address – which may be an RFC1918 non-routable address, or a link-local address – as well as an extended IPv4+ address, which SHOULD be a routable IPv4 address plus 16 additional bits of identifying data. The legacy IPv4 address MUST be locally unique within its network segment. A backwards-compatible IPv4+ that uses an RFC1918 address for its legacy IPv4 address SHOULD (MUST?) be connected to a Router or Gateway that is capable of Network Address Translation.

As the protocol gains acceptance, core BGP routes MAY be extended to full /32 networks.

An IPv4+ node learns it is on an IPv4+ compatible network through an extra DHCP option, or it may be statically configured as such.

IPv4+ ARP protocol is not currently defined.

An IPv4+-aware gateway OR node MUST be aware of the mapping from IPv4+ addresses to legacy IPv4 addresses. The mapping SHOULD be programmatic – e.g. 192.168.1.2 corresponds to 72.14.204.99.1.2.

MOST implementations will likely be RFC1918 addresses for legacy IPv4, and routable IPv4 addresses as the first four octets of the IPv4+ address. So-called “Public Hosts” MAY exist at some point in which they have both a routable IPv4 address AND an IPv4+ address. The only purpose of such a host would be future-proofing – no real benefit is conferred, other than ensuring that software stacks can utilize extended addressing.

An IPv4+ gateway MAY define a ‘default host’ which should receive all unidentified legacy IPv4 traffic, or it may drop any such packets, or it may use a simple heuristic such as ‘lowest address wins’.

“IPv4+ ONLY” hosts cannot exist. Non-legacy-routable IPv4+ hosts could exist by the local gateway refusing to NAT addresses by responding with ICMP Destination Unreachable or a new ICMP message.

Software implementations SHOULD embed 48-bit IPv4+ addresses in their existing IPv6 software stacks – which have already been implemented and rolled out. A special segment of IPv6 space SHOULD be allocated and reserved for this embedding to ensure no collisions occur if IPv6 were to become more widespread.

Interoperability Scenarios

two IPv4+ nodes on the same network

MUST use their legacy IPv4 addresses to communicate. An IPv4+ node can identify the IPv4-legacy address that corresponds to the other node because of the nodes required knowledge of mapping between IPv4+ and IPv4-legacy addresses.

two IPv4+ nodes on different networks with through IPv4+ transit

IPv4+ packets are sent and received through extended IPv4+ addresses.

IPv4 node and IPv4+ node on same network

An IPv4+ node will use its IPv4 address and the IPv4 protocols to contact the ‘legacy’ node. The IPv4+ node can identify the IPv4 node via its legacy 32-bit address.

IPv4 node and IPv4+ node on different networks, IPv4+ aware router on IPv4+ network

The IPv4+ node uses its legacy IPv4 address to talk to the IPv4 node. The IPv4+-aware router NAT’s the traffic to the IPv4 node.

IPv4 node and IPv4+ node on different networks, IPv4-only router on IPv4+ node’s network

Interesting – at some point the IPv4+ node MUST become aware that it does not have through connectivity to the other IPv4+ node, and must fall back to using its legacy address. The IPv4 gateway or router will not be able to communicate to the IPv4+ node because the IPv4+ node will ‘appear’ to be transmitting packets from an incorrect address. The IPv4+ node SHOULD eventually disable IPv4+ connectivity as it will be unable to communicate to any other devices. An ICMP ‘Ping’ exchange may be employed to determine that the gateway is not IPv4+ aware by examining the return packet’s IPv4 options (if any).

Migration Scheme

“Access” routers (small office, home office) need new firmware to handle iPv4+. IP stacks in major operating systems need extensions for IPv4+. ISP’s who do not firewall their customers do not need to do anything. IPv4+ compatible applications now have restored end-to-end connectivity.

Eventually, DNS extensions SHOULD be created to permit returning extended IP4+ addresses for services.

An ARP extension MAY be at some point required for nodes that do not require legacy connectivity.

BGP routing will eventually need to be broadened to support /32 extended networks (a /32 network could correspond to a full 65,000 host internetwork).

Interior routing protocols MAY need to be extended to make routing decisions based on extended IPv4+ addresses. The header order and length has been optimized to make this as painless as possible, but it may still be painful.

An additional 8-bits of address space can be reclaimed once all routers are compatible with IPv4+ – the length byte can be counted as an address space indicator, which is 3 for all initial IPv4+ networks, but could be allowed to vary once the entire Internet is switched over to IPv4+.

Concerns

This is a stopgap.

Will routers in the ‘wild’ strip options they do not understand? Will they ever reorder or muck with options? Do we really have to steal that one byte in the option to say ‘3’?

Will IPv4 live on forever? Will routers always have to handle all the crazy NAT and other stuff that will be a legacy of ‘legacy’ IPv4?

Will the only additional ‘feature’ of this IPv4 thing be better BitTorrent nodes?

BIG CONCERN will network devices get confused about seeing IP packets with “apparently” identical IPv4 addresses (really extended IPv4+ addresses) and freak out?

10.0.0.0/8 network is too large to map all of its hosts as IPv4+ addresses to use one legacy IPv4 address. It might require a full Class-C allocation of legacy IPv4 addresses to map every possible host. Yuck.

Should this protocol actually attempt to map full IPv6 addresses into appropriate extension headers?

IPv6

So, two things about IPv6 – first, a little bit about how to do it if you’re all Mac’ed up like me, and then, a little rant.

The easiest way to get IPv6 working it is to grab a copy of Miredo for OS X. This lets your mac, pretty much automagically, get a connection to the IPv6 Internet via an IPv4 tunnel anywhere that you have IPv4 connectivity. It’s nearly painless, and at that point, you can start to at least do some basic playing around with IPv6 stuff. I enabled IPv6 on my home network, but I still have Miredo installed but deactivated if for some reason I wanted to use it when I’m at a coffee shop or some other network.

The good way to do it is to go to tunnelbroker.net and sign up (it’s free!). Then configure your Airport Extreme to do tunneling by following these directions. Voila. Now you have IPv6 connectivity to the intarwebs…or the ip6ernet. Whatever.

The best way to do it – and I haven’t done it this way – is to actually get IPv6 connectivity from your ISP – no tunneling or anything, just native connectivity. I can’t do this because Time Warner doesn’t give me that, or maybe my Airport isn’t so good at doing that. I don’t really know.

So far, the one thing I can see here is that you could begin to use this IPv6 connectivity to work around the general destruction of the internet any-to-any principle – the idea that any IP address on the internet should be able to contact any other. This is basically no longer the case, as many people use RFC1918 addresses behind NAT to conserve IP addresses (and also there are some positive security implications). So my computer at 10.0.1.2 can’t necessarily talk directly to your computer at 192.168.1.2 (or, even worse, your computer at 10.0.1.2 but behind your NAT, and not mine). The way we work around this type of things is all kinds of magical firewall port-mapping and other such things. It’s a pain in the butt. Services like AIM’s ability to send files, or various screensharing utilities all now require some kind of centralized server that everyone can connect to because just about every network-connected computer tends to be behind a NAT. That centralization is unfortunate, and a drain on services that really should just be about connecting anyone to anyone.

But if you have IPv6 set up in the ‘good’ way listed above (or ‘better’), you actually have a new option. You can un-check “block incoming IPv6 connections” on your Airport, and now have access to anything in your network that speaks IPv6 from the outside world (again, so long as the outside world is IPv6). Of course, big security implications here, but that could actually be a way of making IPv6 somewhat (remotely) useful. Things that like this type of connectivity might be: BitTorrent-esque things…peer-to-peer video applications…some kinda of home-hosting things…I dunno. That type of stuff. But, in short, while at Starbucks, I could fire up my Miredo-for-OS X client, and connect to various things in my home. That could be useful for some people.

My experience with this new setup is rather underwhelming. I can go to ipv6.google.com. I guess on World IPv6 day I’ll be able to…somehow…enjoy some festivities or something. I don’t really have any home servers nowadays.

<Begin Rant>

Who the fuck came up with this stupid-ass migration plan? It has to be one of the dumbest things I have ever seen. IPv6 the protocol is OK (at best)…it really feels pretty close to IPv4, except with a bigger address space. OK, I guess. DJB (who is brilliant, but I think may be batshit insane) sums up the problem really well.

In short, there’s negligible benefit for going to IPv6. You can’t really get anywhere you couldn’t get to anyways. If IPv6 had been designed to interoperate with IPv4, we would be far closer to being in a happy IPv6 world – think about how many machines are dual-stacked right now? Those machines would instead be single-stacked, and some early adopters, or price conscious people (think: Web startup types who like to skip vowels in their domain names) might be able to start offering IPv6 only services, and would be able to start hitting users right now. But, no. The migration scheme seems to be:

  1. Migrate everyone and everything to IPv6 now

And you’re done! Isn’t that easy? The standard has been out for a bajillion years. The IPv4 shortage has been a problem for a bajillion years. And we’re still here. Not because the protocol for IPv6 is flawed, but because there’s no migration scheme at all. There’s no backwards compatibility. This whole infrastructure has to layer over the entire internet. Who the hell thought this was a good idea? I mean, sure, it’s “simpler”, protocol-wise, to do that…but a few more years of protocol engineering instead and a true backwards-compatible solution and we would’ve had people switching ages ago. Go look at how many transition mechanisms are in place for IPv4-to-IPv6. It’s stupid. That alone indicates the level of FAIL that is likely here.

The other problem I have with IPv6 has to do with routing tables. And protocol stacks. Right now, to do any non-trivial amount of TCP/IP networking (let’s imagine HTTP for this example), you need:

  • DNS
  • some kind of routing protocol has to be working right
  • ARP to figure out how to get to your local endpoint
  • DHCP to figure out what your IP address is going to be

Network troubleshooting ends up being an interesting and non-trivial problem of figuring out who can ping who (whom? Grammar fail. Sorry), what routing tables look like on various intermediate devices, what IP address you get from DNS, is your DNS server working, etc, etc. It’s a muddle, but it’s a muddle that’s been treating us well on this whacky internet of ours.

However, in the IPv6 world, we now have – the entire protocol stack for IPv4, PLUS a protocol stack for IPv6, and a crazy autotunneling doodad with a weird anycast IPv4 address (oh, that’ll be fun). And a routing table that is exploding out of control. I mean, my dinky little home network (theoretically) gets a /64 network. If every Time Warner customer gets a /64 – and there’s not some means of aggregating routes together – the routing table completely goes insane. Now I’d assume that TW would aggregate its customers into a /48 or something – god, I hope so! But still, we’re talking about a world where there are networks all over the place.

There’s a big question as to whether or not people ought to get provider-independent network addresses or not. I think I know the answer to this: No, they should not. It’s suicide. I think the real solution for this is at the DNS level – you should get addresses that correspond to your rough physical place on the internet to keep the routing tables somewhat simple, and if you want to move endpoints around, you change DNS entries. Get away from thinking of IP’s as static. If DNS were baked deeper into the protocol stack, this could work extremely well. Want to have a webserver at www.whatever.com? If you have some kind of authorization, your webserver would come up and use some kind of key exchange to somehow tell DNS that it is www.whatever.com. If you move, you just move your webserver. Your keys still work. If you set up a webserver in your house – same thing. Anyways, that’s just hand-waving. There still would have to be some way of bootstrapping that (like, what IP address do I contact the webserver at? Maybe you find that out by talking to your local gateway? Dunno)

<End Rant>

I guess that 1) wasn’t a little rant and 2) was a little rambly. So sue me.

ucspi-tcp and stupid errno.h (CentOS and ucspi-tcp)

I keep running into this and doing my standard google-up-the-answer-routine didn’t seem to be working.

In short, ucspi-tcp doesn’t compile on CentOS boxes (or RedHat boxes). Cuz DJB doesn’t “believe in” RedHat’s “you must have an errno.h” thing. Hey, I love DJB, and his software, but I also think he’s impractical and a nutjob sometimes. This would be one of those times.

Lots of folks had patch-related ways of fixing the problem, I thought those seemed rather laborious. I just stole The Internet’s method for another DJB package.

Just append -include /usr/include/errno.h at the end of the first line of conf-cc so it looks like this:

gcc -O2 -include /usr/include/errno.h

This will be used to compile .c files.

Boom, everything works now.

Even Mo’ Math…

So Beckley got a hold of the MetroCard Math site and built on top of David’s fantastic work to build even more prettiness, neat-workingness, and general niftitude into the site.

We also put in a thingee – well, by ‘we’ I mean ‘he’ – he put in a thingee that lets you see how the new price changes will affect you. For me, I definitely will be sticking with the pay-per-ride.

And another thing – I actually tested the new (divisible-by-a-nickel) magic number, and it *does* work. My MetroCard has an exactly even number of rides on it. Cool. Now I just have to do something with all these MetroCards that have 10 or 20 cents on them – perhaps a new part of the site that lets you put in how much money is on your cards, and then it tells you how much more to put on to get it ‘even’? Not a bad idea…

Gory Details: so, talk to any computer sciencey person and they will always tell you that Floating Point Math is Hard. I have only rarely run into this, but the rounding algorithms are very specific when you buy stuff, and if you’re off by a penny, then, well, you’re off by a penny, and things stop working. We found a couple of minor (off-by-one) bugs here and there, and every time it seems like I fixed one, the rest of the results would start to go haywire. The real problem is that I am trying to ‘move’ the rounding around the formula:

round_for_money($x * 1.15) = n * $2.25

Now solve for ‘x’, and let ‘n’ be any integer – well, that pesky ’round()’ is in the way, and if you just try to move it to the other side, or round at some random and/or inopportune time, then when you get back to the original equation, sometimes the numbers don’t work out anymore. It sucks.

So I racked and racked my brain trying to figure out a way to do my simple solve-for-x routine. I really just want to try different integers for ‘n’ until I find an answer that’s “acceptable.” But that doesn’t work. At all. Or at least, I don’t know what mathematical operation I can do to move that round() function off the left side so I can try to have a formula that points to ‘x’.

What did I do finally? I gave up. I left the formula as it is above, and just run ‘x’ from 0 to “a lot” (a thousand bucks or a hundred bucks I think?). The answer I get is going to be completely accurate, but it wastes computing power. Well, too bad, your browser has to do a little bit of multiplication in a loop. My condolences. But! The result is, I’m pretty convinced my answers are to-the-penny accurate now. We’ll see when the big price change kicks in.

Thanks again to David Dominguez for the initial switch to jQuery-powered MetroCard Math, and thanks to Beckley for the full re-skinning he pulled off.

More Metrocard Math…

So I’ve updated my Metrocard Math site.

First, my friend David Dominguez helped out to make it much, much prettier. He also added some jQuery magic, and changed up a significant amount of how the site is structured. I was trying a weird idea – where I would strip the markup down to its most basic elements, and style it from there using cleverly-constructed css selectors, but I don’t think it worked out. My friend Bryan tried to restyle it as well, and the rigidity of the markup basically stopped him in his tracks. So, anyways, now it looks prettier and is definitely more usable on my phone.

I also had tried to buy a metrocard for one of the Magic Number amounts the other day at a vending machine, and it was rejected due to “invalid amount.” Stupid. It had worked before. I tried the small number. I tried the big number. Nothing worked. On a hunch, I tried $11.75 instead of $11.74. Success. And of course, I will eventually have a metrocard with a penny on it. So apparently, the number has to be divisible by 5? So I’ve added that to the site, and we’ll see when I next buy a metrocard if the new system actually works. I hope they don’t make it where it has to be divisible by $0.25, that would really sting.

I still want to do something where you can toggle between the current prices and the newly announced ones. But right now you can just type in the new numbers – Here’s what they are according to the Queens Chronicle (which I used to consult for a million years ago!) $104 is the new 30-day, $2.50 is the new single ride, and $29 is the new 7-day. The one-day funpass is going to be eliminated and so will the 14-day unlimited. Oh, and I hadn’t seen this before – there’s now going to be a $1 surcharge every time you pick up a new metrocard (though that doesn’t start till some time in 2011) OUCH. That means when you leave your metrocard at home and have to buy another one it’s *really* going to sting. One more extra buck. Damn. I mean, you can still use the lastest magic number ($15.65 I believe? Though I worry my rounding might not match the MTA’s…), but you definitely will not want to be throwing out your metrocards anymore.