Follow up on Amazon Elastic Load Balancers and multi-AZ configuration

I got a really good comment on my blog a day or so ago from a guy by the name of Mark Rose (that’s the only link I have for him, sorry!) He mentioned that AWS multi-AZ load-balancing happens via DNS – which intrigued me – so I thought I’d mess with my test load balancer and see.

He explained that each AZ gets its own DNS entry when you look up the load balancer – and that meshes exactly with what I’m getting. I do the DNS lookup for the LB, and get two IP addresses right now – and I’m assuming that each one corresponds to one of the LB’s.

But Amazon does some interesting DNS stuff – for instance, if you look up one of your ‘public DNS names’ of your instances from the _outside_, you get the instance’s outside IP address. But if you look it up from the _inside_, you get the inside IP. I use this for configuration settings, when I want an instance to have a relatively-static internal IP. Instead of trying to pin down the IP, I set up an elastic IP for the instance, and use the permanent public DNS name for that IP as the _internal_ hostname for the instance. This way, if the instance reboots, I just have to make sure that the elastic IP address I’ve configured is still associated with it, and everything still works internally.

I assume that traffic to the inside IP address is faster than bouncing back outside to the public address, then going back inside. I definitely know that it is cheaper – you don’t pay for internal network traffic, only external.

So my question is – what does it look like when you try to resolve the load balancer’s DNS name from the _inside_ of Amazon AWS? Do you get the same outside IP addresses, or do you get internal ones instead? Since it seemed like AWS traffic ‘tends’ to be directed back to the same AZ it originated from, I expect to get different answers.

So here’s what I did. Set up an ELB with two AZ’s – us-east-1a and us-east-1e. I installed apache and launched it on both. As soon as the ELB registered the instances as ‘up’, I did a DNS lookup from the outside to see what it resolved to.

I got exactly two addresses – I’m assuming one points to one AZ, one to another.

Then, I tried to resolve the same ELB DNS name from the _inside_. Weirdly enough, I *still* get both (outside) IP addresses! I didn’t expect that.

So now, I wonder, is there anything to ‘bias’ the traffic to one AZ or another? Or is it just the vagaries of DNS round-robin that have been affecting me?

I changed the home pages on both apaches to report which AZ they’re in. I then browsed-to, and curl’ed, the ELB name. The results were surprisingly ‘sticky’ – on the browser, I kept seeming to hit ‘1-a’. On curl, I seemed to keep hitting 1-e.

What if I specifically direct my connections to one IP or another? Let’s see.

Since the ELB IP addresses seem to correspond, one-to-one, with AZ’s, I thought I would be able to curl each one. I did, and consistently got the same AZ for each IP. One seems to be strongly associated to 1-a, and one to 1-e.

So it seems the coarseness of the multi-AZ ELB load-balancing can be fully explained by the coarseness of using round-robin DNS to implement it.

Something else to note – it seems like the DNS entries *only* have 60 second lifetimes. With well-behaved DNS clients (of which I will bet there are depressingly few), you should at *least* be able to end up changing the AZ you’re banging into every 60 seconds. However, in my testing – brief though it may be – it seems to stay pretty ‘sticky’.

So what does this mean? I dunno – I feel like I want to do multi-AZ setups in AWS even less now. round-robin DNS is old-school, but at large enough scales does generally work. Though I wonder if heavily-hit web services API’s like the ones my company provides fit will enough into that framework? I’m not sure.

Session stickiness and multi-AZ setups

Another question – how does this affect ‘stickiness’? You can set an LB to provide sticky session support – but with this IP address shenaniganry, how can that possibly work?

Well, weirdly enough – it actually does.

I set an Amazon load-balancer-provided stickiness policy on my test LB. I curl’ed the name, and got the cookie. I then curl’ed the individual IP addresses for the load balancer, with that cookie set. And now, no matter which IP I hit, I keep getting the results from the same back-end server. So session-stickiness *does* break-through the load-balancer’s IP-address-to-AZ associations, to always keep hitting the same back-end server.

I wonder, what does the AWS-provided cookie actually look like? It seems like Hex, so let me see if I can decipher it.

Since I don’t know if anything scary is encoded therein, I won’t post my cookie here, but when I tried to decipher it, I just got a bunch of binary gobbeldygook. Stayed consistent from request-to-request (maybe modulo time, not sure), so probably just encodes an AZ and/or an instance ID (and maybe time).

Implications

So since AWS exposes some of the internal implementation details of your load-balancer setups, what does this mean? It certainly does imply that you can lower the bar for DoS’ing a website that’s ELB-hosted by just picking one of the ELB IP’s and slamming it. For a two-AZ example – as opposed to having to generate 2x traffic to overwhelm a site, you can just pick one IP and hit that one with 1x and have the site go half-down from it.

Considering the issues I’ve actually run into from having autoscaling groups that won’t scale because only one AZ is overwhelmed, I wonder if it makes sense to only have autoscaling groups that span a single AZ?

And it also seems to imply that you can DoS an individual server by hitting it with a session-cookie that requires it to always hit the same back-end server. So perhaps, for high-performance environments, it makes sense to stick with shared-nothing architectures and *not* do any kind of session-based stickiness?

RightScale-to-Native Amazon Web Services (AWS) Name Synchronizer

At my company, we use RightScale for a lot of our Amazon Web Services management. It’s a pretty neat service – sort of “training wheels” for the cloud. Still provides us a lot of value.

But sometimes I like to log directly into the AWS console. Especially to find out when Amazon has scheduled reboots of our servers. Before I wrote this script, I would log in to find a whole bunch of instances running with no names. Then I’d have to go look them up in RightScale. Why can’t RightScale just name your Amazon instances with the right names?!

Well, I finally took matters into my own hands and built the following script. It walks through all of your RightScale servers, and finds the associated Amazon instances and sets their name attributes to the RightScale “nicknames.”

And I got permission from my job to make it available to the public – so here it is:

https://github.com/uberbrady/RightScaleNameSynchronizer

Yes, it is not the prettiest code I have ever written, but it does the trick. If someone wants to make it prettier I am definitely open to pull requests.

One thing I have noticed is that when you ‘relaunch’ a RightScale instance, the new instance will come up without an AWS name. If you re-run the script that will fix that. Also, if you use any RightScale arrays, the same thing can happen during scale-up/scale-down events.

ucspi-tcp and stupid errno.h (CentOS and ucspi-tcp)

I keep running into this and doing my standard google-up-the-answer-routine didn’t seem to be working.

In short, ucspi-tcp doesn’t compile on CentOS boxes (or RedHat boxes). Cuz DJB doesn’t “believe in” RedHat’s “you must have an errno.h” thing. Hey, I love DJB, and his software, but I also think he’s impractical and a nutjob sometimes. This would be one of those times.

Lots of folks had patch-related ways of fixing the problem, I thought those seemed rather laborious. I just stole The Internet’s method for another DJB package.

Just append -include /usr/include/errno.h at the end of the first line of conf-cc so it looks like this:

gcc -O2 -include /usr/include/errno.h

This will be used to compile .c files.

Boom, everything works now.

Even Mo’ Math…

So Beckley got a hold of the MetroCard Math site and built on top of David’s fantastic work to build even more prettiness, neat-workingness, and general niftitude into the site.

We also put in a thingee – well, by ‘we’ I mean ‘he’ – he put in a thingee that lets you see how the new price changes will affect you. For me, I definitely will be sticking with the pay-per-ride.

And another thing – I actually tested the new (divisible-by-a-nickel) magic number, and it *does* work. My MetroCard has an exactly even number of rides on it. Cool. Now I just have to do something with all these MetroCards that have 10 or 20 cents on them – perhaps a new part of the site that lets you put in how much money is on your cards, and then it tells you how much more to put on to get it ‘even’? Not a bad idea…

Gory Details: so, talk to any computer sciencey person and they will always tell you that Floating Point Math is Hard. I have only rarely run into this, but the rounding algorithms are very specific when you buy stuff, and if you’re off by a penny, then, well, you’re off by a penny, and things stop working. We found a couple of minor (off-by-one) bugs here and there, and every time it seems like I fixed one, the rest of the results would start to go haywire. The real problem is that I am trying to ‘move’ the rounding around the formula:

round_for_money($x * 1.15) = n * $2.25

Now solve for ‘x’, and let ‘n’ be any integer – well, that pesky ’round()’ is in the way, and if you just try to move it to the other side, or round at some random and/or inopportune time, then when you get back to the original equation, sometimes the numbers don’t work out anymore. It sucks.

So I racked and racked my brain trying to figure out a way to do my simple solve-for-x routine. I really just want to try different integers for ‘n’ until I find an answer that’s “acceptable.” But that doesn’t work. At all. Or at least, I don’t know what mathematical operation I can do to move that round() function off the left side so I can try to have a formula that points to ‘x’.

What did I do finally? I gave up. I left the formula as it is above, and just run ‘x’ from 0 to “a lot” (a thousand bucks or a hundred bucks I think?). The answer I get is going to be completely accurate, but it wastes computing power. Well, too bad, your browser has to do a little bit of multiplication in a loop. My condolences. But! The result is, I’m pretty convinced my answers are to-the-penny accurate now. We’ll see when the big price change kicks in.

Thanks again to David Dominguez for the initial switch to jQuery-powered MetroCard Math, and thanks to Beckley for the full re-skinning he pulled off.

More Metrocard Math…

So I’ve updated my Metrocard Math site.

First, my friend David Dominguez helped out to make it much, much prettier. He also added some jQuery magic, and changed up a significant amount of how the site is structured. I was trying a weird idea – where I would strip the markup down to its most basic elements, and style it from there using cleverly-constructed css selectors, but I don’t think it worked out. My friend Bryan tried to restyle it as well, and the rigidity of the markup basically stopped him in his tracks. So, anyways, now it looks prettier and is definitely more usable on my phone.

I also had tried to buy a metrocard for one of the Magic Number amounts the other day at a vending machine, and it was rejected due to “invalid amount.” Stupid. It had worked before. I tried the small number. I tried the big number. Nothing worked. On a hunch, I tried $11.75 instead of $11.74. Success. And of course, I will eventually have a metrocard with a penny on it. So apparently, the number has to be divisible by 5? So I’ve added that to the site, and we’ll see when I next buy a metrocard if the new system actually works. I hope they don’t make it where it has to be divisible by $0.25, that would really sting.

I still want to do something where you can toggle between the current prices and the newly announced ones. But right now you can just type in the new numbers – Here’s what they are according to the Queens Chronicle (which I used to consult for a million years ago!) $104 is the new 30-day, $2.50 is the new single ride, and $29 is the new 7-day. The one-day funpass is going to be eliminated and so will the 14-day unlimited. Oh, and I hadn’t seen this before – there’s now going to be a $1 surcharge every time you pick up a new metrocard (though that doesn’t start till some time in 2011) OUCH. That means when you leave your metrocard at home and have to buy another one it’s *really* going to sting. One more extra buck. Damn. I mean, you can still use the lastest magic number ($15.65 I believe? Though I worry my rounding might not match the MTA’s…), but you definitely will not want to be throwing out your metrocards anymore.

Lightdesktop tweaks

The console font wasn’t fixed-width so using the console was driving me crazy. Fixed. I changed the filesystem to point to the new domain (big pain in the ass). Tweaked the installer and filesystem so the /boot directory is fully under the control of crestfs. New parallel version of crestfs. Fewer pauses, much good. Added make and gcc and lots of stuff so you can now compile things (still not self-hosting though). Put in an ACPI daemon so you can close your laptop lid and the system might go to sleep (doesn’t work perfectly yet). /etc/ is the next directory to get taken over by crestfs, but will be a bit of a challenge because some things like to write in there, and there are a couple of very odd symlinks that point to /proc or /tmp, and crestfs won’t let you make symlinks like that.

Ran into some huge disappointments with davfs2 though – write performance isn’t very good, It won’t let you connect if you don’t have an internet connection, and you apparently can’t make symlinks, and all of that really really sucks. I’m going to replace it with something that I figure I will eventually merge back into crestfs. I figured I was going to have to do this eventually, but it came sooner than I had hoped. I already have code for doing HTTP GET requests, and directory listings and so on – basically the ‘read’ side of the equation – so I don’t think it will be too horrible to get the write side going with PUT and POST (what I’ll be using for symlinks) and DELETE…

Braydix…ease of use?!?!!?

My Macbook pro went in the shop again, and instead of spending $100 to rent another one, I spent $300 to buy a cheap, crappy Toshiba laptop at BestBuy. It was not a fun experience, but the laptop worked as well as one could expect. Vista is actually as bad as they say, but mostly only in the networking – of which I do alot, so I am a bit biased.

So once I got my craptop what would be the first thing that I would do! But try to install Braydix. Alas, ’twas not so easy…

First off, the CDROM detection code wasn’t being ‘patient’ enough to let the drive spin up. I worked around this by manually walking through the boot code and typing it line-by-line…all 30 or 40 lines of it. Not fun. It would then crash in some other spectacular manner, later on. But I did notice the terminal window coming up very quickly ๐Ÿ™‚

So I knew I had some real work to do once I got my Macbook back (upon which my development environment for Braydix lives).

First, I made the CDROM detection more patient – well, instead of making it patient I made it insistent. It keeps looping around and around until it finds what it needs to boot from. Which may be forever, but if so, too bad. Poor computer. Turns out that Craptop would end up finding the CD around the third pass or so through the loop. OK, fine.

Having burnt around…oh, 2 or 3 CD’s for the purposes of booting from, I was now thinking that it might be nice to boot from a handy-dandy USB memory stick I have hanging around, so I don’t keep wasting CD’s. Especially since I have the image around 32 MB, with some crunching and caution, I bet I could get it to fit on a 32MB USB memory stick. So that required some rejiggering, and I made my development environment better in the process – so everybody wins. Now we can boot off of USB, and Boot from FAT filesystems. (edit – just found some stuff to yank – 31MB FTW…edit 2 – oh, and I had to add more stuff, so now 34 FTL ๐Ÿ™ )

In the course of doing my boottime diagnostic, I couldn’t remember what the various boot functions I had created were for. I had to go read the config file to figure out. That sucked, so I also made a little menu (only shows up if you hold down Shift while the system is booting. This is diagnostic Wizard stuff, not even power-user stuff for the most part).

Two important things – this boots to a terminal window, not the browser. This is a placeholder for eventually making some kind of launcher-thing.

To finally get the craptop to actually work with this stuff for WiFi, I had to dig around to figure out the WiFi story. Apparently, it’s WiFi chip is a USB based one…whoops! So I had to add all kinds of terrible and crazy USB device detection, etc, etc. Not at all fun.

I rewrote a huge chunk of CREST-fs, the internet filesystem upon which the bulk of the rootdisk lives. It performs much better and is more aggressive about pre-fetching things that will help it later on. I enjoyed deleting lots and lots of complicated code, best feeling as a programmer, when you’ve out-eleganted yourself at something.

I created a ‘config’ directory scheme that persists across reboots (living on your hard disk, actually) for things like WiFi passwords. Who wants to keep typing those in?! I eventually imagine that I might store video display resolutions and other such ‘little, trivial, machine-specific’ things in there. Maybe the root URI of your home directory (*NOT* your password! Too dangerous!)

I think I’ve come up with a new name which I’m not going to share yet, but I’m still simmering on.

ISO: Infinix4.iso

FAT filesystem…blob thing: fatimage.fat.

Notes:

I don’t know an easy way to make a USB bootable flash disk. This is the method that works for me, on a mac. I attach my USB-key thingamadojie. I go into Disk Utility. I tell it to unmount (Not eject!) the partition. I go into terminal and say dd if=whatevermydiskimgis of=/dev/diskBLAHsBLAH (My Mac chose /dev/disk1s1, but Disk Utility should tell you for certain). For some freaky reason, even though the partition sizes do not match, this seems to kinda work. Don’t ask me. Your USB key must have an MBR partition scheme (pretty much standard), ‘normal’ MBR boot code (also kinda standard?), and EXACTLY ONE partition marked as ‘active’ (fiddle with fdisk -e to make this so, should show up with an asterisk). Apparently, according to what I’ve read, this is how USB key flash diskey things come from the store, but YMMV. This blows out your entire partition. This can fit into 35 MB or so, but it will make the math look funny – e.g. “256MB drive, 34MB in use, 1MB available”. That’s expected. There are other ways to do this with DOS exe’s and other such crap that I really don’t want to mess with, so suck it up, too bad!

divs vs. tables, part II – the compromise (maybe?)

<div class=’tablesque’>
   <div class=’rowesque’>
      <div class=’cellish’>A</div>
      <div class=’cellish’>B</div>
   </div>
   <div class=’rowesque’>
      <div class=’cellish’>C</div>
      <div class=’cellish’>D</div>
   </div>
</div>

stylesheet:
.tablesque { display: table; }
.rowesque { display: table-row; }
.cellish { display: table-cell; }

There – it looks like a table, because you told it to look like a table in the CSS. But the markup doesn’t say it’s a table – it just says you have a hierarchy.

I sorta fell into this idea because I’m working on making a web application work for iphone or for a regular browser, and in the plain browser context I wanted something to be a table, but on the iphone, I wanted it to act more like spans and divs.

To give you an idea of what a moron I am, you should know my first idea was to have a big table, and on the iphone, do things like: display: block, display: inline, etc. But the iPhone (and even Safari on the desktop) had problems with letting me convince it to display tables as non-tables. So finally I switched it to divs, and made the regular browser side do display: table, display: table-row, display: table-cell. And that seems to work okay for now.

So, standards people, there, I’m standardy. My ‘layout-like-a-table’ CSS is all in the CSS. I think this CSS looks a hell of a lot prettier than the crazy ‘float, clear, width, etc’ routines. And it should stretch better based on its contents รก la tables.

As a bonus, within the table DOM stuff I don’t have mysterious invisible ‘tbody’ tags that chuck themselves in my table. I lost 3 or 4 hours to that a while ago.