Practical experience with Mongo, and why I do not like it, in terms of Money and Time

For my job, I inherited a Mongo architecture. I resolved to learn it – and it still runs to this day, ticking along quite nicely.

This is what my feelings are about the platform having actually used it in production – not on little toy projects. Our main MongoDB server is a 67 GB RAM AWS instance, with several hundred GB of EBS storage.

First, the good parts:

It’s super-duper easy to set up and administer. Pleasant to do so, in fact.

Javascript in the console is a remarkably useful thing to have handy. I’ve used it for quick proof-of-concepts and tests and whatnot – really good to have.

It’s really nice to develop against. Not having to deal with schema changes, and being able to save arbitrary column-value associations makes life really easy.

And now, the bad (from unimportant to important)

Doing anything beyond a really trivial or simplistic query in the console is surprisingly annoying:

db.tablename.find({"name": {$gt: "fred"}})

Not a dealbreaker or anything, just annoying. name > "fred" would be nicer.

The default ‘mode’ of Mongo is surprisingly unsafe. I found drivers (could be the driver’s fault, might not be Mongo’s) that return _immediately_ after you attempt a write – without even making sure the data has touched the disk. This makes me uneasy. But there are probably use cases for this kind of speed at the expense of safety. But I don’t like it. And this is opinion, so I’m saying it. There _are_ modes that you can use (and I do use) that slow down the writes to make sure they’ve at least been accepted. But, in the conventional mode, if we have a crash before the writes have been accepted by Mongo, the data is gone. This has happened to us. Usually we have failsafes in place to ensure that the data eventually gets written, but it costs us time. We’re mortal; we’re going to use the defaults we get until they don’t work for us, then we’re going to change them.

Because there is no schema, a mongo “collection” (fuck it, I’m going to call it a table) takes up more storage than it should. Our large DB would be far smaller if we defined a schema and stored it in an RDBMS. This space seems to be taken up in RAM as well as disk. This costs us more money.

MongoDB starts to perform horribly when it runs out of memory. This is by design. But it’s annoying and it costs us more money and time because we have to either archive out old data (which we do in some cases), or use a larger instance than we ought to have to (which we also do). And even if you delete entries, or even truncate a table, the amount of space used on disk remains the same (see below). More money.

MongoDB will fragment the storage its given if your data gets updated and changes size. This caused us to end up storing around 20-30 GB of data in a 60-some-odd GB instance. And then we started to exhaust RAM, and performance plummeted. So we needed to defrag it. More care and feeding (time) that I didn’t want to spend.

So to ‘fix’ the fragmented storage issue, we had to ‘repair’ the DB. This knocked our instance offline for hours. Many hours. Time. The standard fix for this is to spin up another instance (money), make it part of a replication set, repair one, let it catch up, then repair the other. Time.

The final issue I had with Mongo was when I attempted to shard our setup. We were using a 67GB (quad-x-large memory) instance for our Mongo setup. I got advice from some savvy Mongo users to ‘just shard it.’ That made it sound so trivial. So I did. I figured we could go for 16GB shards and add a new one when we got heavy, and yank out one if we got light. I liked the idea of being able to save more money, and flexibly respond to requirements. So I set up a set of four shards – three “shardmasters” which coordinated metadata, and one ‘dumb shard’ which just stored data and synced up to the metadata servers. I blew the first time to get the config right. Whoops. I did it again, and this time, I did it right. I picked a shard key – not an ideal one, but one that would, over time, roughly evenly distribute data across all of our shards, while maintaining some level of locality for the most likely operations. I ran a test – it’s really nice to do in the JS console, I must say. I ran a for-loop to insert 10 million objects of garbage data, with a Modulo-10 of ‘i’ to simulate the distributions of the shard keys. I watched, in a separate console, as it threw data on one shard, then started migrating data from one shard to the others. It worked enormously well. So I yanked my test data, then we put production data on the thing.

It worked fine for a few days. The Mongo filled up a shard and blew up. It was a pretty huge, horrible catastrophe. It was hard for me to troubleshoot what, exactly, happened – but it looked like no data went onto any shard other than the first.

Now, I *was* using an ObjectId() as the shard key. Not the object’s _id, but the object_id of a related table. One that was nice and chunky – didn’t change very much except for every few hundred thousand records or so. It’s possible that I needed not to use a shard key that is an increasing ObjectId. It’s possible that switching from an integer going from 0-9 over to an ObjectId that increments somehow messed me up. I tried to figure out what happened, after the fact, and got similarly nowhere. I also checked documentation to see if I had done something wrong. While I wasn’t able to find anything definitive, there was mention about using an ObjectId as your shard key possibly throwing all traffic to just one shard. For our purposes, that would’ve been fine *if* the other ‘chunks’ of data got migrated off that shard, on to somewhere else. They didn’t. This whole ordeal cost us loads and loads of time. Again, I’m perfectly willing to take the blame for having done something wrong – but if so, then there’s something missing in the documentation.

So that was a complete nightmare. But it’s still not a technology I would discount – I can imagine specific use-cases where it might fit nicely. And it sure is pleasant to work with, as an admin and as a developer. But I’d far rather use something like DynamoDB (which seems very interesting), or something like Cassandra (which I’ve been following closely, but I have not yet put into production). In the meantime, I still use a lot of MySQL. And it definitely shows its age, and isn’t always pleasant, but generally does not surprise me.

IPv6

So, two things about IPv6 – first, a little bit about how to do it if you’re all Mac’ed up like me, and then, a little rant.

The easiest way to get IPv6 working it is to grab a copy of Miredo for OS X. This lets your mac, pretty much automagically, get a connection to the IPv6 Internet via an IPv4 tunnel anywhere that you have IPv4 connectivity. It’s nearly painless, and at that point, you can start to at least do some basic playing around with IPv6 stuff. I enabled IPv6 on my home network, but I still have Miredo installed but deactivated if for some reason I wanted to use it when I’m at a coffee shop or some other network.

The good way to do it is to go to tunnelbroker.net and sign up (it’s free!). Then configure your Airport Extreme to do tunneling by following these directions. Voila. Now you have IPv6 connectivity to the intarwebs…or the ip6ernet. Whatever.

The best way to do it – and I haven’t done it this way – is to actually get IPv6 connectivity from your ISP – no tunneling or anything, just native connectivity. I can’t do this because Time Warner doesn’t give me that, or maybe my Airport isn’t so good at doing that. I don’t really know.

So far, the one thing I can see here is that you could begin to use this IPv6 connectivity to work around the general destruction of the internet any-to-any principle – the idea that any IP address on the internet should be able to contact any other. This is basically no longer the case, as many people use RFC1918 addresses behind NAT to conserve IP addresses (and also there are some positive security implications). So my computer at 10.0.1.2 can’t necessarily talk directly to your computer at 192.168.1.2 (or, even worse, your computer at 10.0.1.2 but behind your NAT, and not mine). The way we work around this type of things is all kinds of magical firewall port-mapping and other such things. It’s a pain in the butt. Services like AIM’s ability to send files, or various screensharing utilities all now require some kind of centralized server that everyone can connect to because just about every network-connected computer tends to be behind a NAT. That centralization is unfortunate, and a drain on services that really should just be about connecting anyone to anyone.

But if you have IPv6 set up in the ‘good’ way listed above (or ‘better’), you actually have a new option. You can un-check “block incoming IPv6 connections” on your Airport, and now have access to anything in your network that speaks IPv6 from the outside world (again, so long as the outside world is IPv6). Of course, big security implications here, but that could actually be a way of making IPv6 somewhat (remotely) useful. Things that like this type of connectivity might be: BitTorrent-esque things…peer-to-peer video applications…some kinda of home-hosting things…I dunno. That type of stuff. But, in short, while at Starbucks, I could fire up my Miredo-for-OS X client, and connect to various things in my home. That could be useful for some people.

My experience with this new setup is rather underwhelming. I can go to ipv6.google.com. I guess on World IPv6 day I’ll be able to…somehow…enjoy some festivities or something. I don’t really have any home servers nowadays.

<Begin Rant>

Who the fuck came up with this stupid-ass migration plan? It has to be one of the dumbest things I have ever seen. IPv6 the protocol is OK (at best)…it really feels pretty close to IPv4, except with a bigger address space. OK, I guess. DJB (who is brilliant, but I think may be batshit insane) sums up the problem really well.

In short, there’s negligible benefit for going to IPv6. You can’t really get anywhere you couldn’t get to anyways. If IPv6 had been designed to interoperate with IPv4, we would be far closer to being in a happy IPv6 world – think about how many machines are dual-stacked right now? Those machines would instead be single-stacked, and some early adopters, or price conscious people (think: Web startup types who like to skip vowels in their domain names) might be able to start offering IPv6 only services, and would be able to start hitting users right now. But, no. The migration scheme seems to be:

  1. Migrate everyone and everything to IPv6 now

And you’re done! Isn’t that easy? The standard has been out for a bajillion years. The IPv4 shortage has been a problem for a bajillion years. And we’re still here. Not because the protocol for IPv6 is flawed, but because there’s no migration scheme at all. There’s no backwards compatibility. This whole infrastructure has to layer over the entire internet. Who the hell thought this was a good idea? I mean, sure, it’s “simpler”, protocol-wise, to do that…but a few more years of protocol engineering instead and a true backwards-compatible solution and we would’ve had people switching ages ago. Go look at how many transition mechanisms are in place for IPv4-to-IPv6. It’s stupid. That alone indicates the level of FAIL that is likely here.

The other problem I have with IPv6 has to do with routing tables. And protocol stacks. Right now, to do any non-trivial amount of TCP/IP networking (let’s imagine HTTP for this example), you need:

  • DNS
  • some kind of routing protocol has to be working right
  • ARP to figure out how to get to your local endpoint
  • DHCP to figure out what your IP address is going to be

Network troubleshooting ends up being an interesting and non-trivial problem of figuring out who can ping who (whom? Grammar fail. Sorry), what routing tables look like on various intermediate devices, what IP address you get from DNS, is your DNS server working, etc, etc. It’s a muddle, but it’s a muddle that’s been treating us well on this whacky internet of ours.

However, in the IPv6 world, we now have – the entire protocol stack for IPv4, PLUS a protocol stack for IPv6, and a crazy autotunneling doodad with a weird anycast IPv4 address (oh, that’ll be fun). And a routing table that is exploding out of control. I mean, my dinky little home network (theoretically) gets a /64 network. If every Time Warner customer gets a /64 – and there’s not some means of aggregating routes together – the routing table completely goes insane. Now I’d assume that TW would aggregate its customers into a /48 or something – god, I hope so! But still, we’re talking about a world where there are networks all over the place.

There’s a big question as to whether or not people ought to get provider-independent network addresses or not. I think I know the answer to this: No, they should not. It’s suicide. I think the real solution for this is at the DNS level – you should get addresses that correspond to your rough physical place on the internet to keep the routing tables somewhat simple, and if you want to move endpoints around, you change DNS entries. Get away from thinking of IP’s as static. If DNS were baked deeper into the protocol stack, this could work extremely well. Want to have a webserver at www.whatever.com? If you have some kind of authorization, your webserver would come up and use some kind of key exchange to somehow tell DNS that it is www.whatever.com. If you move, you just move your webserver. Your keys still work. If you set up a webserver in your house – same thing. Anyways, that’s just hand-waving. There still would have to be some way of bootstrapping that (like, what IP address do I contact the webserver at? Maybe you find that out by talking to your local gateway? Dunno)

<End Rant>

I guess that 1) wasn’t a little rant and 2) was a little rambly. So sue me.

Time Warner is a bunch of poopfaces, especially “DP Loss Prevention”

So this is one of those blog posts where I rant and whine and complain about how some service provider done me wrong. If you dislike those posts, feel free to wait for the next one. This one will come off as extremely whiny. You have been warned.

Fucking Time Warner. You assholes. Especially Dawn, from DP Loss Prevention. I hate you. You suck.

So, I work for me. I need internet to do what I do. I just moved to Astoria from my old place in Jamaica. So I cancelled my old cable internet service, turned in the modem thingee. Like a good little boy. They asked for no money from me, and said I could call them and they would tell me if I owed them anything. Fine. I of course immediately disregarded that advice – if I owed them something they’d tell me and I’d pay it. I’m not chasing them down to ask them how much I owe.

So after a week and a half of no internet, Time Warner finally shows up to install it on a Friday. Guy is nice, jams it right out in no time. Even knew his way around a Mac, had me go into the Advanced tab thing to do “renew DHCP lease” – I didn’t know that was there! Up and running. Fast, low-latency, happy times.

So, I’m literally sitting on the toilet end of day Friday, and my phone rings. I consider it rude to talk whilst I am…in such a situation, so I decline to answer. Time is now 5:31pm. I finish my business. I go listen to my voicemail. “<<remnants of talk with co-worker>>Hi <<mispronunciation of my first name>>…Uhm….<<horrific mispronunciation of my last name>>, this is Dawn from Time Warner. Please give us a call back regarding your account. 718-888-4393”. Fine. I’m an upstanding citizen, maybe I owed them something from where I was living before? Better give a call back. I do. Generic-sounding voicemail for “DP Loss Prevention” in a computerized voice. Oh well, if it’s important they’ll call me back. I ain’t leavin’ no message for some weird generic voicemail box at a random 718 number.

Saturday morning rolls around and I flip on the iPad. No internet. Hrm. Maybe a remnant from the install? Power-cycle my modem. No internet. Okay, time to call tech support. I call. The automated system picks up my phone number and says “it seems your account has been disconnected. I’ll forward you to a representative.” The rep is nice, and says I need to talk to the 718 number that was left on my phone before. Crap. I guess I better. I call back and leave a message with my phone number and explaining that if I indeed owe anything, I’d be happy to pay it.

Nothing happens. I call the main tech support number – they again explain that I need to talk to that one department. I explain that I’d love to – and pay them anything they want – but no one’s there. I discover they only work regular business days, and I’m screwed till Monday. Awesome.

Monday rolls in and out. Nothing changes. I call them again, I beg to pay them. I call DP Loss prevention and leave another message. Note that I am restraining myself from freaking the fuck out on the voicemail because I want them to fix it.

Tuesday. Nothing. I call, another voicemail left to the generic voicemail box. I call tech support, a ‘message passed to the supervisor’. I ask to speak to a supervisor, I’m told I will get a call back. At this point, I am freaking out. Did I leave my phone number correctly? Is my phone broken? Do I keep missing the call? What’s happening? I go onto a forum and ask for help, they say that there will be an escalation if nothing happens. Great. In the meanwhile, at NO POINT WHATSOEVER has ANYONE told me what the hell is going on. I guessed I must have owed them money – and it must have been quite a bit.

At 4:01pm Tuesday a familiar 718 number shows up on my phone, and behind that number is a familiar voice. My old friend, Dawn. “I understand you want to make a payment?” she asks. You shithead. I answered, “well, assuming that’s why you turned me off, if I do owe you money, yes.” I’m prepared to shell out $300 on the spot. I need my internet. Must have it.

The bombshell hits. “You owe……….$32.17.” I couldn’t help myself, I laughed. Relayed my credit card info, and my internet came up after a quick power-cycle of the modem. Unbelievable. Time Warner put me through all of this hell – well, to me it was hell – for thirty bucks. You fucking dickweeds.

So of course during this whole debacle I’ve been looking at alternatives, and those alternatives made Time Warner’s behavior completely obvious – there are none. I can pay roughly the same amount to have 1/10 the bandwidth within 2-3 weeks from Verizon? No good. I was surviving, only barely, using 3G service and that was definitely not going to be a long-term way for me to get by. RCN didn’t service my area. There was nothing I could do.

And my feeling of powerlessness might be some of the reason I wigged out so badly – I thought long and hard about telling Time Warner that, no, I’m not going to pay you anything, you pissed me off, and I want to cancel, I’m not paying anything at all. But I couldn’t – I’m stuck, needed the connection. I had all kinds of wonderful imaginary conversations, escalating until I could talk to someone’s supervisor, getting Dawn fired and permanently making changes to Time Warner’s policies…getting them to comp me all kinds of things, making a big stink. I can’t though. There’s one place I can get a usable connection from, and they are it. So I made a decision, a very very difficult one, to accept the treatment and just get the connection working.

So, in summary, this is what happened. Time Warner called me end of day Friday, left a botched and meaningless message, turned off my service late Friday night (probably early Saturday Morning), never told me why, wasn’t even around to turn it on for the next two days, and kept me down for 4 days total….over $30.

I will be happy, happy, happy to get rid of them, as soon as anything even remotely good gets available. I can easily guess what Time Warner’s thinking was when they created this group or department or whatever – “Hey, we lost $x in people moving – I want that to be <less than x>! Let’s empower this raging horrible monster woman, Dawn, and put her in a department of Pure Evil to torture our own clients!” And $x goes to less-than-x. I hate them so much I’d be happy to pay twice as much for half the service. If they had any competition at all, they would not ever dare. Is that worth $30, Time Warner? I think not.

OK. End rant. Feel much better now. Sorry about that.

Rails Documentation

Is the worst fucking thing on the planet. I’ve actually googled for stuff, clicked on it, and gone to redirecting cybersquatter pages, it’s so goddamned bad. Maybe I’m spoiled. The bulk of the professional development I’ve done has been with PHP, though I was pretty heavy into Perl, Tcl and other such languages at their time. Compared to any of them, Rails documentation is, hands down, the absolute worst.

Half the time I feel like they’re being too goddamned clever for their own good. But the ‘sensible defaults’ that they espouse aren’t documented anywhere, so how the hell am I supposed to know what they are? What seems sensible to me might not be sensible to you. I’ve found myself drilling down into source code more times than I’d like to count to try and figure out what’s going on. That is total and complete fail. It’s lucky that it’s so powerful and cool regardless, or I would’ve left it in the dust a million years ago.

Maybe I have to be more…loquacious in PHP. That’s fine. At least I know what to do and how to do it. 70-80% of the time I’m working in Rails, I have no friggin clue how to tell it how I want to do something. Then when I find out, it’s always something like – type two magic words into the right file, then Rails reads your mind. Awesome. I just hate that sickening feeling during that not-20-to-30 percent of the time. I feel helpless.

Then when you do find documentation, it’s all stories. “So here’s what active record aims to do, here’s different ways you can make it do things, blah blah blah.” I like my programming docs terse. I look it up, it tells me what that does. But the documentation, especially, just seems all jumbled together and awful. Or the other thing I’ll find is the opposite granularity – “Class Foo::Helper::Doodad::fwipple::dingus has methods ‘get’,’put’,’set’,’be’,’execute’. The source code to method ‘execute’ is: …….” That doesn’t help either. That’s why it’s called DOCUMENTATION. Not fucking SOURCE CODE. I feel like it’s some kind of ‘hipster’ framework – if they actually explained it to you, and regular unhip people “got it”, then the hip people would all switch to using Scala.

And, embarassingly enough, I only just ‘got’ the yield command in Ruby. That’s just sad, man. Though I don’t see the difference between a yield and an anonymous function, but I guess I’m just not that bright.

I assume it’s one of those things where as soon as you buy into it 100%, completely, and spend time just soaking in it, then you’ll fully understand. But I don’t like having to commit to that level of buy-in. I’ll continue to fiddle with it, and even choose it as a framework in whichever contexts it seems right for, but I’ll always look slightly askance at it – perhaps until I’ve been so steeped in it that I can’t look at it objectively anymore. But until then, fix your fucking docs Rails, it’s horrible.

Horrifically bad technology

A few years ago, I was kinda into XML. Sure, it’s bloated, but the idea that you could arbitrarily represent any kind of data in it seemed cool to me. And then – if you were to try and compose two types of data that no one had ever thought of before – you could support even that, with namespaces. You could even have two conflicting elements <foo> – by specifying which one is in which namespace – <a:foo> vs. <b:foo>. Neat. Now you can really have some nuance and power in your document. Mind you, I’ve never seen this feature used – and yet it bloats the XML specifications and implementations horribly – but it seems like it could be important. Right? Never mind the fact that just about any time you grab an XML document you probably already know exactly what it’s going to look like. Shh. You’re not thinking big enough. Here’s even an article I wrote in 2005, sad that Web Pundits were going to start moving away from XML. And here’s another one from early 2007, again complaining about the inevitable HTML5. I was totally and completely wrong. I mean, I’d like to say something like “while I still believe that blah, I have to admit that I may have been mistaken…” No. Totally. Dead. Fucking. Wrong. Maybe XML’s heart was in the right place (there! I did it! some sort of backpedaling statement!), but the devil’s in the details, and XML’s details have more devils than you can shake a stick at. Several sticks.

You see my friends (you can tell I watched the debate last night, right?), I just finished working like, maybe 10 hours straight on writing a SAML receiver in PHP for my former employer. That wouldn’t be so bad, except – I’d already written one. It worked fine. For SAML 1.0. Now I had to make it read SAML 1.1. Easy, right? Read the spec on SAML 1.1, implement the changes, all done. No. SAML assertions are XML documents. XML documents that need some kind of security thingee so that people can’t forge them or tweak them. So you need XML Digital Signatures. But XML is so crazy and fluid – you could have two documents that logically mean the same thing, but their bytes don’t match! How do you compare them? Easy, my friends! You canonicalize them using the XML Canonicalization spec(s), then you sign them. SAML 1.1 “improves” this process using a “better” method of canonicalization. If you read lots of sarcasm in my angry sarcasm-quotes, you read correctly. Back to canonicalization in a moment.

Now if we’re going to sign a document that’s XML, and since everything that has ever been of any merit at all is XML and must be XML, then our signature should be XML too. But if we’re injecting bits of XML into our document to sign it, doesn’t that change the document that we’re signing? We need some way to indicate which subset of the document corresponds to the signature, and which way corresponds to what-you’re-signing. I know, I know! How about a nice simple regex to do that! Or just a straight subset of the document – cut from here…..to here? Hahahahaha…just kidding! That’s not XML! No, we have to use XPath, a way to query for arbitrary “node-sets”. And it’s, of course, XML.

So this is the ridiculous technology stack I have to go through in order to implement this relatively simple request – “let us accept SAML assertions to do single sign-on stuff.” So of course PHP doesn’t support any of this crap – because this crap is crap. Only IBM and Sun and other Big Company Weenies implement this garbage. PHP’s a working-man’s language, it supports things that are useful or interesting. There’s some sun-sponsored SAML 2.0 stuff in the works in PHP, but we need 1.1. PHP’s XML support has historically been spotty – and I don’t blame it, the XML-approved API’s are the worst API’s ever. Ever. Well, I think I had looked once at a PHP library for DNS that may have been worse. But still, very bad. So I had to cook a lot of this stuff up myself. It sucked. And the specs are, quite frankly, just wrong. Or so grossly unclear that they could never be right. And I’m no moron – I’m a big freakin’ super genius type, and I can’t implement whatever the hell they’re talking about. So there’s no chance for lesser programmers. And because people are abandoning it in droves, there’s tons of half-implemented xml packages, and digital signature packages, and XML canonicalization packages sitting out there, in various states of disrepair and malfunction. All in different languages. I had to learn bits of Python and was on track to start trying to learn Java if I hadn’t gotten myself out of some serious holes.

Here’s some fun notes: Here’s the default XPath (make sure to capitalize that P!) that should extract a signature: <XPath xmlns:dsig="&dsig;"> count(ancestor-or-self::dsig:Signature | here()/ancestor::dsig:Signature[1]) > count(ancestor-or-self::dsig:Signature)</XPath> Oh, whoops! Except that doesn’t work. That’s just what’s in the spec. No reason it should work. Let’s expand the dsig entity – <XPath xmlns:dsig="http://www.w3.org/2000/09/xmldsig#"> count(ancestor-or-self::dsig:Signature | here()/ancestor::dsig:Signature[1]) > count(ancestor-or-self::dsig:Signature)</XPath> Uhm, nope. That “here()” function doesn’t actually exist, you see. So I gotta make my own. Fast-forward two hours or so – hell, probably more – and many, many iterations, to get: <XPath xmlns:ds="http://www.w3.org/2000/09/xmldsig#"> (//. | //@* | //namespace::*)[not(ancestor-or-self::ds:Signature)] </XPath> Now, shit, that *was* pretty obvious – I don’t know how I missed it. Say, though – maybe it’s just me, but maybe we’re using XPath in a way that wasn’t intended? You can tell by the fact that we have to grab all attributes, namespaces and tags at the start, unioning them together, then…doing I don’t really know what to them to ensure…something about their ancestry. Horrible. Really, really horrible.

XML Canonicalization was the bane of my existence when I made the SAML 1.0 receiver, and it returned with a vengeance this time. The concern is that some XML processors may shove nodes around and do stuff to your document that doesn’t change its meaning, but changes its bytestream. So we want to be able to transform the document in such a way as to make it always look the same, no matter how mangled it gets. XML Canonicalization actually fails at this, in that you can compare two logically identical douments: <a:foo xmlns:a=”http://www.foo.com”/> vs. <b:foo xmlns:b=”http://www.foo.com”/> – they don’t compare identical, but should. Even after canonicalization. But! Heaven forbid we say “screw it, let’s just say don’t muck with the data, and call it a day!” No no, that’s not the XML way! Instead you have to do all kinds of stuff. Turn empty tags into tag pairs, reorder attributes in each node, expand entities, strip some stuff, etc. And with “Exclusive XML Canonicalization” – the new-and-improved XML Canonicalization method used in SAML 1.1 – it gets even more confusing when you talk about your subset of the document and the namespace nodes that go with it. And then the spec’s wrong. And it turns out your test SAML assertion is canonicalizing using the method you already built 6 months ago, but is just calling it something else.

Sometimes the comedy of errors around all of this stuff makes me think that someone or something deliberately torpedoed it all. Perhaps Microsoft was concerned about some kind of interoperability utopia coming about, and they sent their agents to agitate for namespaces and xpath and xml signatures and enveloping and so on. Who knows.

If you ever find yourself in this unenviable position, first off, get xmlstarlet. If you don’t, you’ll never have anything to compare your own work to. I only got it late on in the process, and most of my real progress was after I got it. It requires libxml2, and libxslt. They’re handy to have around, though you may already have them. Once you’ve got that, read the specs very fuzzily. They’re not quite right, and Real Life trumps specs anytime. The end result is that it was not fun, at all. Very fulfilling in the end, when I finally see the message that the assertion’s digest and signature are ok, but not at all fun. And not code you want to show your mom. I don’t imagine myself working with this awful crap for quite a while again – or so I hope.

It’s funny (and It’s 2am, and I’ve drunk some Pepsi MAX, so I’m a little wired, so please indulge me) that you can see that there are any number of New Hip Cool technologies that start getting pushed really hard by companies, and end up being useful for some things, but not the panacea that they’re supposed to be. And you know how you can tell which technologies will end up being snake-oil? Look for the ones that claim they’ll end up powering a refrigerator that can automatically order milk when you’re running low. They’ve been saying that shit since “HTTP Push technology” was the exciting hip technology that was going to change the world. Let’s see, I’m sure I’m missing some, but the ‘hip technology that isn’t actually good’ list that I can remember would be…remote procedure calls…object oriented programming…then remote method invokes…client-server…Web 0.9 (everyone needs a single, static web page! Hosted on http://www.whateverhost.com/~companyname)…Push Technology…Web 1.0 is around there, oh I know B2C….B2B…Java…XML…Web 2.0…Y’know what it is now? Virtualization. It’s got its uses, sure. But having one big box and virtualizing a whole bunch of little boxes in it means you still have to manage a whole bunch of little boxes – they just live in a big one. Actual consolidation is better – moving a whole bunch of related functions onto one big box. The idea that you can move around the images is definitely neat, and over time, we always reduce our attachment to the bare metal of our computers – virtual memory, virtual volumes (logical volumes in Windows and Linux), why not virtualize the machine too? I just don’t see it as a cure for all ailments, and it does increase single points of failure (unless you do it right, but most don’t). Okay, now I’m getting legitimately tired, I’m going to bed.

Dive into Mark. Ruby on Rails.

I enjoy reading Dive into Mark. It’s a good blog.

Sometimes I agree with him. Sometimes I agree wit him, but disagree with how he says something. And sometimes, not only do I disagree with something he’s said but I disagree with how he’s said it.

This is one of the latter times.

The level of ad Hominem attacks in the article is off the scale.

Let me distill it down for you. Ruby focuses on ease of development. It does not focus on performance. They’re talking about a site there that has gotten itself to 11,000 requests per second. That’s a lot. That’s a fucking lot. And if you have that, you’re going to be doing some tuning and tweaking and who knows what. I know this from experience. But some dipshit started whining that it was Rails’ fault. Well, maybe it is, maybe it isn’t. I don’t think there’s any kind of web development framework that exists today that you can scale up to that level and not notice some kind of performance degradation. Hell, even if it’s in your own code. I have scaled applications up factors of a hundred times on some of the highest-performing web application servers in existence, and hell yes it strained my database, and hell yes I had to optimize stuff all over the place.

These web-dev guys wouldn’t even have a site were it not for Rails – they’d still be pounding on whatever other language they were working in. Then they certainly wouldn’t have these scaling problems, because they wouldn’t fucking exist! The developer doing the whining isn’t actually whining that hard – he’s talking about what is going on with him, and DHH, if he were smarter, would’ve shut the fuck up. But he didn’t. I mean, seriously, you’re the largest fucking Rails site on the web, and you have scaling problems? OF FUCKING COURSE!!! You’re the largest $BLAH site on the web, you do anything performey, and you will probably be dealing with performance issues. For any value of $BLAH.

It’s all so stupid. Webdude guy answered questions he was asked, DHH stupidly did some kind of counter-spin, and Mark, even more stupidly, is doing counter-counter spin because he’s a Python weenie.

You’re all morons. Die.

Thank you, and have a nice day.

Shitness of Windows

Every time I try to do anything interesting or nontrivial with Windows, I get let down. I feel like it has to be due to my personal ignorance of the environment. But my problem is, the more I learn about it, the less I feel like it’s ignorance and the more I feel like it’s actual, practical experience that tells me so.

For example, I have a busted windows box with a dying HD. I install ‘Doze on the “D” drive – the second IDE disk. I use this copy of windows to try to copy off the data from the C drive. It takes a long time – the disk is trying over and over and clicking and warming up and all kinds of terribleness – but eventually I get some data off it. So great. I try to do some actual work on this new install of windows and I feel like it keeps trying to look at the C drive, so I figure I’ll unmount it. Can’t. It’s a “boot” drive, even though the D drive is my ‘system’ drive. Well, fine, so I pull my dying C drive (I figure I got what I could off of it anyway). System won’t boot. Oh well, I guess I need a bootsector rewrite or something. Fine. Windows CD, recovery console, fix it…and it can’t find my windows. I might as well not have it installed. Never mind the fact that every single file that Windows should rightfully need is right there. But Recovery console can’t find anything. So I have to reinstall windows. And if I try to reinstall it right back to where it was before, it threatens to toss my files (including my recovered server volume). So I have to put it somewhere else.

Mind you, on a Linux box or a Mac, I could do this in 2 seconds. In linux, you re-lilo (showing my age there) or re-grub the disk, and that gets you a new bootable volume (BIOS permitting, HD sizes, etc, etc). On the Mac…I don’t even think you have to do that because the firmware is a bit smarter about locating disks and booting off of them. You can either use the Startup disk control panel from a CD or even hold Option during boot and it will let you pick which volume to boot from. Easy.

So now as I’m running through the Windows install I thought I would document the reboots. My policy is I install everything on the ‘express’ list.

  • When it switches from textmode to graphics mode. (OK, this one barely counts, but it is, strictly speaking, a reboot. I saw BIOS. It rebooted.)
  • When graphics mode completes, system installed.
  • New version of Windows Update requires reboot
  • Service Pack
  • 52 updates
  • IE 6
  • 9 more updates post IE 6
  • .NET 2.0 plus video driver
  • Whoops! two security updates for .NET 2.0 – edit – this did NOT require a reboot!

That’s fucking shit. What should take minutes takes hours.

Mac OS X Server

I am a huge fan of the Mac. I have been using them since the Mac Plus running – oh, I dunno, it was before system 6 and MultiFinder and all that. I’ve dabbled in PC’s, and am pretty good with them, but I love Macs. And I like Unix machines a lot too. I learned Linux In the days before the kernel was 1.0 – it was in the 0.9’s or something, I don’t remember. I ran Slackware in those days. Ah, the good ole days.

So I try to consider myself platform-agnostic. I can tell you now some things I really like about Windows boxes. Among which was the generally snappiness of them. I was running a Win2k box for a while (to help force me to test one of our application’s bugginess and behavior under the Dreaded Internet Explorer). It made me really envious. And, so long as I didn’t mess around with it too much, it performed well – especially so for a box with such low specs (as it was, I think I blogged about it before).

As such, it pains me terribly to say I fucking despise Mac OS X Server. I’m sorry, but Apple has completely blown it with this product. I don’t doubt that they are fine if you just do file and print, but this isn’t a Windows server, it’s a Mac server – it’s got Unix stuff in it – why can’t I make it do a whole bunch of things? And the answer is, because it is shit.

The number of individual problems I’ve had on OS X Server is too numerous to count. The stupid management applications crashing on me, or their effects not ‘kicking in’, or the fact that you can’t migrate NetInfo accounts to LDAP accounts, any number of things. The GUI ends up being obtuse and incomprehensible, and the command line is even more painful than that. I’ve always theorized that when you try and put a nice shiny GUI on top of an ugly (but efficient, and flexible) command line, the end result is always a terrible mishmash. I was hoping to be proved wrong with OS X server. And I have not been. The file system is shit. The Mail server is garbage. The web server – oh! the web server! – I have never seen Apache be so terribly crippled. I had to crawl around in config files and XML files for hours to repair our server, once. Awful. And we’ve taken explicit, careful pains to never mess with the command line or any binaries or anything – after all, it’s an OS X server, and we’re trying to do things the OS X way. What a disappointment.

Today, for example, I’m trying to set up a co-workers account so he can do SSH authentication to the server to run some simple SSH commands (having to do with Subversion, a version control system I’d like to switch us over to from CVS). I go into the management app, I go to my coworkers account, I see he has no ‘home’ set. That’s fine, he is not an SSH man, himself. So I try and set him one. Crash. I try and read mine so I can compare. Crash. I try and look at it again, it’s not one box for ‘what is your home’ it’s three boxes, and I can’t figure out what is what. And I’m not stupid. And any time I try and do anything to it, crash. What a fucking mess.

Now, don’t think this means I have any like for Windows servers. Because they’re just as bad – though possibly a bit less so, since they don’t have to do the “Shiny GUI to shitty command line translation” that OS X has to. To enable RIP routing on a FreeBSD box? (Mind you, I don’t know FreeBSD that well). set it to ‘yes’ in the conf file, and then launch it. Boom. New routes in routing table. Try to enable RIP on Win2k3 server? You have to enable routing and remote access (Telephony! What the fuck!), then add RIP for an interface, then all kinds of stuff – then all my route metrics are all freaky and inexplicably huge, until I find out that Windows is randomly mangling my route metrics based on interface speed (NB – win2k did not do that). What kind of lame-ass bullshit IS this? IIS also enjoys baffling and frustrating me and anyone else who is cursed with it.

Unix boxes, however, are mean. They just aren’t nice or friendly at all. Totally unapproachable. Compare an airplane cockpit with the driver’s seat in a car. The car you might be able to work out yourself, by playing with it. The airplane, you will not. There are 50,000 gajillion little controls for things. The car emphasizes just a few, and will let you get around. So the end result is it takes FOREVER to figure out what you’re doing on a Unix box, until you start getting the Zen of how it works and what its design are. You can do a hell of a lot on a Windows server or a Mac server without knowing what the hell you’re doing. And that has its advantages. And its disadvantages, when you mess with something you don’t understand and unleash unholy hell upon yourself. And Unix boxes will not only let you shoot yourself in the foot, they will load the gun and point it right at your foot, and take the safety off, and not say a word about it. Like I said, mean.

I do know this – any time I get to spec out or make any new server or computer for anyone that I have control over, it won’t be a Windows or Mac server. Maybe if it was something for simple file-and-print, and some email, I might. But not for anything nontrivial.

I should mention – it could be an issue with my own personal comfort level with these machines. I mean, I know Unix boxen pretty well, maybe not as well as I know the Windows and Mac boxen. But every time someone needs to refer to someone who knows more about these things, they always get referred to me. So it’s sad, but maybe I *do* know as much about these things as I do about the other. Because I assume there’s some bias in my knowledge here. But the scary thing is, there might not be.