Brady's Blog – Page 2

Some actual things you could do for gun control stuff

Most people would probably agree to some formation of the following:

Someone ought to be able to own a firearm to protect their home and family. And all of these horrible shootings that keep happening are awful, and we should try to prevent them from happening. We can’t stop them all, but we can at least try to make it more difficult for them to happen.

And if you do agree with that, here’s my proposal:

Background-checks. Yes, even for private sales. In an era of $100 smart-phones, there’s a way to do it. When you sell a car, someone has to fill in or file a registration. It’s not unreasonable.

One-way database. In the same way you can have a ‘hash function’ which can map from source data to a hash value (But *not* backwards!), you should be able to map from serial numbers to people. But not the other way around. If you really need to see if someone has a firearm, you can get a warrant to search their house. But if a gun is used in a crime, and the serial number can be read off it, we need to be able to figure out who that gun belongs to. I would want to appoint some kind of privacy advocate to protect this data as well. The idea of the cops running around after Hurricane Katrina happened, confiscating legal firearms of civilians is something that should be prevented from even happening, and made even more terribly illegal than it was already.

Providing a firearm to someone who then commits a crime means you are an accessory. And should be criminally charged. Improper storage or securing of ones firearm(s) which are then used in crime mean you are negligent, or possibly an accessory.

And yes, that means if you ‘lose’ or have your firearm stolen, you need to report it. And that means if you didn’t secure it you can be charged. And that means you need to check in on your firearm every, say, 6 months or so – no saying “Oh, I forgot I had it! Haven’t looked at where I keep it in a while…”

That also means when you’re doing tearful press conferences about how no one knew that your kid could go shoot up a school, it’s likely you’ll be wearing prison-orange. Because you probably weren’t properly storing your firearms. Because if you were, maybe your kid would’ve had a harder time shooting up that school?

As for the definitions of what these things are? (How would a ‘household firearm’ work? What does ‘properly stored’ entail? Etc.) I don’t know and I think that’s probably an important place for us to get to. But let’s start a conversation here.

What about firearm types? I don’t care about that. AR-15 or AK-47 or simple 9mm Glock. Honestly, that’s the wrong road to go down. The right road to go down is stopping people who shouldn’t be able to buy guns managing from buying or acquiring them somehow. And making gun-owners responsible for secure and safe storage of their firearms. That, honestly, is not unreasonable.

Running Riak in Classic-EC2 (in a non-moronic way)

So I’ve been playing around with Riak lately – and it is a spectacular piece of software. It has a lot of the cool clustering goodness that you might get from something like Cassandra – but is a lot less of a pain in the ass to administer and deal with. Interacting with the database in an ad-hoc fashion is actually delightful – you can use lovely little REST URL’s like: http://my-riak-lb.something.com/riak/bucketname/key to fetch one particular key. Riak doesn’t care what you store there – you could certainly throw a JSON blob in there, but you can throw whatever else you might like too. Want a secondary index? Make sure to use a 2i-capable backend (Leveldb, or others), then declare what values you want indexed when you write your data. Riak doesn’t load-balance over your cluster instances, but there are a ton of perfectly good load balancer solutions that already exist that work great with it. And if you need something tighter/faster than HTTP REST URL’s, you can use their protocol buffers interface for something that’s more socket-ey. If you’re into that kind of stuff, check it out. I’m really excited about it.

But be wary if you’re running it in Classic AWS (though their advice on running in VPC seems solid). The advice they give on their documentation website is terrible and wrong-headed. They actually expect you to manually go in and reconfigure the nodes on your cluster if one of your instances has to reboot. I worked out a better way. Of course, I offered it back to them, but they have yet to take me up on it.

You should basically run it the same way you’d run any other long-lived server in AWS. Allocate an Elastic IP address for each node. Look up the cool “public DNS name” for each of your elastic IP’s (should look like: “ec2-1-2-3-4.compute-1.amazonaws.com”), and use either that or a nice CNAME that points to that as your host identifier. That way, when your instances inside AWS resolve the name, they get the inside IP addresses. If your instance has to reboot, just reassociate your EIP to the instance. And that’s it. Oh, and the bits in the config where you’re supposed to put your internal IP address as something to listen to? Just put 0.0.0.0, it works fine (though there are allegedly some possible problems with Riak Control and HTTPS that way; that’s what Basho tells me but I don’t really know anything about that).

And you should of course protect your Riak port the same way you’d protect any other port for any other internal-facing system. There. That’s it. I have shut down and restarted my entire cluster (forcing it to get brand-new IP addresses), and using this method it seemed to work just fine.

The method Basho proposes to handle a reboot is as follows: Stop the node. Mark it ‘down’ using another node. Rename the node to whatever your current private IP address is. Delete the ring directory. Start the node. Rejoin the cluster. Invoke the magic command “riak-admin cluster force-replace ” (Oh, I hope you remembered to keep track of what the old name was when you renamed it!) (Oh, and one more thing! Up until a few months ago that was misdocumented as “riak-admin cluster replace …” which would toss your node’s data). Plan, then commit the changes. If you like to do it that way, you are some kind of weird masochist. And if you think this is remotely a good idea, then you are not good at technology.

I got into a long back-and-forth with one of their engineers about their crazy recommendations, but he refused to budge. I even submitted them a pull request with a fixed version of the documentation – and they still haven’t merged it. Why? I have no idea. The general impression I got is that the guy I was talking to was just being obstinate. We were literally talking in circles. “Why don’t you just use VPC!?” I’m sorry dude, I can’t, because all my shit is on Classic and that’s where I’m stuck, for now. “But if you give it an elastic IP, now it’s accessible to the world!” No more or less so than if it just picks its own IP, buddy. “But you’ll be trying to talk to the outside IP!” No, those funny names resolve to inside IP’s when you’re inside of Amazon. “Well, you should just use VPC!” And so on. Literally, this dude told me to use VPC like 5 times in the course of our email exchange. When I was explaining how to use Riak in Classic-EC2.

So, yeah. Good database. Had a really nice leisurely chat with one of their sales guys, too – he seemed really cool and knowledgable and low-pressure. But this ‘evangelist’ fellow certainly makes that group seem pretty dysfunctional to me.

Minimum Wage Thoughts

The libertarian answer to the question of “should we raise the minimum wage” is always the same: “We shouldn’t even have a minimum wage.” I really like this video as an explanation of that viewpoint.

As someone who likes to flirt with libertarian thoughts on occasion, I have pondered this one for a while. My desire for people who are less well-off to get more money is certainly there, I’ll concede – but I also like the idea of getting rid of stupid regulations and letting the marketplace regulate itself (ideas about limiting CEO pay or bonuses or other stuff like that seems weird to me, for example).

I think I’ve found a way to think about the issue that actually makes sense to me:

Not raising – or even eliminating the minimum wage would make perfect sense in a perfectly functional free-market economy for labor. I will argue that #1) We don’t have one now, and #2) we aren’t going to get one.

A Free-Market Economy Example

Let’s talk about what a perfectly functional free-market economy looks like, in a place I am very familiar with: the PC hardware market. You’ve got many corporations (HP, Acer, Toshiba, Dell, Lenovo, Fujitsu, etc.), competing against each other tooth-and-nail in order to win marketshare and earn profits. While they’re fighting with each other, kicking and screaming, the consumers win – at least in terms of price. And yet the profit margins for these companies – for these product lines – are razor thin. Doesn’t that seem strange? Here we have a microcosm of an “ideal” market economy, and yet we have very tiny profits.

We can explain this strange attribute pretty simply however – let’s imagine we have two companies – “A” and “B”. They both sell similar products. Company “A” however is taking a very healthy profit on their sales. Whereas Company “B” does not. If their costs are similar, it means that Company “A” is charging more for their product – and thus Company “B” will be cheaper, and will sell more. If “A” doesn’t eventually drop their prices and reduce their profit, they will be driven out of business by Company “B”. There’s a real life example for this – Amazon.com. Jeff Bezos had famously said something like, “Your margin is my opportunity.” Meaning, if you make too much of a profit margin, Amazon will step in and make little to none, and drive you out of business.

(Note for completeness’s sake: Apple is an example of a company “A” that is not dropping prices but is doing quite well. They’re a rare exception and don’t have exactly the same product as the “B” companies, so the analogy doesn’t work for them, but I should at least mention it, especially as an Apple Guy.)

So if we can agree that that is what a competitive marketplace ought to look like – why are Corporate profits the highest that they’ve ever been?

The Minimum Wage Effects on Employment

Studies have shown that moderate increases in the minimum wage do not affect employment significantly (caveat: they’re from the left. The ones from the right say the opposite). Does that mean you can triple it all of a sudden? No. Moderate increases are what have been studied.

The math here is pretty simple – corporate profits are at an all-time high, and minimum-wage changes don’t seem to affect employment – it seems like we might not have a fully-functioning free-market for labor. What’s wrong with our assumptions?

The non-free, inelastic-demand, non-rational market of incomplete information for labor

In a completely fluid, well-functioning marketplace, a worker will always move to where the jobs are. And if they find themselves working in an industry that doesn’t have a lot of jobs, and notice that another industry does, they will make sure to retrain themselves for the new industry so they can make more money. And furthermore, a worker will always have acccess to all of this information so they can make well-informed choices. Applying any level of scrutiny to these assumptions will show they are not always valid. A worker can’t always pick up and move to the slightly higher-paying job across town, they’ve got to find an apartment, they have to drive further, the day-care center where little Bobby or Sue goes will be too far away, family lives nearby, or any number of hundreds of other reasons from trivial to the serious. A worker can’t always re-train and enter into another industry that’s higher-paying than the one they’re in – this might entail them spending time they don’t have, or spending money they can’t afford. And a worker does not have perfect information on what jobs do/do not exist and where, or how to go about them. Perfect information like that doesn’t even exist – and if it did, not everyone has broadband, a computer, access to the Internet, and the ability to use it all.

The current unemployment rate is 6.6%. Now, that doesn’t mean the number of people who want to be employed and are not is 6.6% – that number doesn’t include people who have given up and dropped out of the workforce. The best proxy we can come up with for that is the employment-population rate (Epop). That number is 58.8% right now. 58.8% of our population is currently working. The high was 63.4% in Dec. 2006 (source: Bureau of Labor Statistics). Meaning that not only is there the 6.6% of population which is unemployed and looking for work, there is another 4-ish percent who’ve given up looking for a job actively (presumably because they couldn’t find one). So if you think that you can ask your minimum-wage employer for a raise, or you can ask not to work Saturday to spend time with your kids – what do you think is going to happen? That’s right, you’re fired, and replaced with someone else. Marx’s idea of “the reserve army of the unemployed” is perfectly real. (I also think his solutions to this problem, and others, are the worst solutions to any problem, ever – but that’s not the point of this piece).

In supply and demand terms, instead of imagining that ‘labor’ is the thing that is supplied-and-demanded, let’s reverse it and look at ‘jobs’ as the thing that’s in supply or demand. A high ‘bid’ for a job means “paying more hours for it” – accepting a lower wage, and a low ‘bid’ for a job means offering only a smaller number of hours for it – thus only accepting a high wage. What does this supply-and-demand curve look like? We can certainly argue that the demand for this commodity is somewhat inelastic – people need jobs and will even take jobs far beneath their skill level and training if they’re desperate enough. And supply of this commodity, while it’s not ‘fixed’, isn’t highly dynamic. The end result is that people are going to ‘overspend’ – driving ‘prices’ up – in our reversed chart, this means that people will accept wages much lower than they deserve.

So with all of that groundwork laid out, let’s get back to that video. I disagree with it, but I still like it and think it’s well done. When you get to 3:24, I think their misconception is laid out bare – “The owner will either give Bob a raise, or a competitor will scoop him up to work in his restaurant.” Does this really ever happen in fast-food restaurants? Managers of competitor restaurants in disguise sneaking in, taking notice of more productive workers and surreptitiously offering them a job with better pay? No. No it does not. Because it doesn’t have to. The fast-food owner can fire Bob the moment he starts making noise about wanting better pay and snatch someone else up who’s unemployed. That job won’t go unfilled for long. And neither will the ‘competitors’ fast-food job go unfilled for long. Plenty of people want work.

And the next thing to note – their pretend math is all wrong. I’m fine with pretend math; but any restaurant owner who is running at such a tight margin that a minimum wage increase would make some of his workers unprofitable is not going to be in the restaurant business for long. Want some more realistic numbers? Sure – I’m making these up but they’re at least closer to realistic: $8/hr labor, a fry cook can make around a burger every 5 minutes (12 burgers an hour) – they sell for maybe 5$ each. Your average fry cook is earning you $60/hr revenue and you’re paying him $8. A bump to $10.00 is not going to break the bank. And your less-profitable cook is bringing in only $50/hr, but is still useful.

Corporations Must Maximize Profits for Shareholders

The real problem of depressed wages is at the bottom part of the worker pay-scale, but to show that the problem exists all throughout various different pay-grades – look at the anti-poaching agreements a whole bunch of silicon valley giants had. These are for engineering jobs at various tech companies, so I don’t think a single salary we’d be talking about here would be less than $100,000 – probably more like $150,000 and more. These anti-poaching agreements which were deemed anticompetitive by the Department of Justice, by the way. They were entered into to prevent Silicon Valley companies from poaching each other’s employees. Why? To keep salaries from escalating. And these are the same companies that constantly beg to increase the number of H-1B visas. Is that because they can’t fill jobs? No, it’s because they can’t fill jobs as cheaply as they want.

Now there were people like Henry Ford who deliberately paid their employees more than they needed to – in order to improve retention, get higher-quality employees, and improve employee performance. But that’s the rarity. I don’t think we can bank on everyone being as smart as he was. (He was also a antisemitic asshole, but we’re just looking at a small piece of the business side of him). Costco is another modern example of deliberately paying higher wages in exchange for heightened productivity – but it also seems to be the exception rather than the rule.

The rule will always be: A CEO must maximize the profits of a company for his or her shareholders. There can’t really be any other way that I think makes any sense. You could even argue that a CEO who did not do these kinds of things is not acting in the best interest of shareholders, and should be fired. So they will pay the minimum they feel they can for as long as they can to maximize profit – and one might argue that’s what they should do.

Conclusion

So I would argue we can assemble these bits of information together – we don’t have a free market (when it comes to labor), and we probably can’t get one. Corporations are reaping the highest profits they’ve ever had, as a fraction of labor. And they’re going to pay as little as they can for wages. So what’s the simplest, bare-minimum piece of legislation we can pass to help try to fix the problem? Crank up the minimum wage a little bit.

There are also some side-effects of extremely low-paying jobs – they end up costing the government money, in terms of food stamps or other assistance. It effectively means the government is subsidizing these extremely low-paying jobs – allowing corporations to pay less than they should for labor, and making up the difference out of its own pocket. Now of course any libertarian worth his/her salt will immediately say “there should be no such government assistance programs!” but they’re there, and I don’t think they’re going away (And those programs aren’t what this is about).

So the problem – people working hard and still being in poverty – has a lot of causes and reasons behind it, but the simplest workaround for our inefficient markets remains the same – just bump up the minimum wage a bit.

Friction

If the labor market was as competitive as the PC hardware market I described above – there would be no need for a minimum wage, and if there were one you certainly wouldn’t need to raise it. This is the view of most libertarians that I have read. But the two pieces of evidence – highest corporate profits ever, plus minimum-wage-increase doesn’t affect unemployment – really seem like a solid one-two punch. Why do they (we?) get it so wrong?

Because the libertarian viewpoint on supply-and-demand for labor is like a junior high school student’s view of physics. Sure, junior high physics isn’t really wrong – well, at speeds close to the speed of light I guess it is – it’s just simplified. But every example in that physics class starts with “assuming a perfect sphere, and a vacuum, and no gravity…”

But real physics examples deal with nonuniform bodies, friction, air-resistance, gravitational pull, non-perfect springs – and so on. And real economics problems need to deal with “friction” of their own. They don’t deal with friction, only with ‘perfect markets’ – which are probably more the exception than the rule.

Epilogue

And really at the end, what does a minimum wage mean? I would argue it’s the minimum price of human dignity. A true, pure, traditional libertarian would say that if someone wanted to work for a nickel an hour, he or she should be able to. But a place where that would actually be happening doesn’t sound to me like America. This is one of those rare occasions where I actually want the government to step in and say, “No, you can’t do that. Pay them a reasonable amount that they can live on, or figure out another business model. What you’re doing is bullshit.” If your company can’t operate without labor that costs less than the minimum wage, then your company shouldn’t exist. Most libertarians don’t believe that a person has a right to sell him or herself into slavery (only “most”) – so we are already setting a lower bound on wages, in a sense. I’m just arguing that bound can, and should, go up.

What I learned from the Defcon CTF

So if anyone follows me on Twitter, they might’ve caught that I tried the Defcon CTF challenge a week or so ago. I didn’t place on the finalist list; most of that stuff is waaaaaaay out of my league.

But the one category I thought was pretty interesting – and I ought to do well in – was the Web one. So I tried to get at least one point in that category, so I could prove to myself that I could do this type of stuff. I’m not a security guy; I’m a web developer.

The result? I got all five questions 🙂 The last one I got with just an hour to spare.

And it really got me to thinking – as a developer, ‘engineer’ or architect or whatever I am – about some of the security things that I haven’t really thought that deeply about before.

Here’s what I came up with:

#1) Don’t ever use dictionary passwords. Not even with your cl3v3r subst1tut10ns of punctuation or numbers for letters.

Why? Because I tried to brute force a password that was very strongly hashed (SHA-512). I was grinding through 3 and 4 character passwords with a custom-built script I put together. It had been running overnight and I got nothing from it.

But when the lovely and talented Snipeyhead pointed me over towards a password cracker tool, I decided to give that a shot.

The tool spat out the password I needed in probably about 60 seconds.

And the tool had a ruleset – built in to it – that allowed it to automatically test out numeric and punctuation substitutions. So your clever password that’s based on a dictionary word might get cracked – and maybe not with today’s ruleset, but definitely with tomorrow’s.

The password length is actually *less* of a big deal. Of course, if you try and brute-force a password (as I did), a longer one will take longer to force than a shorter one. But if your super duper long password is just a dictionary word – then, no, you’re still fucked.

If I were building something from scratch? I would definitely use a very strong hashing method (SHA-512? Bcrypt?) for password storage, but I would play around with different types of password requirements. If the user wants a super duper short password? Maybe it has to have lots of different types of characters. One that has just letters in it? Better be pretty damned long. Who knows? Maybe I’d just stick with what we’re doing now.

But regardless of that – if your password can be cracked with a dictionary, then you can’t use it. End of story.

(edit) And try not to expose usernames, maybe?

If you don’t know what username to dictionary-attack (or brute-force attack) – you don’t know what you’re going after. You can guess – but if you’re at least not exposing “valid username, but invalid password!” (and, holy crap, I hope you aren’t!) then you make their job just a liiiiittle bit harder. And that’s worth doing, if you can.

#2) Hashing (to prevent message alteration or tampering) and crypto (confidentiality) are completely orthogonal concepts.

If you want to hide the contents of the message, encrypt it. But someone clever can still mess with the contents a little bit. In fact, that’s just what I did in challenge number 5 🙂

If you want to ensure that a message is not altered in any way, present a hash of it (salted with a secret salty thing). This way if you change one tiny bit anywhere on the message, the resulting salt should change substantially.

So if you need both? DO BOTH. The two functions have nothing to do with each other.

#3) Error Messages in production

This one is probably the most embarrassing for me. I have, on plenty of occasions, allowed apps to go into production with full error messaging intact.

This is usually because I write shitty software that breaks. So when it does, I like to be able to quickly see what happened. So my heart’s in the right place, even if my code isn’t.

But this is a ridiculously fucking terrible fucking idea.

More than half of the challenges I was going after started to give me hints on how to break in once I was able to fingerprint what type of app they were (this year? Lots of Sinatra. Feel bad for those guys, like they got ‘picked on’ – but I guess, considering some of their bláse attitudes about security (calling hashed stuff ‘encrypted’?), maybe it’s warranted). Maybe it’s a wakeup call. Or maybe it’s just because Sinatra is fun to write? Who knows.

But, seriously – trying to figure what type of thing I was going after really took a lot of time away from trying to figure out how to break in. So the harder you can make those first couple of steps for an attacker (like me) – maybe the easier it might be to get him to go look at somebody else instead.

Windows 8 from the point of view of a Mac user

So I should mention before anything else that I use Windows 8 just for fun. I work all week on my Retina MacBook Pro in OS X, and on evenings and weekends – when I want to play World of Tanks – I reboot into Windows. That’s about it. Make money on OS X, have fun in Windows. It’s kinda like the opposite of how it was in the 90’s (working in Windows, coming home to Mac).

So today when I launched World of Tanks, I realized I was in a bit of a rut, tank-wise. Quite a bit of XP in order to get the next set of tanks. What does that mean? Windows 8 upgrade time!

I was able to do the thing all online. $200. I would’ve liked to have stepped down from my Win 7 Pro down to Win 8 Home – I mean, all I do is game in the thing. Why do I need pro? But it didn’t give me a chance. Upgrade assistant was pretty reasonable. It complained at me for not having enough space – I uninstalled some Steam games to make room, and when I flipped back to the Assistant window, it had moved along to the next screen. Not bad.

As my friend Beckley and I have been discussing – Microsoft is throwing money away not making it easier for Mac people to get Windows. And making it way too expensive. If they put up an app in the App Store (presumably with Apple’s buy-in) they could put a $99 price tag on it and make some nice money. At higher margins than OEM licenses! But they’re dumb and short-sighted about things like that. Oh well.

Anyways, it still takes quite a few reboots for the install to complete. That was unexpected – but I guess that’s me not remembering my Windows-fu. Kept having to hold down the Option (alt) key to get it to boot into Windows. It threw me into a setup assistant and I somehow managed to inadvertently hook up my Xbox account to my Windows login. Freaky, but why not.

I ended up hooking up my Metro Home screen to my Facebook account too, as well as my Xbox account. I was surprised to see Xbox avatars for all of my friends pop up – relatively easily accessible from the home screen as I clicked around. I had to download some updates and the process was not as buttery as possible, but still not bad at all. My FB friends are around too. Again, not too terrible. Hooked in two of my three Gmail accounts into the Mail app. In the end, I had a brand new view of all kinds of ‘live tiles’ giving me a newfangled view of my PC.

The Start button in Windows has always been a disaster for me – it contained a billion things I didn’t care about, and 3 that I did. And when it had a scrolling view it was even worse. And that one view where it ‘automatically’ optimized where everything went? Even more of a disaster. So in Windows 8, Metro completely replaces the Start Menu. This new one actually makes sense to me. The three things I want to find? Right there, staring me in the face. I want to move them around? Easy and obvious. The one place it consistently throws me for a loop is when I want to find something that I would normally have to dig through the lesser-used folders in the Start menu to find (Start -> Program Files->SomeStupidCompanyName->DumbUtility…). That’s where my muscle-memory gets in the way. I can’t find something, I hit the ‘command’ key (Windows key) and the Metro screen comes up -> I can’t find what I’m looking for -> I click on Desktop-> I still can’t find what I’m looking for -> hit the Windows key again…

The ‘flat’ look seems timeless to me. Reminiscent of Star Trek:TNG’s LCARS pseudo-OS. No excessive curves and embossing and rounded edges and so on. Just really clean and flat. It starts to fall down a little bit when you look at Legacy Windows things – they look a little odd – trying to be flat like Metro but inheriting all the Windows baggage. And lots of things that are buttons don’t look Buttoney. So I can imagine I will find myself in a position where I have to scrub my mouse around to see if things are clickable all the time. That will certainly cause some level of UI-fail. It’ll be even worse on a touchscreen machine.

I’m even using IE10 (!) Because it shows up better on my Retina screen. I tried Google Chrome in ‘Windows 8 mode’ and it was ok – but the text was showing up too small. IE actually is not bad. I’m completely shocked by this. The font rendering still looks a touch off to me – some parts of letters look too thin maybe? I can’t put my finger on it. But I can read the screen, and that’s a nice start.

In the end, my feelings about the look/feel of the thing? I don’t see why everyone is all up in arms. It’s actually kinda futuristic-looking, IMHO. I actually find it very pleasant. Of course, I’m a bit of a contrarian – when I first used Windows Vista it didn’t bother me as much as it seemed to bother everyone else. And when I first used Windows 7 I didn’t think it was so super amazingly awesome like everyone else did. So take that into account. And also take into account – the only things I do in Windows are: play games, test things in IE, and poke around. I don’t usually try too hard to get much work done.

But as always, the devil is in the Details. And that’s where Apple tends to really knock things out of the park, and where Microsoft tends to fumble. Especially when I click on something that’s “classic” Windows from something that’s Metro, things feel janky and weird and awkward. The tile layouts in the ‘top free’ and ‘top paid’ sections of the App store are awful and useless. I can’t search the store either; if you don’t click on one of the ‘suggested’ apps, you’re screwed. The Skype integration seemed exciting – but then it was insisting on doing some kind of account merge that I didn’t want it to do. I still can’t figure out how to add a third email account. I can’t get number of unread messages to show up in the email tiles. It’s hard to get apps to show up in the Metro panel thingee if they aren’t there already. Or if you (ahem) accidentally ‘unpin’ one (oops!). My Hipchat (Adobe Air) app was all messed up and confused and it took quite a bit of cajoling to get it so I could have a normal window on my screen. (That could be Adobe’s fault, or HipChat’s fault, or Microsoft’s fault. Not sure.) As I continue to play with it, I’m sure I’ll find more to complain about. I couldn’t figure out how to open multiple windows in the Metro version of IE10 until I was doing final edits of this article (right-clicking somewhere plain on the page seems to bring up a contextual menu?)

But in the end I think Microsoft is trying some really clever stuff here. I think this Metro stuff really is the future. Apple broke with OS X precedent when it made iOS; and I think Microsoft is trying to do the same thing here with Win8/Metro. And in the same way people were up in arms and completely freaked out when Apple first removed the Floppy drive – and then later, the CDROM – I think that’s how the typical Microsoft person is responding to Metro.

I think Microsoft has latched on to Apple’s “Halo Effect” strategy. Apple had the iPod, and the iPod “Halo” started to cause a boost in Mac sales (and lead to the iPhone and iPad). Analogously, Microsoft has the Xbox. Perhaps, from this Xbox Halo, they can start to rebuild the strength of their Windows empire? Maybe. But remember – Microsoft is adopting Apple’s strategy here. And Microsoft was very good at taking someone else’s idea or product, imitating it, iterating a few versions of it, throwing in some dubious business practices, and then coming up with something that actually starts to crush the competition. They might be trying that again here. Though they haven’t really pulled that off in a while, I think.

My bet is that Microsoft-users will continue to whine and bitch and moan about how terrible and awful Windows 8 is. And we’ll have a service pack or two come out, and then maybe some kind of interim release – and then maybe people will get used to it and move along.

I think, most importantly, that if they didn’t obsolete themselves with Metro, someone else – probably Apple – would’ve done it for them. So they really had no choice.

In half-hearted defense of PHP

Okay now there’s a big rant about PHP. These things are getting exhausting. I don’t particularly like Python, that doesn’t mean I have to make any kind of long boring post about it. It’s just a matter of my personal preference, and I know that. Plenty of people do plenty of nice things in Python and that’s great. Twisted seems like a cool idea. But I don’t like Python. And no one cares, and no one should care.

So the rant on PHP has a similar tenor to the ones about Node.js – which already starts us off on the wrong foot. However, as I read through the whole thing, I find that I can’t disagree with any one particular point. Every single one seems at least conditionally valid, and at least none of them seem deliberate designed to be maliciously false. So, points to you sir and/or m’am on your post. My rebuttal follows.

I’ve been using PHP for somewhere around 10 or 15 years or so, and I haven’t run into even half of those various weird edge-cases. I’m not saying they don’t exist. They seem like they do. But I’m saying you don’t run into them so often.

However, I do feel that the long, exhaustive list of ‘things that are wrong with PHP’ sort-of misses the point. Yes, the language has weird bugaboos about it. All languages do. No, none of them are fatal. The language is easy to pick up and be productive in. It’s easy to have a mostly-static site with the odd dynamic page here and there. Deployment is a breeze. FTP up your new files and you’re done. You don’t even have to think about it. Sites run fast. Development is easy. The documentation is excellent. I haven’t seen any other environment that comes remotely close in that regard. Just about any thing you would want to be able to do, there is a function for. It exists to make dynamic web pages, so it fits that well.

The article also rather quickly breezes over an important point. Facebook uses PHP. Why is that? The article states that it’s fine for FB because they’re huge and can ‘engineer around’ the various weaknesses of the language. That certainly begs the question as to why they would bother. Is it just legacy? Are they just dumb? I have a hard time believing any of those possibilities.

Mass hosting for websites really only works for PHP. The security gets a little dodgy, but it basically kinda works. Try that with any other webdev environment and you wont get nearly the server-density that you do in PHP. That is pretty hard to beat.

Server-side includes. Anyone remember these? This is really what PHP has supplanted. For that kind of stuff, it’s great.

Or for a one-page site. Or a static site that has one dynamic page that does something – you cannot beat PHP.

Plenty of people use it with frameworks for complex sites. If I had a really super complex site to do, I honestly don’t know whether or not I’d do it in PHP. But if I did decide to do it there, I wouldn’t feel the least bit bad about it.

Edit, reload, etc. You can edit a PHP page on a server, hit reload, and it works. No magic. Or if there is magic in that, it’s so magical that you don’t need to ever know how it works. This is quite pleasant. No dicking around with production versus development or HUP signals or restart.txt files or any of that shit. On development or production; edit your file, reload. Boom.

Server infrastructure – since PHP is most often served up via Apache, you don’t have to throw some kind of reverse proxy in front of it or anything. If you have static assets, they will be quickly served up via Apache’s static web-page serving. All for free in the default set-up for PHP.

Performance. PHP is fast. Even without an accelerator. I’ve built many, many PHP apps, and I’ve never needed to use one – the database always ends up being the bottleneck, never the web front-end. It will likely outperform most other web environments with no tuning. I *have* had to tweak the maximum number of http processes in Apache on a very high-traffic site, and I’ve also messed with the maximum PHP memory to permit per-process, but (off the top of my head) those are the only two knobs I’ve had to twiddle. And I’ve only had to do that on sites with the highest-of-traffic running on the crappiest-of-server environments.

So if you want to live in your tiny ivory tower and yammer on endlessly on pedantic points about object hierarchies and namespace pollution and function-name consistency, feel free. However, those of us who are jamming out PHP sites will not be doing that – because we will have already finished the project we’ve worked on and moved on to the next one.

Practical experience with Mongo, and why I do not like it, in terms of Money and Time

For my job, I inherited a Mongo architecture. I resolved to learn it – and it still runs to this day, ticking along quite nicely.

This is what my feelings are about the platform having actually used it in production – not on little toy projects. Our main MongoDB server is a 67 GB RAM AWS instance, with several hundred GB of EBS storage.

First, the good parts:

It’s super-duper easy to set up and administer. Pleasant to do so, in fact.

Javascript in the console is a remarkably useful thing to have handy. I’ve used it for quick proof-of-concepts and tests and whatnot – really good to have.

It’s really nice to develop against. Not having to deal with schema changes, and being able to save arbitrary column-value associations makes life really easy.

And now, the bad (from unimportant to important)

Doing anything beyond a really trivial or simplistic query in the console is surprisingly annoying:

db.tablename.find({"name": {$gt: "fred"}})

Not a dealbreaker or anything, just annoying. name > "fred" would be nicer.

The default ‘mode’ of Mongo is surprisingly unsafe. I found drivers (could be the driver’s fault, might not be Mongo’s) that return _immediately_ after you attempt a write – without even making sure the data has touched the disk. This makes me uneasy. But there are probably use cases for this kind of speed at the expense of safety. But I don’t like it. And this is opinion, so I’m saying it. There _are_ modes that you can use (and I do use) that slow down the writes to make sure they’ve at least been accepted. But, in the conventional mode, if we have a crash before the writes have been accepted by Mongo, the data is gone. This has happened to us. Usually we have failsafes in place to ensure that the data eventually gets written, but it costs us time. We’re mortal; we’re going to use the defaults we get until they don’t work for us, then we’re going to change them.

Because there is no schema, a mongo “collection” (fuck it, I’m going to call it a table) takes up more storage than it should. Our large DB would be far smaller if we defined a schema and stored it in an RDBMS. This space seems to be taken up in RAM as well as disk. This costs us more money.

MongoDB starts to perform horribly when it runs out of memory. This is by design. But it’s annoying and it costs us more money and time because we have to either archive out old data (which we do in some cases), or use a larger instance than we ought to have to (which we also do). And even if you delete entries, or even truncate a table, the amount of space used on disk remains the same (see below). More money.

MongoDB will fragment the storage its given if your data gets updated and changes size. This caused us to end up storing around 20-30 GB of data in a 60-some-odd GB instance. And then we started to exhaust RAM, and performance plummeted. So we needed to defrag it. More care and feeding (time) that I didn’t want to spend.

So to ‘fix’ the fragmented storage issue, we had to ‘repair’ the DB. This knocked our instance offline for hours. Many hours. Time. The standard fix for this is to spin up another instance (money), make it part of a replication set, repair one, let it catch up, then repair the other. Time.

The final issue I had with Mongo was when I attempted to shard our setup. We were using a 67GB (quad-x-large memory) instance for our Mongo setup. I got advice from some savvy Mongo users to ‘just shard it.’ That made it sound so trivial. So I did. I figured we could go for 16GB shards and add a new one when we got heavy, and yank out one if we got light. I liked the idea of being able to save more money, and flexibly respond to requirements. So I set up a set of four shards – three “shardmasters” which coordinated metadata, and one ‘dumb shard’ which just stored data and synced up to the metadata servers. I blew the first time to get the config right. Whoops. I did it again, and this time, I did it right. I picked a shard key – not an ideal one, but one that would, over time, roughly evenly distribute data across all of our shards, while maintaining some level of locality for the most likely operations. I ran a test – it’s really nice to do in the JS console, I must say. I ran a for-loop to insert 10 million objects of garbage data, with a Modulo-10 of ‘i’ to simulate the distributions of the shard keys. I watched, in a separate console, as it threw data on one shard, then started migrating data from one shard to the others. It worked enormously well. So I yanked my test data, then we put production data on the thing.

It worked fine for a few days. The Mongo filled up a shard and blew up. It was a pretty huge, horrible catastrophe. It was hard for me to troubleshoot what, exactly, happened – but it looked like no data went onto any shard other than the first.

Now, I *was* using an ObjectId() as the shard key. Not the object’s _id, but the object_id of a related table. One that was nice and chunky – didn’t change very much except for every few hundred thousand records or so. It’s possible that I needed not to use a shard key that is an increasing ObjectId. It’s possible that switching from an integer going from 0-9 over to an ObjectId that increments somehow messed me up. I tried to figure out what happened, after the fact, and got similarly nowhere. I also checked documentation to see if I had done something wrong. While I wasn’t able to find anything definitive, there was mention about using an ObjectId as your shard key possibly throwing all traffic to just one shard. For our purposes, that would’ve been fine *if* the other ‘chunks’ of data got migrated off that shard, on to somewhere else. They didn’t. This whole ordeal cost us loads and loads of time. Again, I’m perfectly willing to take the blame for having done something wrong – but if so, then there’s something missing in the documentation.

So that was a complete nightmare. But it’s still not a technology I would discount – I can imagine specific use-cases where it might fit nicely. And it sure is pleasant to work with, as an admin and as a developer. But I’d far rather use something like DynamoDB (which seems very interesting), or something like Cassandra (which I’ve been following closely, but I have not yet put into production). In the meantime, I still use a lot of MySQL. And it definitely shows its age, and isn’t always pleasant, but generally does not surprise me.

Amazon cloud-init – customizing EBS-backed Amazon Linux AMI’s

EDIT – No, not even this works. I feel like I’m losing my mind.

EDIT 2 – Oh, apparently you *have* to specify the boot kernel. Have to. Can’t use “use default” as I have been for, like, ever. Ugh. Angry.

I just blew a horrible amount of time on this. I’ve burned many an AMI – based on ephemeral store and EBS-backed volumes. But trying to do it ‘right’ – with programmable private keys and whatnot – seemed to be out of my grasp, at least when using Amazon’s own Linux distro.

If you try to customize Amazon Linux you will find that some things that are normally done by cloud-init don’t seem to work on your image. Namely, setting ssh keys. It works fine when you first boot the pristine Amazon image, but when you try to burn your own it won’t seem to set the ssh keys properly.

To set them, make sure you blow out the contents of /var/lib/cloud/ – and both /root/.ssh/authorized_keys as well as /home/ec2-user/.ssh/authorized_keys. They’ll get reset on next boot.

This isn’t documented anywhere and I basically had to dick around with strace and flipping through all of the python code to figure out that there’s a semaphore file in /var/lib/cloud/sem that gets set and then the ssh-setting-script at boot will never run again. It makes me angry – but maybe that’s Amazon’s point; they don’t want you to customize their image so they can save on EBS volume space. I don’t know. Pisses me off and wastes my time for sure though.

You would think that at least when I try to run stuff by hand it would say “Oh, hey, there’s a semaphore file right here – make sure to yank it if you really want to run your scripts again.” Not this silent no-message bullshit.

ARGH.

-B.

Follow up on Amazon Elastic Load Balancers and multi-AZ configuration

I got a really good comment on my blog a day or so ago from a guy by the name of Mark Rose (that’s the only link I have for him, sorry!) He mentioned that AWS multi-AZ load-balancing happens via DNS – which intrigued me – so I thought I’d mess with my test load balancer and see.

He explained that each AZ gets its own DNS entry when you look up the load balancer – and that meshes exactly with what I’m getting. I do the DNS lookup for the LB, and get two IP addresses right now – and I’m assuming that each one corresponds to one of the LB’s.

But Amazon does some interesting DNS stuff – for instance, if you look up one of your ‘public DNS names’ of your instances from the _outside_, you get the instance’s outside IP address. But if you look it up from the _inside_, you get the inside IP. I use this for configuration settings, when I want an instance to have a relatively-static internal IP. Instead of trying to pin down the IP, I set up an elastic IP for the instance, and use the permanent public DNS name for that IP as the _internal_ hostname for the instance. This way, if the instance reboots, I just have to make sure that the elastic IP address I’ve configured is still associated with it, and everything still works internally.

I assume that traffic to the inside IP address is faster than bouncing back outside to the public address, then going back inside. I definitely know that it is cheaper – you don’t pay for internal network traffic, only external.

So my question is – what does it look like when you try to resolve the load balancer’s DNS name from the _inside_ of Amazon AWS? Do you get the same outside IP addresses, or do you get internal ones instead? Since it seemed like AWS traffic ‘tends’ to be directed back to the same AZ it originated from, I expect to get different answers.

So here’s what I did. Set up an ELB with two AZ’s – us-east-1a and us-east-1e. I installed apache and launched it on both. As soon as the ELB registered the instances as ‘up’, I did a DNS lookup from the outside to see what it resolved to.

I got exactly two addresses – I’m assuming one points to one AZ, one to another.

Then, I tried to resolve the same ELB DNS name from the _inside_. Weirdly enough, I *still* get both (outside) IP addresses! I didn’t expect that.

So now, I wonder, is there anything to ‘bias’ the traffic to one AZ or another? Or is it just the vagaries of DNS round-robin that have been affecting me?

I changed the home pages on both apaches to report which AZ they’re in. I then browsed-to, and curl’ed, the ELB name. The results were surprisingly ‘sticky’ – on the browser, I kept seeming to hit ‘1-a’. On curl, I seemed to keep hitting 1-e.

What if I specifically direct my connections to one IP or another? Let’s see.

Since the ELB IP addresses seem to correspond, one-to-one, with AZ’s, I thought I would be able to curl each one. I did, and consistently got the same AZ for each IP. One seems to be strongly associated to 1-a, and one to 1-e.

So it seems the coarseness of the multi-AZ ELB load-balancing can be fully explained by the coarseness of using round-robin DNS to implement it.

Something else to note – it seems like the DNS entries *only* have 60 second lifetimes. With well-behaved DNS clients (of which I will bet there are depressingly few), you should at *least* be able to end up changing the AZ you’re banging into every 60 seconds. However, in my testing – brief though it may be – it seems to stay pretty ‘sticky’.

So what does this mean? I dunno – I feel like I want to do multi-AZ setups in AWS even less now. round-robin DNS is old-school, but at large enough scales does generally work. Though I wonder if heavily-hit web services API’s like the ones my company provides fit will enough into that framework? I’m not sure.

Session stickiness and multi-AZ setups

Another question – how does this affect ‘stickiness’? You can set an LB to provide sticky session support – but with this IP address shenaniganry, how can that possibly work?

Well, weirdly enough – it actually does.

I set an Amazon load-balancer-provided stickiness policy on my test LB. I curl’ed the name, and got the cookie. I then curl’ed the individual IP addresses for the load balancer, with that cookie set. And now, no matter which IP I hit, I keep getting the results from the same back-end server. So session-stickiness *does* break-through the load-balancer’s IP-address-to-AZ associations, to always keep hitting the same back-end server.

I wonder, what does the AWS-provided cookie actually look like? It seems like Hex, so let me see if I can decipher it.

Since I don’t know if anything scary is encoded therein, I won’t post my cookie here, but when I tried to decipher it, I just got a bunch of binary gobbeldygook. Stayed consistent from request-to-request (maybe modulo time, not sure), so probably just encodes an AZ and/or an instance ID (and maybe time).

Implications

So since AWS exposes some of the internal implementation details of your load-balancer setups, what does this mean? It certainly does imply that you can lower the bar for DoS’ing a website that’s ELB-hosted by just picking one of the ELB IP’s and slamming it. For a two-AZ example – as opposed to having to generate 2x traffic to overwhelm a site, you can just pick one IP and hit that one with 1x and have the site go half-down from it.

Considering the issues I’ve actually run into from having autoscaling groups that won’t scale because only one AZ is overwhelmed, I wonder if it makes sense to only have autoscaling groups that span a single AZ?

And it also seems to imply that you can DoS an individual server by hitting it with a session-cookie that requires it to always hit the same back-end server. So perhaps, for high-performance environments, it makes sense to stick with shared-nothing architectures and *not* do any kind of session-based stickiness?

RightScale-to-Native Amazon Web Services (AWS) Name Synchronizer

At my company, we use RightScale for a lot of our Amazon Web Services management. It’s a pretty neat service – sort of “training wheels” for the cloud. Still provides us a lot of value.

But sometimes I like to log directly into the AWS console. Especially to find out when Amazon has scheduled reboots of our servers. Before I wrote this script, I would log in to find a whole bunch of instances running with no names. Then I’d have to go look them up in RightScale. Why can’t RightScale just name your Amazon instances with the right names?!

Well, I finally took matters into my own hands and built the following script. It walks through all of your RightScale servers, and finds the associated Amazon instances and sets their name attributes to the RightScale “nicknames.”

And I got permission from my job to make it available to the public – so here it is:

https://github.com/uberbrady/RightScaleNameSynchronizer

Yes, it is not the prettiest code I have ever written, but it does the trick. If someone wants to make it prettier I am definitely open to pull requests.

One thing I have noticed is that when you ‘relaunch’ a RightScale instance, the new instance will come up without an AWS name. If you re-run the script that will fix that. Also, if you use any RightScale arrays, the same thing can happen during scale-up/scale-down events.