Practical experience with Mongo, and why I do not like it, in terms of Money and Time

For my job, I inherited a Mongo architecture. I resolved to learn it – and it still runs to this day, ticking along quite nicely.

This is what my feelings are about the platform having actually used it in production – not on little toy projects. Our main MongoDB server is a 67 GB RAM AWS instance, with several hundred GB of EBS storage.

First, the good parts:

It’s super-duper easy to set up and administer. Pleasant to do so, in fact.

Javascript in the console is a remarkably useful thing to have handy. I’ve used it for quick proof-of-concepts and tests and whatnot – really good to have.

It’s really nice to develop against. Not having to deal with schema changes, and being able to save arbitrary column-value associations makes life really easy.

And now, the bad (from unimportant to important)

Doing anything beyond a really trivial or simplistic query in the console is surprisingly annoying:

db.tablename.find({"name": {$gt: "fred"}})

Not a dealbreaker or anything, just annoying. name > "fred" would be nicer.

The default ‘mode’ of Mongo is surprisingly unsafe. I found drivers (could be the driver’s fault, might not be Mongo’s) that return _immediately_ after you attempt a write – without even making sure the data has touched the disk. This makes me uneasy. But there are probably use cases for this kind of speed at the expense of safety. But I don’t like it. And this is opinion, so I’m saying it. There _are_ modes that you can use (and I do use) that slow down the writes to make sure they’ve at least been accepted. But, in the conventional mode, if we have a crash before the writes have been accepted by Mongo, the data is gone. This has happened to us. Usually we have failsafes in place to ensure that the data eventually gets written, but it costs us time. We’re mortal; we’re going to use the defaults we get until they don’t work for us, then we’re going to change them.

Because there is no schema, a mongo “collection” (fuck it, I’m going to call it a table) takes up more storage than it should. Our large DB would be far smaller if we defined a schema and stored it in an RDBMS. This space seems to be taken up in RAM as well as disk. This costs us more money.

MongoDB starts to perform horribly when it runs out of memory. This is by design. But it’s annoying and it costs us more money and time because we have to either archive out old data (which we do in some cases), or use a larger instance than we ought to have to (which we also do). And even if you delete entries, or even truncate a table, the amount of space used on disk remains the same (see below). More money.

MongoDB will fragment the storage its given if your data gets updated and changes size. This caused us to end up storing around 20-30 GB of data in a 60-some-odd GB instance. And then we started to exhaust RAM, and performance plummeted. So we needed to defrag it. More care and feeding (time) that I didn’t want to spend.

So to ‘fix’ the fragmented storage issue, we had to ‘repair’ the DB. This knocked our instance offline for hours. Many hours. Time. The standard fix for this is to spin up another instance (money), make it part of a replication set, repair one, let it catch up, then repair the other. Time.

The final issue I had with Mongo was when I attempted to shard our setup. We were using a 67GB (quad-x-large memory) instance for our Mongo setup. I got advice from some savvy Mongo users to ‘just shard it.’ That made it sound so trivial. So I did. I figured we could go for 16GB shards and add a new one when we got heavy, and yank out one if we got light. I liked the idea of being able to save more money, and flexibly respond to requirements. So I set up a set of four shards – three “shardmasters” which coordinated metadata, and one ‘dumb shard’ which just stored data and synced up to the metadata servers. I blew the first time to get the config right. Whoops. I did it again, and this time, I did it right. I picked a shard key – not an ideal one, but one that would, over time, roughly evenly distribute data across all of our shards, while maintaining some level of locality for the most likely operations. I ran a test – it’s really nice to do in the JS console, I must say. I ran a for-loop to insert 10 million objects of garbage data, with a Modulo-10 of ‘i’ to simulate the distributions of the shard keys. I watched, in a separate console, as it threw data on one shard, then started migrating data from one shard to the others. It worked enormously well. So I yanked my test data, then we put production data on the thing.

It worked fine for a few days. The Mongo filled up a shard and blew up. It was a pretty huge, horrible catastrophe. It was hard for me to troubleshoot what, exactly, happened – but it looked like no data went onto any shard other than the first.

Now, I *was* using an ObjectId() as the shard key. Not the object’s _id, but the object_id of a related table. One that was nice and chunky – didn’t change very much except for every few hundred thousand records or so. It’s possible that I needed not to use a shard key that is an increasing ObjectId. It’s possible that switching from an integer going from 0-9 over to an ObjectId that increments somehow messed me up. I tried to figure out what happened, after the fact, and got similarly nowhere. I also checked documentation to see if I had done something wrong. While I wasn’t able to find anything definitive, there was mention about using an ObjectId as your shard key possibly throwing all traffic to just one shard. For our purposes, that would’ve been fine *if* the other ‘chunks’ of data got migrated off that shard, on to somewhere else. They didn’t. This whole ordeal cost us loads and loads of time. Again, I’m perfectly willing to take the blame for having done something wrong – but if so, then there’s something missing in the documentation.

So that was a complete nightmare. But it’s still not a technology I would discount – I can imagine specific use-cases where it might fit nicely. And it sure is pleasant to work with, as an admin and as a developer. But I’d far rather use something like DynamoDB (which seems very interesting), or something like Cassandra (which I’ve been following closely, but I have not yet put into production). In the meantime, I still use a lot of MySQL. And it definitely shows its age, and isn’t always pleasant, but generally does not surprise me.

Amazon cloud-init – customizing EBS-backed Amazon Linux AMI’s

EDIT – No, not even this works. I feel like I’m losing my mind.

EDIT 2 – Oh, apparently you *have* to specify the boot kernel. Have to. Can’t use “use default” as I have been for, like, ever. Ugh. Angry.

I just blew a horrible amount of time on this. I’ve burned many an AMI – based on ephemeral store and EBS-backed volumes. But trying to do it ‘right’ – with programmable private keys and whatnot – seemed to be out of my grasp, at least when using Amazon’s own Linux distro.

If you try to customize Amazon Linux you will find that some things that are normally done by cloud-init don’t seem to work on your image. Namely, setting ssh keys. It works fine when you first boot the pristine Amazon image, but when you try to burn your own it won’t seem to set the ssh keys properly.

To set them, make sure you blow out the contents of /var/lib/cloud/ – and both /root/.ssh/authorized_keys as well as /home/ec2-user/.ssh/authorized_keys. They’ll get reset on next boot.

This isn’t documented anywhere and I basically had to dick around with strace and flipping through all of the python code to figure out that there’s a semaphore file in /var/lib/cloud/sem that gets set and then the ssh-setting-script at boot will never run again. It makes me angry – but maybe that’s Amazon’s point; they don’t want you to customize their image so they can save on EBS volume space. I don’t know. Pisses me off and wastes my time for sure though.

You would think that at least when I try to run stuff by hand it would say “Oh, hey, there’s a semaphore file right here – make sure to yank it if you really want to run your scripts again.” Not this silent no-message bullshit.

ARGH.

-B.