In the Labs team here at Net-A-Porter, we have been using Amazon EC2 to run small proof-of-concept web applications, and we have enjoyed working with it. In under a minute, you can fire up a new server in the cloud and SSH in. That is incredibly useful when you want to get something up-and-running quickly.
We have learnt a few lessons along the way which I’d like to share in this post. Just to reiterate: we have been using EC2 for small prototypes, so the suggestions below are targeted at small dev teams who don’t have backgrounds in system admininstration. The suggestions are going to be less relevant if you are running large-scale production systems in EC2.
Enable Termination Protection
These points aren’t in any particular order, but if there is one thing to take away from the post, this is it: turn on Termination Protection for your EC2 instances!
An EC2 instance that is up-and-running has two states you can change it to:
Stopping an instance is the equivalent of shutting down a physical machine; you can get your instance back up-and-running later by simply starting it again. Terminating an instance is the equivalent of putting a sledge-hammer though a physical machine. That EC2 instance is gone for good.
Unfortunately, it is really easy to terminate an instance in the EC2 Web UI, by right-clicking on it and selecting ‘Terminate’. That’s it. No dialogue box asking for you to confirm, no undo button, that instance is dead. This happened to me when I was looking to terminate a certain instance, but didn’t realise that I also had another instance selected. When I clicked ‘terminate’, they both got killed — and, believe me, that’s not fun!
If you do accidentally terminate an instance, then don’t panic. Despite the instance getting killed, the disk volume that was attached to it remains available for a limited period, so you can still launch a new instance and attach this volume. To do this in the Web UI:
- Click the Launch Instance Button and go through the wizard choosing the same settings as your terminated instance (same keypair for SSH access, et cetera). You can use any AMI to launch from, it doesn’t matter, because we will be deleting this instance’s Hard Drive immediately afer launching it.
- Immediately stop the new instance
- Click on the Volumes from the left-hand menu. You should see a new Hard Drive attached to the instance we just launched, as well as the Hard Drive from the instance we accidentally terminated (it will be unattached).
- Unattach the Volume from the new instance (you can also delete it as we no longer need it), and attach the other Volume. It will ask for the device name, which may vary depending on your set up. For a default Amazon Linux instance, it is /dev/sda1.
- Start the instance and all should be good! If you were using an elastic IP, you will need to reattach it, and if you are monitoring with Cloudkick (see below), you may need to update your setting in the Cloudkick dashboard.
If you enable Termination Protection as the first thing you do when lauching an EC2 instance, then hopefully you will never need follow those instructions.
EC2 instances come in a variety of sizes and the smallest, micro instances, are tempting if you are building a proof-of-concept application. They have 613MB of RAM, which in most cases is plenty, and are less that a quarter of the price of a small instance (2 cents per hour compared to 8.5 cents per hour per instance in the Ireland data-centre).
That all sounds good; however, there is something to be wary of. A demo application we had deployed on a micro instance suddenly stopped responding, so I SSHed in to see what the deal was.
The first thing I noticed was that the terminal was a bit sluggish, so I ran `top` and on the first line saw this:
load average: 44.17, 35.29, 31.24
Okay… so that was looking a little on the high side. This was a pretty lightweight web app that had been running fine on a micro instance for about a year with a fairly low load average. Why the drastic increase in CPU usage?
The third line of the `top` output showed that CPU usage was actually only around 2%. `st` was at over 97%.
Cpu(s): 1.8%us, 0.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 97.7%st
I had no idea what `st` was, so broke out some googlefu and found a whole bunch of useful blog posts. The `st` stands for ‘steal time’ and indicates the amount of CPU throttling EC2 is doing.
Amazon does document this Cloudkick is straightforward to get up-and-running, and is a powerful way to monitor the health of your EC2 instances and applications deployed to them. It is also free, so check it out!
Note: At the time of writing, Cloudkick is transitioning to Rackspace Cloud Monitoring with Cloudkick disappearing in a year or two, so look out for that.
Name everything you can; Instances, Volumes, Snapshots, AMIs. This sounds like common sense; however, assigning names is optional, and easy to skip in the setup wizards. If you don’t name things, it becomes much more difficult picking, say, the right Volume from a drop-down list that contains vol-bc5e8dd4, vol-aca11cd and vol-a7fe7de.
To name things:
- Click the Tags tab
- Click the Add/Edit Tags button
- There should already be a key prepopulated with ‘Name’ — just fill in the value.