This is a walkthrough for backing up files to Amazon's data centers, more specifically to an EC2 instance with an EBS root volume. While it is tailored to a UNIX-like environment (as that's what I use - Solaris 10, Debian and Mac OS X 10.6 Snow Leopard), all tools used in these scripts are also available for Windows environments. Some adaptation of the commands might be required.
I recommend that you are at least slightly familiar with the AWS services before continuing to read this walkthrough.
There are many variations of using Amazon Web Services to back up your data. In the following I will shortly describe the methods I know of and the pros and cons I see with each of them.
For backing up directly to S3, several tools exist.
A general problem with backing up to S3, not caused by any of the above-mentioned tools, is that S3 requires you to copy whole files. So if you have a 500 MB file where only 1 byte changed, the whole 500 MB have to be uploaded each time, and you will be charged current S3 incoming transfer fees for that. This reason alone made me disregard all S3-based solutions for larger/more frequent backups.
You could use the (currently 160 GB) local storage of EC2 instances for backups. However, if the instance is shutdown for any reason, all data is lost. Even if you don't shutdown the instance ever, the host could lose its power or the host's hard drive could crash, resulting in a total data loss.
This problem can be overcome by snapshotting the local instance storage to S3 in frequent intervals. I just found this solution to be too complex and expensive for my purposes.
It is possible to use an EC2 instance and attach a EBS volume to it. Even though the EC2 instance is shut down or crashes, the EBS volume will be preserved, and even though the EBS host should crash, its data is backed up automatically. While S3 data is backed up about several data centers, EBS data is only being backed up in the same data center. So the safety level of data on attached EBS volumes can be considered to lie somewhere between instance-local storage and S3.
For increased security, EBS volumes can be snapshotted to S3 as often and with as many concurrent copies as desired.
This is certainly a viable backup solution. Depending on how important your data is, you might want to take S3 snapshots in addition to your EBS backups. While I tried it, I found it too complex to administer and also kind-of expensive.
<!--
For increased security, EBS volumes can be snapshotted to S3 as often and with as many concurrent copies as desired.
-->
Based on the determined importance of my data, the fact that this is a second site (in addition to on-site backup) and budgetary constraints, I chose to use an EBS volume as an EC2 instance root without snapshotting the volume to S3. This is the method I am going to describe in detail in the rest of this walkthrough. Before cloning this process I invite you to study the AWS documentation to determine its fitness for your purposes.
If you don't have an AWS account yet, sign up here using your Amazon login (or create one using the same link, if you aren't an Amazon customer yet). Even if you've already ordered books and stuff from Amazon, signup for AWS services is separate from that. However, the only AWS service you need for this guide is EC2.
Amazon AWS currently has server farms in three regions: US-West (Northern California), US-East (Virginia) and EU-West (Ireland). Usually, you want to use the one closest to the location of your servers. Depending on the type of data you want to back up, you might even have a legal obligation not to move it outside of your own country or region. In my case this means I have to use the EU-West region.
TBD
TBD
TBD
TBD
Download and unpack the Amazon EC2 API tools.
Set the following environment variables:
EC2_HOME=/path/to/ec2-api-tools
JAVA_HOME=/path/to/java
PATH=/path/to/ec2-api-tools/bin:$PATH
Download and install the latest version of boto.
Configure your region in ~/.boto
TBD
I want to clarify this part further, but as a short note, in the following command you will most likely want to change:
ec2-run-instances --private-key /path/to/private-key.pem --cert /path/to/cert.pem -H --region <regionName> --availability-zone <availibilityZone> --block-device-mapping /dev/sda1=:100:false --instance-initiated-shutdown-behavior stop --key <keyName> ami-13042f67
ssh -i /path/to/keypair-private-key.pem root@<instance public dns name>
Now connected to the instance, resize the root volume to its full size:1:
sudo resize2fs /dev/sda1
TBD
conn.start_instances([instance_id])
res = conn.get_all_instances([instance_id])[0]
while not res.instances[0].state == u'running': wait
ip = res.instances[0].public_dns_name
TBD
TBD
1: Thanks to Alestic