Amazon Simple Storage Service with s3cmd

s3cmd is a command line tool for uploading, retrieving and managing data in Amazon S3. With s3cmd we can perform other related tasks as well, such as creating buckets, removing buckets, listing objects and synchronizing directory trees.

Using an Amazon Web Service

To use an Amazon Web Service, like the Amazon Simple Storage Service (S3), there are two prerequisites. The first one is that we need an Amazon Web Services account. We took care of that in the Signing up with Amazon Web Services manual. The second is that we need to sign up for each individual web service we want to use. Since we signed up with an S3 account in a previous manual, we’re almost ready to roll.

The last piece of the puzzle is “Security Credentials”. To access Amazon Web Services, we need to provide special credentials that are associated with our account. What we’re after are access keys. They make sure that REST or Query protocol requests to any AWS service API are secure.

An access key is automatically created when we signed up, so now we only need to obtain the Access Key ID and the Secret Access Key to begin using the services.

Amazon S3 concepts

There are three key Amazon S3 concepts, namely objects, buckets, and keys. Objects are entities that are stored in Amazon S3. These objects consist of the object data – a file – and metadata – a set of name-value pairs that describe the object.

Amazon Buckets are containers for your objects. You can have up to 100 buckets at a time. The reason that there is a limit, is that buckets must have unique names across the entire service. There are, however, no limit to the amount of objects you can put into each bucket. Objects can range in size from 1 byte to 5 gigabytes.

A key is the unique identifier for an object within a bucket. A bucket name and a key uniquely identify an object. You can access every object in S3 by a combination of the service endpoint, bucket name, and key.

You own each bucket you create. Amazon charge you for storing objects in your buckets and for transferring objects in and out of your buckets.

Installing s3cmd

To install s3cmd issue the following

demo@server:~$ sudo aptitude install s3cmd

Configuring s3cmd

Let’s configure s3cmd next. You’ll need your Access Key ID and the Secret Access Key for this section.

demo@server:~$ s3cmd --configure

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3
Access Key: AWS_ACCESSS_KEY_ID
Secret Key: AWS_SECRET_ACCESS_KEY

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password: <Enter> 
Path to GPG program [/usr/bin/gpg]: <Enter>

When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP and can't be used if you're behind a proxy
Use HTTPS protocol [No]: Yes

New settings:
  Access Key: AWS_ACCESS_KEY_ID
  Secret Key: AWS_SECRET_ACCESS_KEY
  Encryption password: 
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: True
  HTTP Proxy server name: 
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] y
Please wait...
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works...
Not configured. Never mind.

Save settings? [y/N] y
Configuration saved to '/home/demo/.s3cfg'
demo@server:~$ 

Let’s walk through what happened during the configuration. First we provided the access key and the secret key which we need to actually use S3. Then there were two settings concerning GPG encryption that we ignored – we’ll get back to them in a minute. Next we opted for using the HTTPS protocol. This protocol is a combination of the Hypertext Transfer Protocol (HTTP) with the SSL/TLS protocol. What this does is provide encryption during transport to and from servers. It also takes measures to identify the server we’re communicating with.

Before saving our configuration to disk, s3cmd checked that all our settings work. In the next section we’ll play around with s3cmd.

Playing with Buckets

  • create bucket (naming convention, domain name, or useful_name_random_characters)
  • upload files
  • download files
  • list objects
  • delete files
  • delete directories
  • syncing

really create a bucket

get random characters for a bucket name

pwgen -s -A

s3cmd mb s3://useful_name_random_characters s3cmd mb s3://bm_backup_scb77ph8yz4td5xxlkvr6al6

bmichelsen:~ $ s3cmd mb s3://bm_backup_scb77ph8yz4td5xxlkvr6al6 Bucket ‘bm_backup_scb77ph8yz4td5xxlkvr6al6’ created bmichelsen:~ $

  • gpg encryption
  • add gpg with “s3cmd –configure”
  • revisit configuration (defaults to US etc., step-by-step walkthrough of the config file)
  • vim .s3cfg
  • amazon s3 versioning
  • backup with s3 sync and cron

Q: How do I decide which Region to store my data in?

There are several factors to consider based on your specific application. You may want to store your data in a Region that…

    * ...is near to your customers, your data centers, or your other AWS resources in order to reduce data access latencies.
    * ...is remote from your other operations for geographic redundancy and disaster recovery purposes.
    * ...enables you to address specific legal and regulatory requirements.
    * ...allows you to reduce storage costs. You can choose a lower priced Region to save money. Please see the pricing section on the S3 detail page.