Javascript Integration with AWS S3

I thought I would document some of the work that I have been working on, and this particular solution was quite neat and turned out to be a really good solution to the problem we were facing.

Some background

The SaaS product that we are currently redeveloping has a need to store images uploaded by the user before sending them off to a publishing site. We are also wanting to extend this functionality to include video in the media types that are available to the user.

At the moment the system grabs a multi-part form post and stores the entire byte-array of the image in the database, along with the rest of the details of that piece of content. When the platform was originally built (2012), this may have sufficed, however now that we have much larger image files and the need to store videos up to 1Gb in size, this is probably not ideal.

A plan

I decided to implement storage via AWS S3 – it’s nice and fast, has lots of locations around the world and some great capabilities as far as security and access control are concerned. I have had a little experience in the past integrating with the SDKs, but mostly from PHP – and this bad boy is built in Java (not my first language of choice – or the one I am very versed at). I built a class that just wrapped up what I needed it to do, sending a file blob to an S3 bucket of my choosing in a region of my choosing; attaching a nice bunch of meta-data along the way. All was going well, when I had the epiphany – why not change the S3 integration to the client-side, then we will wouldn’t have to deal with the upload bandwidth at all.

The benefits of moving to a front-end solution were, it would be quicker for the end-user (the file would go straight to S3, not to our server, then S3); it would save bandwidth on our server (and ultimately money); and it would save on disc space requirements on our server – happy days.

Now, it would be remiss of me to outline the dis-benefits of this change – first, this would take more time to implement than going with the S3 integration from the server (this one is certainly outweighed by the benefits though); the other really big issue was that if we were to integrate via the front-end, our Amazon SDK credentials would be exposed for the world to see (a great little solution to this problem, next). All-in-all, I think the benefits greatly outweighed the dis-benefits, so we were a go on the front-end integration.

The implementation

As stated previously, one of the dis-benefits to integrating with AWS from the front-end is that you need to provide SDK credentials for the world to see in your javascript. So the solution to this was to do the following:

  1. Create a temporary bucket that will only receive PUT requests via appropriate CORS headers (which also limits domain origins that can operate on the bucket)
  2. Create a final storage bucket that has NO CORS headers, meaning it cannot be uploaded to from javascript
  3. Add a file expiry policy on the bucket to 24 hours (this will probably come down considerably, after launch) – so files will automatically be deleted after this period of time
  4. Create an IAM role in AWS that has very limited capabilities – ONLY S3 access and ONLY the ability to upload (no delete or move, etc) – this will be used by the javascript
  5. Create an IAM role in AWS that has much more powerful capabilities (still only S3 access, but more control over all aspects of S3) – this will be used by the Java backend
  6. Upload the file to the temporary bucket from javascript, using the credentials for the limited IAM user
  7. Send the details of the file (along with the rest of the form details) to our server
  8. Our server will move (or rather, copy) the file from one bucket to another (note: the file would never go to our server), using the more powerful IAM role
  9. Store the file details in the database along with the rest of the form data

Obviously we would need to manage the ability to have content with the old, blob-storage images AND the newly created S3 url stored images, but that is a single check on the DB record and only a couple of places in the rest of the codebase where it is required.

Using the limited IAM user and the multi-bucket system, alongside Amazon’s lifecycle file management in S3; means that the worst that can happen is that someone maliciously uploads a really large file to our S3 bucket, and it gets deleted automatically after a short period of time (granted it would be 24 hours to begin with, but this will come down).

Configuration

First, this is the CORS Configuration that was used on the temporary bucket:

<?xml version="1.0" encoding="UTF-8"?>  
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">  
    <CORSRule>
        <AllowedOrigin>*</AllowedOrigin>
        <AllowedMethod>HEAD</AllowedMethod>
        <AllowedMethod>PUT</AllowedMethod>
        <AllowedMethod>POST</AllowedMethod>
        <MaxAgeSeconds>3000</MaxAgeSeconds>
        <ExposeHeader>ETag</ExposeHeader>
        <ExposeHeader>x-amz-meta-company-id</ExposeHeader>
        <ExposeHeader>x-amz-meta-company-name</ExposeHeader>
        <ExposeHeader>x-amz-meta-user-id</ExposeHeader>
        <ExposeHeader>x-amz-meta-user-name</ExposeHeader>
        <AllowedHeader>*</AllowedHeader>
    </CORSRule>
</CORSConfiguration>  

Note the extra metadata headers that have been allowed to be exposed. The three methods (HEAD, PUT & POST) ended up being required to allow for the SDK to do the Multipart uploads to the bucket from Javascript.

Next, the user configurations. There were two "users", the javascript user and the server-side user. The Access Keys and Secrets of these users are what were used by the SDKs to operate on the S3 Buckets.

There was also a user group called SDKUsersGroup, which both of the users belonged to. The AWS policy that was attached to this group was the built-in AWSConnector policy - although realistically this could be a much more refined policy

The "Javscript user" had a custom policy attached that is as follows:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1455063034000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectAcl",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::my-media-store-temp/*"
            ]
        },
        {
            "Sid": "Stmt1455063160000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectAcl",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::my-media-cors-test-01/*"
            ]
        }
    ]
}

I have left in the test bucket that was used above to show how to include access to more than one bucket within the same policy document, but really this would not be needed in production.

As for the "Server User", they just had the AmazonS3FullAccess built-in policy attached.

Once both users were created, the security credentials were saved for use within the application itself. The ones for the server user were NEVER exposed in any way and were stored for use on the server only. The javascript users' credentials WERE exposed from within the javascript code (hence the extra security step of creating a secondary bucket & user).

Code

Here is a cut-back gist of the javascript code – note that it is written completely functionally and in a future build of the platform, this will all get wrapped up in a nice component within a larger framework, but for now, we needed to get this going as fast as we could.

I will go through a more technical document for the actual code at a later date, but the code has a few comments if you want to follow along at home… 😉

I haven't included the server-side code here because this is more about how to do things from the front-end - the high-level overview of the backend was that it received the details of the object in S3 (as well as some other form data), moved the object to the final bucket and then stored that url in the DB with the rest of the data for the document.

Improvements

Of course there can always be improvements; and the big one here will be to implement signed urls, which would mean that I would no longer need to have the AWS Secret in the front-end. I would still use the temp bucket as that would mean the main bucket could not be touched via javascript or CORS, just adding a small amount of extra security.

I am sure I will think of more improvements – in fact this entire solution will probably look positively archaic in a few months! 😉