A QUICK BACKGROUND
If there’s one service in AWS that you’ve probably heard about, it’s S3 – the Simple Storage Service. As AWS’s oldest service, its cheap storage marries so well with its ability to act as a static website-hosting platform that it’s become a popular choice for that function for many organisations. Even Cevo uses it – the page you’re reading now is hosted from an S3 bucket, configured as a static webserver. It’s fast, scalable, secure, and cheap, and we love it.
It’s pretty much the top of the heap when it comes to a serverless web hosting platform (yeah, I said it).
The challenge comes when you have a section of a website, or a whole website, which you want to protect with some kind of authentication. Perhaps it’s family photos, documentation for your clients, customers or partners, or just something you’re not quite ready to show the world yet.
S3’s ability to serve web pages and associated content is second to none, but it doesn’t provide any way to run code on the server-side – it can’t ask you for a username and password, check that against a list, and allow or refuse access accordingly.
You can do Very Clever Things with something called a “pre-signed URL”, but you still have to be running some code somewhere in order to generate that URL, and they’re only valid for a limited time; no, that won’t solve the problem.
You could require all your customers, partners, family members and so forth to sign up for an AWS account, and make use of the existing Identity and Access Management (IAM) capabilities in order to manage access to the bucket – but if you’ve ever had to explain how to attach a photo to an email to Aunt Beryl (for the eighth time), you’ll know that this is pretty much a non-starter.
Enter CloudFront, the AWS Content Delivery Network (CDN) service. CloudFront does a whole bunch of stuff: makes websites faster, guards against all sorts of attacks, and (crucially) allows you to use an S3 bucket as an “origin”. The killer feature for our little conundrum is that you can have CloudFront run just a little bit of code inside itself as it does so – for every request that a web browser sends through it, CloudFront can:
- see if there’s authentication already supplied as part of the request;
- request authentication from the user if there isn’t any;
- check that the supplied credentials are valid; and then
- serve the content really fast, from its global network of edge cache nodes
In addition, you get:
- HTTP to HTTPS redirection
- SSL certificate management
- access logs
- the solution is entirely serverless, so you don’t pay for any compute when you aren’t actually receiving traffic (there’s a tiny cost for the S3 storage, but that’s the only ongoing cost)
Sounds wonderful, right?
Let’s set one up, and you can see how easy it is!
DOING THE DO
We’re going to use CloudFormation, AWS’s “infrastructure as code” capability, to create all the resources that we need. This way, there’s a single command to create everything, and if we ever need to recreate it again somewhere else, it’s just as easy. I’ve done all the work for you, so you could just grab it and go, but I’m going to break down how it works by stepping through the template.
If you want to just get the code and have a crack, you can find it on github.
This example assumes that you have sufficient rights in the AWS account where you’re creating the stack to do things like create CloudFront distributions, ACM certificates, update Route53 DNS records, create S3 buckets and set policies.
STEP 1 – SET UP THE DNS ZONE IN ROUTE53
You’ll need to have a Route53 hosted zone. If you don’t have a DNS domain handy that you want to use for this website, you can buy one via the Route53 console. The domain must be delegated to Route53 and publicly resolvable.
STEP 2 – CHOOSING YOUR WORDS
You’ll need to choose a username and password for access to your new website. For this example, we’re just going to do a single username/password pair; if you want to extend this to use a more complex authentication scheme, that’s entirely possible – this one is quite basic.
STEP 3 – CREATE THE STACK
There’s one very specific condition in play here – you MUST create the CloudFormation stack in the
us-east-1 (North Virginia) region. This is because the ACM certificate for the CloudFront distribution must be created in that region, which ties everything else there too; but don’t worry, your content is delivered via CloudFront so your content will be fast no matter where on Earth you are (CloudFront edge nodes not yet available on orbit, or on other planets).
- Clone the git repository (link above)
- Update the
parameters.jsonfile, setting your chosen:
- DNS domain name
- Create the stack, either via the console or using the command-line:
aws cloudformation create-stack --stack-name s3-singlepage --template-body file://template.yml --parameters file://parameters.json
Note that the stack could take up to 30 minutes to create, with the CloudFront distribution being by far the major contributor to this time.
STEP 4 – TEST IT
Once the CloudFormation stack is in
CREATE_COMPLETE state, wait a minute or two for the DNS changes to propagate and then browse to the site:
The CloudFront distribution we’ve set up will automatically redirect browsers to the secure (HTTPS) website if you accidentally forget and try to go to the plaintext (HTTP) URL.
STEP 5 (OPTIONAL) – UPLOAD MORE CONTENT
The default bucket contains a dummy
index.html “Hello, world!” to demonstrate that the site works. It’s now up to you to create more content and upload it, or tune and tweak the stack to better suit your needs.
If you make any cool additions or adjustments to the CloudFormation template and want to see them reflected in the source, just send us a pull request!
HOW IT WORKS
SITE VISIT FLOW
- When a browser visits the website, the hostname resolves to one of the CloudFront edge nodes
- The browser requests the site, but has no credentials. A Lambda@Edge function, running inside the CloudFront edge node, checks the authentication but none was supplied, so returns a
401 UnauthorizedHTTP response, which the browser understands and prompts for a username and password
- Second time around, the credentials are supplied with the browser request, which is validated by the Lambda@Edge function, and the request can proceed
- CloudFront either serves the content out of its edge cache, or requests it from the origin (the S3 bucket) and caches it against later queries.
The stack incorporates a couple of neat features of CloudFormation:
- Inline NodeJS code for the Lambda@Edge function
- A Custom Resource to create the
index.htmlobject inside S3
- CloudWatch scheduled events to trigger a Lambda function which checks whether there’s a pending ACM certificate request and, if there is, creates the corresponding DNS entries
Other than that, it’s pretty straightforward 😉
Things to note, though:
- The stack creates 2 different S3 buckets – one for the content, and one for the logs. When you delete the stack, the buckets and their content will remain although the bucket policies associated with them will be deleted.
WHAT DOES IT COST?
This is a bit like asking “how long is a piece of string?” The answer is primarily dependent on the amount of data being served, the rate at which your site is being hit. Let’s make some assumptions to give indicative pricing, though. Let’s explore 3 scenarios: serving content purely from an EC2 instance with an Application Load Balancer (ALB); serving the same content from an EC2 instance via CloudFront; and serving the content via the method described here.
For comparison, we’re going to assume that all regional resources are deployed in
us-east-1 (North Virginia), and we’ll calculate a monthly cost based on a 30-day month with 22 business days.
SETTING THE (IMAGINARY) SCENE
Imagine we’re serving up a set of technical service manuals as PDFs to a fleet of service technicians on the road. Each PDF is 2MB (there are quite a few images in them), and there are 5,000 different manuals. It’s a worldwide service, so 10,000 technicians. Each technician downloads 5 manuals a day (for some reason, they never save them locally) and they work 5 days a week.
In terms of data transfer, that’s 10,000 techs * 5 days * 2MB/pdf = 100,000MB/day. In a 30-day month with 22 working days, that would be 2,200,000MB in the month, or a shade over 2TB.
Let’s also imagine that each technician visits 5 pages in order to navigate a tree of documents to download, before downloading the manual. Every page visit invokes the Lambda@Edge function to validate their credentials. That’s 10,000 techs * 5 pages * 5 manuals * 22 days = 5,500,000 hits per month.
So our baseline is 5.5 million hits/month, with data transfer of 2TB/month.
EC2 WITH ALB
Even this amount of data doesn’t require a terribly high-throughput EC2 instance, but we want it to be highly available so we’d run an autoscaling group with 2 instances in separate availability zones. Each instance has a local EBS volume with the manual PDFs, created from a snapshot.
- 2 x
m5.largeLinux instances at $0.096/hour = $138.24 / month
- 2 x 10GB EBS
gp2volumes (1 per instance) at $0.10/GB-month = $2.00 / month
- 1 x 10GB EBS snapshot at $0.05/GB-month = $1.00 / month
- 2 x 10GB EBS
- 720 x Application Load Balancer hours at $0.0225 / hour = $16.20 / month
- 720 x Capacity Units at $0.008 / LCU-hour = $5.76 / month
- 2147 GB data transfer out at $0.09/GB = $193.23 / month
- this is 2148GB less the 1GB/month free tier
Which works out to $365.43 / month
EC2 WITH CLOUDFRONT (NO LAMBDA@EDGE)
If we keep the same back-end infrastructure, the main price difference will be on the data transfer. Of course, users of the service in regions other than North America will also get a much better experience because of the caching and more predictable network performance across the CloudFront backhaul network, but let’s just look at the pricing for now.
CloudFront data fetches from an AWS origin (like ALB) have been free since 2014, so we don’t have to include that cost. We’re in the first pricing tier (up to 10TB/month). We have to hand-wave a bit about where our technicians might be, but if we pretend they’re distributed evenly among all the CloudFront regions, we get:
- 2148GB / 8 regions = 268.5GB per region per month
- total cost of (268.5 * .085) + (268.5 * .085) + (268.5 * .110) + (268.5 * .114) + (268.5 * .114) + (268.5 * .140) + (268.5 * .170) + (268.5 * .250) = $286.75 / month
Which works out (including EC2 resources) to $449.95 / month
This is more expensive than just EC2 with ALB in
us-east-1, and part of that cost comes from the improvement in performance for users in the less well-connected parts of the world (look at the CDN pricing for South-East Asia, India and South America); however, we can control these costs by choosing a different price class if most of our users are in, say, North America and Europe:
- Price class 100 (USA / Canada / Europe) would bring the cost down to $182.58 / month
- Price class 200 (all but Australia & South America) takes the cost to $252 / month
By choosing the most restrictive price class, and improving the performance less for second- and third-tier users, you can get the cost down to $345.78 / month.
S3 WITH CLOUDFRONT AND LAMBDA@EDGE
In this class, we get to remove the EC2 component altogether and replace it with S3 storage instead:
- 10 GB x S3 Standard storage (North Virginia) at $0.023 / GB-month = $0.23 / month (yes, twenty-three cents)
- CloudFront costing from above (Price Class All: $286.75 / month; Price Class 200: $252 / month; Price Class 100: $182.58 / month)
- Lambda@Edge 5.5 million requests at $0.60 per 1,000,000 requests = $3.30 / month
- Duration costs per 50 millisecond block for 5500000 * 0.05 * $0.00000625125 = $1.72 / month
Our total monthly costs (if using Price Class 100) would therefore only be $184.53 – 53% of the cheapest option using EC2 and CloudFront, and 50% of the cheapest EC2-only option.
Add to this, you get out of the box:
- IPv6 handling
- No infrastructure to patch, secure or maintain
- Performance that scales linearly with demand
This is a great demonstration of the capabilities of the platform; if you wanted, it could be extended to incorporate:
- a build pipeline (for delivery of content to the S3 bucket)
- a more complex authorization and authentication scheme (eg multiple users, with multiple roles)
- support for non-Route53 domains
… or whatever else you can think of.
Got any good ideas? Feel free to submit pull requests against the repo.
If this example solves a problem for you, great! If you’d like to talk to someone about getting a bit of assistance to implement or extend it for your needs, please get in touch.