In a recent client engagement we were tasked with setting up an open source PHP learning management system on AWS. Our client will have users from all over Australia accessing video content via Moodle, so it was decided that CloudFront would act as the content delivery network (CDN) service – AWS has edge locations in Melbourne, Sydney and Perth, and more local edge locations helps to deliver content with lower latency.
After quickly establishing and building some lovely CloudFormation templates with all of our required AWS services, it became very apparent that CloudFront was not doing the CloudFront-y things it should (i.e. speeding up the distribution of content as nothing was getting cached in the relevant edge locations)…hmmm.
To overcome this, we changed the CloudFormation template to whitelist the host header, and – hurrah – we began to see some initial caching on objects!
While this didn’t fix everything (it was soon realised that the application was emitting Cache-Control headers for dynamic content so that wasn’t going to get cached by default), the static content was now being cached.
Sample CloudFormation code:
``` DefaultCacheBehavior: AllowedMethods: - DELETE - GET - HEAD - OPTIONS - PATCH - POST - PUT DefaultTTL: 86400 ForwardedValues: QueryString: true Headers: - "Host" Cookies: Forward: all TargetOriginId: elb ViewerProtocolPolicy: redirect-to-https Compress: true ```
Because a lot of the headers that were previously going through to Moodle were now no longer being captured (as we had only whitelisted the host header), we then enabled CloudFront logging in S3.
After switching all this on, our client was confused as to why all content still wasn’t being cached now. This was due to a few things:
- The Cache-Control headers discussed above
- A lot of the content loaded on the page had unique IDs (e.g. logins) in the URL meaning that it wasn’t cacheable, as caching is based on URL path
- Sometimes the asset had not been hit widely enough that all edge nodes had it available to use for caching.
This means when CloudFront gets a request, the path is compared with path patterns in the order in which cache behaviours are listed. The first match determines which cache behaviour is applied to the request. There were several custom behaviours set up on different file types to try and improve caching.
The CloudFormation template would look like this when setting up one custom behaviour:
``` DistributionConfig: CacheBehaviours: - PathPattern: "*.jpg" TargetOriginId: elb ViewerProtocolPolicy: redirect-to-http ```
The only thing that had been changed in CloudFormation was CloudFront, but we didn’t think caching would cause these errors. As it turns out, CloudFront caches HTTP 4xx and 5xx status codes for 5 seconds by default. We lowered the caching time for 504 errors to be 0 seconds to test, and to our relief the large amount of 504 errors weren’t a problem any more.