Host a Gatsby website on AWS with S3 and CloudFront

There are many ways to host a Gatsby website on AWS. My preference is for using S3 and CloudFront as this seems to give the best experience for both developers and users. In this article I'll show you how to set up S3 and CloudFront using CloudFormation.

It starts with a simple CloudFormation template containing an S3 bucket to host our website. In the Outputs section I'll an output which contains the name of the bucket that was created so we don't need to hunt for it..

AWSTemplateFormatVersion: "2010-09-09"n
Description: Gatsby Website

Resources:
  HostingBucket:
    Type: AWS::S3::Bucket
    Properties:
      AccessControl: Private

Outputs:
  HostingBucket:
    Value: !Ref HostingBucket
    Description: Name of the bucket for hosting

A lot of articles about website hposting on S3 will now enable S3's website hosting feature and stop. This only provides an HTTP version of the website and only one specific hostname. I want the website to be richdevelops.dev with the HTTP and non-www prefix version all redirecting to that site. To do that I need to use CloudFront.

You may have noticed that the bucket isn't public. That is by design because public S3 buckets are bad practice and AWS provides a way for CloudFront to access private S3 buckets.

For the CloudFront distribution to access the private S3 bucket it requires CloudFront Origin Access Identity. You can think of this like a user that CloudFront will use when fetching content from the bucket.

CloudFrontOAI:
  Type: AWS::CloudFront::CloudFrontOriginAccessIdentity
  Properties:
    CloudFrontOriginAccessIdentityConfig:
      Comment: Allow access to the Graphboss S3 bucket from CloudFront

You give the OAI access to the S3 bucket via an S3 Bucket Policy which allows it to get objects.

HostingBucketPolicy:
  Type: AWS::S3::BucketPolicy
  Properties:
    PolicyDocument:
      Id: PolicyForCloudFrontPrivateContent
      Version: 2012-10-17
      Statement:
        - Sid: PublicReadForGetBucketObjects
          Effect: Allow
          Principal:
            CanonicalUser: !GetAtt CloudFrontOAI.S3CanonicalUserId
          Action: "s3:GetObject"
          Resource: !Join
            - ""
            - - "arn:aws:s3:::"
              - !Ref HostingBucket
              - /*
    Bucket: !Ref HostingBucket

Now its's time to create our CloudFront distribution which will use the OAI to access files from the S3 bucket and serve them to the Internet. As it's a static website I'm only going to allow GET, HEAD and OPTION methods. I'm also going to configure HTTP to redirect to HTTPS.

CloudFrontDistribution:
  Type: AWS::CloudFront::Distribution
  Properties:
    DistributionConfig:
      Comment: CDN for Gatsby website
      Enabled: true
      DefaultCacheBehavior:
        AllowedMethods:
          - GET
          - HEAD
          - OPTIONS
        ForwardedValues:
          QueryString: true
        TargetOriginId: HostingBucketOrigin
        ViewerProtocolPolicy: redirect-to-https
      DefaultRootObject: index.html
      Origins:
        - DomainName: !GetAtt HostingBucket.DomainName
          Id: HostingBucketOrigin
          S3OriginConfig:
            OriginAccessIdentity:
              Fn::Join:
                - "/"
                - - origin-access-identity/cloudfront
                  - !Ref CloudFrontOAI

Before deploying this you will want to add an Outputs section to your CloudFormation template that will output the URL for the CloudFront Distribution.

CloudFrontDistributionURL:
  Value: !Join
    - ""
    - - "https://"
      - !GetAtt
        - CloudFrontDistribution
        - DomainName
  Description: CloudFront URL for the website

With all of that done let's do out first deployment. Creating a CloudFront Distribution can be slow so it will probably take a few minutes to deploy.

Once the deployment is done you can run npm run build in your Gatsby projects and copy the files from the build folder into your S3 Bucket. The easiest way to do this is with the AWS CLI. Remember to replace BUCKET_NAME, PROFILE_NAME and REGION_NAME with values appropriate for you.

aws s3 sync build s3://BUCKET_NAME --profile PROFILE_NAME --region REGION_NAME

If you use your web browser to look the CloudFront Distribution URL from the template outputs you should see your website.

The website needs to be hosted using my domain name so I need to create a certificate using AWS Certificate Manager.

CloudFrontCertificate:
  Type: AWS::CertificateManager::Certificate
  Properties:
    DomainName: richdevelops.dev
    DomainValidationOptions:
      - DomainName: richdevelops.dev
        HostedZoneId: MY_HOSTED_ZONE_ID
      - DomainName: www.richdevelops.dev
        HostedZoneId: MY_HOSTED_ZONE_ID
    SubjectAlternativeNames:
      - richdevelops.dev
      - www.richdevelops.dev
    ValidationMethod: DNS

I then need to add the aliases and certificate to the CloudFront distribution.

CloudFrontDistribution:
  Type: AWS::CloudFront::Distribution
  Properties:
    DistributionConfig:
      Aliases:
        - richdevelops.dev
        - www.richdevelops.dev
      ViewerCertificate:
        AcmCertificateArn: !Ref CloudFrontCertificate
        SslSupportMethod: sni-only

After deploying the change we need to setup a CNAME DNS record pointing to the CloudFront distribution.

Once all of that is done you should be able to access the website using my own domain name.

While this is a good start we can do a lot to clean it up. You may notice that page not founds return an ugly error message. To fix that I added some custom error responses to my CloudFront distribution for 403 and 404 errors.

CloudFrontDistribution:
  Type: AWS::CloudFront::Distribution
  Properties:
    DistributionConfig:
      CustomErrorResponses:
        - ErrorCode: 403
          ResponseCode: 404
          ResponsePagePath: /404/index.html
        - ErrorCode: 404
          ResponseCode: 404
          ResponsePagePath: /404/index.html

After reploying the change the page not found page will now work.

My next two problems are SEO related. Currently the site responds to both the host name richdevelops.dev and the www prefixed version www.richdevelops.dev. This is bad for SEO so to help search engines like Google understand which one is canonical I added a CloudFront Function which redirects from the www. prefix to the naked domain.

RichDevelopsRequestFunction:
  Type: AWS::CloudFront::Function
  Properties:
    AutoPublish: true
    FunctionCode: |
      function handler(event) {
        var request = event.request;
        var host = request.headers.host.value;
        var uri = request.uri;

        if (host === 'www.richdevelops.dev') {
          return request;
        }
        var response = {
          statusCode: 301,
          statusDescription: 'Found',
          headers: {
            'cloudfront-functions': { value: 'generated-by-CloudFront-Functions' },
            'location': { value: 'https://www.richdevelops.dev' + uri }
          }
        };
        return response;
      }
    FunctionConfig:
      Comment: Redirect non-canonical domains and add /index.html for richdevelops.dev
      Runtime: cloudfront-js-1.0
    Name: rich-develops-viewer-request

Enabling that CloudFunction requires a small change to the configuration for our CloudFront distribution

CloudFrontDistribution:
  Type: AWS::CloudFront::Distribution
  Properties:
    DistributionConfig:
      DefaultCacheBehavior:
        FunctionAssociations:
          - EventType: viewer-request
            FunctionARN: !GetAtt RichDevelopsRequestFunction.FunctionARN
      Origins:
        - DomainName: !GetAtt HostingBucket.DomainName
          Id: HostingBucketOrigin
          S3OriginConfig:
            OriginAccessIdentity:
              Fn::Join:
                - "/"
                - - origin-access-identity/cloudfront
                  - !Ref CloudFrontOAI

With that change deployed any attempt to access a page on www.richdevelops.dev will automatically redirect to the same page on richdevelops.dev.

Finally you may notice a problem with the trailing slash / in URL. To clean that up I modified the CloudFront Function so that if it ends with a / it adds index.html to the end of the URL and it doesn't contain a . then adds /index.html

RichDevelopsRequestFunction:
  Type: AWS::CloudFront::Function
  Properties:
    AutoPublish: true
    FunctionCode: |
      function handler(event) {
        var request = event.request;
        var host = request.headers.host.value;
        var uri = request.uri;

        if (host === 'www.richdevelops.dev') {
          // Check whether the URI is missing a file name.
          if (uri.endsWith('/')) {
              request.uri += 'index.html';
          }
          // Check whether the URI is missing a file extension.
          else if (!uri.includes('.')) {
              request.uri += '/index.html';
          }

          return request;
        }
        var response = {
          statusCode: 301,
          statusDescription: 'Found',
          headers: {
            'cloudfront-functions': { value: 'generated-by-CloudFront-Functions' },
            'location': { value: 'https://www.richdevelops.dev' + uri }
          }
        };
        return response;
      }
    FunctionConfig:
      Comment: Redirect non-canonical domains and add /index.html for richdevelops.dev
      Runtime: cloudfront-js-1.0
    Name: rich-develops-viewer-request

With all of that done we now have a Gatsby site hosted on S3.