Wednesday, November 18, 2020

Antora Deploy to S3 and CloudFront

Even though there aren't any dedicated Antora components for deploying to AWS CloudFront or S3, it's still really easy to do — most of the Antora settings you'd use for a generic web hosting site work perfectly for S3 + CloudFront. Here's how:

  1. Playbook Settings
  2. S3 Settings
  3. CloudFront Settings
  4. Upload Script
  5. Redirects Script

Playbook Settings

Here's an example playbook file that you'd use to build your documentation as part of your production deploy process:

# antora-playbook.yml site: robots: allow start_page: example-user-guide::index.adoc title: Example Documentation url: https://docs.example.com content: sources: - url: https://git.example.com/my-account/my-docs.git branches: master start_path: content/* output: clean: true runtime: fetch: true ui: bundle: url: https://ci.example.com/my-account/my-docs-ui/builds/latest/ui-bundle.zip snapshot: true urls: html_extension_style: indexify

If you run Antora in the same directory as this playbook, with a command like the following, Antora will generate your site to the build/site sub-directory:

antora generate antora-playbook.yml

These are the key playbook settings for S3/CloudFront (including some settings omitted from the above playbook, because the default value is perfect already):

site.robots: Set this to allow (or disallow if you want to forbid search-engines from crawling your docs), so that Antora will generate a robots.txt file for you.

site.url: Make sure you set this to an absolute URL — doing so will trigger Antora to build out a bunch of desirable files, like a 404.html and sitemap.xml. If your documentation has its own dedicated domain name, like docs.example.com, set site.url to https://docs.example.com; if instead your documentation can be found at a sub-directory of your main website, like under the docs directory of www.example.com, set site.url to https://www.example.com/docs. In either case, omit the trailing slash (eg don't set it to https://www.example.com/docs/do set it to https://www.example.com/docs).

output.dir: By default, Antora will generate your site to the build/site sub-directory of whatever directory you ran the antora command from. If this is good for you, you can omit the output.dir setting; otherwise you can set output.dir to some other local filesystem path.

urls.html_extension_style: Set this to indexify, which directs Antora to a) build out each documentation page to an index.html file in a sub-directory named for the path of the page, and b) to build links to each page via the path to the page with a trailing slash. For example, for a page named how-it-works.adoc in the ROOT module of the example-ui-guide component, with indexify Antora will build the page out as a file named example-ui-guide/how-it-works/index.html (within its build/site output directory), and build links to the page as /example-ui-guide/how-it-works/. This is exactly what you want when you your site is served by S3.

urls.redirect_facility: The default setting, static is what you want for S3, so you can omit this setting from your playbook (or set it explicity to static if you like).

S3 Settings

When hosting Antora-generated sites on S3, you don't need to do anything different than you would for any other statically-generated website, so you can following any of the dozens of online guides for S3 website hosting, like Amazon's own S3 static website hosting guide. The key things you need to set are:

  1. Turn on static website hosting for the S3 bucket.
  2. Set the "index document" to index.html (the default for S3 website hosting).
  3. Set the "error document" to 404.html (Antora generates this file for you).
  4. Either configure the permissions of the S3 bucket to explicitly allow public access to read all objects in the bucket; or when you upload files to the bucket, explicitly upload them with a canned ACL setting that allows public read access (as the scripts covered later in this article will).

You need to make the files in your S3 bucket publicly-accessible (point #4 above) so that CloudFront can access them. While there technically is a way to configure S3 and CloudFront so that the files are not publicly-accessible in S3 but CloudFront can still access them (via an Origin Access Identity), it's kind of a pain. Since these files are ultimately meant to be served to the public through CloudFront anyway, it's simpler just to make them publicly-accessible in S3.

CloudFront Settings

There's also nothing special you need to do for Antora-generated sites with CloudFront — any of the dozens of online guides for S3 + CloudFront hosting will work to set it up. Just make sure that when you set the origin for your CloudFront distribution, you use the "website endpoint" of your S3 bucket, and not the standard endpoint.

For example, if your S3 bucket is named "example-bucket" and it's located in the us-west-2 region, don't use example-bucket.s3.us-west-2.amazonaws.com as your CloudFront origin — instead do use example-bucket.s3-website-us-west-2.amazonaws.com. Using the website endpoint will ensure that CloudFront serves the Antora-generated 404.html page for pages that don't exist, and that it also serves a 301 redirect for pages for which you've configured S3 to redirect (as the scripts covered later in this article will).

Upload Script

Once you've set up your Antora playbook, S3 bucket, and CloudFront distribution, you're ready to deploy your site. If you've set up your antora-playbook.yml as above, you can build your documentation, upload it to S3, and clear the CloudFront caches of the old version of your docs with the following simple script:

#!/bin/sh -e build_dir=build/site cf_distro=E1234567890ABC s3_bucket=example-bucket antora generate antora-playbook.yml aws s3 sync $build_dir s3://$s3_bucket --acl public-read --delete aws cloudfront create-invalidation --distribution-id $cf_distro --paths '/*'

The first line generates your documentation to the build/site directory. The second line replaces the existing content of example-bucket with the content of the build/site directory (granting public read-access to each individual file uploaded). The third line clears the CloudFront caches for all the content of your CloudFront distribution.

If you documentation is part of a larger site (eg hosted as https://www.example.com/docs/ instead of being hosted as its own site (eg https://docs.example.com/), add the sub-directory under which your documentation is hosted (eg /docs) to the last two lines of the above script; for example, like the following:

aws s3 sync $build_dir s3://$s3_bucket/docs --acl public-read --delete aws cloudfront create-invalidation --distribution-id $cf_distro --paths '/docs/*'

Redirects Script

The redirect pages that Antora will generate when you set the Antora urls.redirect_facility setting to static will work fine for your website users as is. But search engines will like it better if you serve real HTTP redirect responses (with the redirect information embedded in HTTP header fields) instead of just HTML pages that indicate that the client browser should redirect to a different location once parsed. You can get S3 + CloudFront to serve 301 Moved Permanently redirects in place of all the redirect pages Antora generates by uploading them separately to S3 with a special x-amz-website-redirect-location header.

To do so, insert the following block into your upload script between the aws s3 sync and aws cloudfront create-invalidation commands:

#!/bin/sh build_dir=build/site cf_distro=E1234567890ABC s3_bucket=docs.example.com antora generate antora-playbook.yml aws s3 sync $build_dir s3://$s3_bucket --acl public-read --delete grep -lR 'http-equiv="refresh"' $build_dir | while read file; do redirect_url=$(awk -F'"' '/rel="canonical"/ { print $4 }' $file) aws s3 cp $file s3://$s3_bucket/${file##$build_dir/} \ --website-redirect $redirect_url --acl public-read done aws cloudfront create-invalidation --distribution-id $cf_distro --paths '/*'

The above script block will search the Antora build dir for all redirect pages (with the grep command), and loop over each (with the while command, reading the local filepath to each into the file variable). It will pull out the canonical URL of the page to redirect to from the redirect page (via the awk command, into the redirect_url variable), and re-upload the file using the --website-redirect flag of the aws s3 cp command to indicate that S3 should serve a 301 redirect to the specified URL instead of the file content itself (when accessed through the S3 website endpoint).

As a concrete example of this redirect capability, say you had a page named how-it-works.adoc in the ROOT module of your example-ui-guide component. If you added metadata to that how-it-works.adoc page to add a redirect to it from the non-existant inner-workings.adoc page (eg via a page-aliases header attribute value of inner-workings.adoc), Antora would generate the following redirect page for you at build/site/example-user-guide/inner-workings/index.html:

<!DOCTYPE html> <meta charset="utf-8"> <link rel="canonical" href="https://docs.example.com/example-user-guide/how-it-works/"> <script>location="../how-it-works/"</script> <meta http-equiv="refresh" content="0; url=../how-it-works/"> <meta name="robots" content="noindex"> <title>Redirect Notice</title> <h1>Redirect Notice</h1> <p>The page you requested has been relocated to <a href="../how-it-works/">https://docs.example.com/example-user-guide/how-it-works/</a>.</p>

The above script would re-upload this file to S3 like so (with all variables expanded, and some additional line-wrapping for legibility):

aws s3 cp build/site/example-user-guide/inner-workings/index.html \ s3://example-bucket/example-user-guide/inner-workings/index.html \ --website-redirect https://docs.example.com/example-user-guide/how-it-works/ --acl public-read

If a user (or search engine) then navigates to https://docs.example.com/example-user-guide/inner-workings/, S3 + CloudFront will send this response back:

HTTP/2 301 location: https://docs.example.com/example-user-guide/how-it-works/