NOTE: I don’t recommend the shell program script in this approach anymore. I recommend using Ansible to automate CloudFormation stack and other deployment steps.


The SlashDeploy blog was previously deployed from my local machine. I
figured it is time to change that. This post walks through the
continuous delivery process step by step. You may be thinking: the
blog is just some static content right? This is true and it is a
perfect test bed for applying continuous delivery to a small (but
important) piece of software.

Achieving continuous delivery is no small task. It requires careful
engineering choices and sticking to a few key principals. First and
foremost: continuous integration. Every change must run through an
automated test suite. The test suite should verify that a particular change
is production ready. It is impossible to have continuous delivery
without continuous integration. Production bugs must be patched with
regression tests to ensure they are not repeated. Second:
infrastructure as code. Software requires infrastructure to run
(especially web applications/sites/services). The infrastructure may
be physical or virtual. Regardless of what the infrastructure is, it
must also be managed with code. Consider a change to that requires a
change in web server. Their must be code to do so. That code (by the
first principal) must also have continuous integration. Finally there
must verification criteria for each change. This signals a deployment completed successfully or not. Automation is the
common ground. These principles must be applied and baked into the
system from the ground up.

10,000 Feet

This blog is statically generated site. Any web sever can serve the
generated artifacts. This requires from infrastructure. A CDN should
be used as well to ensure a snappy reading experience. Finally the
web server needs a human readable domain name. OK, so how do we make
that happen? Use jekyll to generate the site. Use
CloudFormation to create an S3 website behind a CloudFront CDN
and an appropriate Route53 DNS entry. GitLab CI runs
everything. Right, those are the tools but what does the whole
pipeline look like?

  1. Generate the release artifact
  2. Run tests against the release artifact
  3. Run tests against the CloudFormation code
  4. On master?
  5. Deploy the CloudFormation stack
  6. Test CloudFormation deploy went as expected
  7. Copy release artifact to S3 bucket
  8. Test release artifact available through public DNS

This whole process can be coordinated with some bash programs and some
make targets. Time to dive deeper into each phase.

The Build Phase

make builds all the content and dependencies (e.g. jekyll plus
all other ruby gems) into a docker image¹. The image can
be used to start a dev docker container or to generate the release
artifacts. Next make dist target generates the release artifact. The
docker cp is used instead of -v "${PWD}:/data method².
The release artifacts ares kept in dist/ for testing.

The release artifact (a directory of files in this case) is run
through the following tests:

  1. The root index.html exists
  2. The defined error.html exists
  3. The sentinel file exists
  4. robots.txt blocks access to error.html
  5. robots.txt blocks access to the sentinel file
  6. Each HTML file has a tracking snippet

You may be wondering about the sentinel file. The sentinel file
uniquely identifies each release artifact. The file name includes the
git commit that built it. It lives in _sentinels/GIT_COMMIT.txt. Its
sole purpose to indicate a release artifact is available via the CDN.
The sentinel file name should should be unique to bust caches. If it
were not (simple sentinel.txt with unique content) it would subject
to any cache rules the CDN may apply (such as how long content can
lives in edge nodes before rechecking origin). This would wreak havoc
on deploy verification.

Each test focuses around production behavior. The first two assert the
release artifact should function properly behind the CloudFront CDN.
The sentinel tests assert this build stage meets the next stage’s
requirements. The robots.txt test assert proper things are not
included in search engines. Finally tracking (page views, browser,
etc) is important so it must be included.

Infrastructure

I have touched on the infrastructure a bit. The infrastructure is an
S3 bucket behind a CloudFront CDN with a RouteR53 DNS entry.
CloudFormation manages the whole bit. The bin/blog
coordinates the AWS calls. The deploy command is the heart. This
either creates a non-existent stack or updates an existing one. There
are also utility commands to get stack status, outputs, and
importantly for testing. The validate command validates the
CloudFormation template through an API call. This eliminate errors
such as invalid resource types, missing keys, syntax errors, and
other things a complier might point out. Unfortunately this does not
assert a template will work. Deploying it is the only way to know
for sure. This is a key limitation with CloudFormation³. However it is
enough for this project. Finally the publish command copies files
into the appropriate S3 bucket.

The Bash code itself passes through shellcheck to eliminate stupid
mistakes and to enforce coding style. This is desperately needed to
write sane Bash programs.

Deploying

Deploying has two discrete steps each with verification criteria. It
shakes out like so:

  1. make dist to generate the release artifact
  2. bin/blog deploy to deploy infrastructure changes
  3. Poll the bin/blog status until the state is green
  4. bin/blog publish to copy the release artifacts into S3
  5. Poll the public DNS until the sentinel file is available.

There is single script (script/ci/deploy) to get the job done. The
coolest bit is a simple Bash function that will execute a function N
times at T time interval. This is a simple timeout style function.
It is used to handle the asynchronicity of each step. The deploy
script can vary the interval depending on how long a change should
take. This is more important for CloudFormation changes since some
components update much more slowly than others. Route53 compared to
CloudFront is one example.

The Complete Pipeline

  1. Setup
  2. make check
  3. make clean
  4. Build
  5. make dist
  6. Test
  7. make test-shellcheck - Lint all executable shell programs in
     the code base (bin/blog + pipeline scripts)
  8. make test-dist - Run release artifact tests mentioned earlier
  9. make test-blog - Validate CloudFormation template
  10. Deploy
  11. Poll for UPDATE_COMPLETE or CREATE_COMPLETE stack status.
     This ensures the stack is ready to recieve a potential update.
  12. bin/blog deploy - Deploy infrastructure changes
  13. Poll for UPDATE_COMPLETE or CREATE_COMPLETE stack status
  14. bin/blog publish - Upload release artifact to S3
  15. Poll with curl for the sentinel file on bin/blog url

Closing Thoughts

The entire pipeline turned out well. This was a great exercise in
setting up continuous delivery for a simple system. The practices
applied here can be applied to to large systems. Here some other
take-aways:

  • GitLab CI is awesome. I have been using buildkite at work for
     sometime. Gitlab CI attracted to me with it’s agent based approach.
     This enables me to keep my runners under configuration management
     and deployed with proper AWS InstanceProfiles. GitLab with
     integrated CI support is immediately better than GitHub. All in all
     I’m very happy with GitLab and it’s CI offering. I recommend you
     check it out as well.
  • CloudFormation testing. It would be nice if a set of changes could
     be applied in a “dry run” mode. This would increase confidence on
     each change.
  • Splitting the deploy script. I am uncertain if I would split the
     deploy script into two parts: one bin/blog deploy and
     verification, the other for bin/blog publish and verification. I
     did not do this because I did not want to move the shared
     poll/timeout function into a separate script. The script in its
     current form is about as long as I want it.
  • Regenerating the release artifact in the deploy phase. Generally
     this is bad pratice. The test phase runs against a particular
     artifact, that artifact should be deployed. This project is simple
     enough that this is not a problem. The build phase should upload the
     artifact somewhere (GitLab does seem to support artifacts) then the
     next steps should pull it down for whatever needs to happen. I also
     skipped this because I like to keep the scripts executable on the
     machine itslef. This way if the CI system is down or for other
     reasons the process can still complete.
  • make check. This is a life saver. make check is a smoke
     test of the system running the pipeline. It does not need to be
     exacting but simply testing for availablity of depenencies (e.g.
     docker, aws, or jq). This is especially helpful when build
     steps execute on various hosts and/or the project relies on things
     outside of GNU/BSD core utils.
  • Sharing Bash functions. I know the poll function will be reused
     across many projects going forward. It would be nice to solve this
     without copying and pasting between projects. I considered if such
     functions could be distributed between environments with
     configuration management but that is much too heavy weight for this
     problem. Larger teams may definitely encounter this problem if there
     is a lot of Bash.
  • aws s3 sync with --delete. bin/blog publish uses the sync
     command under the covers. --delete was not added until doing this
     work. This option ensures files not on in source are deleted from
     destination side, a.k.a. “delete removed files.”

Finally enjoy the relevant source files. The files are linked to the
versions introduced in the continuous delivery merge request. They
may have changed since this post was published.


  1. Curious why? Docker works extremely well for development
    and build piplines. It encapsulates project tool chains excellently.
    This approach especially excels with datastores. I have found these
    vary more widely than tool chains. I try to run as many dependencies
    as possible as docker containers given there is not sufficient
    complexity. I do not dockerize jq or aws since they are
    installable on the host easy enough and are not project specific.
  2. It is common to see -v "${PWD}:/data when encapsulating
    tools as docker containers. This is the easiest and first thought of
    solution to get data in/out of the container. This creates a problem
    though since docker containers run as root. This approach may litter
    the filesystem with root owner artifacts depending on your docker
    setup (e.g. the docker daemon runs directly on your host or the
    docker daemon is running in a VM). This is solved by running the
    container as the current user (-u $(id -u)). However file system
    mounts do not work on remote docker hosts. docker-machine on OSX
    solves this by mounting $HOME as a shared directory in the VM so
    file system mounts (inside $HOME) work transparently. docker cp
    is a sure fire way to get data out of the container regardless of
    how docker runs. It is more verbose but always works. See make
    dist in the Makefile for an example.
  3. The validate-template API call is only semantic verification.
    CloudFormation cannot validate things that may only come up when
    actually doing the change. These are things such as such as account
    limits, incorrectly configured permissions, creating things in wrong
    regions, potential outages in AWS, or unexpected capacity changes.
    The only way to know for sure is to deploy the stack and see how it
    shakes out. Naturally you can have a stack for verification
    purposes. I opt-ed out because the template is simple and should not
    change much past the initial revision.