Pelican + Gitlab CI/CD + Docker + AWS = Awesome Static Site

How I leverage Gitlab CI/CD and Pelican to create a static blog hosted on AWS S3

All the code referenced in this post (and even this post itself) is available on gitlab.

Choosing a static site generator

Setting out to start a blog, there are tons of options. The classic Wordpress, the upstart ghost or the many static site generators like jeckyll, Hugo, Octopress (abandoned), and Pelican. After looking at each option, I settled on pelican because I wanted a static site (one less server to deal with), it's written in Python (one of my preferred languages) and it has an extensive library of themes to use as a base. I decided on the m.css theme because I liked its dark theme and lack of javascript (shout out to anyone reading this via the Tor browser) and it has great support for code. I've only had to make a few small tweaks to m.css to make it my own.

Getting started with Pelican is simple, follow the m.css quickstart.

Pelican has a cool feature which makes tweaking themes or writing content easy - the devserver.

$ cd /dir/of/pelican/blog
$ make devserver
<lots of output>
Pelican and HTTP server processes now running in background.
$

Now Pelican is watching your files for changes, and will re-compile articles when you save a change. Keep an eye on the terminal running the devserver though, if a change causes an error in Pelican it will show up there and your browser will not see anything new.

Creating content is as easy as writing a reStructuredText document in thecontentdirectory. reStructuredText is awesome, and if you've used Markdown (Pelican also supports Markdown if you prefer) before, it has the same general feel. The m.css writing content guide is a great primer on reST. The only issue I have with reST is that markup can't be nested, so italicising a link is not as simple as wrapping it in*. For instance, you would think that last writing content link would be written as

*`writing content <http://mcss.mosra.cz/pelican/writing-content/>`_*

But that is not allowed, so you have to define a directive at the top of the document to allow raw HTML and use it in-line later, as so:

.. At the top of the document before any content
.. role:: raw-html(raw)
     :format: html
.. In-line
:raw-html:`<em><a href="http://mcss.mosra.cz/pelican/writing-content/">writing content</a></em>`

Hosting a static site

Just like static site generators, there are a few static site hosts to choose from: Google, GitHub Pages, GitLab Pages, and Amazon's S3.

I choose S3, mostly because I am already familiar with AWS and am using it extensively for another project (hamiltix.net) which will be detailed in a future post. For the first 12 months on AWS you get 5GB of S3 storage free, as well as 20k get requests and 2k put requests per month. Combine this with Cloudfront (AWS's CDN) and even if reddit tries to hug you to death you should have no issues keeping your site up. In fact, if you want to use SSL/TLS with your S3 static site (hint: you do) you have to use Cloudfront.

Instead of walking through another S3 and Cloudfront setup, just follow the same guide I used.

CI/CD - Putting it all together, automatically

This is where the magic happens. On every push to master, your static site should build, minify, upload, and invalidate the Cloudfront cache. This way you can write a post in a feature branch, and when you merge it into master your blog updates without any additional actions! Gitlab is my git host of choice because it can be self-hosted and is very powerful. Additionally, Gitlab.com offers unlimited free private repos with unlimited collaborators. But my favorite feature of Gitlab is its built-in CI/CD. No longer do you need a seperate service to test/build/deploy your code, it's all built right into your version control. Layer docker on top of this and you get easy, reporducable builds and all it takes is one yaml file in the root of your repo!

Getting started with Gitlab CI/CD can be a little intimidating, and I've found using other projectsgitlab-ci.ymlfiles as templates is the best way to get started. For instance, here is thegitlab-ci.ymlfile for this blog (if you're on mobile, sorry in advance; there is no good way to show code on mobile without wrapping which kills context):

variables:
  # Set git strategy, recursive in case there are submodules
  GIT_STRATEGY: clone
  GIT_SUBMODULE_STRATEGY: recursive
  # Keys and secrets are defined in the project CI settings and exposed as env variables
  AWS_ACCESS_KEY_ID: $AWS_ACCESS_KEY_ID
  AWS_SECRET_ACCESS_KEY: $AWS_SECRET_ACCESS_KEY
  AWS_DEFAULT_REGION: "us-east-1"

# Define two stages, if the site fails to build it will not be deployed
stages:
  - build
  - deploy

build:
  stage: build
  image: apihackers/pelican  # This image contains everything needed to build a static pelican site
  artifacts:  # artifacts are files that will be passed to the next CI stage and can be downloaded from the GitLab web
              # frontend as zips
    paths:
      - output  # This is the directory we want to save and pass to the next stage
    expire_in: 1 week  # Keep it around for a week in case we need to roll back
  script:  # The script block is the series of commands that will be run in the container defined in `image`
    - pelican content -o output -s publishconf.py  # Build the site using the publish config into the output directory
    - ls -lart output
  only:
    - master  # Only run this step on the master branch. No reason to spend resources on incomplete feature branches


deploy-prod:
  stage: deploy
  image: badsectorlabs/aws-compress-and-deploy  # This is a custom image for minifying and working with AWS
  variables:  # You can set per-stage variables like this
    DESC: "Prod build, commit: $CI_COMMIT_SHA"  # There are tons of built in env variables during the CI process
    S3_BUCKET: blog.badsectorlabs.com
    CLOUDFRONT_DISTRIBUTION_ID: $CLOUDFRONT_DISTRIBUTION  # Again, the secrets are stored in GitLab, not in the code!
  script:
    - cd output # Assumes the static site is in 'output' which is automatically created because the last step had
                # 'output' as an artifact
    - echo [+] ls before minification
    - ls -lart .
    - echo "$DESC" > version.html
    - echo [+] minifying HTML
    - find . -iname \*.html | xargs -I {} htmlminify -o {} {}
    - echo [+] minifying CSS
    - find . -iname \*.css | xargs -I {} uglifycss --output {} {}
    - echo [+] minifying JS
    - find . -iname \*.js | xargs -I {} uglifyjs -o {} {}
    - echo [+] ls after minification
    - ls -lart .
    - echo [+] Syncing all files to $S3_BUCKET
    - aws s3 sync . s3://$S3_BUCKET --region us-east-2
    - echo [+] Invalidating Cloudfront cache  # This step is necessary or you wont see the changes until the TTL expires
    - aws cloudfront create-invalidation --distribution-id $CLOUDFRONT_DISTRIBUTION_ID --paths '/*'
  environment:  # environments are just ways to control what is deployed where, for a simple blog straight to prod is ok
    name: master-prod
  only:
    - master
  when: manual  # This causes GitLab to wait until you click the run button before executing this stage