Building lightweight docker images for static site generators
Static site generators are a popular way to create simple websites and blogs. They convert markdown files, HTML templates, and static assets into a single folder that can easily be hosted on a basic webserver, hosting provider, or remote storage like Amazon S3.
backbeat.tech is built as a static site, which we’ve been hosting on a basic server so far. To simplify things, we’re going to retire that server and host it in our Nomad container clusters along with our other applications and services.
We use the excellent Hugo for our site, but the techniques we use in this post apply to other tools too.
Project layout before Docker
The directory structure is in a typical Hugo layout:
├── content/ # markdown files of pages
├── img/ # master images before they are optimised
├── public/ # the generated site
├── resources/ # generated assets
├── scss/ # scss files before compilation
├── static/ # static files - where npm places the built assets
├── node_modules/ # installed by npm for asset compilation
├── layouts/ # HTML templates
├── config.toml # hugo configuration
├── gulpfile.js # gulp configuration
├── package-lock.json # used by npm
└── package.json # node modules we required to build the assets
To build the site without docker, we run these commands:
npm install # install node_modules
npm run build # build assets to static/
hugo # build the site to public/
Site generation is a two step process - first the node package gulp compiles the scss into optimised css files, then compresses and resizes the master images.
These files are then placed into the static/
folder.
After that hugo
builds the site using the markdown files in content/
and the HTML templates in layouts/
.
It writes the generated HTML to public/
, and copies the optimised assets from static/
into public/
.
Now let’s convert that to a docker build.
A naive docker build
To run in a container orchestrator, our docker image needs respond to HTTP requests itself. We’ll use nginx base container image for this.
Let’s start by installing hugo in an nginx image and building the site.
Create Dockerfile
:
# base nginx image
FROM nginx:alpine
# an arbitrary directory to build our site in
WORKDIR /build
# copy the project into the container
COPY . .
# download hugo and make it available in PATH
ENV HUGO_VERSION 0.41
ENV HUGO_BINARY hugo_${HUGO_VERSION}_Linux-64bit.tar.gz
RUN set -x && \
apk add --update wget ca-certificates && \
wget https://github.com/spf13/hugo/releases/download/v${HUGO_VERSION}/${HUGO_BINARY} && \
tar xzf ${HUGO_BINARY} && \
mv hugo /usr/bin
# build the project and copy the result to the nginx folder
RUN /usr/bin/hugo && ls -l
RUN cp -fR /build/public/* /usr/share/nginx/html
Then build and run the container:
docker build -t static-site
docker run -ti -p 8000:80 static-site
Visit http://localhost:8000 to see the generated site.
We’ve built a basic container, but without any assets!
The nginx:alpine
image doesn’t include nodejs.
We could install it manually like we did with hugo, or perhaps look for another image that contains both nginx and nodejs together.
There’s another way which brings far more benefits: multi-stage builds.
Multi-stage builds
Docker introduced multi-stage builds in version 17.05, allowing the use of multiple base images in a single Dockerfile.
Let’s use the node:alpine
base image to build our site with assets, then copy the generated files into another image that builds from nginx:alpine
as before.
# start with the nodejs image, calling it 'build'
FROM node:alpine as build
WORKDIR /build
COPY . .
# download hugo as before
ENV HUGO_VERSION 0.41
ENV HUGO_BINARY hugo_${HUGO_VERSION}_Linux-64bit.tar.gz
RUN set -x && \
apk add --update wget ca-certificates && \
wget https://github.com/spf13/hugo/releases/download/v${HUGO_VERSION}/${HUGO_BINARY} && \
tar xzf ${HUGO_BINARY} && \
mv hugo /usr/bin
# install node modules and build assets
RUN npm ci && npm run build
# build the site
RUN /usr/bin/hugo
# change base image
FROM nginx:alpine
# copy public/ from the 'build' container into the nginx container
COPY --from=build /build/public /usr/share/nginx/html
Great! The built image now includes assets.
It’s also much smaller than before, because the added files used in the build
image have been discarded.
All that’s left is the nginx base image and the contents of the generated public/
folder, nothing else.
npm ci vs npm install
You may notice we’re running npm ci
instead of npm install
.
Npm install may modify package.json
during install
, but it never will during ci
.
This is what we want for a reproducible build.
However, npm gives us a warning with npm ci
, as it doesn’t expect to see an existing node_modules
folder:
npm WARN prepare removing existing node_modules/ before installation
Let’s fix that, and improve the performance of our build.
Do not ignore .dockerignore
Docker can read from a dedicated .dockerignore
file, telling it about files and directories that should be excluded from COPY
instructions.
We can fix the npm ci
warning by ignoring node_modules/
and some other directories we might have during local development.
.git/
node_modules/
static/css/
static/img/
public/
It will also speed up the build slightly, as docker will skip copying the (rather large) node_modules/
directory into the container for each build.
Cut build time by preserving the layers cache
Every action in a Dockerfile (COPY
, RUN
, ENV
) adds a new layer.
Docker will cache these layers for future builds if the contents haven’t changed.
However, once the cache is invalidated, all subsequent layers will be built again.
In our Dockerfile we copy the entire project directory into the container with COPY . .
early on.
These files change frequently, invalidating the layer cache for the rest of the build.
We should change the order of the build so the layers that rarely change (e.g. downloading wget and hugo) are cached.
# download hugo first, the layer will be cached
ENV HUGO_VERSION 0.41
ENV HUGO_BINARY hugo_${HUGO_VERSION}_Linux-64bit.tar.gz
RUN set -x && \
apk add --update wget ca-certificates && \
wget https://github.com/spf13/hugo/releases/download/v${HUGO_VERSION}/${HUGO_BINARY} && \
tar xzf ${HUGO_BINARY} && \
mv hugo /usr/bin
# then copy and build the site
COPY . .
RUN /usr/bin/hugo
Another good technique is only copying the package manager files, then the rest of the project later. If the required node modules haven’t changed, the layer can be cached. No more waiting for npm!
# copy package.json first
COPY package.json package-lock.json /build/
# install node_modules, will be cached unless package.json has changed
RUN npm ci
# the copy and build the site
COPY . .
RUN npm run build && /usr/bin/hugo
Abby Fuller gave a useful talk at Dockercon 17 with more advice for preserving space and cutting build time:
The complete Dockerfile
Putting it all together, the complete Dockerfile:
FROM node:alpine as build
WORKDIR /build
ENV HUGO_VERSION 0.41
ENV HUGO_BINARY hugo_${HUGO_VERSION}_Linux-64bit.tar.gz
RUN set -x && \
apk add --update wget ca-certificates imagemagick && \
wget https://github.com/spf13/hugo/releases/download/v${HUGO_VERSION}/${HUGO_BINARY} && \
tar xzf ${HUGO_BINARY} && \
mv hugo /usr/bin
COPY package.json package-lock.json /build/
RUN npm ci
COPY . .
RUN npm run build && /usr/bin/hugo
FROM nginx:alpine
COPY --from=build /build/public /usr/share/nginx/html
docker images
shows a pretty decent 17MB file size:
REPOSITORY TAG IMAGE ID CREATED SIZE
static-site latest e537765831b9 22 seconds ago 17.6MB
Perfect! We’ve built a lightweight image that’s perfect for running on our cluster.