Building lightweight docker images for static site generators

Using docker multi-stage builds to keep our container sizes down.

Glynn Forrest
Thursday, February 28, 2019

Static site generators are a popular way to create simple websites and blogs. They convert markdown files, HTML templates, and static assets into a single folder that can easily be hosted on a basic webserver, hosting provider, or remote storage like Amazon S3.

backbeat.tech is built as a static site, which we’ve been hosting on a basic server so far. To simplify things, we’re going to retire that server and host it in our Nomad container clusters along with our other applications and services.

We use the excellent Hugo for our site, but the techniques we use in this post apply to other tools too.

Project layout before Docker

The directory structure is in a typical Hugo layout:

├── content/          # markdown files of pages
├── img/              # master images before they are optimised
├── public/           # the generated site
├── resources/        # generated assets
├── scss/             # scss files before compilation
├── static/           # static files - where npm places the built assets
├── node_modules/     # installed by npm for asset compilation
├── layouts/          # HTML templates
├── config.toml       # hugo configuration
├── gulpfile.js       # gulp configuration
├── package-lock.json # used by npm
└── package.json      # node modules we required to build the assets

To build the site without docker, we run these commands:

npm install   # install node_modules
npm run build # build assets to static/
hugo          # build the site to public/

Site generation is a two step process - first the node package gulp compiles the scss into optimised css files, then compresses and resizes the master images. These files are then placed into the static/ folder.

After that hugo builds the site using the markdown files in content/ and the HTML templates in layouts/. It writes the generated HTML to public/, and copies the optimised assets from static/ into public/.

Now let’s convert that to a docker build.

A naive docker build

To run in a container orchestrator, our docker image needs respond to HTTP requests itself. We’ll use nginx base container image for this.

Let’s start by installing hugo in an nginx image and building the site.

Create Dockerfile:

# base nginx image
FROM nginx:alpine

# an arbitrary directory to build our site in
WORKDIR /build

# copy the project into the container
COPY . .

# download hugo and make it available in PATH
ENV HUGO_VERSION 0.41
ENV HUGO_BINARY hugo_${HUGO_VERSION}_Linux-64bit.tar.gz
RUN set -x && \
  apk add --update wget ca-certificates && \
  wget https://github.com/spf13/hugo/releases/download/v${HUGO_VERSION}/${HUGO_BINARY} && \
  tar xzf ${HUGO_BINARY} && \
  mv hugo /usr/bin

# build the project and copy the result to the nginx folder
RUN /usr/bin/hugo && ls -l
RUN cp -fR /build/public/* /usr/share/nginx/html

Then build and run the container:

docker build -t static-site
docker run -ti -p 8000:80 static-site

Visit http://localhost:8000 to see the generated site.

We’ve built a basic container, but without any assets! The nginx:alpine image doesn’t include nodejs. We could install it manually like we did with hugo, or perhaps look for another image that contains both nginx and nodejs together.

There’s another way which brings far more benefits: multi-stage builds.

Multi-stage builds

Docker introduced multi-stage builds in version 17.05, allowing the use of multiple base images in a single Dockerfile.

Let’s use the node:alpine base image to build our site with assets, then copy the generated files into another image that builds from nginx:alpine as before.

# start with the nodejs image, calling it 'build'
FROM node:alpine as build

WORKDIR /build

COPY . .

# download hugo as before
ENV HUGO_VERSION 0.41
ENV HUGO_BINARY hugo_${HUGO_VERSION}_Linux-64bit.tar.gz
RUN set -x && \
  apk add --update wget ca-certificates && \
  wget https://github.com/spf13/hugo/releases/download/v${HUGO_VERSION}/${HUGO_BINARY} && \
  tar xzf ${HUGO_BINARY} && \
  mv hugo /usr/bin

# install node modules and build assets
RUN npm ci && npm run build
# build the site
RUN /usr/bin/hugo

# change base image
FROM nginx:alpine

# copy public/ from the 'build' container into the nginx container
COPY --from=build /build/public /usr/share/nginx/html

Great! The built image now includes assets.

It’s also much smaller than before, because the added files used in the build image have been discarded. All that’s left is the nginx base image and the contents of the generated public/ folder, nothing else.

npm ci vs npm install

You may notice we’re running npm ci instead of npm install. Npm install may modify package.json during install, but it never will during ci. This is what we want for a reproducible build.

However, npm gives us a warning with npm ci, as it doesn’t expect to see an existing node_modules folder:

npm WARN prepare removing existing node_modules/ before installation

Let’s fix that, and improve the performance of our build.

Do not ignore .dockerignore

Docker can read from a dedicated .dockerignore file, telling it about files and directories that should be excluded from COPY instructions.

We can fix the npm ci warning by ignoring node_modules/ and some other directories we might have during local development.

.git/
node_modules/
static/css/
static/img/
public/

It will also speed up the build slightly, as docker will skip copying the (rather large) node_modules/ directory into the container for each build.

Cut build time by preserving the layers cache

Every action in a Dockerfile (COPY, RUN, ENV) adds a new layer. Docker will cache these layers for future builds if the contents haven’t changed. However, once the cache is invalidated, all subsequent layers will be built again.

In our Dockerfile we copy the entire project directory into the container with COPY . . early on. These files change frequently, invalidating the layer cache for the rest of the build.

We should change the order of the build so the layers that rarely change (e.g. downloading wget and hugo) are cached.

# download hugo first, the layer will be cached
ENV HUGO_VERSION 0.41
ENV HUGO_BINARY hugo_${HUGO_VERSION}_Linux-64bit.tar.gz
RUN set -x && \
  apk add --update wget ca-certificates && \
  wget https://github.com/spf13/hugo/releases/download/v${HUGO_VERSION}/${HUGO_BINARY} && \
  tar xzf ${HUGO_BINARY} && \
  mv hugo /usr/bin

# then copy and build the site
COPY . .
RUN /usr/bin/hugo

Another good technique is only copying the package manager files, then the rest of the project later. If the required node modules haven’t changed, the layer can be cached. No more waiting for npm!

# copy package.json first
COPY package.json package-lock.json /build/
# install node_modules, will be cached unless package.json has changed
RUN npm ci

# the copy and build the site
COPY . .
RUN npm run build && /usr/bin/hugo

Abby Fuller gave a useful talk at Dockercon 17 with more advice for preserving space and cutting build time:

The complete Dockerfile

Putting it all together, the complete Dockerfile:

FROM node:alpine as build

WORKDIR /build
ENV HUGO_VERSION 0.41
ENV HUGO_BINARY hugo_${HUGO_VERSION}_Linux-64bit.tar.gz
RUN set -x && \
  apk add --update wget ca-certificates imagemagick && \
  wget https://github.com/spf13/hugo/releases/download/v${HUGO_VERSION}/${HUGO_BINARY} && \
  tar xzf ${HUGO_BINARY} && \
  mv hugo /usr/bin
COPY package.json package-lock.json /build/
RUN npm ci
COPY . .
RUN npm run build && /usr/bin/hugo

FROM nginx:alpine

COPY --from=build /build/public /usr/share/nginx/html

docker images shows a pretty decent 17MB file size:

REPOSITORY        TAG                 IMAGE ID            CREATED             SIZE
static-site       latest              e537765831b9        22 seconds ago      17.6MB

Perfect! We’ve built a lightweight image that’s perfect for running on our cluster.

More from the blog

Using SaltStack for internal SSL certificates cover image

Using SaltStack for internal SSL certificates

Glynn Forrest
Tuesday, April 30, 2019

Git: beyond the basics cover image

Git: beyond the basics

Glynn Forrest
Thursday, January 31, 2019

Subscribe to our mailing list

Receive periodic updates about our products, services, and articles.

View recent emails