Bootstrapping infrastructure with Salt Cloud and Terraform
Apps and services hosted by Backbeat run on collections of machines that follow a simple rule: machines configured with SaltStack and external resources configured with Terraform.
This method allows us to stay cloud-agnostic and create new and temporary environments extremely quickly. In this post I’ll describe the process of spinning up a brand new environment.
Virtual machines
Each environment has stateless machines running services with a Nomad cluster such as APIs, web applications, and monitoring agents.
These machines are disposable and can be destroyed at any time, with
hostnames that reflect this such as node-a1b2c3.aws-eu-west1.backbeat.tech
.
We also have stateful machines that run crucial processes and store important data.
These machines run databases, load balancers, the Salt Master, and server nodes for Nomad, Consul, and Vault.
Their hostnames reflect their roles, such as lb1.aws-eu-west1.backbeat.tech
for a load balancer or postgres1.aws-eu-west1.backbeat.tech
for a database.
These would be considered the ‘pets’ in the ‘pets vs cattle’ metaphor that’s sometimes used in dev-ops circles.
Both stateless and stateful machines are configured with SaltStack.
The stateless machines are created by a Salt Cloud process when required, given a hostname and minion ID with a random hash (a1b2c3
), and added to the Nomad cluster.
The stateful machines are also created by Salt Cloud, but explicitly listed in a cloud ‘map’ file.
Salt is configured to constantly check that the machines listed in this file are present and running.
External resources
Beyond virtual machines, each environment has external resources that need to be created such as domain name records, elastic / floating IP addresses, security groups, and managed services like Amazon RDS.
We use Terraform to create these resources and connect them to the virtual machines (e.g. assigning an IP address to a load balancer).
Overview
These are the steps to create a new environment:
- Bootstrap the Salt Master
- Build the rest of the machines with Salt Cloud
- Bootstrap remote Terraform state
- Run Terraform locally
- (optional) Use a newly provisioned Jenkins instance to run Terraform from now on.
Bootstrapping the Salt Master
We need to create the Salt Master before Salt Cloud can create machines.
For each cloud provider, we have a Python script that will use the provider’s API and Fabric to create a new machine, connect to it with SSH, and install SaltStack.
This is the only machine not created by SaltStack; the bootstrap script is run on a local machine with the required access credentials.
For example, with DigitalOcean:
./bin/bootstrap-digitalocean.py
DigitalOcean region (e.g. nyc3): lon1
Subdomain for this environment (e.g. do-nyc3.backbeat.tech): do-lon1.backbeat.tech
Create a new API key in the digitalocean control panel called "salt.do-lon1.backbeat.tech".
Enter the new API key:
Adding ~/.ssh/id_rsa.pub to digitalocean account as "bootstrap_tmp_1554029111"
Creating salt.do-lon1.backbeat.tech
New droplet created with IP address 206.189.27.193
Removing SSH key "bootstrap_tmp_1554029111" from digitalocean account
Waiting 10 seconds for SSH...
Waiting 10 seconds for SSH...
Waiting 10 seconds for SSH...
SSH connected
Uploading API key to new master for use in salt cloud
Generating a new SSH key on salt.do-lon1.backbeat.tech
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/cloud_rsa.
Your public key has been saved in /root/.ssh/cloud_rsa.pub.
The key fingerprint is:
SHA256:Q40GHXckjQ+4+mytNYyw7PoKzBIF/aFsFXYqhYKEHgI root@salt
The key's randomart image is:
+---[RSA 2048]----+
|Eo .+.o..oo+o |
|=ooooo .o+oo. |
|oo++.. +..o |
| o+.. o. . |
|.. ..S |
| + ..o + |
|. + oo..+ |
| . . . +... |
| o+o... |
+----[SHA256]-----+
Adding new SSH key to digitalocean account as "salt.do-lon1.backbeat.tech"
Installing salt on salt.do-lon1.backbeat.tech
...
...
Copying salt states and pillar data to the new master
Now access the new master at ssh root@206.189.27.193 and run a salt highstate.
This script does a few things:
- Requests the environment name (e.g.
do-lon1.backbeat.tech
) - Requests a new API key for Salt Cloud to use (and for the script to create the Salt Master)
- Temporarily adds a local SSH public key to the Digitalocean account
- Creates the Salt Master machine pre-seeded with the added SSH key
- Removes the SSH key from Digitalocean
- Connects to the Salt Master with the uploaded public key
- Generates a new SSH key for Salt Cloud to use and adds it to the Digitalocean account
- Installs SaltStack
- Copies salt states and pillar data to the master
All that’s left is to run a highstate on the master, which will properly configure the environment to manage itself.
ssh root@206.189.27.193 salt-call state.apply
This will install and configure Salt Cloud, set up timers to enforce environment state, and lockdown SSH (access to the environment is now exclusively through a bastion host).
The environment is now operational. The master will create and provision the stateful machines according to the Salt Cloud map file and create a number of stateless machines for Nomad jobs to run on.
Bootstrapping Terraform
The new machines are up and running, but inaccessible! We now use Terraform to create external resources and route them to the machines.
We use an Amazon S3 bucket to store Terraform state remotely. This bucket lives under a separate locked down AWS account made exclusively for that purpose. Terraform has no access to this account except to store state in the S3 bucket.
terraform {
backend "s3" {
bucket = "terraform"
region = "us-east-1"
}
}
Each environment has a separate Terraform config with discrete state. Sample directory layout:
terraform
├── env
│ ├── aws-eu-west1
│ │ ├── main.tf
│ │ └── ...
│ ├── bootstrap
│ │ └── main.tf
│ ├── do-lon1
│ │ ├── main.tf
│ │ └── ...
│ └── do-nyc3
│ ├── main.tf
│ └── ...
└── modules
│ ├── module1
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── ...
│ ├── module2
Each folder in env/
matches an environment which Terraform is run inside.
For the newly made do-lon1
environment, run:
cd terraform/env/do-lon1
terraform plan -out tf.out
# check all is okay
terraform apply tf.out
This will create all external resources required by the environment.
If the Terraform remote state doesn’t exist, use the bootstrap/
folder to create it.
This single-use Terraform code could look like:
resource "aws_s3_bucket" "tf-state" {
bucket = "terraform"
acl = "private"
versioning {
enabled = true
}
}
Connecting Terraform resources with SaltStack resources
We use data
in Terraform code to reference machines managed with SaltStack.
For example, to create a DigitalOcean floating IP address and assign it to a load balancer:
data "digitalocean_droplet" "lb2" {
name = "lb2.do-lon1.backbeat.tech"
}
resource "digitalocean_floating_ip" "lb2" {
droplet_id = "${data.digitalocean_droplet.lb2.id}"
region = "${data.digitalocean_droplet.lb2.region}"
}
Running Terraform in the environment
Now that external resources (domain names) are configured, webhooks from code hosting services will start to arrive and trigger jobs on a Jenkins instance. One such job is to validate, plan, and apply terraform code - whenever an update is made, the job will apply any changes to infrastructure after confirmation from an authorised operator.
Finish
We’re done! The new environment is fully provisioned. SaltStack will ensure the machines are configured correctly, while Terraform and Jenkins keep the external state correctly managed.
This post is an overview of Backbeat’s cloud strategy. In future posts I’ll explain parts of it further, such as running Terraform in Jenkins and recycling stateful ‘pet’ machines without losing data.