Nomad
Nomad clusters on the cloud
The Get Started guide describes how to deploy a Nomad environment with minimal infrastructure configuration. It also allows you to quickly develop, test, deploy, and iterate on your application.
When you are ready to move from your local machine, these tutorials guide you through deploying a Nomad cluster with access control lists (ACLs) enabled on the three major cloud platforms: AWS, GCP, and Azure. This gives you the flexibility to leverage all of the features available in Nomad such as CSI volumes, service discovery integration, and job constraints.
The code and configuration files for each cloud provider are in their own directory in the example repository. This tutorial will cover the contents of the repository at a high level which is the configuration of the Nomad cluster. The tutorials will then guide you through deploying and provisioning a Nomad cluster on the specific cloud platform of your choice.
Cluster overview
The cluster design follows best practices outlined in the reference architecture including a three server setup for high availability, using Consul for automatic clustering and service discovery, and making sure there is low network latency between the nodes.
Nomad's ACL system is enabled to control data and API access and provides a minimal amount of permission to the default client token, restricting any administrative rights by default. This client token is generated during the cluster setup and provided to the user for their interactions with Nomad instead of the management token.
Finally, the security group setup allows free communication between the nodes of the cluster and limits external ingress to only the necessary UI ports as outlined in the extensibility notes.
Review repository contents
The root level of the repository contains a directory for each cloud and a shared directory that contains configuration files common to all of the clouds.
Explore the shared/config directory
The shared/config directory contains configuration files for starting the Nomad and Consul agents as well as the policy files for configuring ACLs.
Nomad files
nomad-acl-user.hcl is the Nomad ACL policy file that gives the user token the permissions to read and submit jobs.
nomad.hcl and nomad_client.hcl are the Nomad agent startup files for the server and client nodes, respectively. They are used to configure the Nomad agent started by the nomad.service file via systemd. The agent files contain capitalized placeholder strings that are replaced with actual values during the provisioning process.
shared/config/nomad.hcl
data_dir  = "/opt/nomad/data"
bind_addr = "0.0.0.0"
# Enable the server
server {
  enabled          = true
  bootstrap_expect = SERVER_COUNT
}
consul {
  address = "127.0.0.1:8500"
  token = "CONSUL_TOKEN"
}
acl {
  enabled = true
}
## ...
Consul files
consul-acl-nomad-auto-join.hcl is the Consul ACL policy file that gives the Nomad agent token the necessary permissions to automatically join the Consul cluster during startup.
consul-template.hcl and consul-template.service are used to configure and start the Consul Template service.
consul.hcl and consul_client.hcl are the Consul agent startup files for the server and client nodes, respectively. They are used to configure the Consul agent started by the consul_aws.service, consul_gce.service, or consul_azure.service files via systemd, depending on the cloud platform. Like the Nomad agent files, these also contain capitalized placeholder strings that are replaced with actual values during the provisioning process.
/shared/config/consul.hcl
data_dir = "/opt/consul/data"
bind_addr = "0.0.0.0"
client_addr = "0.0.0.0"
advertise_addr = "IP_ADDRESS"
bootstrap_expect = SERVER_COUNT
acl {
    enabled = true
    default_policy = "deny"
    down_policy = "extend-cache"
}
log_level = "INFO"
server = true
ui = true
retry_join = ["RETRY_JOIN"]
## ...
Explore the shared/scripts directory
The shared/scripts directory contains scripts for installing, configuring, and starting Nomad and Consul on the deployed infrastructure.
setup.sh downloads and installs Nomad, Consul, Consul Template, and their dependencies.
server.sh and client.sh replace the capitalized placeholder strings in the server and client agent startup files with actual values, copies the systemd service files to the correct location and starts them, and configures Docker networking.
Explore the shared/data-scripts directory
The data-scripts directory contains user-data-server.sh which bootstraps the Consul ACLs, the Nomad ACLs, and then saves the Nomad bootstrap user token temporarily in the Consul KV store. It also contains user-data-client.sh which runs the shared/scipts/client.sh script from above and restarts Nomad.
Tip
 Terraform adds the nomad_consul_token_secret value to the configuration during the provisioning process so that it's available for the script to replace at runtime.
shared/data-scripts/user-data-client.sh
#!/bin/bash
set -e
exec > >(sudo tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
sudo bash /ops/shared/scripts/client.sh "${cloud_env}" '${retry_join}' "${nomad_binary}"
NOMAD_HCL_PATH="/etc/nomad.d/nomad.hcl"
CLOUD_ENV="${cloud_env}"
sed -i "s/CONSUL_TOKEN/${nomad_consul_token_secret}/g" $NOMAD_HCL_PATH
# ...
Explore the cloud directories
The root level aws, gcp, and azure directories contain several common components that have been configured to work with a specific cloud platform.
variables.hcl.example is the variables file used for both Packer and Terraform via the -var-file flag.
Example Packer command using -var-file
$ packer build -var-file=variables.hcl image.pkr.hcl
image.pkr.hcl is the Packer build file used to create the machine image for the cluster nodes. This also runs the shared/scripts.setup.sh script.
main.tf, outputs.tf, variables.tf, and versions.tf contain the Terraform configurations to provision the cluster. 
By default, the cluster consists of 3 server and 3 client nodes and uses the Consul auto-join functionality to automatically add nodes as they start up and become available. The value for retry_join found in the consul.hcl and consul_client.hcl agent template files comes from Terraform during provisioning and differs somewhat between the three cloud platforms.
shared/config/consul_client.hcl
ui = true
log_level = "INFO"
data_dir = "/opt/consul/data"
bind_addr = "0.0.0.0"
client_addr = "0.0.0.0"
advertise_addr = "IP_ADDRESS"
retry_join = ["RETRY_JOIN"]
In each scenario, Terraform substitutes the retry_join value into either the user-data-server.sh or user-data-client.sh scripts with the templatefile() function in main.tf.
Cloud Auto-join for AWS EC2 does not require any project specific information so the value is set as a default in the variables file. The values for tag_key and tag_value are read by Consul as a key-value pair of "ConsulAutoJoin" = "auto-join".
aws/variables.tf
# ...
variable "retry_join" {
  description = "Used by Consul to automatically form a cluster."
  type        = string
  default     = "provider=aws tag_key=ConsulAutoJoin tag_value=auto-join"
}
# ...
A tag is set in the aws_instance resource for each server and client that matches the key-value pair in the retry_join variable.
aws/main.tf
resource "aws_instance" "server" {
  # ...
  # instance tags
  # ConsulAutoJoin is necessary for nodes to automatically join the cluster
  tags = merge(
    {
      "Name" = "${var.name}-server-${count.index}"
    },
    {
      "ConsulAutoJoin" = "auto-join"
    },
    {
      "NomadType" = "server"
    }
  )
  # ...
}
The value is then read by Terraform during provisioning for both the server and client nodes.
aws/main.tf
resource "aws_instance" "server" {
  # ...
  user_data = templatefile("../shared/data-scripts/user-data-server.sh", {
    server_count              = var.server_count
    region                    = var.region
    cloud_env                 = "aws"
    retry_join                = var.retry_join
    nomad_binary              = var.nomad_binary
    nomad_consul_token_id     = random_uuid.nomad_id.result
    nomad_consul_token_secret = random_uuid.nomad_token.result
  })
  # ...
}
main.tf also adds the startup scripts from shared/data-scripts to the server and client nodes during provisioning and places the actual values specified in variables.hcl to those startup scripts.
post-script.sh gets the temporary Nomad bootstrap user token from the Consul KV store, saves it locally, and then deletes it from the Consul KV store.
Extensibility Notes
The cluster setup in the following tutorials includes the minimum amount of configuration that is required for the cluster to operate.
Once setup is complete, the Consul UI will be accessible on port 8500, the Nomad UI on port 4646, and SSH to each node on port 22. Security groups implementing this configuration are in main.tf for each cloud in the root of their respective folders. They allow access from IP addresses specified by the CIDR range in the allowlist_ip variable of the variables.hcl file in the same directory.
To test out your applications running in the cluster, you will need to create additional security group rules that allow access to ports used by your application. Each scenario's main.tf file contains an example showing how to configure the rules.
The AWS scenario contains a security group named client_ingress where you can place your application rules.
aws/main.tf
resource "aws_security_group" "clients_ingress" {
  name   = "${var.name}-clients-ingress"
  vpc_id = data.aws_vpc.default.id
  # ...
  # Add application ingress rules here
  # These rules are applied only to the client nodes
  # nginx example
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}
The aws_instance resource for the clients contain the clients_ingress security group and attaches your application rules to the client instances with this group.
aws/main.tf
resource "aws_instance" "client" {
  ami                    = var.ami
  instance_type          = var.client_instance_type
  key_name               = var.key_name
  vpc_security_group_ids = [
    aws_security_group.consul_nomad_ui_ingress.id,
    aws_security_group.ssh_ingress.id,
    aws_security_group.clients_ingress.id,
    aws_security_group.allow_all_internal.id
  ]
  count                  = var.client_count
  # ...
}
Next steps
Now that you have reviewed the cluster setup repository and learned how the cluster is configured, continue on to the cluster setup tutorials for each of the major cloud platforms to provision and configure your Nomad cluster.