Cloud deployment (AWS, Azure, GCP virtual machines)
This section provides a detailed guide for manually installing a Nomad cluster on AWS, Azure, and GCP instances. This approach is suitable for organizations that require fine-grained control over their Nomad deployment or have specific compliance requirements that necessitate manual installation. Before proceeding, ensure you have read and have a solid understanding of Nomad architecture(opens in new tab) and Deployment requirements(opens in new tab).
Architectural summary
As referenced in the Nomad architecture section(opens in new tab) each Nomad node in a cluster resides on a separate VM, with shared or dedicated tenancy both being acceptable. A single Nomad cluster deploys within a single region, with nodes distributed across all available Availability Zones (AZs). This design uses Nomad’s redundancy zone feature and deploys a minimum of 3 voting server nodes and 3 non-voting server nodes across 3 AZs.
Server naming and IP addressing
For illustrative purposes, we use the following naming convention and IP addressing scheme for our Nomad servers:
DNS NAME | IP ADDRESS | NODE TYPE | LOCATION |
---|---|---|---|
nomad-1.domain | 10.0.0.1 | Voting server | us-east-1a |
nomad-2.domain | 10.0.0.2 | Voting server | us-east-1b |
nomad-3.domain | 10.0.0.3 | Voting server | us-east-1c |
nomad-4.domain | 10.0.0.4 | Non-voting server | us-east-1a |
nomad-5.domain | 10.0.0.5 | Non-voting server | us-east-1b |
nomad-6.domain | 10.0.0.6 | Non-voting server | us-east-1c |
nomad.domain | 10.0.0.100 | NLB | (all zones) |
Note: These names and IP addresses are examples. Adjust them according to your specific environment and naming conventions.
Certificates
Create a standard X.509 certificate for installation on the Nomad servers. Follow your organization’s process for creating a new certificate that matches the DNS record you intend to use for accessing Nomad. This guide assumes the conventions and uses self-signed certificates from the Nomad architecture(opens in new tab) and enable TLS tutorial(opens in new tab).
You need three files:
- CA Public certificate (
nomad-agent-ca.pem
) - Server node certificate’s private key: (
global-server-nomad-key.pem
) - Server node public certificate: (
global-server-nomad.pem
)
These certificates must be distributed to the server nodes in a later section and used to secure communications between Nomad clients, servers, and the API.
A fourth certificate and key is generated: global-cli-nomad.pem
and global-cli-nomad-key.pem
. These are used later in this guide to interact with the cluster. The global-
prefix for the certificate and key assume these are for multi-region environments. You are able to generate certificates on a per-region basis, however In this guide, we are using a single certificate and key that is assumed to be used across all clusters globally.
Note
Ensure the certificate’s Common Name (CN) or Subject Alternative Names (SANs) include the DNS names you’ll use to access Nomad, such as `nomad.yourdomain.com`. If you’re using separate DNS names for server-to-server communication, include these in the SANs as well.While Nomad's TLS configuration is production ready, key management and rotation is a complex subject not covered by this guide. Vault(opens in new tab) is the suggested solution for key generation and management.
Firewall rules
Create firewall rules that allow these ports bidirectionally:
Port | Name | Purpose | Protocol |
---|---|---|---|
4646 | HTTP API | Used by clients and servers to serve the HTTP API | TCP only |
4647 | RPC | Used for internal RPC communication between client agents and servers, and for inter-server traffic | TCP only |
4648 | Serf WAN | Used by servers to gossip both over the LAN and WAN to other servers. Not required for Nomad clients to reach this address | TCP and UDP |
Load balancing the API and UI
- Ensure your load balancer is in front of all Nomad server nodes, across all availability zones, and all subnets where the Nomad server nodes are deployed.
- Use TCP ports 4646 (HTTP API) and 4647 (RPC). The target groups associated with each of these listeners must contain all Nomad server nodes.
- Use the
/v1/agent/health
endpoint for health checking.
Download and install the Nomad CLI
To interact with your Nomad cluster, you’ll need to install the Nomad binary. Follow the steps for your OS on the Install Nomad(opens in new tab) documentation page.
Obtain the Nomad Enterprise license file
Obtain the Nomad Enterprise license file from your HashiCorp account team. This file contains a license key unique to your environment. The file is named license.hclic
or similar.
Keep this file to hand as it is needed later in the installation process.
Platform-specific guidance
Select the tab below for further guidance and corresponding choices for your cloud service provider.
Deployment considerations for AWS
The official Nomad Enterprise HVD Module for deployment on AWS(opens in new tab) is available in the Terraform Registry.
Nomad server compute recommendations
We recommend using m5.xlarge
or m5.2xlarge
instance types to provide the necessary vCPU and RAM resources for Nomad. Instance types from other instance families can also be used, provided that they meet or exceed the above resource recommendations.
Nomad client compute recommendations
When selecting instance types for Nomad clients, consider the resource requirements of your workloads. AWS offers a variety of instance families optimized for different use cases:
Instance family | Use case | Workload examples |
---|---|---|
General purpose (M family) | Balanced compute, memory, and networking resources | Web servers, small-to-medium databases, dev/test environments |
Compute-optimized (C family) | High-performance computing, batch processing, scientific modeling | CPU-intensive applications, video encoding, high-performance web servers |
Memory-optimized (R family) | High-performance databases, distributed memory caches, real-time big data analytics | SAP HANA, Apache Spark, Presto |
Storage-optimized (I, D families) | High I/O applications, large databases, data warehousing | NoSQL databases, data processing applications |
GPU instances (P, G families) | AI, high-performance computing, rendering | AI training and inference, scientific simulations |
NUMA-aware instances (X1, X1e families) | High-performance databases, in-memory databases, big data processing engines | SAP HANA, Apache Spark, Presto |
Auto scaling groups
While you can use AWS Auto Scaling Groups to manually manage the number of Nomad clients, it is often more efficient to use the Nomad Autoscaler. The Nomad Autoscaler integrates directly with Nomad and provides more granular control over scaling decisions.
The Nomad Autoscaler can dynamically adjust the number of client instances based on various metrics, including job queue depth, CPU utilization, memory usage, and more. It can even scale different node pools independently, allowing you to maintain the right mix of instance types for your workloads.
As referenced in the Nomad architecture(opens in new tab), each Nomad node in a cluster resides on a separate EC2 instance, with shared or dedicated tenancy both being acceptable.
EC2 volume storage
Nomad can be very IOPS intensive as it writes chunks of data from memory to storage. Initially, EBS volumes with Provisioned IOPS SSD (io1 or io2) are recommended as they meet the disk performance needs for most use cases. These provide a balance of performance and manageability suitable for typical Nomad deployments.
As your cluster scales or experiences large batch job scheduling events, you may encounter storage performance limitations. In such scenarios, consider starting or transitioning to instances with local NVMe storage. While this can offer significant performance benefits, be aware that it requires additional OS-level configuration during instance setup. This trade-off between enhanced performance and increased setup complexity should be carefully evaluated based on your specific workload demands and operational capabilities. Regularly monitor your IOPS metrics using your preferred monitoring solution to identify when you're approaching storage performance limits. For detailed guidance on monitoring Nomad server storage metrics, refer to the Observability section of the Nomad Operating guide(opens in new tab).
Snapshot storage
When deploying in AWS, use an S3 bucket for storage of Automated Snapshots(opens in new tab). The S3 bucket should be deployed in a different region from the compute instances, to allow the possibility of restoring the cluster in case of regional failure. S3 Versioning and MFA Delete can be used to ensure that the snapshots cannot be deleted by mistake or a malicious actor.
Create IAM role for cloud auto-join
We recommended that you make a IAM role used only for auto-joining. The only required IAM permission is ec2:DescribeInstances
, and if the region is omitted it is discovered through the local instance's EC2 metadata endpoint(opens in new tab). Visit Auto join AWS Provider(opens in new tab) for more information.
Deployment process
Create and connect to your instances
Launch two instances in each of the three AZs, for a total of six instances using the machine image of your choice. Ubuntu is the OS used in the following example. Adjust these to use
dnf
on RHEL if needed.Ensure the IAM Instance Profile or Managed Service Identity is using the Cloud auto-join role permissions you previously created.
Ensure the instances are tagged with
tag_key=cloud-auto-join
&tag_value=true
Update the OS
ssh -i /path/to/your-private-key ubuntu@your-vm-ip sudo apt update && sudo apt upgrade -y
Download and install Nomad
NOMAD_VERSION="1.9.1+ent"
wget https://releases.hashicorp.com/nomad/${NOMAD_VERSION}/nomad_${NOMAD_VERSION}_linux_amd64.zip
unzip nomad_${NOMAD_VERSION}_linux_amd64.zip
sudo mv nomad /usr/local/bin/
Follow the instructions in the documentation(opens in new tab) page to install Nomad from the official HashiCorp package repository for your OS.
Create necessary directories
sudo mkdir -p /opt/nomad/data
sudo mkdir -p /etc/nomad.d
sudo mkdir -p /etc/nomad.d/tls
Copy TLS certificates
Copy your TLS certificates to /etc/nomad.d/tls/
:
sudo scp nomad-agent-ca.pem global-server-nomad-key.pem global-server-nomad.pem \
ubuntu@your-vm-ip:/etc/nomad.d/tls/
Copy License File
echo "02MV4UU43BK5..." >> /etc/nomad.d/license.hclic
Create gossip encryption key
nomad operator gossip keyring generate
Save the output of this key to use in the configuration file.
Note
The `nomad operator gossip keyring generate` command returns 16 bytes; however, Nomad supports gossip encryption keys of 32 bytes as well. Supplying your own 32 byte key enables AES-256 mode, where supplying a 16 byte key enables AES-128 mode.Create Nomad configuration file
Create /etc/nomad.d/nomad.hcl
with the following content or supply your own. The example below includes:
- Cloud Auto-join(opens in new tab) which joins servers to the cluster based on tags instead of hard coded IPs or hostnames.
- Autopilot(opens in new tab) which handles automated upgrades
- Redundancy zones(opens in new tab): Nomad uses these values to partition the servers by redundancy zone, and keeps one voting server per zone. Extra servers in each zone stay as non-voters on standby and are promoted automatically if the active voter in that availability zone goes offline. We recommend matching the name of the AWS availability zone for simplicity. Change your config based on which AZ the instance resides in.
- TLS:(opens in new tab) enables TLS between nodes and for the API. At scale, it is recommended to use HashiCorp Vault(opens in new tab), however for starting out, it is acceptable to leverage signed certificates copied to the instance.
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0" # CHANGE ME to listen only on the required IPs
server {
enabled = true
bootstrap_expect = 3
server_join {
retry_join = ["provider=aws tag_key=cloud-auto-join tag_value=true"]
}
redundancy_zone = "us-east-1b" # CHANGE ME
license_path = "/etc/nomad.d/license.hclic"
encrypt = "YOUR-GOSSIP-KEY" # Paste your key from the above "Create Gossip Encryption Key" step
}
acl {
enabled = true
}
tls {
http = true
rpc = true
ca_file = "/etc/nomad.d/tls/nomad-agent-ca.pem"
cert_file = "/etc/nomad.d/tls/global-server-nomad-key.pem"
key_file = "/etc/nomad.d/tls/global-server-nomad.pem"
}
autopilot {
cleanup_dead_servers = true
last_contact_threshold = "200ms"
max_trailing_logs = 250
server_stabilization_time = "10s"
enable_redundancy_zones = true
disable_upgrade_migration = false
enable_custom_upgrades = false
}
Starting Nomad
Set up Nomad service
sudo nano /etc/systemd/system/nomad.service
[Unit]
Description=Nomad
Documentation=https://www.nomadproject.io/docs/
Wants=network-online.target
After=network-online.target
[Service]
# Nomad servers should be run as the nomad user.
# Nomad clients should be run as root
User=root
Group=root
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/nomad agent -config /etc/nomad.d
KillMode=process
KillSignal=SIGINT
LimitNOFILE=65536
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
TasksMax=infinity
OOMScoreAdjust=-1000
[Install]
WantedBy=multi-user.target
Start the Nomad service
sudo systemctl enable nomad
sudo systemctl start nomad
Validate the cluster is bootstrapped and functioning
Ensure you have 3 of the members state Voter=False
:
If you are running into issues with your cluster bootstrapping or the Nomad agent starting, you can view journal logs to reveal any errors:
sudo systemctl status nomad
sudo journalctl -xe
Typically issues are due to invalid configuration entries within nomad.hcl
, certificate, or networking issues where the nodes cannot communicate with each other.
Initialize ACL system
Log into one of your servers and run the following commands:
Set environment variables
export NOMAD_ADDR=https://127.0.0.1:4646
export NOMAD_CACERT=/etc/nomad.d/tls/nomad-agent-ca.pem
export NOMAD_CLIENT_CERT=/etc/nomad.d/tls/global-server-nomad.pem
export NOMAD_CLIENT_KEY=/etc/nomad.d/tls/global-server-nomad-key.pem
Bootstrap the ACL system. This only needs to be done on one server. The ACLs sync across server nodes.
nomad acl bootstrap
Save the accessor ID and secret ID and follow the Identity and Access Management(opens in new tab) section of the Nomad Operating Guide for more information on ACLs and post installation tasks.
Interact with your Nomad cluster
export NOMAD_ADDR=https://nomad-load-balancer
export NOMAD_TOKEN=YourSecretID
export NOMAD_CACERT=./nomad-agent-ca.pem
export NOMAD_CLIENT_CERT=./global-cli-nomad.pem
export NOMAD_CLIENT_KEY=./global-cli-nomad-key.pem
nomad -autocomplete-install && complete -C /usr/bin/nomad nomad
Nomad clients
This section provides guidance on deploying and managing Nomad clients. Nomad clients are responsible for running tasks and jobs scheduled by the Nomad servers. They register with the server cluster, receive work assignments, and execute tasks.
Preparation
Many of the preparation steps for Nomad clients are similar to those for servers. Refer to the Preparation(opens in new tab) section for details on the following:
- Creating security groups
- Setting up IAM roles
- Preparing for launching instances
Nomad client instance types
These instance families can be organized within Nomad using node pools(opens in new tab), allowing for workload segregation and enabling job authors to target their workload to the most appropriate hardware resources.
Deployment process
Follow the steps in the preparation(opens in new tab) and installation process(opens in new tab) sections of the server deployment section, with a few key differences in configuration. Ensure your clients are spread across availability zones.
Client TLS configuration
Nomad clients should use a different certificate than the server nodes. From the same directory as the CA certificate you generated during the server installation steps:
nomad tls cert create -client
This generates the following.
global-client-nomad.pem
(Client certificate)global-client-nomad-key.pem
(Client private key)
Copy certificates to the client instances:
scp nomad-agent-ca.pem global-client-nomad.pem global-client-nomad-key.pem \
ubuntu@your-vm-ip:/etc/nomad.d/tls/
Remember to keep your private keys secure and implement proper certificate management practices, including regular rotation and secure distribution of certificates to your Nomad clients.
In your nomad.hcl
file:
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0" #CHANGE ME to listen only on the required IPs
client {
enabled = true
server_join {
retry_join = ["provider=aws tag_key=cloud-auto-join tag_value=true"]
}
}
tls {
http = true
rpc = true
ca_file = "/etc/nomad.d/tls/nomad-agent-ca.pem"
cert_file = "/etc/nomad.d/tls/global-client-nomad.pem"
key_file = "/etc/nomad.d/tls/global-client-nomad-key.pem"
}
#...other client configurations...
Post-installation tasks
After installation, perform these tasks to ensure everything is working as expected:
Verify cluster health:
nomad server members #lists all servers
nomad node status #lists all clients
For day two activities such as observability and backups, consult the Nomad Operating Guide(opens in new tab).