Nomad version-specific upgrade guides

The upgrading page covers the details of doing a standard upgrade. However, specific versions of Nomad may have more details provided for their upgrades as a result of new features or changed behavior. This page is used to document those details separately from the standard upgrade flow.

Nomad 1.11.2

QEMU driver

The QEMU driver now uses host file paths for filesystem environment variables instead of relative container paths such as /alloc and /local. You may need to update job specs utilizing these variables to reflect the new values.

Nomad 1.11.1

Storage fingerprinting calculation changed

Nomad now calculates the storage available for scheduling using only `totalBytes

client.reserved.disk. The previous strategy using free disk space could lead to incorrect values when clients with running allocations restarted. Theunique.storage.bytesfree` attribute has also been removed. We recommend that you reserve at least the amount of disk that is used by the host OS.

QEMU driver

In Nomad 1.11.1, emulator and machine_type were added to the task config. These default to the previously used values of qemu-system-x86_64, and pc. Previously, when using the kvm accelerator, the machine type host was forced. This is no longer true, the value for machine_type will be used. Additionally, if using resources.cores, with the kvm accelerator, the -smp was hardcoded to that number of cores. This is now only done if the user has not specified a custom -smp flag.

Nomad 1.11.0

Sysbatch jobs will no longer accept `reschedule` blocks

In Nomad 1.11.0, submitting a sysbatch job with a reschedule block returns an error instead of being silently ignored, as it was in previous versions. The same behavior applies to system jobs.

Eval broker metrics for dispatch and periodic jobs

The leader records metrics for the eval broker. In Nomad 1.11.0 the job label on the nomad.nomad.broker.wait_time, nomad.nomad.broker.process_time, nomad.nomad.broker.response_time, and nomad.nomad.broker.eval_waiting metrics refers to the parent job ID for dispatch and periodic jobs. The nomad.nomad.broker.eval_waiting no longer has an eval_id label. For clusters running high volume dispatch workloads, this change significantly reduces metrics cardinality and memory usage on the leader.

ACL policies no longer silently ignore duplicate or invalid keys

Nomad 1.11.0 introduces stricter validation for ACL policies. Policy writes that include duplicate or invalid keys will be rejected with an error instead of being silently ignored. Any existing policies with duplicate or invalid keys will continue to work, but the source policy document will need to be updated to be valid before it can be written to Nomad.

Maximum number of allocations per job is limited by default

Nomad 1.11.0 limits the maximum number of allocations for a job to the value of the new job_max_count server configuration option, which defaults to 50000. The number of allocations is determined from the sum of the job's task group count fields. This limit is enforced at the time the job is submitted or scaled, and updating the value will not impact existing jobs.

Deprecated resource fields on Node API

The Resources and Reserved fields on the Go API's Node struct, as well as the equivalent fileds on the Read Node API, are deprecated. These fields are never populated. Use the NodeResources and ReservedResources fields instead.

Enterprise product usage reporting
Enterprise

Nomad Enterprise 1.11.0 adds detailed product usage information to automated license utilization reporting.

Nomad 1.10.6

ACL policies no longer silently ignore duplicate or invalid keys

Nomad 1.10.6 introduces stricter validation for ACL policies. Policy writes that include duplicate or invalid keys will be rejected with an error instead of being silently ignored. Any existing policies with duplicate or invalid keys will continue to work, but the source policy document will need to be updated to be valid before it can be written to Nomad.

Enterprise product usage reporting
Enterprise

Nomad Enterprise 1.10.6 adds detailed product usage information to automated license utilization reporting.

Nomad 1.10.2

Clients respect `telemetry.publish_allocation_metrics`

Nomad 1.10.2 fixed a bug where allocation metrics were collected and published even if the telemetry.publish_allocation_metrics configuration field was unset or set to false. If you are monitoring allocation metrics, you will need to ensure your Nomad clients set this field to true.

Nomad 1.10.1

Remove Raft peer by address removed

Nomad 1.4.0 removed support for Raft Protocol v2, and this removed the ability to remove Raft peers by address instead of peer ID. Nomad 1.10.1 removes the non-functional -peer-address option for the operator raft peer-remove command, and the address parameter for the DELETE /v1/operator/raft/peer API.

Agent exit on reloading configuration errors

Errors encountered when reloading agent configuration now cause agents to exit. In prior versions, Nomad only logged configuration errors during reloads. This could lead to agents running but unable to communicate. Any other errors when parsing the new configuration are logged and the reload is aborted, consistent with the current behavior.

Added Server `start_timeout` Configuration Option

Nomad 1.10.1 introduces a new server configuration option named start_timeout with a default value of 30s. This duration is used to monitor the server setup and startup processes which must complete before it is considered healthy, such as keyring decryption. If these processes do not complete before the timeout is reached, the server process will exit and any errors logged to the console.

Corrected `/v1/acl/token/self` response codes

Nomad 1.10.1 responds with different HTTP response codes to API calls sent to /v1/acl/token/self. For users that do not have ACLs enabled, the endpoint responds with 200 code and a response body that indicates that ACLs are disabled. Previously, the response code in such a scenario was 404.

For users that do have ACLs enabled and do not have a valid ACL token present, the endpoint responds with 403 code. Previously, the response code in such a scenario was 404.

Nomad 1.10.0

Quota specification variable_limits deprecated
Enterprise

The quota specification's variable_limits field is deprecated. We replaced it with a new storage block with a variables field, under the region_limit block. Existing quotas will be automatically migrated during server upgrade. We will remove the variables_limit field from the quota specification in Nomad 1.12.0.

Nomad 1.8 deprecated `disconnect` fields removed

In Nomad 1.8, we introduced the disconnect block to replace the max_client_disconnect, stop_after_client_disconnect, and prevent_reschedule_on_list fields. In Nomad 1.10, we removed these fields, and Nomad will ignore them if specified. Jobs should migrate to using the disconnect block prior to upgrading.

Go SDK API change for quota limits

In Nomad 1.10.0, the Go API for quotas has a breaking change. The QuotaSpec.RegionLimit field is now of type QuotaResources instead of Resources. The QuotaSpec.VariablesLimit field is deprecated in lieu of QuotaSpec.RegionLimit.Storage.Variables and will be removed in Nomad 1.12.0.

Remote task driver support removed

In Nomad 1.10.0, we removed all support for remote task driver capabilities. Nomad no longer detaches drivers with the RemoteTasks capability when an allocation is lost. Also, Nomad does not detach remote tasks when a node is drained. Workloads running as remote tasks should be migrated prior to upgrading.

Loading binaries from `plugin_dir` without configuration

Plugins stored within the plugin_dir will now only be loaded when they have a corresponding plugin block in the agent configuration file. Nomad now skips any plugin found without a corresponding configuration block.

Sentinel apply command requires scope
Enterprise

To prevent accidentally adding policies for volumes to the job scope, the nomad sentinel apply command now requires the -scope option. Refer to the GitHub pull request for details.

Affinity and spread updates are non-destructive

We fixed a scheduler bug so that updates to affinity and spread blocks are no longer destructive. After a job update that changes only these blocks, existing allocations remain running with their job version incremented. If you were relying on the previous behavior to redistribute workloads, you can force a destructive update by changing fields that require one, such as the meta block.

Vault and Consul integration changes

Nomad 1.10.0 removes the previously deprecated token-based authentication workflow for Vault and Consul. Nomad clients must now use a task's workload identity to authenticate to Vault and Consul and obtain a token specific to the task.

This table lists removed Vault fields and the new workflow.

Field	Configuration	New Workflow
`vault.allow_unauthenticated`	Agent	Tasks should use a workload identity. Do not use a Vault token.
`vault.task_token_ttl`	Agent	With workload identity, tasks receive their TTL configuration from the Vault role.
`vault.token`	Agent	Nomad agents use the workload identity when making requests to authenticated endpoints.
`vault.policies`	Job specification	Configure and use a Vault role.

Before upgrading to Nomad 1.10, perform the following tasks:

Configure Vault and Consul to work with workload identity.
Migrate all workloads to use workload identity.

Refer to the following guides for more information:

Nomad 1.9.9

Added Server `start_timeout` Configuration Option

Nomad 1.9.9 introduces a new server configuration option named start_timeout with a default value of 30s. This duration is used to monitor the server setup and startup processes which must complete before it is considered healthy, such as keyring decryption. If these processes do not complete before the timeout is reached, the server process will exit and any errors logged to the console.

Nomad 1.9.5

CNI plugins

Nomad 1.9.5 includes a bug fix for restoring allocation networking after a client host reboot. This fix requires recent versions of the CNI reference plugins (minimum 1.2.0) and will fallback to the existing behavior if the CNI reference plugins cannot support the fix.

We recommend installing the CNI reference plugins from the CNI project release page rather than your Linux distribution's package manager.

Nomad 1.9.4

Security updates to default deny lists

In Nomad 1.9.4, the default function_denylist includes executeTemplate, as a measure to prevent accidental or malicious infinitely recursive execution. Users that require executeTemplate should update their configuration.

Additionally, the default client env deny list includes more environment variables. Users who need some of these secure environment variables passed to their tasks should consult the list and overwrite it in the configuration.

Nomad 1.9.3

In Nomad 1.9.3, the mechanism used for calculating when objects are eligible for garbage collection changes to a clock-based one. This has two consequences. First, it allows to set arbitrarily long GC intervals. Second, it requires that Nomad servers are kept roughly in sync time-wise, because GC can originate in a follower.

Nomad 1.9.2 contained a bug that could drop all cluster state on upgrade and has been removed from downloads.

Nomad 1.9.0

Dropped support for older clients

Nomad 1.9.0 removes support for Nomad client agents older than 1.6.0. Older nodes fail heartbeats. Nomad servers mark the workloads on those nodes as lost and reschedule them normally according to the job's [reschedule][] block.

Keyring In Raft

Nomad 1.9.0 stores keys used for signing Workload Identity and encrypting Variables in Raft, instead of storing key material in the external keystore. When using external KMS or Vault transit encryption for the keyring provider, the key encryption key (KEK) is stored outside of Nomad and no cleartext key material exists on disk. When using the default AEAD provider, the key encryption key (KEK) is stored in Raft alongside the encrypted data encryption keys (DEK).

Nomad automatically migrates the key storage for all key material on the first root_key_gc_interval after all servers are upgraded to 1.9.0. The existing on-disk keystore is required to restore servers from older snapshots, so you should continue to back up the on-disk keystore until you no longer need those older snapshots.

Support for HCLv1 removed

Nomad 1.9.0 no longer supports the HCLv1 format for job specifications. Using the -hcl1 option for the job run, job plan, and job validate commands will no longer work.

One common use of -hcl1 was when specifying Docker labels with dots in their keys such as for DataDog autodiscovery:

labels {
  "com.datadoghq.ad.check_names"  = "[\"openmetrics\"]"
  "com.datadoghq.ad.init_configs" = "[{}]"
  # ...
}

Quoted keys are invalid in HCLv2 blocks and must be specified with a list-of-maps syntax:

labels = [
  {
    "com.datadoghq.ad.check_names"  = "[\"openmetrics\"]"
    "com.datadoghq.ad.init_configs" = "[{}]"
    # ...
  }
]

Nomad 1.8.18

ACL policies no longer silently ignore duplicate or invalid keys

Nomad 1.8.18 introduces stricter validation for ACL policies. Policy writes that include duplicate or invalid keys will be rejected with an error instead of being silently ignored. Any existing policies with duplicate or invalid keys will continue to work, but the source policy document will need to be updated to be valid before it can be written to Nomad.

Enterprise product usage reporting
Enterprise

Nomad Enterprise 1.8.18 adds detailed product usage information to automated license utilization reporting.

Nomad 1.8.4

Default Docker `infra_image` changed

Due to the deprecation of the third-party gcr.io registry, the default Docker infra_image is now registry.k8s.io/pause-<arch>:3.3. If you do not override the default, clients using the docker driver will make outbound requests to the new registry.

Nomad 1.8.3

Nomad keyring rotation

In Nomad 1.8.3, the Nomad root keyring will prepublish keys at half the root_key_rotation_threshold and promote them to active once the root_key_rotation_threshold has passed. The nomad operator root keyring rotate command now requires one of two arguments: -prepublish <duration> to prepublish a key or -now to rotate immediately. We recommend using -prepublish to avoid outages from workload identities used to log into external services such as Vault or Consul.

Nomad 1.8.2

New `windows_allow_insecure_container_admin` configuration option for Docker driver

In 1.8.2, Nomad will refuse to run jobs that use the Docker driver on Windows with Process Isolation that run as ContainerAdmin. This is in order to provide a more secure environment for these jobs, and this behavior can be overridden by setting the new windows_allow_insecure_container_admin Docker plugin configuration option to true or by setting privileged=true. We made this change as a result of regressions introduced by mitigations for HCSEC-2024-03.

New default isolation mode for Docker on Windows

Nomad 1.8.2 changes the default isolation mode for Docker tasks on Windows from process to hyperv, since hyperv provides a much more secure execution environment. We made this change as a result of regressions introduced by mitigations for HCSEC-2024-03.

Nomad 1.8.1

Enterprise

Nomad Enterprise 1.8.1 includes an updated version of the Sentinel library. Users that have built custom Sentinel plugins must recompile them using an SDK supporting Sentinel Plugin Protocol Version 3. Consult the Sentinel SDK Compatibility Matrix for appropriate Sentinel SDK versions.

Nomad 1.8.0

Deprecated Disconnect Fields

Nomad 1.8.0 introduces a disconnect block meant to group all the configuration options related to disconnected client's and server's behavior, causing the deprecation of the fields stop_after_client_disconnect, max_client_disconnect and prevent_reschedule_on_lost. This block also introduces new options for allocations reconciliation if the client regains connectivity.

CNI Constraints

In Nomad 1.8.0, jobs with bridge networking will have constraints added during job submit that require CNI plugins to be present on the node. Nodes have fingerprinted the available CNI plugins starting in Nomad 1.5.0.

If you are upgrading from Nomad 1.5.0 or later to 1.8.0 or later, there's nothing additional for you to do. It's not recommended to skip more than 2 versions of Nomad. But if you upgrade from earlier than 1.5.0 to 1.8.0 or later, you will need to ensure that clients have been upgraded before submitting any jobs that use bridge networking.

Removal of `raw_exec` option `no_cgroups`

In Nomad 1.7.0 the raw_exec plugin option for no_cgroups became ineffective. Starting in Nomad 1.8.0 attempting to set the no_cgroups in raw_exec plugin configuration will result in an error when starting the agent.

Nomad 1.7.11

Enterprise

Nomad keyring rotation

In Nomad 1.7.11, the Nomad root keyring will prepublish keys at half the root_key_rotation_threshold and promote them to active once the root_key_rotation_threshold has passed. The nomad operator root keyring rotate command now requires one of two arguments: -prepublish <duration> to prepublish a key or -now to rotate immediately. We recommend using -prepublish to avoid outages from workload identities used to log into external services such as Vault or Consul.

Nomad 1.7.10

Enterprise

New `windows_allow_insecure_container_admin` configuration option for Docker driver

In 1.7.10, Nomad will refuse to run jobs that use the Docker driver on Windows with Process Isolation that run as ContainerAdmin. This is in order to provide a more secure environment for these jobs, and this behavior can be overridden by setting the new windows_allow_insecure_container_admin Docker plugin configuration option to true or by setting privileged=true.

New default isolation mode for Docker on Windows

Nomad 1.7.10 changes the default isolation mode for Docker tasks on Windows from process to hyperv, since hyperv provides a much more secure execution environment.

Nomad 1.7.2

Nomad 1.7.2 fixes a critical bug in CPU fingerprinting in Nomad 1.7.0 and 1.7.1. You should not install Nomad 1.7.0 or 1.7.1 and instead install the latest Nomad 1.7.x version.

Nomad 1.7.0

Warning

Nomad 1.7.0 contains a critical bug in keyring replication. You should not install Nomad 1.7.0 and instead install the latest Nomad 1.7.x version.

Keyring Replication Failure After Leader Election

Nomad 1.7.0 introduced new RSA keys to the keyring for use in signing workload identities. These keys were not correctly replicated from leader to followers. This results in all workload identity verification failing after a leader election.

This bug was fixed in Nomad 1.7.1.

Vault Integration Changes

Starting in Nomad 1.7, Nomad clients will use a task's [Workload Identity][] to authenticate to Vault and obtain a Vault token specific to the task.

The existing workflow using a Vault token provided in either the agent configuration or at the time of job submission is deprecated and will be removed in Nomad 1.10. The vault.policies field is also deprecated and will work only with the existing workflow. Instead, you should configure a suitable Vault role and use that.

The following agent configuration fields are deprecated:

vault.allow_unauthenticated will be removed in Nomad 1.10. Tasks will use the workload identity without the user supplying a Vault token.
vault.task_token_ttl will be removed in Nomad 1.10. With workload identity, tasks will receive their TTL configuration from the Vault role.
vault.token will be removed in Nomad 1.10. Nomad agents will no longer make requests to authenticated endpoints except with a task's workload identity.

Before upgrading to Nomad 1.10 you will need to have configured authentication with Vault to work with workload identity. Refer to Migrating to Using Workload Identity with Vault for more details.

Consul Integration Changes

Starting in Nomad 1.7, Nomad clients will use a service's or task's Workload Identity to authenticate to Consul and obtain a Consul token specific to the workload.

The existing workflow using a Consul token provided in either the agent configuration or at the time of job submission is deprecated and will be removed in Nomad 1.10. The consul.allow_unauthenticated agent configuration field will be removed in Nomad 1.10. Tasks will use the workload identity without the user supplying a Consul token.

Before upgrading to Nomad 1.10 you will need to have configured authentication with Consul to work with workload identity. See Migrating to Using Workload Identity with Consul for more details.

RS256 JWT Signing Algorithm Support

Prior to Nomad 1.7, workload identity JWTs were signed with the EdDSA algorithm. While EdDSA has numerous advantages as a signing algorithm, most third parties that accept JWTs expect the RS256 signing algorithm to be used.

Therefore starting in Nomad 1.7 new signing keys will generate an RSA key and sign workload identities with the RS256 signing algorithm.

Before setting up third party authentication methods to use workload identities, it is recommended to run nomad operator root keyring rotate to ensure you generate a new RSA key.

To verify an RSA key is present you may check the /.well-known/jwks.json endpoint on any Nomad agent. If you see "kty": "RSA", then an RSA key exists and you do not need to rotate keys.

New Nomad clusters will use RSA by default and are not affected.

CPU Fingerprinting Changes

Starting in Nomad 1.7, Nomad clients improve the accuracy of detected CPU performance metrics. The fingerprinter now takes into account heterogeneous core types on applicable processors. In addition, Nomad will attempt to detect and use the base frequency of the processor rather than the turbo frequency when calculating the total available CPU bandwidth. The net result of these behaviors is that the calculated total CPU bandwidth available on a node may change when upgrading to Nomad 1.7. Operators are encouraged to ensure planned capacity meets expectations before upgrading. The [cpu concepts][cpu] documentation contains guidance in understanding how Nomad detects CPU metrics.

CPU EC2 Detection Changes

Prior to Nomad 1.7, Nomad clients embedded a large lookup table of CPU performance data for every EC2 instance type. In 1.7 and later Nomad instead gathers this data by executing the dmidecode command. The dmidecode package must be installed manually on some Linux distributions before the Nomad agent is started.

CPU Core Isolation

Starting in Nomad 1.7, Nomad tasks that specify CPU resources using the cores attribute will be restricted to using only the CPU cores assigned to them. In previous versions of Nomad these tasks could also make use of other non-reserved CPU cores. However this feature would cause severe performance problems for the Linux kernel as the number of tasks increased. Operators are encouraged to ensure tasks making use of the cores attribute are given sufficient CPU resources before upgrading.

The `distinct_hosts` Constraint Now Honors Namespaces

Nomad 1.7.0 changes the behavior of the distinct_hosts constraint such that namespaces are taken into account when choosing feasible clients for allocation placement. The previous, less-expected behavior would cause any job with the same name running on a client to cause that node to be considered infeasible.

This change allows workloads that formerly did not colocate to be scheduled onto the same client when they are in different namespaces. To prevent this, consider using [node pools] and constrain the jobs with a distinct_property constraint over ${node.pool}.

Loading Binaries from `plugin_dir` Without Configuration

Starting with Nomad 1.7.0, loading plugins that are not referenced in the agent configuration file is deprecated. Future versions of Nomad will only load plugins that have a corresponding plugin block in the agent configuration file.

Changes to `raw_exec`

The raw_exec task driver now enforces memory limits via cgroups on Linux platforms similar to the exec and docker task drivers. The driver does support memory oversubscription, which can be configured in such a way to nearly replicate the previously unlimited behavior.

The no_cgroups configuration option no longer has any effect. Previously, setting no_cgroups would disable the mechanism where Nomad used the freezer cgroup to halt the process group of a Task before issuing a kill signal to each process. Starting in Nomad 1.7.0 this behavior is always enabled (and a similar mechanism has always been enabled on cgroups v2 systems).

Nomad 1.6.14

Enterprise

Nomad keyring rotation

In Nomad 1.6.14, the Nomad root keyring will prepublish keys at half the root_key_rotation_threshold and promote them to active once the root_key_rotation_threshold has passed. The nomad operator root keyring rotate command now requires one of two arguments: -prepublish <duration> to prepublish a key or -now to rotate immediately. We recommend using -prepublish to avoid outages from workload identities used to log into external services such as Vault or Consul.

Nomad 1.6.13

Enterprise

New `windows_allow_insecure_container_admin` configuration option for Docker driver

In 1.6.13, Nomad will refuse to run jobs that use the Docker driver on Windows with Process Isolation that run as ContainerAdmin. This is in order to provide a more secure environment for these jobs, and this behavior can be overridden by setting the new windows_allow_insecure_container_admin Docker plugin configuration option to true or by setting privileged=true.

New default isolation mode for Docker on Windows

Nomad 1.6.13 changes the default isolation mode for Docker tasks on Windows from process to hyperv, since hyperv provides a much more secure execution environment.

Nomad 1.6.0

Enterprise License Validation with BuildDate

Nomad Enterprise 1.6.0 now compares license ExpirationTime with the Nomad binary's BuildDate, rather than comparing the sometimes more lenient license TerminationTime with time.Now(). See the licensing FAQ for more info, but most relevant here is that you should run the new nomad license inspect command before trying to upgrade your Enterprise servers to v1.6.0 or higher.

Job Evaluate API Endpoint Requires `submit-job` Instead of `read-job`

Nomad 1.6.0 updated the ACL capability requirement for the job evaluate endpoint from read-job to submit-job to better reflect that this operation writes state to Nomad. This endpoint is used by the nomad job eval CLI command and so the ACL requirements changed for the command as well. Users that called this endpoint or used this command using tokens with just the read-job capability or the read policy must update their tokens to use the submit-job capability or the write policy.

Exec Driver Requires New Capability for mlock

Nomad 1.6.0 updated the exec task driver to maintain the max memory locked limit set by the host system. In earlier versions of Nomad this limit was unset unintentionally.

In practice this means that exec tasks such as Vault which use the mlock system call will now need to explicitly add the ipc_lock capability.

First allow the ipc_lock capability in the Client configuration:

plugin "exec" {
  config {
    allow_caps = ["audit_write", "chown", "dac_override", "fowner", "fsetid",
      "kill", "mknod", "net_bind_service", "setfcap", "setgid", "setpcap",
      "setuid", "sys_chroot", "ipc_lock"]
  }
}

Then add the ipc_lock capability to the exec task that uses mlock:

task "vault" {
  driver = "exec"

  config {
    cap_add = ["ipc_lock"]

    # ... other task configuration
  }

# ... rest of jobspec

These additions are backward compatible with Nomad v1.5, so Clients and Jobs should be updated prior to upgrading to Nomad v1.6.

See #17780 for details.

Namespace ACL policies require a label

Nomad 1.6.0 does not allow ACL policies for namespaces without a label. Prior to this version, ACL policies for namespaces were allowed to be defined without a label, and the documented behavior in this case was that the policy would be applied to the default namespace.

A bug in this logic caused the policy to be incorrectly applied to a different namespace. For example, the policy below would be applied to a namespace called policy instead of default.

namespace {
  policy = "read"
}

To avoid further confusion and potential security incidents, this functionality was removed and now all namespace policies are required to have a label.

Tokens currently attached to an invalid policy will stop working after the upgrade, so you should fix invalid policies to have an explicit namespace label before upgrading Nomad.

After the policies are fixed, the existing tokens with those policies will continue to work and do not need to be regenerated.

Command `nomad tls cert create` flag `-cluster-region` deprecated

Nomad 1.6.0 will deprecate the command nomad tls cert create flag -cluster-region in favour of using the standard flag -region. The -cluster-region flag will be removed in Nomad 1.7.0

32-bit Intel Builds Deprecated

Starting with Nomad 1.6.0, HashiCorp will no longer release 32-bit Intel builds of Nomad and Nomad Enterprise (the builds named windows_386 and linux_386). Bug fixes will continue to be backported to the 1.5.x and 1.4.x versions so long as those major versions are still supported.

The 32-bit ARM build (linux_arm for the armhf architecture) is deprecated and may be removed in a future major version of Nomad. The 32-bit ARM build is not tested and may include bugs around platform-specific integer sizes. Using 64-bit builds for small form-factor hosts such as the RaspberryPi is strongly recommended.

Nomad 1.5.7, 1.4.11

Namespace ACL policies require a label

Nomad 1.5.7 and 1.4.11 do not allow ACL policies for namespaces without a label. Prior to these versions, ACL policies for namespaces were allowed to be defined without a label, and the documented behavior in this case was that the policy would be applied to the default namespace.

A bug in this logic caused the policy to be incorrectly applied to a different namespace. For example, the policy below would be applied to a namespace called policy instead of default.

namespace {
  policy = "read"
}

To avoid further confusion and potential security incidents, this functionality was removed and now all namespace policies are required to have a label.

Tokens currently attached to an invalid policy will stop working after the upgrade, so you should fix invalid policies to have an explicit namespace label before upgrading Nomad.

After the policies are fixed, the existing tokens with those policies will continue to work and do not need to be regenerated.

Nomad 1.5.5

Nomad 1.5.5 fixed a bug where allocations that are rescheduled for jobs registered before the upgrade would no longer collect allocation logs. The logs.enabled field introduced in 1.5.4 is now deprecated and has been replaced by a logs.disabled field that defaults to false. The logs.enabled field value will be ignored in 1.5.5 and will be removed in Nomad 1.6.0.

Nomad 1.5.4

Nomad 1.5.4 included a bug where allocations that are rescheduled for jobs registered before the upgrade would no longer collect allocation logs. The client will emit debug-level logs like the following:

client.alloc_runner.task_runner.task_hook: log collection is disabled by task

You should avoid this version of Nomad and instead install the latest version of Nomad 1.5. If you have already upgraded to Nomad 1.5.4, upgrading to Nomad 1.5.5 will restore logging collection when clients are restarted as part of the upgrade process.

Nomad 1.5.1

Artifact Download Regression Fix

Nomad 1.5.1 reverts a behavior of 1.5.0 where artifact downloads were executed as the nobody user on compatible Linux systems. This was done optimistically as defense against compromised artifact endpoints attempting to exploit the Nomad Client or tools it uses to perform downloads such as git or mercurial. Unfortunately running the child process as any user other than root is not compatible with the advice given in Nomad's security hardening guide which calls for a specific directory tree structure making such operation impossible.

Other changes to artifact downloading remain - they are executed as a child process of the Nomad agent, and on modern Linux systems make use of the Kernel landlock feature to restrict filesystem access from that process.

Nomad 1.5.0

Pause Container Reconciliation Regression

Nomad 1.5.0 introduced a regression to the way the Docker driver reconciles dangling containers. This meant pause containers would be erroneously removed, even though the allocation was still running. This would not affect the running allocation, but does cause it to fail if it needs to restart. An immediate workaround is to disable dangling container reconciliation.

Artifact Download Sandboxing

Nomad 1.5.0 changes the way [artifacts] are downloaded when specifying an artifact in a task configuration. Previously the Nomad Client would download artifacts in-process. External commands used to facilitate the download (e.g. git, hg) would be run as root, and the resulting payload would be owned as root in the allocation's task directory.

In an effort to improve the resilience and security model of the Nomad Client, in 1.5.0 artifact downloads occur in a sub-process. Where possible, that sub-process is run as the nobody user, and on modern Linux systems will be isolated from the filesystem via the kernel's landlock capability.

Operators are encouraged to ensure jobs making use of artifacts continue to work as expected. In particular, git-ssh users will need to make sure the system-wide /etc/ssh/ssh_known_hosts file is populated with any necessary remote hosts. Previously, Nomad's documentation suggested configuring /root/.ssh/known_hosts which would apply only to the root user.

The artifact downloader no longer inherits all environment variables available to the Nomad Client. The downloader sub-process environment is set as follows on Linux / macOS:

PATH=/usr/local/bin:/usr/bin:/bin
TMPDIR=<path to task dir>/tmp

and as follows on Windows:

TMP=<path to task dir>\tmp
TEMP=<path to task dir>\tmp
PATH=<inherit $PATH>
HOMEPATH=<inherit $HOMEPATH>
HOMEDRIVE=<inherit $HOMEDRIVE>
USERPROFILE=<inherit $USERPROFILE>

Configuration of the artifact downloader should happen through the options and headers fields of the artifact block. For backwards compatibility, the sandbox can be configured to inherit specified environment variables from the Nomad client by setting set_environment_variables.

The use of filesystem isolation can be disabled in Client configuration by setting disable_filesystem_isolation.

Artifact Decompression Limits

Nomad 1.5.0 now sets default limits around artifact decompression. A single artifact payload is now limited to 100GB and 4096 files when decompressed. An artifact that exceeds these limits during decompression will cause the artifact downloader to fail. These limits can be adjusted or disabled in the client artifact configuration by setting decompression_size_limit and decompression_file_count_limit.

Datacenter Wildcards

In Nomad 1.5.0, the datacenters field for a job accepts wildcards for multi-character matching. For example, datacenters = ["dc*"] will match all datacenters that start with "dc". The default value for datacenters is now ["*"], so the field can be omitted.

The * character is no longer a legal character in the datacenter field for an agent configuration. Before upgrading to Nomad 1.5.0, you should first ensure that you've updated any jobs that currently have a * in their datacenter name and then ensure that no agents have this character in their datacenter field name.

Server `rejoin_after_leave` (default: `false`) now enforced

All Nomad versions prior to v1.5.0 have incorrectly ignored the Server rejoin_after_leave configuration option. This bug has been fixed in Nomad version v1.5.0.

Previous to v1.5.0 the behavior of Nomad rejoin_after_leave was always true, regardless of Nomad server configuration, while the documentation incorrectly indicated a default of false.

Cluster operators should be aware that explicit leave events (such as nomad server force-leave) will now result in behavior which matches this configuration, and should review whether they were inadvertently relying on the buggy behavior.

Changes to eval broker metrics

The metric nomad.nomad.broker.total_blocked has been changed to nomad.nomad.broker.total_pending. This state refers to internal state of the leader's broker, and this is easily confused with the unrelated evaluation status "blocked" in the Nomad API.

Deprecated gossip keyring commands removed

The commands nomad operator keyring, nomad keyring, nomad operator keygen, and nomad keygen used to manage the gossip keyring were marked as deprecated in Nomad 1.4.0. In Nomad 1.5.0, these commands have been removed. Use the nomad operator gossip keyring commands to manage the gossip keyring.

Garbage collection of evaluations and allocations for batch job

Versions prior to 1.5.0 only delete evaluations and allocations of batch jobs that are explicitly stopped which can lead to unbounded memory growth of Nomad when the batch job is executed multiple times.

Nomad 1.5.0 introduces a new server configuration batch_eval_gc_threshold to control how allocations and evaluations for batch jobs are collected.

The default threshold is 24h. If you need to access completed allocations for batch jobs that are older than 24h you must increase this value when upgrading Nomad.

Nomad version-specific upgrade guides

Nomad 1.11.2

QEMU driver

Nomad 1.11.1

Storage fingerprinting calculation changed

QEMU driver

Nomad 1.11.0

Sysbatch jobs will no longer accept reschedule blocks

Eval broker metrics for dispatch and periodic jobs

ACL policies no longer silently ignore duplicate or invalid keys

Maximum number of allocations per job is limited by default

Deprecated resource fields on Node API

Enterprise product usage reporting Enterprise

Nomad 1.10.6

ACL policies no longer silently ignore duplicate or invalid keys

Enterprise product usage reporting Enterprise

Nomad 1.10.2

Clients respect telemetry.publish_allocation_metrics

Nomad 1.10.1

Remove Raft peer by address removed

Agent exit on reloading configuration errors

Added Server start_timeout Configuration Option

Corrected /v1/acl/token/self response codes

Nomad 1.10.0

Quota specification variable_limits deprecated Enterprise

Nomad 1.8 deprecated disconnect fields removed

Go SDK API change for quota limits

Remote task driver support removed

Loading binaries from plugin_dir without configuration

Sentinel apply command requires scope Enterprise

Affinity and spread updates are non-destructive

Vault and Consul integration changes

Nomad 1.9.9

Added Server start_timeout Configuration Option

Nomad 1.9.5

CNI plugins

Nomad 1.9.4

Security updates to default deny lists

Nomad 1.9.3

Nomad 1.9.0

Dropped support for older clients

Keyring In Raft

Support for HCLv1 removed

Nomad 1.8.18

ACL policies no longer silently ignore duplicate or invalid keys

Enterprise product usage reporting Enterprise

Nomad 1.8.4

Default Docker infra_image changed

Nomad 1.8.3

Nomad keyring rotation

Nomad 1.8.2

New windows_allow_insecure_container_admin configuration option for Docker driver

New default isolation mode for Docker on Windows

Nomad 1.8.1

Nomad 1.8.0

Deprecated Disconnect Fields

CNI Constraints

Removal of raw_exec option no_cgroups

Nomad 1.7.11

Nomad keyring rotation

Nomad 1.7.10

New windows_allow_insecure_container_admin configuration option for Docker driver

New default isolation mode for Docker on Windows

Nomad 1.7.2

Nomad 1.7.0

Keyring Replication Failure After Leader Election

Vault Integration Changes

Consul Integration Changes

RS256 JWT Signing Algorithm Support

CPU Fingerprinting Changes

CPU EC2 Detection Changes

CPU Core Isolation

The distinct_hosts Constraint Now Honors Namespaces

Loading Binaries from plugin_dir Without Configuration

Changes to raw_exec

Nomad 1.6.14

Nomad keyring rotation

Nomad 1.6.13

New windows_allow_insecure_container_admin configuration option for Docker driver

New default isolation mode for Docker on Windows

Sysbatch jobs will no longer accept `reschedule` blocks

Enterprise product usage reporting
Enterprise

Enterprise product usage reporting
Enterprise

Clients respect `telemetry.publish_allocation_metrics`

Added Server `start_timeout` Configuration Option

Corrected `/v1/acl/token/self` response codes

Quota specification variable_limits deprecated
Enterprise

Nomad 1.8 deprecated `disconnect` fields removed

Loading binaries from `plugin_dir` without configuration

Sentinel apply command requires scope
Enterprise

Added Server `start_timeout` Configuration Option

Enterprise product usage reporting
Enterprise

Default Docker `infra_image` changed

New `windows_allow_insecure_container_admin` configuration option for Docker driver

Removal of `raw_exec` option `no_cgroups`

New `windows_allow_insecure_container_admin` configuration option for Docker driver

The `distinct_hosts` Constraint Now Honors Namespaces

Loading Binaries from `plugin_dir` Without Configuration

Changes to `raw_exec`

New `windows_allow_insecure_container_admin` configuration option for Docker driver

Job Evaluate API Endpoint Requires `submit-job` Instead of `read-job`

Command `nomad tls cert create` flag `-cluster-region` deprecated

Server `rejoin_after_leave` (default: `false`) now enforced