Nomad
Nomad version-specific upgrade guides
The upgrading page covers the details of doing a standard upgrade. However, specific versions of Nomad may have more details provided for their upgrades as a result of new features or changed behavior. This page is used to document those details separately from the standard upgrade flow.
Nomad 1.11.2
QEMU driver
The QEMU driver now uses host file paths for filesystem environment variables
instead of relative container paths such as /alloc and /local. You may need
to update job specs utilizing these variables to reflect the new values.
Nomad 1.11.1
Storage fingerprinting calculation changed
Nomad now calculates the storage available for scheduling using only `totalBytes
- client.reserved.disk
. The previous strategy using free disk space could lead to incorrect values when clients with running allocations restarted. Theunique.storage.bytesfree` attribute has also been removed. We recommend that you reserve at least the amount of disk that is used by the host OS.
QEMU driver
In Nomad 1.11.1, emulator and machine_type were added to the task config.
These default to the previously used values of qemu-system-x86_64, and pc.
Previously, when using the kvm accelerator, the machine type host was forced.
This is no longer true, the value for machine_type will be used. Additionally, if
using resources.cores, with the kvm accelerator, the -smp was hardcoded to that
number of cores. This is now only done if the user has not specified a custom -smp flag.
Nomad 1.11.0
Sysbatch jobs will no longer accept reschedule blocks
In Nomad 1.11.0, submitting a sysbatch job with a reschedule block returns
an error instead of being silently ignored, as it was in previous versions. The
same behavior applies to system jobs.
Eval broker metrics for dispatch and periodic jobs
The leader records metrics for the eval broker. In Nomad 1.11.0 the job label
on the nomad.nomad.broker.wait_time, nomad.nomad.broker.process_time,
nomad.nomad.broker.response_time, and nomad.nomad.broker.eval_waiting
metrics refers to the parent job ID for dispatch and periodic jobs. The
nomad.nomad.broker.eval_waiting no longer has an eval_id label. For clusters
running high volume dispatch workloads, this change significantly reduces
metrics cardinality and memory usage on the leader.
ACL policies no longer silently ignore duplicate or invalid keys
Nomad 1.11.0 introduces stricter validation for ACL policies. Policy writes that include duplicate or invalid keys will be rejected with an error instead of being silently ignored. Any existing policies with duplicate or invalid keys will continue to work, but the source policy document will need to be updated to be valid before it can be written to Nomad.
Maximum number of allocations per job is limited by default
Nomad 1.11.0 limits the maximum number of allocations for a job to the value of
the new job_max_count server
configuration option, which defaults to 50000. The number of allocations is
determined from the sum of the job's task group count fields. This limit is
enforced at the time the job is submitted or scaled, and updating the value will
not impact existing jobs.
Deprecated resource fields on Node API
The Resources and Reserved fields on the Go API's Node struct, as well as
the equivalent fileds on the Read Node API,
are deprecated. These fields are never populated. Use the NodeResources and
ReservedResources fields instead.
Enterprise product usage reporting Enterprise
Enterprise
Nomad Enterprise 1.11.0 adds detailed product usage information to automated license utilization reporting.
Nomad 1.10.6
ACL policies no longer silently ignore duplicate or invalid keys
Nomad 1.10.6 introduces stricter validation for ACL policies. Policy writes that include duplicate or invalid keys will be rejected with an error instead of being silently ignored. Any existing policies with duplicate or invalid keys will continue to work, but the source policy document will need to be updated to be valid before it can be written to Nomad.
Enterprise product usage reporting Enterprise
Enterprise
Nomad Enterprise 1.10.6 adds detailed product usage information to automated license utilization reporting.
Nomad 1.10.2
Clients respect telemetry.publish_allocation_metrics
Nomad 1.10.2 fixed a bug where allocation metrics were collected and published
even if the
telemetry.publish_allocation_metrics
configuration field was unset or set to false. If you are monitoring
allocation metrics, you will need to ensure your Nomad clients set this field to
true.
Nomad 1.10.1
Remove Raft peer by address removed
Nomad 1.4.0 removed support for Raft Protocol v2, and this removed the ability
to remove Raft peers by address instead of peer ID. Nomad 1.10.1 removes the
non-functional -peer-address option for the operator raft
peer-remove command, and the
address parameter for the DELETE /v1/operator/raft/peer API.
Agent exit on reloading configuration errors
Errors encountered when reloading agent configuration now cause agents to exit. In prior versions, Nomad only logged configuration errors during reloads. This could lead to agents running but unable to communicate. Any other errors when parsing the new configuration are logged and the reload is aborted, consistent with the current behavior.
Added Server start_timeout Configuration Option
Nomad 1.10.1 introduces a new server configuration option named start_timeout
with a default value of 30s. This duration is used to monitor the server setup
and startup processes which must complete before it is considered healthy, such
as keyring decryption. If these processes do not complete before the timeout is
reached, the server process will exit and any errors logged to the console.
Corrected /v1/acl/token/self response codes
Nomad 1.10.1 responds with different HTTP response codes to API calls sent to
/v1/acl/token/self. For users that do not have ACLs enabled, the endpoint
responds with 200 code and a response body that indicates that ACLs are
disabled. Previously, the response code in such a scenario was 404.
For users that do have ACLs enabled and do not have a valid ACL token present, the endpoint responds with 403 code. Previously, the response code in such a scenario was 404.
Nomad 1.10.0
Quota specification variable_limits deprecated Enterprise
Enterprise
The quota specification's variable_limits field is deprecated. We replaced it
with a new storage block
with a variables field, under the region_limit block. Existing quotas will
be automatically migrated during server upgrade. We will remove the
variables_limit field from the quota specification in Nomad 1.12.0.
Nomad 1.8 deprecated disconnect fields removed
In Nomad 1.8, we introduced the disconnect block to replace the
max_client_disconnect, stop_after_client_disconnect, and
prevent_reschedule_on_list fields. In Nomad 1.10, we removed these fields, and
Nomad will ignore them if specified. Jobs should migrate to using the
disconnect block prior to upgrading.
Go SDK API change for quota limits
In Nomad 1.10.0, the Go API for quotas has a breaking change. The
QuotaSpec.RegionLimit field is now of type QuotaResources instead of
Resources. The QuotaSpec.VariablesLimit field is deprecated in lieu of
QuotaSpec.RegionLimit.Storage.Variables and will be removed in Nomad 1.12.0.
Remote task driver support removed
In Nomad 1.10.0, we removed all support for remote task driver
capabilities. Nomad no longer detaches drivers with the RemoteTasks capability
when an allocation is lost. Also, Nomad does not detach remote tasks
when a node is drained. Workloads running as remote tasks should be migrated
prior to upgrading.
Loading binaries from plugin_dir without configuration
Plugins stored within the plugin_dir
will now only be loaded when they have a corresponding
plugin block in the agent configuration
file. Nomad now skips any plugin found without a corresponding configuration block.
Sentinel apply command requires scope Enterprise
Enterprise
To prevent accidentally adding policies for volumes to the job scope, the nomad sentinel apply command now
requires the -scope option. Refer to the GitHub pull
request for details.
Affinity and spread updates are non-destructive
We fixed a scheduler bug so that updates to affinity and
spread blocks are no longer destructive. After a job update that changes only
these blocks, existing allocations remain running with their job version
incremented. If you were relying on the previous behavior to redistribute
workloads, you can force a destructive update by changing fields that require
one, such as the meta block.
Vault and Consul integration changes
Nomad 1.10.0 removes the previously deprecated token-based authentication workflow for Vault and Consul. Nomad clients must now use a task's workload identity to authenticate to Vault and Consul and obtain a token specific to the task.
This table lists removed Vault fields and the new workflow.
| Field | Configuration | New Workflow |
|---|---|---|
vault.allow_unauthenticated | Agent | Tasks should use a workload identity. Do not use a Vault token. |
vault.task_token_ttl | Agent | With workload identity, tasks receive their TTL configuration from the Vault role. |
vault.token | Agent | Nomad agents use the workload identity when making requests to authenticated endpoints. |
vault.policies | Job specification | Configure and use a Vault role. |
Before upgrading to Nomad 1.10, perform the following tasks:
- Configure Vault and Consul to work with workload identity.
- Migrate all workloads to use workload identity.
Refer to the following guides for more information:
Nomad 1.9.9
Added Server start_timeout Configuration Option
Nomad 1.9.9 introduces a new server configuration option named start_timeout
with a default value of 30s. This duration is used to monitor the server setup
and startup processes which must complete before it is considered healthy, such
as keyring decryption. If these processes do not complete before the timeout is
reached, the server process will exit and any errors logged to the console.
Nomad 1.9.5
CNI plugins
Nomad 1.9.5 includes a bug fix for restoring allocation networking after a client host reboot. This fix requires recent versions of the CNI reference plugins (minimum 1.2.0) and will fallback to the existing behavior if the CNI reference plugins cannot support the fix.
We recommend installing the CNI reference plugins from the CNI project release page rather than your Linux distribution's package manager.
Nomad 1.9.4
Security updates to default deny lists
In Nomad 1.9.4, the default function_denylist includes executeTemplate, as
a measure to prevent accidental or malicious infinitely recursive execution.
Users that require executeTemplate should update their
configuration.
Additionally, the default client env deny list includes more environment variables. Users who need some of these secure environment variables passed to their tasks should consult the list and overwrite it in the configuration.
Nomad 1.9.3
In Nomad 1.9.3, the mechanism used for calculating when objects are eligible for garbage collection changes to a clock-based one. This has two consequences. First, it allows to set arbitrarily long GC intervals. Second, it requires that Nomad servers are kept roughly in sync time-wise, because GC can originate in a follower.
Nomad 1.9.2 contained a bug that could drop all cluster state on upgrade and has been removed from downloads.
Nomad 1.9.0
Dropped support for older clients
Nomad 1.9.0 removes support for Nomad client agents older than 1.6.0. Older nodes fail heartbeats. Nomad servers mark the workloads on those nodes as lost and reschedule them normally according to the job's [reschedule][] block.
Keyring In Raft
Nomad 1.9.0 stores keys used for signing Workload Identity and encrypting
Variables in Raft, instead of storing key material in the external
keystore. When using external KMS or Vault transit encryption for the
keyring provider, the key encryption key
(KEK) is stored outside of Nomad and no cleartext key material exists on disk.
When using the default AEAD provider, the key encryption key (KEK) is stored in
Raft alongside the encrypted data encryption keys (DEK).
Nomad automatically migrates the key storage for all key material on the
first root_key_gc_interval after all servers are upgraded to 1.9.0. The
existing on-disk keystore is required to restore servers from older snapshots,
so you should continue to back up the on-disk keystore until you no longer need
those older snapshots.
Support for HCLv1 removed
Nomad 1.9.0 no longer supports the HCLv1 format for job specifications. Using
the -hcl1 option for the job run, job plan, and job validate commands
will no longer work.
One common use of -hcl1 was when specifying Docker
labels with dots in their keys such as for
DataDog autodiscovery:
labels {
"com.datadoghq.ad.check_names" = "[\"openmetrics\"]"
"com.datadoghq.ad.init_configs" = "[{}]"
# ...
}
Quoted keys are invalid in HCLv2 blocks and must be specified with a list-of-maps syntax:
labels = [
{
"com.datadoghq.ad.check_names" = "[\"openmetrics\"]"
"com.datadoghq.ad.init_configs" = "[{}]"
# ...
}
]
Nomad 1.8.18
ACL policies no longer silently ignore duplicate or invalid keys
Nomad 1.8.18 introduces stricter validation for ACL policies. Policy writes that include duplicate or invalid keys will be rejected with an error instead of being silently ignored. Any existing policies with duplicate or invalid keys will continue to work, but the source policy document will need to be updated to be valid before it can be written to Nomad.
Enterprise product usage reporting Enterprise
Enterprise
Nomad Enterprise 1.8.18 adds detailed product usage information to automated license utilization reporting.
Nomad 1.8.4
Default Docker infra_image changed
Due to the deprecation of the third-party gcr.io registry, the default Docker
infra_image is now registry.k8s.io/pause-<arch>:3.3. If you do not
override the default, clients using the docker driver will make outbound
requests to the new registry.
Nomad 1.8.3
Nomad keyring rotation
In Nomad 1.8.3, the Nomad root keyring will prepublish keys at half the
root_key_rotation_threshold and promote them to active once the
root_key_rotation_threshold has passed. The nomad operator root keyring
rotate command now requires one of two arguments: -prepublish <duration> to
prepublish a key or -now to rotate immediately. We recommend using
-prepublish to avoid outages from workload identities used to log into
external services such as Vault or Consul.
Nomad 1.8.2
New windows_allow_insecure_container_admin configuration option for Docker driver
In 1.8.2, Nomad will refuse to run jobs that use the Docker driver on Windows
with Process
Isolation
that run as ContainerAdmin. This is in order to
provide a more secure environment for these jobs, and this behavior can be
overridden by setting the new windows_allow_insecure_container_admin Docker
plugin configuration option to true or by setting privileged=true. We made
this change as a result of regressions introduced by mitigations for
HCSEC-2024-03.
New default isolation mode for Docker on Windows
Nomad 1.8.2 changes the default isolation mode for Docker tasks on Windows from
process to hyperv, since hyperv provides a much more secure execution
environment. We made this change as a result of regressions introduced by
mitigations for
HCSEC-2024-03.
Nomad 1.8.1
Enterprise
Nomad Enterprise 1.8.1 includes an updated version of the Sentinel library. Users that have built custom Sentinel plugins must recompile them using an SDK supporting Sentinel Plugin Protocol Version 3. Consult the Sentinel SDK Compatibility Matrix for appropriate Sentinel SDK versions.
Nomad 1.8.0
Deprecated Disconnect Fields
Nomad 1.8.0 introduces a disconnect block meant to group all the configuration
options related to disconnected client's and server's behavior, causing the
deprecation of the fields stop_after_client_disconnect, max_client_disconnect
and prevent_reschedule_on_lost. This block also introduces new options for
allocations reconciliation if the client regains connectivity.
CNI Constraints
In Nomad 1.8.0, jobs with bridge networking will have constraints added during
job submit that require CNI plugins to be present on the node. Nodes have
fingerprinted the available CNI plugins starting in Nomad 1.5.0.
If you are upgrading from Nomad 1.5.0 or later to 1.8.0 or later, there's
nothing additional for you to do. It's not recommended to skip more than 2
versions of Nomad. But if you upgrade from earlier than 1.5.0 to 1.8.0 or later,
you will need to ensure that clients have been upgraded before submitting any
jobs that use bridge networking.
Removal of raw_exec option no_cgroups
In Nomad 1.7.0 the raw_exec plugin option for no_cgroups became ineffective.
Starting in Nomad 1.8.0 attempting to set the no_cgroups in raw_exec plugin
configuration will result in an error when starting the agent.
Nomad 1.7.11
Enterprise
Nomad keyring rotation
In Nomad 1.7.11, the Nomad root keyring will prepublish keys at half the
root_key_rotation_threshold and promote them to active once the
root_key_rotation_threshold has passed. The nomad operator root keyring
rotate command now requires one of two arguments: -prepublish <duration> to
prepublish a key or -now to rotate immediately. We recommend using
-prepublish to avoid outages from workload identities used to log into
external services such as Vault or Consul.
Nomad 1.7.10
Enterprise
New windows_allow_insecure_container_admin configuration option for Docker driver
In 1.7.10, Nomad will refuse to run jobs that use the Docker driver on Windows
with Process
Isolation
that run as ContainerAdmin. This is in order to
provide a more secure environment for these jobs, and this behavior can be
overridden by setting the new windows_allow_insecure_container_admin Docker
plugin configuration option to true or by setting privileged=true.
New default isolation mode for Docker on Windows
Nomad 1.7.10 changes the default isolation mode for Docker tasks on Windows from
process to hyperv, since hyperv provides a much more secure execution
environment.
Nomad 1.7.2
Nomad 1.7.2 fixes a critical bug in CPU fingerprinting in Nomad 1.7.0 and 1.7.1. You should not install Nomad 1.7.0 or 1.7.1 and instead install the latest Nomad 1.7.x version.
Nomad 1.7.0
Keyring Replication Failure After Leader Election
Nomad 1.7.0 introduced new RSA keys to the keyring for use in signing workload identities. These keys were not correctly replicated from leader to followers. This results in all workload identity verification failing after a leader election.
This bug was fixed in Nomad 1.7.1.
Vault Integration Changes
Starting in Nomad 1.7, Nomad clients will use a task's [Workload Identity][] to authenticate to Vault and obtain a Vault token specific to the task.
The existing workflow using a Vault token provided in either the agent
configuration or at the time of job submission is deprecated and will be removed
in Nomad 1.10. The vault.policies field is also deprecated and will work
only with the existing workflow. Instead, you should configure a suitable Vault
role and use that.
The following agent configuration fields are deprecated:
vault.allow_unauthenticatedwill be removed in Nomad 1.10. Tasks will use the workload identity without the user supplying a Vault token.vault.task_token_ttlwill be removed in Nomad 1.10. With workload identity, tasks will receive their TTL configuration from the Vault role.vault.tokenwill be removed in Nomad 1.10. Nomad agents will no longer make requests to authenticated endpoints except with a task's workload identity.
Before upgrading to Nomad 1.10 you will need to have configured authentication with Vault to work with workload identity. Refer to Migrating to Using Workload Identity with Vault for more details.
Consul Integration Changes
Starting in Nomad 1.7, Nomad clients will use a service's or task's Workload Identity to authenticate to Consul and obtain a Consul token specific to the workload.
The existing workflow using a Consul token provided in either the agent
configuration or at the time of job submission is deprecated and will be removed
in Nomad 1.10. The consul.allow_unauthenticated agent configuration field
will be removed in Nomad 1.10. Tasks will use the workload identity without the
user supplying a Consul token.
Before upgrading to Nomad 1.10 you will need to have configured authentication with Consul to work with workload identity. See Migrating to Using Workload Identity with Consul for more details.
RS256 JWT Signing Algorithm Support
Prior to Nomad 1.7, workload identity JWTs were signed with the EdDSA
algorithm. While EdDSA has numerous advantages as a signing algorithm, most
third parties that accept JWTs expect the RS256 signing algorithm to be used.
Therefore starting in Nomad 1.7 new signing keys will generate an RSA key and
sign workload identities with the RS256 signing algorithm.
Before setting up third party authentication methods to use workload
identities, it is recommended to run nomad operator root keyring
rotate to ensure you
generate a new RSA key.
To verify an RSA key is present you may check the /.well-known/jwks.json
endpoint on any
Nomad agent. If you see "kty": "RSA", then an RSA key exists and you do not
need to rotate keys.
New Nomad clusters will use RSA by default and are not affected.
CPU Fingerprinting Changes
Starting in Nomad 1.7, Nomad clients improve the accuracy of detected CPU performance metrics. The fingerprinter now takes into account heterogeneous core types on applicable processors. In addition, Nomad will attempt to detect and use the base frequency of the processor rather than the turbo frequency when calculating the total available CPU bandwidth. The net result of these behaviors is that the calculated total CPU bandwidth available on a node may change when upgrading to Nomad 1.7. Operators are encouraged to ensure planned capacity meets expectations before upgrading. The [cpu concepts][cpu] documentation contains guidance in understanding how Nomad detects CPU metrics.
CPU EC2 Detection Changes
Prior to Nomad 1.7, Nomad clients embedded a large lookup table of CPU
performance data for every EC2 instance type. In 1.7 and later Nomad instead
gathers this data by executing the dmidecode command. The dmidecode package
must be installed manually on some Linux distributions before the Nomad agent
is started.
CPU Core Isolation
Starting in Nomad 1.7, Nomad tasks that specify CPU resources using the cores
attribute will be restricted to using only the CPU cores assigned to them. In
previous versions of Nomad these tasks could also make use of other non-reserved
CPU cores. However this feature would cause severe performance problems for
the Linux kernel as the number of tasks increased. Operators are encouraged
to ensure tasks making use of the cores attribute are given sufficient CPU
resources before upgrading.
The distinct_hosts Constraint Now Honors Namespaces
Nomad 1.7.0 changes the behavior of the distinct_hosts constraint such that
namespaces are taken into account when choosing feasible clients for allocation
placement. The previous, less-expected behavior would cause any job with the
same name running on a client to cause that node to be considered infeasible.
This change allows workloads that formerly did not colocate to be scheduled
onto the same client when they are in different namespaces. To prevent this,
consider using [node pools] and constrain the jobs with a distinct_property
constraint over ${node.pool}.
Loading Binaries from plugin_dir Without Configuration
Starting with Nomad 1.7.0, loading plugins that are not referenced in the agent
configuration file is deprecated. Future versions of Nomad will only load
plugins that have a corresponding plugin
block in the agent configuration file.
Changes to raw_exec
The raw_exec task driver now enforces memory limits via cgroups on Linux
platforms similar to the exec and docker task drivers. The driver does
support memory oversubscription, which can be configured in such a
way to nearly replicate the previously unlimited behavior.
The no_cgroups configuration option no longer has any effect. Previously,
setting no_cgroups would disable the mechanism where Nomad used the freezer
cgroup to halt the process group of a Task before issuing a kill signal to each
process. Starting in Nomad 1.7.0 this behavior is always enabled (and a similar
mechanism has always been enabled on cgroups v2 systems).
Nomad 1.6.14
Enterprise
Nomad keyring rotation
In Nomad 1.6.14, the Nomad root keyring will prepublish keys at half the
root_key_rotation_threshold and promote them to active once the
root_key_rotation_threshold has passed. The nomad operator root keyring
rotate command now requires one of two arguments: -prepublish <duration> to
prepublish a key or -now to rotate immediately. We recommend using
-prepublish to avoid outages from workload identities used to log into
external services such as Vault or Consul.
Nomad 1.6.13
Enterprise
New windows_allow_insecure_container_admin configuration option for Docker driver
In 1.6.13, Nomad will refuse to run jobs that use the Docker driver on Windows
with Process
Isolation
that run as ContainerAdmin. This is in order to
provide a more secure environment for these jobs, and this behavior can be
overridden by setting the new windows_allow_insecure_container_admin Docker
plugin configuration option to true or by setting privileged=true.
New default isolation mode for Docker on Windows
Nomad 1.6.13 changes the default isolation mode for Docker tasks on Windows from
process to hyperv, since hyperv provides a much more secure execution
environment.
Nomad 1.6.0
Enterprise License Validation with BuildDate
Nomad Enterprise 1.6.0 now compares license ExpirationTime with the Nomad binary's BuildDate,
rather than comparing the sometimes more lenient license TerminationTime with time.Now().
See the licensing FAQ for more info,
but most relevant here is that you should run the new
nomad license inspect command
before trying to upgrade your Enterprise servers to v1.6.0 or higher.
Job Evaluate API Endpoint Requires submit-job Instead of read-job
Nomad 1.6.0 updated the ACL capability requirement for the job evaluate
endpoint from read-job to submit-job to better reflect that this operation
writes state to Nomad. This endpoint is used by the nomad job eval CLI
command and so the ACL requirements changed for the command as well. Users that
called this endpoint or used this command using tokens with just the read-job
capability or the read policy must update their tokens to use the
submit-job capability or the write policy.
Exec Driver Requires New Capability for mlock
Nomad 1.6.0 updated the exec task driver to maintain the max memory locked
limit set by the host system. In earlier versions of Nomad this limit was
unset unintentionally.
In practice this means that exec tasks such as Vault which use the mlock
system call will now need to explicitly add the ipc_lock capability.
First allow the ipc_lock capability in the Client
configuration:
plugin "exec" {
config {
allow_caps = ["audit_write", "chown", "dac_override", "fowner", "fsetid",
"kill", "mknod", "net_bind_service", "setfcap", "setgid", "setpcap",
"setuid", "sys_chroot", "ipc_lock"]
}
}
Then add the ipc_lock capability to the exec task that uses
mlock:
task "vault" {
driver = "exec"
config {
cap_add = ["ipc_lock"]
# ... other task configuration
}
# ... rest of jobspec
These additions are backward compatible with Nomad v1.5, so Clients and Jobs should be updated prior to upgrading to Nomad v1.6.
See #17780 for details.
Namespace ACL policies require a label
Nomad 1.6.0 does not allow ACL policies for namespaces without a label. Prior
to this version, ACL policies for namespaces were allowed to be defined
without a label, and the documented behavior in this case was that the policy
would be applied to the default namespace.
A bug in this logic caused the policy to be incorrectly applied to a different
namespace. For example, the policy below would be applied to a namespace called
policy instead of default.
namespace {
policy = "read"
}
To avoid further confusion and potential security incidents, this functionality was removed and now all namespace policies are required to have a label.
Tokens currently attached to an invalid policy will stop working after the upgrade, so you should fix invalid policies to have an explicit namespace label before upgrading Nomad.
After the policies are fixed, the existing tokens with those policies will continue to work and do not need to be regenerated.
Command nomad tls cert create flag -cluster-region deprecated
Nomad 1.6.0 will deprecate the command nomad tls cert create flag -cluster-region
in favour of using the standard flag -region. The -cluster-region flag
will be removed in Nomad 1.7.0
32-bit Intel Builds Deprecated
Starting with Nomad 1.6.0, HashiCorp will no longer release 32-bit Intel builds
of Nomad and Nomad Enterprise (the builds named windows_386 and
linux_386). Bug fixes will continue to be backported to the 1.5.x and 1.4.x
versions so long as those major versions are still supported.
The 32-bit ARM build (linux_arm for the armhf architecture) is deprecated and
may be removed in a future major version of Nomad. The 32-bit ARM build is not
tested and may include bugs around platform-specific integer sizes. Using 64-bit
builds for small form-factor hosts such as the RaspberryPi is strongly
recommended.
Nomad 1.5.7, 1.4.11
Namespace ACL policies require a label
Nomad 1.5.7 and 1.4.11 do not allow ACL policies for namespaces without a
label. Prior to these versions, ACL policies for namespaces were allowed to be
defined without a label, and the documented behavior in this case was that the
policy would be applied to the default namespace.
A bug in this logic caused the policy to be incorrectly applied to a different
namespace. For example, the policy below would be applied to a namespace called
policy instead of default.
namespace {
policy = "read"
}
To avoid further confusion and potential security incidents, this functionality was removed and now all namespace policies are required to have a label.
Tokens currently attached to an invalid policy will stop working after the upgrade, so you should fix invalid policies to have an explicit namespace label before upgrading Nomad.
After the policies are fixed, the existing tokens with those policies will continue to work and do not need to be regenerated.
Nomad 1.5.5
Nomad 1.5.5 fixed a bug where allocations that are rescheduled for jobs
registered before the upgrade would no longer collect allocation logs. The
logs.enabled field introduced in 1.5.4 is now deprecated and has been replaced
by a logs.disabled field that defaults to false. The logs.enabled field value
will be ignored in 1.5.5 and will be removed in Nomad 1.6.0.
Nomad 1.5.4
Nomad 1.5.4 included a bug where allocations that are rescheduled for jobs registered before the upgrade would no longer collect allocation logs. The client will emit debug-level logs like the following:
client.alloc_runner.task_runner.task_hook: log collection is disabled by task
You should avoid this version of Nomad and instead install the latest version of Nomad 1.5. If you have already upgraded to Nomad 1.5.4, upgrading to Nomad 1.5.5 will restore logging collection when clients are restarted as part of the upgrade process.
Nomad 1.5.1
Artifact Download Regression Fix
Nomad 1.5.1 reverts a behavior of 1.5.0 where artifact downloads were executed
as the nobody user on compatible Linux systems. This was done optimistically
as defense against compromised artifact endpoints attempting to exploit the
Nomad Client or tools it uses to perform downloads such as git or mercurial.
Unfortunately running the child process as any user other than root is not
compatible with the advice given in Nomad's security hardening guide
which calls for a specific directory tree structure making such operation impossible.
Other changes to artifact downloading remain - they are executed as a child process of the Nomad agent, and on modern Linux systems make use of the Kernel landlock feature to restrict filesystem access from that process.
Nomad 1.5.0
Pause Container Reconciliation Regression
Nomad 1.5.0 introduced a regression to the way the Docker driver reconciles dangling containers. This meant pause containers would be erroneously removed, even though the allocation was still running. This would not affect the running allocation, but does cause it to fail if it needs to restart. An immediate workaround is to disable dangling container reconciliation.
Artifact Download Sandboxing
Nomad 1.5.0 changes the way [artifacts] are downloaded when specifying an artifact
in a task configuration. Previously the Nomad Client would download artifacts
in-process. External commands used to facilitate the download (e.g. git, hg)
would be run as root, and the resulting payload would be owned as root in the
allocation's task directory.
In an effort to improve the resilience and security model of the Nomad Client,
in 1.5.0 artifact downloads occur in a sub-process. Where possible, that
sub-process is run as the nobody user, and on modern Linux systems will
be isolated from the filesystem via the kernel's landlock capability.
Operators are encouraged to ensure jobs making use of artifacts continue to work
as expected. In particular, git-ssh users will need to make sure the system-wide
/etc/ssh/ssh_known_hosts file is populated with any necessary remote hosts.
Previously, Nomad's documentation suggested configuring
/root/.ssh/known_hosts which would apply only to the root user.
The artifact downloader no longer inherits all environment variables available to the Nomad Client. The downloader sub-process environment is set as follows on Linux / macOS:
PATH=/usr/local/bin:/usr/bin:/bin
TMPDIR=<path to task dir>/tmp
and as follows on Windows:
TMP=<path to task dir>\tmp
TEMP=<path to task dir>\tmp
PATH=<inherit $PATH>
HOMEPATH=<inherit $HOMEPATH>
HOMEDRIVE=<inherit $HOMEDRIVE>
USERPROFILE=<inherit $USERPROFILE>
Configuration of the artifact downloader should happen through the options
and headers fields of the artifact block. For backwards
compatibility, the sandbox can be configured to inherit specified environment variables
from the Nomad client by setting set_environment_variables.
The use of filesystem isolation can be disabled in Client configuration by
setting disable_filesystem_isolation.
Artifact Decompression Limits
Nomad 1.5.0 now sets default limits around artifact decompression. A single artifact
payload is now limited to 100GB and 4096 files when decompressed. An artifact that
exceeds these limits during decompression will cause the artifact downloader to
fail. These limits can be adjusted or disabled in the client artifact configuration
by setting decompression_size_limit and
decompression_file_count_limit.
Datacenter Wildcards
In Nomad 1.5.0, the
datacenters field for a job
accepts wildcards for multi-character matching. For example, datacenters =
["dc*"] will match all datacenters that start with "dc". The default value
for datacenters is now ["*"], so the field can be omitted.
The * character is no longer a legal character in the
datacenter field for an agent
configuration. Before upgrading to Nomad 1.5.0, you should first ensure that
you've updated any jobs that currently have a * in their datacenter name and
then ensure that no agents have this character in their datacenter field name.
Server rejoin_after_leave (default: false) now enforced
All Nomad versions prior to v1.5.0 have incorrectly ignored the Server
rejoin_after_leave configuration option. This bug has been fixed in Nomad
version v1.5.0.
Previous to v1.5.0 the behavior of Nomad rejoin_after_leave was always true,
regardless of Nomad server configuration, while the documentation incorrectly
indicated a default of false.
Cluster operators should be aware that explicit leave events (such as nomad
server force-leave) will now result in behavior which matches this
configuration, and should review whether they were inadvertently relying on the
buggy behavior.
Changes to eval broker metrics
The metric nomad.nomad.broker.total_blocked has been changed to
nomad.nomad.broker.total_pending. This state refers to internal state of the
leader's broker, and this is easily confused with the unrelated evaluation
status "blocked" in the Nomad API.
Deprecated gossip keyring commands removed
The commands nomad operator keyring, nomad keyring, nomad operator keygen,
and nomad keygen used to manage the gossip keyring were marked as deprecated
in Nomad 1.4.0. In Nomad 1.5.0, these commands have been removed. Use the nomad
operator gossip keyring commands to manage the gossip keyring.
Garbage collection of evaluations and allocations for batch job
Versions prior to 1.5.0 only delete evaluations and allocations of batch jobs that are explicitly stopped which can lead to unbounded memory growth of Nomad when the batch job is executed multiple times.
Nomad 1.5.0 introduces a new server configuration
batch_eval_gc_threshold
to control how allocations and evaluations for batch jobs are collected.
The default threshold is 24h. If you need to access completed allocations for
batch jobs that are older than 24h you must increase this value when upgrading
Nomad.