Nomad
spread block in the job specification
| Placement | job -> spreadjob -> group -> spread |
The spread block allows operators to increase the failure tolerance of their
applications by specifying a node attribute that allocations should be spread
over. This allows operators to spread allocations over attributes such as
datacenter, availability zone, or even rack in a physical datacenter.
By default, when spread is omitted, the scheduler will attempt to place
allocations from the same job on different nodes (and binpacked between
jobs). When using spread the scheduler will attempt to place allocations
equally among the available values of the given target.
job "docs" {
# Spread allocations over all datacenter
spread {
attribute = "${node.datacenter}"
}
group "example" {
# Spread allocations over each rack based on desired percentage
spread {
attribute = "${meta.rack}"
target "r1" {
percent = 60
}
target "r2" {
percent = 40
}
}
}
}
Nodes are scored according to how closely they match the desired target percentage defined in the spread block. Spread scores are combined with other scoring factors such as bin packing.
A job or task group can have more than one spread criteria, with weights to express relative preference.
Spread criteria are treated as a soft preference by the Nomad scheduler. If no nodes match a given spread criteria, placement is still successful. To avoid scoring every node for every placement, allocations may not be perfectly spread. Spread works best on attributes with similar number of nodes: identically configured racks or similarly configured datacenters.
Spread may be expressed on attributes or client metadata. Additionally, spread may be specified at the job and group levels for ultimate flexibility. Job level spread criteria are inherited by all task groups in the job.
Updating the spread block is non-destructive. Updating a job specification
with only non-destructive updates will not migrate or replace existing
allocations.
Parameters
attribute(string: "")- Specifies the name or reference of the attribute to use. This can be any of the Nomad interpolated values.target(target: <required>)- Specifies one or more target percentages for each value of theattributein the spread block. If this is omitted, Nomad will spread allocations evenly across all values of the attribute.weight(integer:0)- Specifies a weight for the spread block. The weight is used during scoring and must be an integer between 0 to 100. Weights can be used when there is more than one spread or affinity block to express relative preference across them.
Target parameters
value(string:"")- Specifies a target value of the attribute from aspreadblock.percent(integer:0)- Specifies the percentage associated with the target value.
Comparison to spread scheduling algorithm
The spread block is not the same concept as setting the scheduler
algorithm to "spread" instead of "binpack". Setting the scheduler
algorithm impacts all jobs on a cluster (or node pool), and adjusts the tendency
of the scheduler to place workloads from different jobs on the same set of nodes
or not. The spread block impacts how the scheduler places allocations for a
given job.
Scheduling performance
For each allocation in a service and batch job, the Nomad scheduler iterates
over nodes until it finds a small number of feasible nodes. The scheduler then
scores those feasible nodes to find the best placement. The exact number of
nodes scored depends on the job specification. Using the affinity or spread
block can have a significant impact on scheduling performance.
No affinity or spread
When you omit the affinity or spread block, the batch job node limit is
two. For service jobs, the node limit is a minimum of two or the log2
of the total number of nodes in the datacenter and node pool.
You can reduce scheduling times by avoiding affinity and spread. Instead, rely on the default distribution of a job across multiple nodes. If this is not possible, you may consider reducing the size of the node pool or datacenter to reduce the number of nodes available for the scheduler to consider.
With affinity or spread
When you include the affinity or spread block, the scheduler scores
a number of nodes in the datacenter and node pool equal to the task group count,
with a maximum of 100 per allocation. This can result in order-of-magnitude
increases in scheduling times.
To increase placement randomization and reduce scheduler contention when using
affinity or spread, set the node-limit-for-feasibility-checks scheduler
configuration
option.
You may specify an upper limit on the number of feasible nodes Nomad should
consider when scheduling a job.
- Lower numbers result in better scheduler performance and more randomization of jobs across nodes.
- Higher numbers result in more deterministic application of spread or affinity.
Reducing the upper node default limit of 100 may reduce the increase in scheduling time, the tightness of binpacking, and how strongly the Nomad scheduler scores affinity or spread.
For a mathematical and graphical explanation of how node limit affects spread and affinity scheduling performance, refer to the GitHub pull request comments.
Monitoring
To monitor scheduling times potentially impacted by affinity or spread
blocks, examine the nomad.nomad.worker.invoke_scheduler.* found in the Key
Metrics table.
Examples
The following examples show different ways to use the spread block.
Even spread across data center
This example shows a spread block across the node's datacenter attribute. If we have
two datacenters us-east1 and us-west1, and a task group of count = 10,
Nomad will attempt to place 5 allocations in each datacenter.
spread {
attribute = "${node.datacenter}"
weight = 100
}
Spread with target percentages
This example shows a spread block that specifies one target percentage. If we
have three datacenters us-east1, us-east2, and us-west1, and a task group
of count = 10, Nomad will attempt to place 5 of the allocations in "us-east1",
and will spread the remaining among the other two datacenters.
spread {
attribute = "${node.datacenter}"
weight = 100
target "us-east1" {
percent = 50
}
}
This example shows a spread block that specifies target percentages for two
different datacenters. If we have two datacenters us-east1 and us-west1,
and a task group of count = 10, Nomad will attempt to place 6 allocations
in us-east1 and 4 in us-west1.
spread {
attribute = "${node.datacenter}"
weight = 100
target "us-east1" {
percent = 60
}
target "us-west1" {
percent = 40
}
}
Spread across multiple attributes
This example shows spread blocks with multiple attributes. Consider a Nomad cluster
where there are two datacenters us-east1 and us-west1, and each datacenter has nodes
with ${meta.rack} being r1 or r2. With the following spread block used on a job with count=12, Nomad
will attempt to place 6 allocations in each datacenter. Within a datacenter, Nomad will
attempt to place 3 allocations in nodes on rack r1, and 3 allocations in nodes on rack r2.
spread {
attribute = "${node.datacenter}"
weight = 50
}
spread {
attribute = "${meta.rack}"
weight = 50
}