spread block in the job specification

Placement	`job -> spread` `job -> group -> spread`

The spread block allows operators to increase the failure tolerance of their applications by specifying a node attribute that allocations should be spread over. This allows operators to spread allocations over attributes such as datacenter, availability zone, or even rack in a physical datacenter.

By default, when spread is omitted, the scheduler will attempt to place allocations from the same job on different nodes (and binpacked between jobs). When using spread the scheduler will attempt to place allocations equally among the available values of the given target.

job "docs" {
  # Spread allocations over all datacenter
  spread {
    attribute = "${node.datacenter}"
  }

  group "example" {
    # Spread allocations over each rack based on desired percentage
      spread {
        attribute = "${meta.rack}"
        target "r1" {
          percent = 60
        }
        target "r2" {
          percent = 40
        }
      }
  }
}

Nodes are scored according to how closely they match the desired target percentage defined in the spread block. Spread scores are combined with other scoring factors such as bin packing.

A job or task group can have more than one spread criteria, with weights to express relative preference.

Spread criteria are treated as a soft preference by the Nomad scheduler. If no nodes match a given spread criteria, placement is still successful. To avoid scoring every node for every placement, allocations may not be perfectly spread. Spread works best on attributes with similar number of nodes: identically configured racks or similarly configured datacenters.

Spread may be expressed on attributes or client metadata. Additionally, spread may be specified at the job and group levels for ultimate flexibility. Job level spread criteria are inherited by all task groups in the job.

Updating the spread block is non-destructive. Updating a job specification with only non-destructive updates will not migrate or replace existing allocations.

Parameters

attribute (string: "") - Specifies the name or reference of the attribute to use. This can be any of the Nomad interpolated values.
target (target: <required>) - Specifies one or more target percentages for each value of the attribute in the spread block. If this is omitted, Nomad will spread allocations evenly across all values of the attribute.
weight (integer:0) - Specifies a weight for the spread block. The weight is used during scoring and must be an integer between 0 to 100. Weights can be used when there is more than one spread or affinity block to express relative preference across them.

Target parameters

value (string:"") - Specifies a target value of the attribute from a spread block.
percent (integer:0) - Specifies the percentage associated with the target value.

Comparison to `spread` scheduling algorithm

The spread block is not the same concept as setting the scheduler algorithm to "spread" instead of "binpack". Setting the scheduler algorithm impacts all jobs on a cluster (or node pool), and adjusts the tendency of the scheduler to place workloads from different jobs on the same set of nodes or not. The spread block impacts how the scheduler places allocations for a given job.

Scheduling performance

For each allocation in a service and batch job, the Nomad scheduler iterates over nodes until it finds a small number of feasible nodes. The scheduler then scores those feasible nodes to find the best placement. The exact number of nodes scored depends on the job specification. Using the affinity or spread block can have a significant impact on scheduling performance.

No affinity or spread

When you omit the affinity or spread block, the batch job node limit is two. For service jobs, the node limit is a minimum of two or the log₂ of the total number of nodes in the datacenter and node pool.

You can reduce scheduling times by avoiding affinity and spread. Instead, rely on the default distribution of a job across multiple nodes. If this is not possible, you may consider reducing the size of the node pool or datacenter to reduce the number of nodes available for the scheduler to consider.

With affinity or spread

When you include the affinity or spread block, the scheduler scores a number of nodes in the datacenter and node pool equal to the task group count, with a maximum of 100 per allocation. This can result in order-of-magnitude increases in scheduling times.

To increase placement randomization and reduce scheduler contention when using affinity or spread, set the node-limit-for-feasibility-checks scheduler configuration option. You may specify an upper limit on the number of feasible nodes Nomad should consider when scheduling a job.

Lower numbers result in better scheduler performance and more randomization of jobs across nodes.
Higher numbers result in more deterministic application of spread or affinity.

Reducing the upper node default limit of 100 may reduce the increase in scheduling time, the tightness of binpacking, and how strongly the Nomad scheduler scores affinity or spread.

For a mathematical and graphical explanation of how node limit affects spread and affinity scheduling performance, refer to the GitHub pull request comments.

Monitoring

To monitor scheduling times potentially impacted by affinity or spread blocks, examine the nomad.nomad.worker.invoke_scheduler.* found in the Key Metrics table.

Examples

The following examples show different ways to use the spread block.

Even spread across data center

This example shows a spread block across the node's datacenter attribute. If we have two datacenters us-east1 and us-west1, and a task group of count = 10, Nomad will attempt to place 5 allocations in each datacenter.

spread {
  attribute = "${node.datacenter}"
  weight    = 100
}

Spread with target percentages

This example shows a spread block that specifies one target percentage. If we have three datacenters us-east1, us-east2, and us-west1, and a task group of count = 10, Nomad will attempt to place 5 of the allocations in "us-east1", and will spread the remaining among the other two datacenters.

spread {
  attribute = "${node.datacenter}"
  weight    = 100

  target "us-east1" {
    percent = 50
  }
}

This example shows a spread block that specifies target percentages for two different datacenters. If we have two datacenters us-east1 and us-west1, and a task group of count = 10, Nomad will attempt to place 6 allocations in us-east1 and 4 in us-west1.

spread {
  attribute = "${node.datacenter}"
  weight    = 100

  target "us-east1" {
    percent = 60
  }

  target "us-west1" {
      percent = 40
  }
}

Spread across multiple attributes

This example shows spread blocks with multiple attributes. Consider a Nomad cluster where there are two datacenters us-east1 and us-west1, and each datacenter has nodes with ${meta.rack} being r1 or r2. With the following spread block used on a job with count=12, Nomad will attempt to place 6 allocations in each datacenter. Within a datacenter, Nomad will attempt to place 3 allocations in nodes on rack r1, and 3 allocations in nodes on rack r2.

spread {
  attribute = "${node.datacenter}"
  weight    = 50
}
spread {
  attribute = "${meta.rack}"
  weight    = 50
}