Application scaling

Application scaling is a critical aspect of operating Nomad Enterprise as a shared service. It enables organizations to efficiently manage resources, maintain application performance, and optimize costs in dynamic environments. As workloads fluctuate, the ability to automatically adjust resources becomes paramount for ensuring operational efficiency.

Nomad Enterprise offers multiple scaling mechanisms to address the diverse needs of modern applications. These include horizontal scaling, which adjusts the number of task instances, and dynamic application sizing, which modifies resource allocations for individual tasks.

Resource allocation with resource blocks

Resource blocks in Nomad job specifications are crucial for ensuring optimal resource allocation and utilization. The definition of resource requirements for tasks guarantees the allocations are on nodes that have sufficient capacity to run your applications.

Focus on these primary and associated attributes when configuring required resources for your workload.

cpu & memory - resources{}
disk - emphemeral_disk{}
device type - device{}
Always specify CPU and memory requirements for each task. Allocate slightly more resources than the default minimum (CPU=100Mhz & Memory=300mb) to account for peak loads and prevent resource contention.
- Nomad automatically finds a suitable node with the available resources you specify. Ensured this before considering horizontal scaling.
Conduct thorough performance testing to determine precise resource requirements for your applications. Dynamic application sizing along with correct stress testing can give you an idea of required resources for your application.
emphemeral_disk{} is only used if you have data within the allocation itself. This does not include any mounted volumes or bind paths outside of the allocation directory. Ephemeral disks are perfect for data that you can rebuild if needed, such as an in-progress cache or a local copy of data.
- If your application data does not need to be ephemeral, however you need larger allocation disk size, consider using volumes.
- If your workload has specific device requirements, consider using the device{} block to request access to a GPU, FPGA, or TPU.
Use Numa settings for your application if needed.

Horizontal application scaling with Nomad Autoscaler

Horizontal scaling involves adding or removing instances of an application to handle varying loads. Manually increase the count parameter using UI, command-line tool, or API. This determines the number of instances within a group. Manual adjustments may be sufficient during your early Nomad adoption journey, but as your footprint grows and processes refine, we recommend you use Nomad's Autoscaler. The Autoscaler can dynamically adjust the number of instances based on predefined policies.

At a high level, the Autoscaler operates with three key components.

APM (Application Performance Monitoring): Collects metrics from either Nomad API, Prometheus, or Datadog. Nomad Enterprise operators configured this on the client node configuration file and in the Autoscaler job specification. Prometheus is the recommended APM. See the Observability section.
Strategy: Determines how to interpret metrics and make scaling decisions.
Target: Executes the scaling actions on the Nomad cluster. For application horizontal scaling, this block can be omitted as Nomad will populate them on job submission.

For more details, go to Nomad Autoscaler concepts page.

Horizontal scaling strategies

The Nomad Autoscaler provides several strategies for horizontal application scaling, each designed to address different scaling scenarios and requirements. This section details four strategies along with their use cases and recommendations to assist you in choosing the right strategy.

Threshold

The threshold strategy scales based on upper and lower bounds of a metric, increasing or decreasing when outside the respective bound. For example, adding instances when CPU utilization exceeds 80% and removing instances when it drops below 30%.

Use cases

Maintaining metrics within an acceptable range (for example memory usage 40-80%).
Applications where sudden spikes in demand require handling.
Useful for managing resources efficiently by scaling down during low demand periods.

Recommendations

Set appropriate upper and lower bounds based on application behavior and requirements.
Use wider thresholds for more stable scaling, narrower for more responsive scaling.
Combine with appropriate cool-down periods to prevent rapid scaling oscillations.
Tune thresholds based on historical and forecasted data (increased demand) to ensure optimal scaling.

Target value

The Target Value strategy aims to maintain a specific metric at a desired target value. Unlike the Threshold strategy, which is reactionary, the target value strategy is proactive. For example, keeping CPU utilization at 70%. Nomad adjusts the number of instances to achieve this target.

Use cases

Applications with dynamic workloads where maintaining a specific performance metric is crucial.
Helps in optimizing costs by scaling resources based on actual demand, avoiding over-provisioning.

Recommendations

Select the metric that best represents the application's performance and resource needs.
Use with metrics that have a clear correlation to instance count (for example CPU usage).
Implement appropriate cool-down periods to prevent oscillation.

Pass-through

The Pass-through strategy allows external systems or custom logic to dictate the number of instances. Nomad Enterprise acts as an executor, scaling the application based on the input it receives, removing computational power from the Nomad Enterprise cluster nodes.

Use cases

Use custom scaling logic such as integrating with external monitoring tools.
Direct mapping of metrics to instance count (for example one instance per active user).
Scaling based on external systems or custom metrics.

Recommendations

Ensure the metric directly correlates to the desired instance count.
Implement safeguards (max_scale_up/down) to prevent rapid over or under-scaling.
Ensure robust integration and failure testing between Nomad and the external system to avoid discrepancies in scaling decisions.

Fixed value

Used for client node scaling and not relevant to application scaling.

When choosing strategy, understand your workload thoroughly and select a strategy that aligns with your application's behavior and scaling needs. Start with conservative scaling strategies and as you fine tune and observe the scaling behavior, consider combining multiple strategies for more nuanced scaling decisions. Regular monitoring and adjustment of your scaling configurations are essential as your application and workload evolve.

Be mindful of the resource implications to ensure your cluster can handle the potential maximum scale and implement safeguards by setting appropriate minimum and maximum values to prevent under or over-scaling, and use suitable cool-down periods to avoid rapid scaling oscillations. Implement [sentinel], [quotas], and [node pool governance] as guardrails.

Thorough testing, including simulations of various load scenarios, is vital to verify the effectiveness of your scaling strategies.

Evaluation interval and cool-down period

Evaluation interval and cool-down period are parameters that impact the responsiveness and stability of your scaling operations.

Evaluation Interval

How often Nomad Autoscaler looks at your metrics to decide if it needs to scale up or down. It affects how your system can respond to changes, how often it might scale, and how much work the Autoscaler itself has to do. Set it too short, and you might be scaling unnecessarily often. Too long, and you might miss important spikes and cause performance degradation.

When choosing an interval, consider the following.

In most cases, the default settings will be sufficient, however critical and volatile workloads may require shorter intervals for quicker responses.
Shorter intervals (for example 5 to 30 seconds) increase the load on the Autoscaler and metric sources, however it can lead to more accurate scaling decisions, optimizing resource utilization for the applications.
Use longer intervals (for example 5 to 10 minutes) if there are resource constraints on the Autoscaler or if your application can tolerate slower response times to workload changes. This can help reduce the load on the Autoscaler while still meeting the scaling needs of less time-sensitive applications.

Cool-down period

The cool-down period is a wait time enforced after each scaling action, during which Nomad Enterprise prevents new scaling operations. Serving to prevent rapid successive scaling actions, this allows time for the system to stabilize after a scaling event, and reduce unnecessary scaling actions due to temporary spikes or dips in metrics.

When choosing a cool-down period, consider the following.

Should be longer than the time it takes for new instances to become fully operational.
For workloads with predictable patterns, a longer cool-down period (for example 10 to 15 minutes) can be effective. For more volatile workloads, a shorter period (for example 2 to 5 minutes) might be necessary.
Consider how long it takes for metrics to stabilize and reflect the impact of a scaling action.

Scaling policy examples

Below are basic and common scaling policies which we aim to be sufficient as a starting point as you refine your scaling logic over time.

Memory and CPU scaling

job "app" {
  datacenters = ["dc1"]

  group "app" {
    count = 1

    scaling {
      enabled = true
      min     = 1
      max     = 10

      policy {
        cooldown = "10m"
        evaluation_interval = "5m"

        check "memory_usage" {
            source = "prometheus"
            query = "nomad_client_allocs_memory_allocated{alloc_id=\"${NOMAD_ALLOC_ID}\",task_group=\"${NOMAD_GROUP_NAME}\"}"

            strategy "threshold" {
                upper_bound = 512
                lower_bound = 400
            }
        }
        check "cpu_usage" {
            source = "prometheus"
            query = "nomad_client_allocs_cpu_allocated{alloc_id=\"${NOMAD_ALLOC_ID}\",task_group=\"${NOMAD_GROUP_NAME}\"}"

            strategy "threshold" {
                upper_bound = 1000
                lower_bound = 700
            }
        }
      }
    }
    # ...
  }
}

In the preceding definition, Nomad Enterprise adds new instances when the resource usage meets the upper_bound thresholds 512mb of memory and 1000mhz CPU, until reaching the defined max count.

Nomad Enterprise removes instances if resource usage meets the lower_bound of 400mb of memory and 700mhz CPU. Short evaluation interval (30 seconds) allows for responsive scaling. Also, Nomad Enterprise sets the evaluation interval to a conservative 5 minutes to respond to changes in traffic and sets the cooldown period to 10 minutes to allow the system to stabilize after each scaling action.

Custom metrics

job "app" {
  datacenters = ["dc1"]

  group "app" {
    count = 1

    scaling {
      enabled = true
      min     = 3
      max     = 10

      policy {
        cool-down = "90s"
        evaluation_interval = "15s"

        check "request_rate" {
            source = "prometheus"
            query = "sum(rate(http_requests_total{}[1m]))"

            strategy "target-value" {
                target = 100
            }
        }
      }
    }
    # ...
  }
}

This example shows an API service which requires an aggressive scaling policy using a custom prometheus metric that is specific to that application. A scaling event (either up or down) triggers if required, to keep the target value of 100 requests per minute with a minimum of 3 instances. A short evaluation interval (15 seconds) allows reaction to traffic changes and a minimum of 3 instances ensures high availability. Also, set the evaluation_interval to 15 seconds to respond quickly to changes in traffic. If we assume a well-tested and stable API service, we can confidently set the cooldown to 90 seconds to stabilize after each scaling action over a short period of time.

Spread considerations

The spread block allows application owners to increase the failure tolerance of their applications. Specify a node attribute to spread allocations over. This allows operators to spread allocations over attributes such as datacenter, availability zone, or even rack in a physical datacenter using metadata.

In a self-service environment, end users deploying applications into Nomad may not always be aware of cluster-level spread configurations. As a best practice, it is advisable to include a spread block in your job file when you have specific distribution requirements. This approach ensures you meet your application's distribution needs, regardless of any cluster-level settings.

Note

When scaling down (reducing the number of allocations), the Autoscaler does not consider the spread configuration or metadata when choosing which allocations to remove. Instead, it uses allocation IDs to determine the order of removal.

For example, if you have:

Allocations 1, 2, 3 on Node 1
Allocations 4, 5, 6 on Node 2
Allocations 7, 8, 9 on Node 3

When scaling down, Nomad Enterprise removes allocations in reverse order (9, 8, 7, and so on) rather than maintaining an even distribution across nodes. is behavior can potentially impact the availability and balance of your application across the cluster during scale-down operations.

When you have specific distribution requirements, always include a spread block in your job file. This ensures you meet your requirements regardless of cluster-level configurations.
Be aware of the current limitation in scale-down operations. You may need to implement additional logic or monitoring to maintain desired distribution during scale-down events.
You can use multiple spread blocks to create more complex distribution strategies. For example, spreading across both datacenters and rack IDs.
Monitor the actual distribution of your allocations to ensure they align with your intended spread configuration.

Dynamic application sizing

Dynamic application sizing (DAS) allows Nomad to provide CPU and memory recommendations to tasks based on their actual usage. This feature complements horizontal scaling by optimizing resource utilization at the individual task level.

The scaling stanza is the same for horizontal scaling. It applies, however, to the task level for vertical scaling using DAS and three additional strategies.

Refer to the Dynamic App Sizing Concept and Dynamic App Sizing tutorial for an overview of the concepts, how to configure, and recommendations.

While DAS does not automatically apply recommended resource changes to tasks, you can implement automation to streamline the process and reduce manual intervention. Consider the following approaches if you are confident in your metrics and DAS configuration.

If the native Nomad Enterprise UI is not sufficient, then you can utilize the Nomad API to retrieve recommendations and display them in a custom dashboard for efficient monitoring and decision-making.
Implement alerts based on DAS recommendations or significant resource changes. Direct these alerts to Nomad Enterprise operators, application owners, or both, ensuring swift awareness of potential optimizations.
Incorporate DAS recommendations into your deployment process for new versions to ensure they receive the latest recommended resource allocations.
Configure DAS individually for each group or task, rather than through a global setting that affects all jobs within your Nomad environment, so that there is no single, overarching setting that can enable DAS across all jobs simultaneously. This means you must modify each job specification in your deployment pipeline to incorporate the appropriate DAS configuration blocks.

Stateful workload considerations

When implementing autoscaling for stateful applications in Nomad Enterprise, consider the following factors to ensure data integrity and performance. Be aware that certain stateful applications may have inherent limitations on how they scale. Be aware that some stateful applications may not be a good fit for autoscaling.

Storage

The underlying storage is crucial for stateful applications when autoscaling. For container workloads, leverage Nomad Enterprise's Container Storage Interface (CSI) support to dynamically attach storage to instances as your workload scales. This gives you the flexibility to manage storage resources on-the-fly, adapting to your application's changing needs.

Networking

While networking is generally less of a concern when autoscaling within the same Nomad Enterprise cluster, some stateful applications require stable network identities (for example IP addresses) for client connections or inter-node communication. Be aware of these requirements when designing your scaling strategy and collaborate with your networking team.

State consistency

Synchronize new instances with the current state of the application, which can be time-consuming and resource-intensive. Consider the performance impact on your stateful application during scale events and plan accordingly.

Also implement conservative scaling policies that allow sufficient time for data synchronization and state management between scaling events.

Clustered applications

For clustered applications, utilize variable locks to prevent split-brain scenarios and data corruption during scale-up and scale-down events.

Kill timeout

Ensure your application has proper shutdown procedures to persist state in a safe manner before scaling down. Use kill_timeout to allow sufficient time for graceful shutdowns and consider max_kill_timeout set on the client configuration to ensure job authors do not exceed this amount.

Service discovery

Implement Service Discovery or Service Mesh to dynamically manage network identities and route traffic appropriately.

As always, use a well-tested autoscaling development environment to identify and resolve any issues before implementing in production. This is critical for stateful applications due to their data persistence requirements. For additional details not related to autoscaling, view the Considerations for Stateful Workloads page.

Standardization

To promote a self service model, consider implementing job templates or utilizing Nomad Pack to provide a consistent deployment workflow to include standard metrics across applications while still allowing for customization.

An alternative method involves creating a global scaling policy within the Autoscaler agent configuration. However, it is important to recognize that a one-size-fits-all approach may not be suitable for all applications.

In many cases, it is more effective to allow job authors to determine the specific scaling requirements for their applications.

Operators can then use features such as Sentinel, resource quotas, and node pool governance to establish appropriate guardrails. This balanced approach ensures flexibility for individual application needs while maintaining overall system integrity and resource management.

Workload Orchestration

Service Discovery