Well-Architected Framework
Configure monitoring agent on container orchestrators
Monitoring container orchestrators, like Kubernetes and Nomad, is important for keeping your clusters and services healthy, and lets you sustain high performance and reliability. The built-in telemetry data from these tools doesn't provide much value alone; you need a monitoring tool to collect, parse, and alert on raw telemetry data. By setting up monitoring agents, you can get valuable insights into how your clusters and services are functioning.
Track metrics about Kubernetes cluster nodes, like CPU and memory usage, to understand if the nodes are healthy and have enough resources. Monitor application-level metrics, like request latency and error rates, to ensure the services are running smoothly. Tools like Prometheus and Grafana let you collect and visualize these metrics.
Track Nomad cluster node metrics like resource usage to optimize resources and keep the cluster stable, and identify any performance bottlenecks. Nomad’s integration with Prometheus collects and analyzes cluster metrics, providing insights into the cluster health and performance. Monitor Nomad job metrics so you know if jobs execute smoothly. Using monitoring tools like Prometheus and Grafana with Nomad lets you comprehensively monitor the entire system - both the cluster itself and all running jobs.
HashiCorp resources:
- The Terraform Datadog provider tutorial shows you how to use Terraform to deploy an application in EKS and install the DataDog agent across the Kubernetes cluster.
- For node-level Nomad metrics, refer to the following resources:
- The Nomad Prometheus tutorial guides you through configuring Prometheus to integrate with a Nomad cluster. This tutorial covers how to gather node-level metrics.
- The Monitoring Nomad, Metrics reference, Nomad autoscaler documentation, and Nomad telemetry block documentation provide a deep dive into the telemetry and metrics that Nomad has to offer.
- The Collect resource utilization metrics shows you how to view naive Nomad job usage for simple service level metrics.
External resources:
- Kubernetes provides resources to learn more about tools that help you monitor Kubernetes resources and node health.
- The Nomad integration for Grafana includes two pre-built dashboards to help monitor and visualize Nomad metrics.
Next steps
In this section of Setup monitoring agents, you learned how to configure and deploy monitoring agents for containers. Configure monitoring agent on container orchestrators is part of the Define and automate processes pillar.