Nomad
Monitor job service metrics with Prometheus, Grafana, and Consul
This tutorial explains how to configure Prometheus and Grafana to integrate with a Consul service mesh deployed with Nomad. While this tutorial introduces the basics of enabling mesh telemetry, you can also use this data to customize dashboards and set autoscaling rules for alerting.
When deploying a service mesh using Nomad and Consul, one of the benefits is the ability to collect service-to-service traffic telemetry emitted by Envoy sidecar proxies. This includes data such as request count, traffic rate, connections, response codes, and more.
In this tutorial you will deploy Grafana and Prometheus within the mesh, setup intentions, and
configure an ingress to enable access. You will configure Consul service discovery for targets
in Prometheus so that services are automatically scraped as they are deployed. A
Consul ingress gateway will load-balance the Prometheus deployment and provide access to the
web interfaces of Prometheus and Grafana on ports 8081 and 3000 respectively.
Prometheus telemetry on Envoy sidecars
The Prometheus configuration can either be done directly in Consul using proxy-defaults or per service within the Nomad job specification. This tutorial will cover configuration within the Nomad jobspec.
For a point of comparison and reference, enabling proxy metrics globally in a Consul
datacenter can be done with the following configuration and the Consul CLI command
consul config write ./<path_to_configuration_file>.
Prerequisites
For this tutorial, you will need:
- A Nomad environment with Consul installed. The Nomad project provides Terraform configuration to deploy a cluster on AWS.
Ensure that the NOMAD_ADDR and CONSUL_ADDR environment variables are set
appropriately.
Create the Nomad jobs
Use the jobspec files below to create jobs for:
- two web applications to simulate traffic flows between envoy proxies
- an ingress controller to monitor traffic coming into the mesh
- Prometheus to collect the envoy metrics
- Grafana to act as a visualization frontend for Prometheus
Create the foo web application job
The first web application job configures a "foo" service. Take note of these three specific configurations.
- A dynamic port to send traffic to Prometheus' default port of - 9102.
- A - metaattribute set in the- serviceblock that uses the dynamic port set. This port will be present in the Consul service registration that Prometheus will use to discover the proxy.
- A - sidecar_serviceto bind the Prometheus endpoint to the dynamic port.
Create a file with the name foo.nomad.hcl, add the following contents to it, and save the file.
foo.nomad.hcl
job "foo" {
  datacenters = ["dc1"]
  type = "service"
  group "foo" {
    count = 1
    network {
      mode = "bridge"
      port "expose" {}
## 1. This opens up a dynamic port to the envoy metrics      
      port "envoy_metrics" {
        to = 9102
      }       
    }        
    service {
      name = "foo"
      port = 9090
## 2. This is used by prometheus to interpolate the dynamic port
      meta {
        envoy_metrics_port = "${NOMAD_HOST_PORT_envoy_metrics}"
      }
      
      check {
        expose   = true
        type     = "http"
        path     = "/health"
        interval = "30s"
        timeout  = "5s"
      }
       
      connect {
        sidecar_service {
          proxy {     
            config {
  ## 3. Instruct envoy to enable prometheus metrics on /metrics
              envoy_prometheus_bind_addr = "0.0.0.0:9102"
            }                 
            upstreams {
              destination_name = "bar"
              local_bind_port  = 9091
            }
          }          
        }
      }
    }        
    
    task "foo" {
      driver = "docker"
      config {
        image   = "nicholasjackson/fake-service:v0.26.0"
      }
      env {
        UPSTREAM_URIS = "http://127.0.0.1:9091"
        NAME = "foo"
        MESSAGE = "foo service"
        ERROR_RATE = "0.2"
        ERROR_DELAY = "0.3s"
        TIMING_VARIANCE = "10"
      }    
    }
  }
}
Submit the job to Nomad.
$ nomad job run foo.nomad.hcl
Create the bar web application job
The bar service jobspec is similar to the foo service jobspec.
Create a file with the name bar.nomad.hcl, add the following contents to it, and save the file.
bar.nomad.hcl
job "bar" {
  datacenters = ["dc1"]
  type = "service"
  group "bar" {
    count = 1
    network {
      mode = "bridge"
      port "expose" {}
      port "envoy_metrics" {
        to = 9102
      }       
    }        
    service {
      name = "bar"
      port = 9090
      meta {
        envoy_metrics_port = "${NOMAD_HOST_PORT_envoy_metrics}"
      }
      
      check {
        expose   = true
        type     = "http"
        path     = "/health"
        interval = "30s"
        timeout  = "5s"
      }
      connect {
        sidecar_service {
          proxy {
            config {
              envoy_prometheus_bind_addr = "0.0.0.0:9102"
            }
          }
        }
      }
    }        
    
    task "bar" {
      driver = "docker"
      config {
        image   = "nicholasjackson/fake-service:v0.26.0"
      }
      env {
        NAME = "bar"
        MESSAGE = "bar service"
        ERROR_RATE = "0.2"
        ERROR_DELAY = "0.3s"
        RATE_LIMIT = "10"     
        RATE_LIMIT_CODE = "429"
        TIMING_VARIANCE = "20" 
      }      
    }
  }
}
Submit the job to Nomad.
$ nomad job run bar.nomad.hcl
Create the ingress controller job
The ingress controller is a system job so it deploys on all client nodes.
Create a file with the name ingress-controller.nomad.hcl, add the following contents to it,
and save the file.
ingress-controller.nomad.hcl
job "ingress-controller" {
    
  type = "system"
  group "consul-ingress-controller" {
    network {
      mode = "bridge"
      port "app" {
        static = 8080
        to     = 8080
      }
      port "prometheus" {
        static = 8081
        to     = 8081
      }
      port "grafana" {
        static = 3000
        to     = 3000
      }      
      port "envoy_metrics" {
        to = 9102
      }          
    }
    service {
      name = "consul-ingress-controller"
      port = "8080"
      
      meta {
        envoy_metrics_port = "${NOMAD_HOST_PORT_envoy_metrics}"
      }
      connect {
        gateway {
          proxy {
            config {
              envoy_prometheus_bind_addr = "0.0.0.0:9102"
            }            
          }
          ingress {
            listener {
              port     = 8080
              protocol = "http"
              service {
                hosts = ["*"]                
                name = "foo"
              }
            }
            listener {
              port     = 8081
              protocol = "http"       
              service {
                hosts = ["*"]                                
                name = "prometheus"
              }
            } 
            listener {
              port     = 3000
              protocol = "http"              
              service {
                hosts = ["*"]
                name = "grafana"
              }
            }                        
          }
        }
      }
    }
  }
}
Submit the job to Nomad.
$ nomad job run ingress-controller.nomad.hcl
Create the Prometheus job
The Prometheus job uses the template stanza to create the Prometheus configuration file.
It has the attr.unique.network.ip-address attribute in the
consul_sd_config section that allows Prometheus to use Consul to detect and scrape
targets automatically. It works in this example because the Consul client is
running on the same virtual machine as Nomad.
The relabel_configs section lets you replace the default application port with the
dynamic envoy port to scrape data from. 
The volumes attribute of the Nomad task block takes the configuration file
that the template stanza dynamically creates and places it in the Prometheus container.
Create a file with the name prometheus.nomad.hcl, add the following contents to it,
and save the file.
prometheus.nomad.hcl
job "prometheus" {
  type = "service"
  group "prometheus" {
    count = 1
    network {
      mode = "bridge"
      port "expose" {}
      port "envoy_metrics" {
        to = 9102
      }       
    }
    restart {
      attempts = 2
      interval = "30m"
      delay    = "15s"
      mode     = "fail"
    }
    ephemeral_disk {
      size = 300
      migrate = true
      sticky  = true
    }
    task "prometheus" {
      template {
        change_mode = "noop"
        destination = "local/prometheus.yml"
        data = <<EOH
---
global:
  scrape_interval:     5s
  evaluation_interval: 5s
scrape_configs:
  - job_name: 'Consul Connect Metrics'
    metrics_path: "/metrics"
    consul_sd_configs:
    - server: "{{ env "attr.unique.network.ip-address" }}:8500"
    relabel_configs:
      - source_labels: [__meta_consul_service]
        action: drop
        regex: (.+)-sidecar-proxy
      - source_labels: [__meta_consul_service_metadata_envoy_metrics_port]
        action: keep
        regex: (.+)
      - source_labels: [__address__, __meta_consul_service_metadata_envoy_metrics_port]
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
EOH
      }
      driver = "docker"
      config {
        image = "prom/prometheus:latest"
        args = [
          "--config.file=/local/prometheus.yml",
          "--storage.tsdb.path=/alloc/data",
          "--web.listen-address=0.0.0.0:9090",
          "--web.external-url=/",
          "--web.console.libraries=/usr/share/prometheus/console_libraries",
          "--web.console.templates=/usr/share/prometheus/consoles"
        ]        
        volumes = [
          "local/prometheus.yml:/etc/prometheus/prometheus.yml",
        ]
      }
    }
    service {
      name = "prometheus"
      port = "9090"
      check {
        name     = "prometheus_ui port alive"
        expose   = true
        type     = "http"
        path     = "/-/healthy"
        interval = "10s"
        timeout  = "2s"
      }
      connect {
        sidecar_service {}
      }
    }
  }
}
Submit the job to Nomad.
$ nomad job run prometheus.nomad.hcl
Create the Grafana job
Create a file with the name grafana.nomad.hcl, add the following contents to it,
and save the file.
grafana.nomad.hcl
job "grafana" {
  group "grafana" {
    count = 1
    network {
      mode = "bridge"
      port "expose" {}
    }
    service {
      name = "grafana"
      port = "3000"
      meta {
        metrics_port = "${NOMAD_HOST_PORT_expose}"
      }
      check {
        expose   = true
        type     = "http"
        name     = "grafana"
        path     = "/api/health"
        interval = "30s"
        timeout  = "10s"
      }
      connect {
        sidecar_service {
          proxy {
            expose {
              path {
                path            = "/metrics"
                protocol        = "http"
                local_path_port = 9102
                listener_port   = "expose"
              }
            }             
            upstreams {
                destination_name = "prometheus"
                local_bind_port  = 9090
            } 
          }
        }
      }
    }
    task "grafana" {
      driver = "docker"
      config {
        image = "grafana/grafana:latest"
        volumes = [
          "local/provisioning/prom.yml:/etc/grafana/provisioning/datasources/prometheus.yml"
        ]
      }
      env {
        GF_PATHS_CONFIG = "/local/config.ini"
        GF_PATHS_PROVISIONING = "/local/provisioning"
      }
      template {
        destination = "local/config.ini"
        data        = <<EOF
[database]
type = sqlite3
[server]
EOF
      }
      template {
        destination = "local/provisioning/datasources/prom.yml"
        data        = <<EOF
apiVersion: 1
datasources:
- name: Prometheus
  type: prometheus
  access: proxy
  url: http://localhost:9090
  isDefault: true
  editable: false
EOF         
      perms = "777"
      }
    }
  }
}
Submit the job to Nomad.
$ nomad job run grafana.nomad.hcl
Access and configure Grafana
Grafana is available via the ingress gateway on port 3000. Use the
nomad service info command to get the IP address of the
client running Grafana.
$ nomad service info grafana     
Job ID   Address              Tags  Node ID   Alloc ID
grafana  192.168.50.210:3000  []    94dabfe7  e797357e
The default username and password for Grafana are both admin. Grafana requires a password change
on initial login. Choose and set a new password for the admin user and make a note of it.
Deploy an envoy dashboard
An envoy clusters dashboard is available from the Grafana dashboard marketplace.
Navigate to the dashboards page, click on the New button, then click on Import.
Enter 11021 in the field with the placeholder text Grafana.com dashboard URL or ID, click
Load, then click Import to finish the process.
The dashboard displays aggregated Envoy health information and traffic flows.
Simulate traffic
Simulate traffic to the cluster by making requests to either
of the client nodes on port 8080. 
Open the dashboard in Grafana to see requests, connections, and traffic volume on the time series panels.
Next steps
In this tutorial, you deployed Grafana and Prometheus within the Consul service mesh, set up intentions, configured an ingress to enable access, and configured Consul service discovery to allow automatic scraping of targets in Prometheus.
For more information, check out the additional resources below.