Monitor job service metrics with Prometheus, Grafana, and Consul

9min
|
Nomad
Consul

This tutorial explains how to configure Prometheus and Grafana to integrate with a Consul service mesh deployed with Nomad. While this tutorial introduces the basics of enabling mesh telemetry, you can also use this data to customize dashboards and set autoscaling rules for alerting.

When deploying a service mesh using Nomad and Consul, one of the benefits is the ability to collect service-to-service traffic telemetry emitted by Envoy sidecar proxies. This includes data such as request count, traffic rate, connections, response codes, and more.

In this tutorial you will deploy Grafana and Prometheus within the mesh, setup intentions, and configure an ingress to enable access. You will configure Consul service discovery for targets in Prometheus so that services are automatically scraped as they are deployed. A Consul ingress gateway will load-balance the Prometheus deployment and provide access to the web interfaces of Prometheus and Grafana on ports 8081 and 3000 respectively.

Prometheus telemetry on Envoy sidecars

The Prometheus configuration can either be done directly in Consul using proxy-defaults or per service within the Nomad job specification. This tutorial will cover configuration within the Nomad jobspec.

For a point of comparison and reference, enabling proxy metrics globally in a Consul datacenter can be done with the following configuration and the Consul CLI command consul config write ./<path_to_configuration_file>.

Prerequisites

For this tutorial, you will need:

A Nomad environment with Consul installed. The Nomad project provides Terraform configuration to deploy a cluster on AWS.

Ensure that the NOMAD_ADDR and CONSUL_ADDR environment variables are set appropriately.

Create the Nomad jobs

Use the jobspec files below to create jobs for:

two web applications to simulate traffic flows between envoy proxies
an ingress controller to monitor traffic coming into the mesh
Prometheus to collect the envoy metrics
Grafana to act as a visualization frontend for Prometheus

Create the `foo` web application job

The first web application job configures a "foo" service. Take note of these three specific configurations.

A dynamic port to send traffic to Prometheus' default port of 9102.
A meta attribute set in the service block that uses the dynamic port set. This port will be present in the Consul service registration that Prometheus will use to discover the proxy.
A sidecar_service to bind the Prometheus endpoint to the dynamic port.

Create a file with the name foo.nomad.hcl, add the following contents to it, and save the file.

foo.nomad.hcl

job "foo" {

  datacenters = ["dc1"]
  type = "service"

  group "foo" {
    count = 1

    network {
      mode = "bridge"
      port "expose" {}
## 1. This opens up a dynamic port to the envoy metrics      
      port "envoy_metrics" {
        to = 9102
      }       
    }        


    service {
      name = "foo"
      port = 9090

## 2. This is used by prometheus to interpolate the dynamic port
      meta {
        envoy_metrics_port = "${NOMAD_HOST_PORT_envoy_metrics}"
      }
      
      check {
        expose   = true
        type     = "http"
        path     = "/health"
        interval = "30s"
        timeout  = "5s"
      }
       
      connect {
        sidecar_service {
          proxy {     
            config {
  ## 3. Instruct envoy to enable prometheus metrics on /metrics
              envoy_prometheus_bind_addr = "0.0.0.0:9102"
            }                 
            upstreams {
              destination_name = "bar"
              local_bind_port  = 9091
            }
          }          
        }
      }
    }        
    
    task "foo" {
      driver = "docker"

      config {
        image   = "nicholasjackson/fake-service:v0.26.0"
      }
      env {
        UPSTREAM_URIS = "http://127.0.0.1:9091"
        NAME = "foo"
        MESSAGE = "foo service"
        ERROR_RATE = "0.2"
        ERROR_DELAY = "0.3s"
        TIMING_VARIANCE = "10"
      }    
    }
  }
}

Submit the job to Nomad.

$ nomad job run foo.nomad.hcl

Create the `bar` web application job

The bar service jobspec is similar to the foo service jobspec.

Create a file with the name bar.nomad.hcl, add the following contents to it, and save the file.

bar.nomad.hcl

job "bar" {

  datacenters = ["dc1"]
  type = "service"

  group "bar" {
    count = 1

    network {
      mode = "bridge"
      port "expose" {}
      port "envoy_metrics" {
        to = 9102
      }       
    }        


    service {
      name = "bar"
      port = 9090

      meta {
        envoy_metrics_port = "${NOMAD_HOST_PORT_envoy_metrics}"
      }
      
      check {
        expose   = true
        type     = "http"
        path     = "/health"
        interval = "30s"
        timeout  = "5s"
      }

      connect {
        sidecar_service {
          proxy {
            config {
              envoy_prometheus_bind_addr = "0.0.0.0:9102"
            }
          }
        }
      }
    }        
    
    task "bar" {
      driver = "docker"

      config {
        image   = "nicholasjackson/fake-service:v0.26.0"
      }
      env {
        NAME = "bar"
        MESSAGE = "bar service"
        ERROR_RATE = "0.2"
        ERROR_DELAY = "0.3s"
        RATE_LIMIT = "10"     
        RATE_LIMIT_CODE = "429"
        TIMING_VARIANCE = "20" 
      }      
    }
  }
}

Submit the job to Nomad.

$ nomad job run bar.nomad.hcl

Create the ingress controller job

The ingress controller is a system job so it deploys on all client nodes.

Create a file with the name ingress-controller.nomad.hcl, add the following contents to it, and save the file.

ingress-controller.nomad.hcl

job "ingress-controller" {
    
  type = "system"

  group "consul-ingress-controller" {
    network {
      mode = "bridge"
      port "app" {
        static = 8080
        to     = 8080
      }
      port "prometheus" {
        static = 8081
        to     = 8081
      }
      port "grafana" {
        static = 3000
        to     = 3000
      }      
      port "envoy_metrics" {
        to = 9102
      }          
    }

    service {
      name = "consul-ingress-controller"
      port = "8080"
      
      meta {
        envoy_metrics_port = "${NOMAD_HOST_PORT_envoy_metrics}"
      }

      connect {
        gateway {
          proxy {
            config {
              envoy_prometheus_bind_addr = "0.0.0.0:9102"
            }            
          }
          ingress {
            listener {
              port     = 8080
              protocol = "http"
              service {
                hosts = ["*"]                
                name = "foo"
              }
            }
            listener {
              port     = 8081
              protocol = "http"       
              service {
                hosts = ["*"]                                
                name = "prometheus"
              }
            } 
            listener {
              port     = 3000
              protocol = "http"              
              service {
                hosts = ["*"]
                name = "grafana"
              }
            }                        
          }
        }
      }
    }
  }
}

Submit the job to Nomad.

$ nomad job run ingress-controller.nomad.hcl

Create the Prometheus job

The Prometheus job uses the template stanza to create the Prometheus configuration file. It has the attr.unique.network.ip-address attribute in the consul_sd_config section that allows Prometheus to use Consul to detect and scrape targets automatically. It works in this example because the Consul client is running on the same virtual machine as Nomad.

The relabel_configs section lets you replace the default application port with the dynamic envoy port to scrape data from.

The volumes attribute of the Nomad task block takes the configuration file that the template stanza dynamically creates and places it in the Prometheus container.

Create a file with the name prometheus.nomad.hcl, add the following contents to it, and save the file.

prometheus.nomad.hcl

job "prometheus" {
  type = "service"

  group "prometheus" {
    count = 1

    network {
      mode = "bridge"
      port "expose" {}
      port "envoy_metrics" {
        to = 9102
      }       
    }

    restart {
      attempts = 2
      interval = "30m"
      delay    = "15s"
      mode     = "fail"
    }

    ephemeral_disk {
      size = 300
      migrate = true
      sticky  = true
    }

    task "prometheus" {
      template {
        change_mode = "noop"
        destination = "local/prometheus.yml"

        data = <<EOH
---
global:
  scrape_interval:     5s
  evaluation_interval: 5s

scrape_configs:
  - job_name: 'Consul Connect Metrics'
    metrics_path: "/metrics"
    consul_sd_configs:
    - server: "{{ env "attr.unique.network.ip-address" }}:8500"
    relabel_configs:
      - source_labels: [__meta_consul_service]
        action: drop
        regex: (.+)-sidecar-proxy
      - source_labels: [__meta_consul_service_metadata_envoy_metrics_port]
        action: keep
        regex: (.+)
      - source_labels: [__address__, __meta_consul_service_metadata_envoy_metrics_port]
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
EOH
      }

      driver = "docker"
      config {
        image = "prom/prometheus:latest"
        args = [
          "--config.file=/local/prometheus.yml",
          "--storage.tsdb.path=/alloc/data",
          "--web.listen-address=0.0.0.0:9090",
          "--web.external-url=/",
          "--web.console.libraries=/usr/share/prometheus/console_libraries",
          "--web.console.templates=/usr/share/prometheus/consoles"
        ]        
        volumes = [
          "local/prometheus.yml:/etc/prometheus/prometheus.yml",
        ]
      }
    }

    service {
      name = "prometheus"
      port = "9090"

      check {
        name     = "prometheus_ui port alive"
        expose   = true
        type     = "http"
        path     = "/-/healthy"
        interval = "10s"
        timeout  = "2s"
      }

      connect {
        sidecar_service {}
      }
    }
  }
}

Submit the job to Nomad.

$ nomad job run prometheus.nomad.hcl

Create the Grafana job

Create a file with the name grafana.nomad.hcl, add the following contents to it, and save the file.

grafana.nomad.hcl

job "grafana" {
  group "grafana" {
    count = 1

    network {
      mode = "bridge"
      port "expose" {}
    }

    service {
      name = "grafana"
      port = "3000"
      meta {
        metrics_port = "${NOMAD_HOST_PORT_expose}"
      }

      check {
        expose   = true
        type     = "http"
        name     = "grafana"
        path     = "/api/health"
        interval = "30s"
        timeout  = "10s"
      }

      connect {
        sidecar_service {
          proxy {
            expose {
              path {
                path            = "/metrics"
                protocol        = "http"
                local_path_port = 9102
                listener_port   = "expose"
              }
            }             
            upstreams {
                destination_name = "prometheus"
                local_bind_port  = 9090
            } 
          }
        }
      }
    }

    task "grafana" {
      driver = "docker"

      config {
        image = "grafana/grafana:latest"

        volumes = [
          "local/provisioning/prom.yml:/etc/grafana/provisioning/datasources/prometheus.yml"
        ]
      }



      env {
        GF_PATHS_CONFIG = "/local/config.ini"
        GF_PATHS_PROVISIONING = "/local/provisioning"
      }

      template {
        destination = "local/config.ini"
        data        = <<EOF
[database]
type = sqlite3
[server]
EOF
      }

      template {
        destination = "local/provisioning/datasources/prom.yml"
        data        = <<EOF
apiVersion: 1

datasources:
- name: Prometheus
  type: prometheus
  access: proxy
  url: http://localhost:9090
  isDefault: true
  editable: false
EOF         
      perms = "777"
      }
    }
  }
}

Submit the job to Nomad.

$ nomad job run grafana.nomad.hcl

Access and configure Grafana

Grafana is available via the ingress gateway on port 3000. Use the nomad service info command to get the IP address of the client running Grafana.

$ nomad service info grafana     
Job ID   Address              Tags  Node ID   Alloc ID
grafana  192.168.50.210:3000  []    94dabfe7  e797357e

The default username and password for Grafana are both admin. Grafana requires a password change on initial login. Choose and set a new password for the admin user and make a note of it.

Deploy an envoy dashboard

An envoy clusters dashboard is available from the Grafana dashboard marketplace. Navigate to the dashboards page, click on the New button, then click on Import. Enter 11021 in the field with the placeholder text Grafana.com dashboard URL or ID, click Load, then click Import to finish the process.

The dashboard displays aggregated Envoy health information and traffic flows.

Simulate traffic

Simulate traffic to the cluster by making requests to either of the client nodes on port 8080.

Open the dashboard in Grafana to see requests, connections, and traffic volume on the time series panels.

Next steps

In this tutorial, you deployed Grafana and Prometheus within the Consul service mesh, set up intentions, configured an ingress to enable access, and configured Consul service discovery to allow automatic scraping of targets in Prometheus.

For more information, check out the additional resources below.

Monitor metrics with Prometheus

Next Collection

Nomad Variables

Prometheus telemetry on Envoy sidecars

Prerequisites

Create the Nomad jobs

Create the foo web application job

Create the bar web application job

Create the ingress controller job

Create the Prometheus job

Create the Grafana job

Access and configure Grafana

Deploy an envoy dashboard

Simulate traffic

Next steps

Create the `foo` web application job

Create the `bar` web application job