Documentation
Get started
What is Consul?
Consul operations
Service networking
Enterprise solutions
Runtimes and platforms
Plugins, integrations, & extensions
Reference docs
Glossary

Observe service mesh on Docker containers

This page describes the configuration process for service mesh observability features when Consul runs on Docker containers.

Introduction

Service mesh observability consists of three core elements: metrics, logs, and traces. To observe these elements in Consul's service mesh, use Prometheus to collect metrics, Loki to collect logs, and Tempo to collect traces. Then you can visualize this data with Grafana.

The examples on this page are not properly secured for a production environment. If you are implementing this functionality in a production system, we encourage you to review the Consul Reference Architecture for Consul best practices and the Docker Documentation for Docker best practices.

Prerequisites

For Docker containers to emit Loki logs, you need to install the Loki logging drivers for Docker. Refer to the Loki Docker driver plugin page for more information.

$ docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions

Collect metrics with Prometheus

Configure the Consul target endpoint in Prometheus. The prometheus.yaml file contains the configuration for Prometheus to scrape metrics from the Consul server. The targets field should point to the Consul server service running in your Docker environment, and the metrics_path should be set to Consul's agent metrics enndpoint, /v1/agent/metrics.

global:
  scrape_interval: 30s
  scrape_timeout: 10s

scrape_configs:
  - job_name: 'consul-server'
    metrics_path: '/v1/agent/metrics'
    params:
      format: ['prometheus']
    static_configs:
      - targets: ['consul-server:8500']

Collect logs with Loki

Loki is an advanced log aggregation system developed by Grafana Labs. The docker-compose-loki.yaml file contains the configuration for the Loki service to run as a Docker container.

  loki:
    image: grafana/loki:2.1.0
    container_name: loki
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      vpcbr:
        ipv4_address: 10.5.0.11
    ports:
      - "3100:3100" # loki needs to be exposed so it receives logs
    environment:
      - JAEGER_AGENT_HOST=tempo
      - JAEGER_ENDPOINT=http://tempo:14268/api/traces # send traces to Tempo
      - JAEGER_SAMPLER_TYPE=const
      - JAEGER_SAMPLER_PARAM=1
    logging:
      driver: loki
      options:
        loki-url: 'http://localhost:3100/api/prom/push'

This configuration allows Loki to collect logs from all containers in the Docker environment. The logs are sent to Loki using the Loki logging driver. The Loki traces are also sent to Tempo for distributed tracing.

Collect traces with Tempo

Tempo is a distributed tracing system developed by Grafana Labs. The docker-compose-tempo.yaml file contains the configuration for the Tempo service to run as a Docker container.

  tempo:
    image: grafana/tempo:1f1c40b3
    container_name: tempo
    command: ["-config.file=/etc/tempo.yaml"]
    volumes:
      - ./tempo/tempo.yaml:/etc/tempo.yaml
    networks:
      vpcbr:
        ipv4_address: 10.5.0.9
    ports:
      - "14268:14268" # jaeger ingest
      - "3100"  # tempo
      - "9411:9411" #zipkin
    logging:
      driver: loki
      options:
        loki-url: 'http://localhost:3100/api/prom/push'

This Docker Compose configuration sets up Tempo to collect traces from the Docker environment. The traces are sent to Tempo using the Loki logging driver, which allows for seamless integration with Grafana. This Docker Compose configuration also refers to a tempo/tempo.yaml file that contains the Tempo configuration.

auth_enabled: false

server:
  http_listen_port: 3100

distributor:
  receivers:                           # this configuration will listen on all ports and protocols that tempo is capable of.
    jaeger:                            # the receives all come from the OpenTelemetry collector.  more configuration information can
      protocols:                       # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
        thrift_http:                   #
        grpc:                          # for a production deployment you should only enable the receivers you need!
        thrift_binary:
        thrift_compact:
    zipkin:
    otlp:
      protocols:
        http:
        grpc:
    opencensus:

ingester:
  trace_idle_period: 10s               # the length of time after a trace has not received spans to consider it complete and flush it
  max_block_bytes: 1_000_000           # cut the head block when it hits this size or ...
  max_block_duration: 5m               #   this much time passes

compactor:
  compaction:
    compaction_window: 1h              # blocks in this time window will be compacted together
    max_block_bytes: 100_000_000        # maximum size of compacted blocks
    block_retention: 1h
    compacted_block_retention: 10m

storage:
  trace:
    backend: local                     # backend configuration to use
    block:
      bloom_filter_false_positive: .05 # bloom filter false positive rate.  lower values create larger filters but fewer false positives
      index_downsample_bytes: 1000     # number of bytes per index record
      encoding: zstd                   # block encoding/compression.  options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
    wal:
      path: /tmp/tempo/wal             # where to store the the wal locally
      encoding: none                   # wal encoding/compression.  options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
    local:
      path: /tmp/tempo/blocks
    pool:
      max_workers: 100                 # the worker pool mainly drives querying, but is also used for polling the blocklist
      queue_depth: 10000

Configure Grafana

You can configure Grafana can to collect and visualize metrics, logs, and traces and use Prometheus, Loki, and Tempo as data sources.

Grafana data source: Prometheus

The datasource-prometheus.yaml file contains the configuration for Grafana to connect to Prometheus. The url field should point to the Prometheus service running in your Docker environment.

datasources:
- name: Prometheus
  type: prometheus
  access: proxy
  orgId: 1
  url: http://prometheus:9090
  basicAuth: false
  isDefault: false
  version: 1
  editable: false

Grafana data source: Loki

The datasource-loki.yaml file contains the configuration for Grafana to connect to Loki. The datasources.url field must point to the Loki service running in your Docker environment.

datasources:
- name: Loki
  type: loki
  access: proxy
  orgId: 1
  url: http://loki:3100
  basicAuth: false
  isDefault: true
  version: 1
  editable: false
  apiVersion: 1
  jsonData:
    derivedFields:
      - datasourceUid: tempo
        matcherRegex: (?:traceID|trace_id)=(\w+)
        name: TraceID
        url: $${__value.raw}

Grafana data source: Tempo

The datasource-tempo.yaml file contains the configuration for Grafana to connect to Tempo. The url field should point to the Tempo service running in your Docker environment.

datasources:
- name: Tempo
  type: tempo
  access: proxy
  orgId: 1
  url: http://tempo:3100
  basicAuth: false
  isDefault: false
  version: 1
  editable: false
  apiVersion: 1
  uid: tempo

Grafana dashboards

In order to visualize the data collected by Prometheus, Loki, and Tempo, you can import pre-built dashboards into Grafana. The configuration is applied to Grafana in the following way, and refers to a directory called dashboards.

apiVersion: 1

providers:
- name: 'dashboards'
  orgId: 1
  folder: 'dashboards'
  folderUid: ''
  type: file
  disableDeletion: true
  editable: true
  updateIntervalSeconds: 3600
  allowUiUpdates: false
  options:
    path: /var/lib/grafana/dashboards

For the contents of the dashboards directory, refer to Dashboards for service mesh observability, which documents the dashboard configurations in the hashicorp/consul GitHub Repository.

You can also refer to the Docker-specific content in the Grafana Dashboards repository.

Use Grafana to explore your service mesh

Now that you have configured the Grafana data sources, you can visualize the data collected.

Explore traces in Grafana

You can explore the traces collected by Tempo in Grafana. Navigate to Grafana's Explore section and execute a search for traces. For example, to view traces from a container named web, execute a Grafana search for {container_name="web"} |= "trace_id". Then open one of the log lines, locate the TraceID field, then click the nearby Tempo link to jump directly from logs to traces.

Image of linked logs and traces in Grafana UI.

Traces provide insight into service mesh performance. To learn more about how application communication traverses through the service mesh, refer to our blog on distributed tracing, as well as the distributed tracing documentation.

Explore metrics in Grafana

You can explore the metrics collected by Prometheus in Grafana. Navigate to the Dashboards section and select the Consul Server Monitoring dashboard. This dashboard provides a comprehensive view of the Consul server metrics, including Raft commit time, catalog operation time, and autopilot health.

Image of Consul Server Monitoring metrics dashboard in Grafana UI.

This dashboard presents important application-level metrics for Consul and provides important insight into the health and performance of your service mesh. To learn more about the various runtime metrics reported by Consul, refer to the Consul telemetry docs.