Consul
Observe service mesh on Docker containers
This page describes the configuration process for service mesh observability features when Consul runs on Docker containers.
Introduction
Service mesh observability consists of three core elements: metrics, logs, and traces. To observe these elements in Consul's service mesh, use Prometheus to collect metrics, Loki to collect logs, and Tempo to collect traces. Then you can visualize this data with Grafana.
The examples on this page are not properly secured for a production environment. If you are implementing this functionality in a production system, we encourage you to review the Consul Reference Architecture for Consul best practices and the Docker Documentation for Docker best practices.
Prerequisites
For Docker containers to emit Loki logs, you need to install the Loki logging drivers for Docker. Refer to the Loki Docker driver plugin page for more information.
$ docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
Collect metrics with Prometheus
Configure the Consul target endpoint in Prometheus. The prometheus.yaml
file contains the configuration for Prometheus to scrape metrics from the Consul server. The targets
field should point to the Consul server service running in your Docker environment, and the metrics_path
should be set to Consul's agent metrics enndpoint, /v1/agent/metrics
.
prometheus.yaml
global:
scrape_interval: 30s
scrape_timeout: 10s
scrape_configs:
- job_name: 'consul-server'
metrics_path: '/v1/agent/metrics'
params:
format: ['prometheus']
static_configs:
- targets: ['consul-server:8500']
Collect logs with Loki
Loki is an advanced log aggregation system developed by Grafana Labs. The docker-compose-loki.yaml
file contains the configuration for the Loki service to run as a Docker container.
docker-compose-loki.yaml
loki:
image: grafana/loki:2.1.0
container_name: loki
command: -config.file=/etc/loki/local-config.yaml
networks:
vpcbr:
ipv4_address: 10.5.0.11
ports:
- "3100:3100" # loki needs to be exposed so it receives logs
environment:
- JAEGER_AGENT_HOST=tempo
- JAEGER_ENDPOINT=http://tempo:14268/api/traces # send traces to Tempo
- JAEGER_SAMPLER_TYPE=const
- JAEGER_SAMPLER_PARAM=1
logging:
driver: loki
options:
loki-url: 'http://localhost:3100/api/prom/push'
This configuration allows Loki to collect logs from all containers in the Docker environment. The logs are sent to Loki using the Loki logging driver. The Loki traces are also sent to Tempo for distributed tracing.
Collect traces with Tempo
Tempo is a distributed tracing system developed by Grafana Labs. The docker-compose-tempo.yaml
file contains the configuration for the Tempo service to run as a Docker container.
docker-compose-tempo.yaml
tempo:
image: grafana/tempo:1f1c40b3
container_name: tempo
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./tempo/tempo.yaml:/etc/tempo.yaml
networks:
vpcbr:
ipv4_address: 10.5.0.9
ports:
- "14268:14268" # jaeger ingest
- "3100" # tempo
- "9411:9411" #zipkin
logging:
driver: loki
options:
loki-url: 'http://localhost:3100/api/prom/push'
</CodeBlockConfig>
This Docker Compose configuration sets up Tempo to collect traces from the Docker environment. The traces are sent to Tempo using the Loki logging driver, which allows for seamless integration with Grafana. This Docker Compose configuration also refers to a `tempo/tempo.yaml` file that contains the Tempo configuration.
<CodeBlockConfig filename="tempo/tempo.yaml">
```yaml
auth_enabled: false
server:
http_listen_port: 3100
distributor:
receivers: # this configuration will listen on all ports and protocols that tempo is capable of.
jaeger: # the receives all come from the OpenTelemetry collector. more configuration information can
protocols: # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
thrift_http: #
grpc: # for a production deployment you should only enable the receivers you need!
thrift_binary:
thrift_compact:
zipkin:
otlp:
protocols:
http:
grpc:
opencensus:
ingester:
trace_idle_period: 10s # the length of time after a trace has not received spans to consider it complete and flush it
max_block_bytes: 1_000_000 # cut the head block when it hits this size or ...
max_block_duration: 5m # this much time passes
compactor:
compaction:
compaction_window: 1h # blocks in this time window will be compacted together
max_block_bytes: 100_000_000 # maximum size of compacted blocks
block_retention: 1h
compacted_block_retention: 10m
storage:
trace:
backend: local # backend configuration to use
block:
bloom_filter_false_positive: .05 # bloom filter false positive rate. lower values create larger filters but fewer false positives
index_downsample_bytes: 1000 # number of bytes per index record
encoding: zstd # block encoding/compression. options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
wal:
path: /tmp/tempo/wal # where to store the the wal locally
encoding: none # wal encoding/compression. options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
local:
path: /tmp/tempo/blocks
pool:
max_workers: 100 # the worker pool mainly drives querying, but is also used for polling the blocklist
queue_depth: 10000
Configure Grafana
You can configure Grafana can to collect and visualize metrics, logs, and traces and use Prometheus, Loki, and Tempo as data sources.
Grafana data source: Prometheus
The datasource-prometheus.yaml
file contains the configuration for Grafana to connect to Prometheus. The url
field should point to the Prometheus service running in your Docker environment.
datasource-prometheus.yaml
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: http://prometheus:9090
basicAuth: false
isDefault: false
version: 1
editable: false
Grafana data source: Loki
The datasource-loki.yaml
file contains the configuration for Grafana to connect to Loki. The datasources.url
field must point to the Loki service running in your Docker environment.
datasource-loki.yaml
datasources:
- name: Loki
type: loki
access: proxy
orgId: 1
url: http://loki:3100
basicAuth: false
isDefault: true
version: 1
editable: false
apiVersion: 1
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: (?:traceID|trace_id)=(\w+)
name: TraceID
url: $${__value.raw}
Grafana data source: Tempo
The datasource-tempo.yaml
file contains the configuration for Grafana to connect to Tempo. The url
field should point to the Tempo service running in your Docker environment.
datasource-tempo.yaml
datasources:
- name: Tempo
type: tempo
access: proxy
orgId: 1
url: http://tempo:3100
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
uid: tempo
Grafana dashboards
In order to visualize the data collected by Prometheus, Loki, and Tempo, you can import pre-built dashboards into Grafana. The configuration is applied to Grafana in the following way, and refers to a directory called dashboards
.
grafana-dashboards.yaml
apiVersion: 1
providers:
- name: 'dashboards'
orgId: 1
folder: 'dashboards'
folderUid: ''
type: file
disableDeletion: true
editable: true
updateIntervalSeconds: 3600
allowUiUpdates: false
options:
path: /var/lib/grafana/dashboards
For the contents of the dashboards
directory, refer to Dashboards for service mesh observability, which documents the dashboard configurations in the hashicorp/consul
GitHub Repository.
You can also refer to the Docker-specific content in the Grafana Dashboards repository.
Use Grafana to explore your service mesh
Now that you have configured the Grafana data sources, you can visualize the data collected.
Explore traces in Grafana
You can explore the traces collected by Tempo in Grafana. Navigate to Grafana's Explore section and execute a search for traces. For example, to view traces from a container named web
, execute a Grafana search for {container_name="web"} |= "trace_id"
. Then open one of the log lines, locate the TraceID
field, then click the nearby Tempo
link to jump directly from logs to traces.
Traces provide insight into service mesh performance. To learn more about how application communication traverses through the service mesh, refer to our blog on distributed tracing, as well as the distributed tracing documentation.
Explore metrics in Grafana
You can explore the metrics collected by Prometheus in Grafana. Navigate to the Dashboards section and select the Consul Server Monitoring dashboard. This dashboard provides a comprehensive view of the Consul server metrics, including Raft commit time, catalog operation time, and autopilot health.
This dashboard presents important application-level metrics for Consul and provides important insight into the health and performance of your service mesh. To learn more about the various runtime metrics reported by Consul, refer to the Consul telemetry docs.