Consul
Consul agent telemetry reference
This page provides reference information for Consul agent events and the metrics they produce.
For information about service mesh traffic metrics, refer to Observe service mesh telemetry.
ACL metrics
Metrics related to the ACL subsystem that measure times needed for ACL operations as well as blocked operations.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.acl.blocked.check.deregistration | Increments whenever a deregistration fails for a check because Consul ACL blocked it. | requests | counter |
consul.acl.blocked.check.registration | Increments whenever a registration fails for a check (blocked by an ACL). | requests | counter |
consul.acl.blocked.node.registration | Increments whenever a registration fails for a node (blocked by an ACL). | requests | counter |
consul.acl.blocked.service.deregistration | Increments whenever a deregistration fails for a service (blocked by an ACL). | requests | counter |
consul.acl.blocked.service.registration | Increments whenever a registration fails for a service (blocked by an ACL). | requests | counter |
consul.acl.ResolveToken | The time it takes to resolve an ACL token. | ms | summary |
consul.acl.authmethod.delete | The time it takes to delete an authentication method. | ms | summary |
consul.acl.authmethod.upsert | The time it takes to create or update an authentication method. | ms | summary |
consul.acl.bindingrule.delete | The time it takes to delete a binding rule. | ms | summary |
consul.acl.bindingrule.upsert | The time it takes to create or update a binding rule. | ms | summary |
consul.acl.login | Measures the time it takes to complete an ACL login operation using an authentication method. | ms | summary |
consul.acl.logout | Measures the time it takes to complete an ACL logout operation, invalidating the session. | ms | summary |
consul.acl.policy.delete | The time it takes to delete an ACL policy. | ms | summary |
consul.acl.policy.upsert | The time it takes to create or update an ACL policy. | ms | summary |
consul.acl.role.delete | The time it takes to delete an ACL role. | ms | summary |
consul.acl.role.upsert | The time it takes to create or update an ACL role. | ms | summary |
consul.acl.token.cache_hit | Increments if Consul is able to resolve a token's identity, or a legacy token, from the cache. | cache hits | count |
consul.acl.token.cache_miss | Increments if Consul cannot resolve a token's identity, or a legacy token, from the cache. | cache misses | count |
consul.acl.token.clone | The time it takes to clone an ACL token. | ms | summary |
consul.acl.token.delete | The time it takes to delete an ACL token. | ms | summary |
consul.acl.token.upsert | The time it takes to create or update an ACL token. | ms | summary |
API metrics
The consul.api.http is an aggregated metric that contains information about all the API endpoints queried in the datacenter.
To distinguish the different endpoints, it includes labels for path and method.
path does not include details like service or key names, for these an underscore will be present as a placeholder (eg. path=v1.kv._).
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.api.http | This samples how long it takes to service the given HTTP request for the given verb and path. | ms | timer |
Autopilot metrics
Metrics relative to the autopilot subsystem that can be used to retrieve the general health of the Consul datacenter.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.autopilot.failure_tolerance | Tracks the number of voting servers that the cluster can lose while continuing to function. | servers | gauge |
consul.autopilot.healthy | Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy. | health | gauge |
Cache metrics
Metrics relative to the Consul cache than can be used to monitor the cache efficacy.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.cache.bypass | Counts how many times a request bypassed the cache because no cache-key was provided. | requests | counter |
consul.cache.entries_count | Represents the number of entries in this cache. | entries | gauge |
consul.cache.evict_expired | Counts the number of expired entries that are evicted. | entries | counter |
consul.cache.fetch_error | Counts the number of failed fetches by the cache. | failed fetches | counter |
consul.cache.fetch_success | Counts the number of successful fetches by the cache. | successful fetches | counter |
consul.consul.cache.bypass | Deprecated. Use cache.bypass instead. | requests | counter |
consul.consul.cache.entries_count | Deprecated. Use cache.entries_count instead. | entries | gauge |
consul.consul.cache.evict_expired | Deprecated. Use cache.evict_expired instead. | entries | counter |
consul.consul.cache.fetch_error | Deprecated. Use cache.fetch_error instead. | failed fetches | counter |
consul.consul.cache.fetch_success | Deprecated. Use cache.fetch_success instead. | successful fetches | counter |
Catalog and KV store metrics
Metrics relative to the Consul catalog and KV store, containing information about the amount of requests and the performance of the queries.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.catalog.connect.not_found | Increments for each connect-based catalog query where the given service could not be found. | queries | counter |
consul.catalog.connect.query | Increments for each connect-based catalog query for the given service. | queries | counter |
consul.catalog.connect.query_tag | Increments for each connect-based catalog query for the given service with the given tag. | queries | counter |
consul.catalog.connect.query_tags | Increments for each connect-based catalog query for the given service with the given tags. | queries | counter |
consul.catalog.deregister | Measures the time it takes to complete a catalog deregister operation. | ms | summary |
consul.catalog.register | Measures the time it takes to complete a catalog register operation. | ms | summary |
consul.catalog.service.not_found | Increments for each catalog query where the given service could not be found. | queries | counter |
consul.catalog.service.query | Increments for each catalog query for the given service. | queries | counter |
consul.catalog.service.query_tag | Increments for each catalog query for the given service with the given tag. | queries | counter |
consul.catalog.service.query_tags | Increments for each catalog query for the given service with the given tags. | queries | counter |
consul.discovery_chain.get | Measures the time it takes to retrieve a service discovery chain configuration. | ms | summary |
consul.kvs.apply | Measures the time it takes to complete an update to the KV store. | ms | summary |
Client agent metrics
Metrics relative to the API and RPC requests for Consul clients.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.client.api.catalog_datacenters | Increments whenever a Consul agent receives a request to list datacenters in the catalog. | requests | counter |
consul.client.api.catalog_deregister | Increments whenever a Consul agent receives a catalog deregister request. | requests | counter |
consul.client.api.catalog_gateway_services | Increments whenever a Consul agent receives a request to list services associated with a gateway. | requests | counter |
consul.client.api.catalog_node_service_list | Increments whenever a Consul agent receives a request to list a node's registered services. | requests | counter |
consul.client.api.catalog_node_services | Increments whenever a Consul agent successfully responds to a request to list nodes offering a service. | requests | counter |
consul.client.api.catalog_nodes | Increments whenever a Consul agent receives a request to list nodes from the catalog. | requests | counter |
consul.client.api.catalog_register | Increments whenever a Consul agent receives a catalog register request. | requests | counter |
consul.client.api.catalog_service_nodes | Increments whenever a Consul agent receives a request to list nodes offering a service. | requests | counter |
consul.client.api.catalog_services | Increments whenever a Consul agent receives a request to list services from the catalog. | requests | counter |
consul.client.api.error.catalog_service_nodes | Increments whenever a Consul agent receives an RPC error for request to list nodes offering a service. | requests | counter |
consul.client.api.success.catalog_datacenters | Increments whenever a Consul agent successfully responds to a request to list datacenters. | requests | counter |
consul.client.api.success.catalog_deregister | Increments whenever a Consul agent successfully responds to a catalog deregister request. | requests | counter |
consul.client.api.success.catalog_gateway_services | Increments whenever a Consul agent successfully responds to a request to list services associated with a gateway. | requests | counter |
consul.client.api.success.catalog_node_service_list | Increments whenever a Consul agent successfully responds to a request to list a node's registered services. | requests | counter |
consul.client.api.success.catalog_node_services | Increments whenever a Consul agent successfully responds to a request to list services in a node. | requests | counter |
consul.client.api.success.catalog_nodes | Increments whenever a Consul agent successfully responds to a request to list nodes. | requests | counter |
consul.client.api.success.catalog_register | Increments whenever a Consul agent successfully responds to a catalog register request. | requests | counter |
consul.client.api.success.catalog_service_nodes | Increments whenever a Consul agent successfully responds to a request to list nodes offering a service. | requests | counter |
consul.client.api.success.catalog_services | Increments whenever a Consul agent successfully responds to a request to list services. | requests | counter |
consul.client.rpc | Increments whenever a Consul agent makes an RPC request to a Consul server. | requests | counter |
consul.client.rpc.error.catalog_datacenters | Increments whenever a Consul agent receives an RPC error for a request to list datacenters. | requests | counter |
consul.client.rpc.error.catalog_deregister | Increments whenever a Consul agent receives an RPC error for a catalog deregister request. | requests | counter |
consul.client.rpc.error.catalog_gateway_services | Increments whenever a Consul agent receives an RPC error for a request to list services associated with a gateway. | requests | counter |
consul.client.rpc.error.catalog_node_service_list | Increments whenever a Consul agent receives an RPC error for request to list a node's registered services. | requests | counter |
consul.client.rpc.error.catalog_node_services | Increments whenever a Consul agent receives an RPC error for a request to list services in a node. | requests | counter |
consul.client.rpc.error.catalog_nodes | Increments whenever a Consul agent receives an RPC error for a request to list nodes. | requests | counter |
consul.client.rpc.error.catalog_register | Increments whenever a Consul agent receives an RPC error for a catalog register request. | requests | counter |
consul.client.rpc.error.catalog_service_nodes | Increments whenever a Consul agent receives an RPC error for a request to list nodes offering a service. | requests | counter |
consul.client.rpc.error.catalog_services | Increments whenever a Consul agent receives an RPC error for a request to list services. | requests | counter |
consul.client.rpc.exceeded | Increments whenever a Consul agent makes an RPC request to a Consul server gets rate limited by that agent's limits configuration. | requests | counter |
consul.client.rpc.failed | Increments whenever a Consul agent makes an RPC request to a Consul server and fails. | requests | counter |
Federation metrics
Metrics relative to the WAN federation status of the Consul datacenter.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.federation.state.apply | Measures the time it takes to apply federation state changes across WAN-federated datacenters. | ms | summary |
consul.federation.state.get | Measures the time it takes to retrieve federation state information for a specific datacenter. | ms | summary |
consul.federation.state.list | Measures the time it takes to list all federation states across WAN-federated datacenters. | ms | summary |
consul.federation.state.list_mesh_gateways | Measures the time it takes to list mesh gateways associated with federation states. | ms | summary |
FSM metrics
The Finite State Machine (FSM) captures the behavior of the Raft index. These metrics measure the amount of time it takes to apply a given event to the FSM.
Other metrics relative to the FSM performances can be found in Raft metrics.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.fsm.acl | Measures the time it takes to apply the given ACL operation to the FSM. | ms | summary |
consul.fsm.acl.authmethod | Measures the time it takes to apply an ACL authmethod operation to the FSM. | ms | summary |
consul.fsm.acl.bindingrule | Measures the time it takes to apply an ACL binding rule operation to the FSM. | ms | summary |
consul.fsm.acl.policy | Measures the time it takes to apply an ACL policy operation to the FSM. | ms | summary |
consul.fsm.acl.token | Measures the time it takes to apply an ACL token operation to the FSM. | ms | summary |
consul.fsm.autopilot | Measures the time it takes to apply the given autopilot update to the FSM. | ms | summary |
consul.fsm.ca | Measures the time it takes to apply CA configuration operations to the FSM. | ms | summary |
consul.fsm.ca.leaf | Measures the time it takes to apply an operation while signing a leaf certificate. | ms | summary |
consul.fsm.coordinate.batch_update | Measures the time it takes to apply the given batch coordinate update to the FSM. | ms | summary |
consul.fsm.deregister | Measures the time it takes to apply a catalog deregister operation to the FSM. | ms | summary |
consul.fsm.intention | Measures the time it takes to apply an intention operation to the FSM. | ms | summary |
consul.fsm.kvs | Measures the time it takes to apply the given KV operation to the FSM. | ms | summary |
consul.fsm.peering | Measures the time it takes to apply a peering operation to the FSM. | ms | summary |
consul.fsm.persist | Measures the time it takes to persist the FSM to a raft snapshot. | ms | summary |
consul.fsm.prepared_query | Measures the time it takes to apply the given prepared query update operation to the FSM. | ms | summary |
consul.fsm.register | Measures the time it takes to apply a catalog register operation to the FSM. | ms | summary |
consul.fsm.session | Measures the time it takes to apply the given session operation to the FSM. | ms | summary |
consul.fsm.system_metadata | Measures the time it takes to apply a system metadata operation to the FSM. | ms | summary |
consul.fsm.tombstone | Measures the time it takes to apply the given tombstone operation to the FSM. | ms | summary |
consul.fsm.txn | Measures the time it takes to apply the given transaction update to the FSM. | ms | summary |
consul.consul.fsm.ca | Deprecated. Use fsm.ca instead. | ms | summary |
consul.consul.fsm.intention | Deprecated. Use fsm.intention instead. | ms | summary |
GRPC metrics
Metrics relative to the amount of gRPC connections handled by the Consul datacenter.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.grpc.client.connection.count | Counts the number of new gRPC connections opened by the client agent to a Consul server. | connections | counter |
consul.grpc.client.connections | Measures the number of active gRPC connections open from the client agent to any Consul servers. | connections | gauge |
consul.grpc.client.request.count | Counts the number of gRPC requests made by the client agent to a Consul server. | requests | counter |
consul.grpc.server.connection.count | Counts the number of new gRPC connections received by the server. | connections | counter |
consul.grpc.server.connections | Measures the number of active gRPC connections open on the server. | connections | gauge |
consul.grpc.server.request.count | Counts the number of gRPC requests received by the server. | requests | counter |
consul.grpc.server.stream.count | Counts the number of new gRPC streams received by the server. | streams | counter |
consul.grpc.server.streams | Measures the number of active gRPC streams handled by the server. | streams | gauge |
Host metrics
Consul servers can report the following metrics about the host's system resources. This feature must be enabled in the agent telemetry configuration. Note that if the Consul server is operating inside a container these metrics still report host resource usage and do not report any resource limits placed on the container.
Host metrics
Metrics relative to the host where the Consul agent is running.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.host.cpu.idle | Idle cpu utilization. One entry per CPU. | percentage | gauge |
consul.host.cpu.iowait | Iowait cpu utilization. One entry per CPU. | percentage | gauge |
consul.host.cpu.system | System cpu utilization. One entry per CPU. | percentage | gauge |
consul.host.cpu.total | Total cpu utilization. One entry per CPU. | percentage | gauge |
consul.host.cpu.user | User cpu utilization. One entry per CPU. | percentage | gauge |
consul.host.disk.available | Available bytes on disk. | bytes | gauge |
consul.host.disk.inodes_percent | Percentage of disk inodes usage. | percentage | gauge |
consul.host.disk.size | Size of disk in bytes. | bytes | gauge |
consul.host.disk.used | Disk usage in bytes. | bytes | gauge |
consul.host.disk.used_percent | Percentage of disk space usage. | percentage | gauge |
consul.host.memory.available | Available physical memory in bytes. | bytes | gauge |
consul.host.memory.free | Free physical memory in bytes. | bytes | gauge |
consul.host.memory.total | Total physical memory in bytes. | bytes | gauge |
consul.host.memory.used | Used physical memory in bytes. | bytes | gauge |
consul.host.memory.used_percent | Percentage of physical memory in use. | percentage | gauge |
consul.host.uptime | System uptime. | seconds | gauge |
Golang metrics
Metrics relative to the Golang runtime that is used to run the Consul process.
| Metric | Description | Unit | Type |
|---|---|---|---|
go_gc_duration_seconds | A summary of the pause duration of garbage collection cycles. | seconds | summary |
go_goroutines | Number of goroutines that currently exist. | goroutines | gauge |
go_info | Information about the Go environment. | string | gauge |
go_memstats_alloc_bytes | Number of bytes allocated and still in use. | bytes | gauge |
go_memstats_alloc_bytes_total | Total number of bytes allocated, even if freed. | bytes | counter |
go_memstats_buck_hash_sys_bytes | Number of bytes used by the profiling bucket hash table. | bytes | gauge |
go_memstats_frees_total | Total number of frees. | frees | counter |
go_memstats_gc_sys_bytes | Number of bytes used for garbage collection system metadata. | bytes | gauge |
go_memstats_heap_alloc_bytes | Number of heap bytes allocated and still in use. | bytes | gauge |
go_memstats_heap_idle_bytes | Number of heap bytes waiting to be used. | bytes | gauge |
go_memstats_heap_inuse_bytes | Number of heap bytes that are in use. | bytes | gauge |
go_memstats_heap_objects | Number of allocated objects. | objects | gauge |
go_memstats_heap_released_bytes | Number of heap bytes released to OS. | bytes | gauge |
go_memstats_heap_sys_bytes | Number of heap bytes obtained from system. | bytes | gauge |
go_memstats_last_gc_time_seconds | Number of seconds since 1970 of last garbage collection. | seconds | gauge |
go_memstats_lookups_total | Total number of pointer lookups. | lookups | counter |
go_memstats_mallocs_total | Total number of mallocs. | mallocs | counter |
go_memstats_mcache_inuse_bytes | Number of bytes in use by mcache structures. | bytes | gauge |
go_memstats_mcache_sys_bytes | Number of bytes used for mcache structures obtained from system. | bytes | gauge |
go_memstats_mspan_inuse_bytes | Number of bytes in use by mspan structures. | bytes | gauge |
go_memstats_mspan_sys_bytes | Number of bytes used for mspan structures obtained from system. | bytes | gauge |
go_memstats_next_gc_bytes | Number of heap bytes when next garbage collection will take place. | bytes | gauge |
go_memstats_other_sys_bytes | Number of bytes used for other system allocations. | bytes | gauge |
go_memstats_stack_inuse_bytes | Number of bytes in use by the stack allocator. | bytes | gauge |
go_memstats_stack_sys_bytes | Number of bytes obtained from system for stack allocator. | bytes | gauge |
go_memstats_sys_bytes | Number of bytes obtained from system. | bytes | gauge |
go_threads | Number of OS threads created. | threads | gauge |
Runtime metrics
Most of these metrics are duplicated from the Golang metrics.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.runtime.alloc_bytes | Measures the number of bytes allocated by the Consul process. This may burst from time to time but should return to a steady state value. | bytes | gauge |
consul.runtime.free_count | Total number of memory deallocations (frees) performed by the Consul process. | frees | gauge |
consul.runtime.gc_pause_ns | Measures the duration of garbage collection pause events in nanoseconds. | nanoseconds | summary |
consul.runtime.heap_objects | Measures the number of objects allocated on the heap and is a general memory pressure indicator. This may burst from time to time but should return to a steady state value. | heap objects | gauge |
consul.runtime.malloc_count | Total number of memory allocations (mallocs) performed by the Consul process. | mallocs | gauge |
consul.runtime.num_goroutines | Tracks the number of running goroutines and is a general load pressure indicator. This may burst from time to time but should return to a steady state value. | number of goroutines | gauge |
consul.runtime.sys_bytes | Measures the total bytes of memory obtained from the operating system by the Consul process. | bytes | gauge |
consul.runtime.total_gc_pause_ns | Cumulative nanoseconds spent in garbage collection pauses since the process started. | nanoseconds | gauge |
consul.runtime.total_gc_runs | Total number of garbage collection cycles completed since the process started. | garbage collection cycles | gauge |
Process metrics
Metrics relative to the Consul process.
| Metric | Description | Unit | Type |
|---|---|---|---|
process.cpu_seconds_total | Total user and system CPU time spent in seconds. | seconds | counter |
process.max_fds | Maximum number of open file descriptors. | file descriptors | gauge. |
process.open_fds | Number of open file descriptors. | file descriptors | gauge. |
process.resident_memory_bytes | Resident memory size in bytes. | bytes | gauge. |
process.start_time_seconds | Start time of the process since unix epoch in seconds. | seconds | gauge. |
process.virtual_memory_bytes | Virtual memory size in bytes. | bytes | gauge. |
process.virtual_memory_max_bytes | Maximum amount of virtual memory available in bytes. | bytes | gauge. |
Intentions metrics
Metrics that measure the time needed to propagate intentions into Consul service mesh.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.intention.apply | Measures the time it takes to apply service mesh intention (authorization policy) changes. | ms | summary |
consul.consul.intention.apply | Deprecated. Use use intention_apply instead. | ms | summary |
Leader metrics
Metrics relative to tracking server leadership and to measure the performance of data replication across federated datacenters.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.leader.barrier | Measures the time spent waiting for the raft barrier upon gaining leadership. | ms | summary |
consul.leader.reapTombstones | Measures the time spent clearing tombstones. | ms | summary |
consul.leader.reconcile | Measures the time spent updating the raft store from the serf member information. | ms | summary |
consul.leader.reconcileMember | Measures the time spent updating the raft store for a single serf member's information. | ms | summary |
consul.leader.replication.acl_policies.index | Emitted only by the leader in a secondary datacenter. Tracks the index of ACL policies in the primary that the secondary has successfully replicated. | index | gauge |
consul.leader.replication.acl_policies.status | Emitted only by the leader in a secondary datacenter. Tracks the current health of ACL policy replication on the leader. Contains 1 if the last round replication was successful or 0 if there was an error. | health | gauge |
consul.leader.replication.acl_roles.index | Emitted only by the leader in a secondary datacenter. Tracks the index of ACL roles in the primary that the secondary has successfully replicated. | index | gauge |
consul.leader.replication.acl_roles.status | Emitted only by the leader in a secondary datacenter. Tracks the current health of ACL role replication on the leader. Contains 1 if the last round replication was successful or 0 if there was an error. | health | gauge |
consul.leader.replication.acl_tokens.index | Emitted only by the leader in a secondary datacenter. Tracks the index of ACL tokens in the primary that the secondary has successfully replicated. | index | gauge |
consul.leader.replication.acl_tokens.status | Emitted only by the leader in a secondary datacenter. Tracks the current health of ACL token replication on the leader. Contains 1 if the last round replication was successful or 0 if there was an error. | health | gauge |
consul.leader.replication.config_entries.index | Emitted only by the leader in a secondary datacenter. Tracks the index of config entries in the primary that the secondary has successfully replicated. | index | gauge |
consul.leader.replication.config_entries.status | Emitted only by the leader in a secondary datacenter. Tracks the current health of config entry replication on the leader. Contains 1 if the last round replication was successful or 0 if there was an error. | health | gauge |
consul.leader.replication.federation_state.index | Emitted only by the leader in a secondary datacenter. Tracks the index of federation states in the primary that the secondary has successfully replicated. | index | gauge |
consul.leader.replication.federation_state.status | Emitted only by the leader in a secondary datacenter. Tracks the current health of federation state replication on the leader. Contains 1 if the last round replication was successful or 0 if there was an error. | health | gauge |
consul.leader.replication.namespaces.index | Enterprise | index | gauge |
consul.leader.replication.namespaces.status | Enterprise 1 if the last round replication was successful or 0 if there was an error. | health | gauge |
consul.server.isLeader | Tracks if the server is a leader. | leadership | gauge |
Membership metrics
Metrics relative to the performances of the Serf gossip protocol, used to manage membership and broadcast messages to the cluster.
Refer also to Serf metrics for more metrics relative to the gossip protocol.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.memberlist.gossip | Measures the time taken for gossip messages to be broadcasted to a set of randomly selected nodes. | ms | summary |
consul.memberlist.node.instances | Tracks the number of instances in each of the node states: alive, dead, suspect, and left. | nodes | gauge |
consul.memberlist.probeNode | Measures the time taken to perform a single round of failure detection on a select agent. | nodes / interval | summary |
consul.memberlist.pushPullNode | Measures the number of agents that have exchanged state with this agent. | nodes / interval | summary |
consul.memberlist.queue.broadcasts | Measures the number of messages waiting to be broadcast to other gossip participants. | messages | summary |
consul.memberlist.size.local | Measures the size in bytes of the memberlist before it is sent to another gossip recipient. | bytes | gauge |
consul.memberlist.size.remote | Measures the size in bytes of incoming memberlists from other gossip participants. | bytes | summary |
consul.memberlist.tcp.accept | Counts the number of times an agent has accepted an incoming TCP stream connection. | connections accepted / interval | counter |
consul.memberlist.tcp.connect | Counts the number of times an agent has initiated a push/pull sync with an other agent. | push/pull initiated / interval | counter |
consul.memberlist.tcp.sent | Measures the total number of bytes sent by an agent through the TCP protocol. | bytes sent / interval | counter |
consul.memberlist.udp.received | Measures the total number of bytes sent/received by an agent through the UDP protocol. | bytes received / interval | counter |
consul.memberlist.udp.sent | Measures the total number of bytes sent/received by an agent through the UDP protocol. | bytes sent / interval | counter |
consul.members.clients | Measures the current number of client agents registered with Consul. It is only emitted by Consul servers. Added in v1.9.6. | number of clients | gauge |
consul.members.servers | Measures the current number of server agents registered with Consul. It is only emitted by Consul servers. Added in v1.9.6. | number of servers | gauge |
Metrics can be appended with certain labels to further distinguish data between different gossip pools. The supported label for CE is network, while segment, partition, area are allowed for .Enterprise
Mesh certificate metrics
Metrics that expose the expiration times of the service mesh certificates.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.mesh.active_root_ca.expiry | Seconds until the service mesh root certificate expires. Updated every hour. | seconds | gauge |
consul.mesh.active_signing_ca.expiry | Seconds until the service mesh signing certificate expires. Updated every hour. | seconds | gauge |
Peering metrics
Cluster peering refers to Consul clusters that communicate through a peer connection, as opposed to a federated connection. Consul collects metrics that describe the number of services exported to a peered cluster. Peering metrics are only emitted by the leader server. These metrics are emitted every 9 seconds.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.peering.exported_services | A gauge that tracks how many services are exported for the peering. The labels are peer_name, peer_id and, for enterprise, partition. We emit this metric every 9 seconds. | services | gauge |
consul.peering.healthy | A gauge that tracks how if a peering is healthy (1) or not (0). The labels are peer_name, peer_id and, for enterprise, partition. We emit this metric every 9 seconds. | health | gauge |
consul.consul.peering.exported_services | Deprecated. Use peering_exported_services instead. | services | gauge |
consul.consul.peering.healthy | Deprecated. Use peering_healthy instead. | health | gauge |
Labels
Consul attaches the following labels to metric values.
| Label Name | Description | Possible values |
|---|---|---|
peer_name | The name of the peering on the reporting cluster or leader. | Any defined peer name in the cluster |
peer_id | The ID of a peer connected to the reporting cluster or leader. | Any UUID |
partition | Enterprise | Any defined partition name in the cluster |
Prepared queries metrics
Metrics relative to the prepared queries' performances.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.prepared_query.apply | Measures the time it takes to apply a prepared query update. | ms | summary |
consul.prepared_query.execute | Measures the time it takes to process a prepared query execute request. | ms | summary |
consul.prepared_query.execute_remote | Measures the time it takes to process a prepared query execute request that was forwarded to another datacenter. | ms | summary |
consul.prepared_query.explain | Measures the time it takes to process a prepared query explain request. | ms | summary |
Raft Metrics
Metrics relative to the Raft protocol that is used to maintain consensus in the Consul cluster. These metrics include the ones for the Raft's LogStore that can be either Bolt DB or WAL. You will only see metrics relative to the logstore enabled on your Consul installation.
Other metrics relative to the Raft performances can be found in FSM metrics.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.raft.applied_index | Represents the raft applied index. | index | gauge |
consul.raft.apply | Counts the number of Raft transactions occurring over the interval, which is a general indicator of the write load on the Consul servers. | transactions/interval | counter |
consul.raft.barrier | Counts the number of times the agent has started a raft barrier to ensure all pending operations are applied. | blocks / interval | counter |
consul.raft.boltdb.freelistBytes | Represents the number of bytes necessary to encode the freelist metadata. When raft_logstore.boltdb.no_freelist_sync is set to false these metadata bytes must also be written to disk for each committed log. | bytes | gauge |
consul.raft.boltdb.freePageBytes | Represents the number of bytes of free space within the raft.db file. | bytes | gauge |
consul.raft.boltdb.getLog | Measures the amount of time spent reading logs from the db. | ms | timer |
consul.raft.boltdb.logBatchSize | Measures the total size in bytes of logs being written to the db in a single batch. | bytes | sample |
consul.raft.boltdb.logsPerBatch | Measures the number of logs being written per batch to the db. | logs | sample |
consul.raft.boltdb.logSize | Measures the size of logs being written to the db. | bytes | sample |
consul.raft.boltdb.numFreePages | Represents the number of free pages within the raft.db file. | pages | gauge |
consul.raft.boltdb.numPendingPages | Represents the number of pending pages within the raft.db that will soon become free. | pages | gauge |
consul.raft.boltdb.openReadTxn | Represents the number of open read transactions against the db. | transactions | gauge |
consul.raft.boltdb.totalReadTxn | Represents the total number of started read transactions against the db. | transactions | gauge |
consul.raft.boltdb.storeLogs | Measures the amount of time spent writing logs to the db. | ms | timer |
consul.raft.boltdb.txstats.cursorCount | Counts the number of cursors created since Consul was started. | cursors | counter |
consul.raft.boltdb.txstats.nodeCount | Counts the number of node allocations within the db since Consul was started. | allocations | counter |
consul.raft.boltdb.txstats.nodeDeref | Counts the number of node dereferences in the db since Consul was started. | dereferences | counter |
consul.raft.boltdb.txstats.pageAlloc | Represents the number of bytes allocated within the db since Consul was started. Note that this does not take into account space having been freed and reused. In that case, the value of this metric will still increase. | bytes | gauge |
consul.raft.boltdb.txstats.pageCount | Represents the number of pages allocated since Consul was started. Note that this does not take into account space having been freed and reused. In that case, the value of this metric will still increase. | pages | gauge |
consul.raft.boltdb.txstats.rebalance | Counts the number of node rebalances performed in the db since Consul was started. | rebalances | counter |
consul.raft.boltdb.txstats.rebalanceTime | Measures the time spent rebalancing nodes in the db. | ms | timer |
consul.raft.boltdb.txstats.spill | Counts the number of nodes spilled in the db since Consul was started. | spills | counter |
consul.raft.boltdb.txstats.spillTime | Measures the time spent spilling nodes in the db. | ms | timer |
consul.raft.boltdb.txstats.split | Counts the number of nodes split in the db since Consul was started. | splits | counter |
consul.raft.boltdb.txstats.write | Counts the number of writes to the db since Consul was started. | writes | counter |
consul.raft.boltdb.txstats.writeTime | Measures the amount of time spent performing writes to the db. | ms | timer |
consul.raft.boltdb.writeCapacity | Theoretical write capacity in terms of the number of logs that can be written per second. Each sample outputs what the capacity would be if future batched log write operations were similar to this one. This similarity encompasses 4 things: batch size, byte size, disk performance and boltdb performance. While none of these will be static and its highly likely individual samples of this metric will vary, aggregating this metric over a larger time window should provide a decent picture into how this BoltDB store can perform. | logs/second | sample |
consul.raft.commitNumLogs | Measures the count of logs processed for application to the FSM in a single batch. | logs | gauge |
consul.raft.commitTime | This measures the time it takes to commit a new entry to the Raft log on the leader. | ms | summary |
consul.raft.fsm.apply | Measures the time taken to apply a log entry to the Finite State Machine (FSM). | ms | summary |
consul.raft.fsm.enqueue | Measures the amount of time to enqueue a batch of logs for the FSM to apply. | ms | summary |
consul.raft.fsm.lastRestoreDuration | Measures the time taken to restore the FSM from a snapshot on an agent restart or from the leader calling installSnapshot. This is a gauge that holds it's value since most servers only restore during restarts which are typically infrequent. | ms | gauge |
consul.raft.last_index | Represents the raft last index. | index | gauge |
consul.raft.leader.dispatchLog | Measures the time it takes for the leader to write log entries to disk. | ms | summary |
consul.raft.leader.dispatchNumLogs | Measures the number of logs committed to disk in a batch. | logs | gauge |
consul.raft.leader.lastContact | Measures the time since the leader was last able to contact the follower nodes when checking its leader lease. | ms | summary |
consul.raft.leader.oldestLogAge | This measures how old the oldest log in the leader's log store is. | ms | gauge |
consul.raft.logstore.verifier.checkpoints_written | Counts the number of checkpoint entries written to the LogStore for verification purposes. | checkpoints | counter |
consul.raft.logstore.verifier.ranges_verified | Counts the number of log ranges for which a verification report has been completed. | log ranges verifications | counter |
consul.raft.rpc.installSnapshot | Measures the time it takes the raft leader to install a snapshot on a follower that is catching up after being down or has just joined the cluster. | ms | summary |
consul.raft.snapshot.persist | Measures the time it takes raft to write a new snapshot to disk. | ms | summary |
consul.raft.state.candidate | This increments whenever a Consul server starts an election. | election attempts / interval | counter |
consul.raft.state.leader | This increments whenever a Consul server becomes a leader. | leadership transitions / interval | counter |
consul.raft.thread.fsm.saturation | Measures the saturation level of the raft FSM thread, indicating processing load. | percentage | summary |
consul.raft.thread.main.saturation | Measures the saturation level of the main raft thread, indicating processing load. | percentage | summary |
consul.raft.wal.head_truncations | Counts how many log entries have been truncated from the head - i.e. the oldest entries. by graphing the rate of change over time you can see individual truncate calls as spikes. | log entries truncated | counter |
consul.raft.wal.last_segment_age_seconds | Is set each time we rotate a segment and describes the number of seconds between when that segment file was first created and when it was sealed. this gives a rough estimate how quickly writes are filling the disk. | seconds | gauge |
consul.raft.wal.log_appends | Counts the number of calls to StoreLog(s) i.e. number of batches of entries appended. | calls | counter |
consul.raft.wal.log_entries_read | Counts the number of calls to get_log. | log entries reads | counter |
consul.raft.wal.log_entries_written | Counts the number of entries written. | log entries written | counter |
consul.raft.wal.log_entry_bytes_read | Counts the bytes of log entry read from segments before decoding. actual bytes read from disk might be higher as it includes headers and index entries and possible secondary reads for large entries that don't fit in buffers. | bytes | counter |
consul.raft.wal.log_entry_bytes_written | Counts the bytes of log entry after encoding with Codec. Actual bytes written to disk might be slightly higher as it includes headers and index entries. | bytes | counter |
consul.raft.wal.segment_rotations | Counts how many times we move to a new segment file. | rotations | counter |
consul.raft.wal.stable_gets | Counts how many calls to StableStore.Get or GetUint64. | calls | counter |
consul.raft.wal.stable_sets | Counts how many calls to StableStore.Set or SetUint64. | calls | counter |
consul.raft.wal.tail_truncations | Counts how many log entries have been truncated from the head - i.e. the newest entries. by graphing the rate of change over time you can see individual truncate calls as spikes. | log entries truncates | counter |
RPC metrics
Metrics relative to the performances of the RPC requests to the Consul servers.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.rpc.accept_conn | Increments when a server accepts an RPC connection. | connections | counter |
consul.rpc.consistentRead | Measures the time spent confirming that a consistent read can be performed. | ms | summary |
consul.rpc.cross_dc | Increments when a server sends a (potentially blocking) cross datacenter RPC query. | query | counter |
consul.rpc.queries_blocking | Shows the current number of in-flight blocking queries the server is handling. | query | gauge |
consul.rpc.query | Increments when a server receives a read request, indicating the rate of new read queries. | query | counter |
consul.rpc.raft_handoff | Increments when a server accepts a Raft-related RPC connection. | connections | counter |
consul.rpc.rate_limit.exceeded | Increments whenever an RPC is over a configured rate limit. Note: in permissive mode, the RPC will have still been allowed to proceed. | RCPs | counter |
consul.rpc.rate_limit.log_dropped | Increments whenever a log that is emitted because an RPC exceeded a rate limit gets dropped because the output buffer is full. | log | counter |
consul.rpc.request | Increments when a server receives a Consul-related RPC request. | requests | counter |
consul.rpc.request_error | Increments when a server returns an error from an RPC request. | errors | counter |
Serf metrics
Metrics relative to the performances of the Serf gossip protocol, used to manage membership and broadcast messages to the cluster.
Refer also to Membership metrics for more metrics relative to the gossip protocol.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.serf.coordinate_adjustment_ms | Measures the magnitude of network coordinate adjustments in milliseconds for failure detection tuning. | ms | summary |
consul.serf.queue_Event | Measures the number of serf user events currently queued for processing. | queued events | summary |
consul.serf.queue_Intent | Measures the number of serf intent messages queued for broadcast. | queued messages | summary |
consul.serf.queue_Query | Measures the number of serf queries queued for processing. | queued queries | summary |
Metrics can be appended with certain labels to further distinguish data between different gossip pools. The supported label for CE is network, while segment, partition, area are allowed for .Enterprise
Session metrics
Metrics relative to the performances of Consul sessions.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.session.apply | Measures the time spent applying a session update. | ms | summary |
consul.session.renew | Measures the time spent renewing a session. | ms | summary |
consul.session_ttl.active | Tracks the active number of sessions being tracked. | sessions | gauge |
consul.session_ttl.invalidate | Measures the time spent invalidating an expired session. | ms | summary |
Consul state metrics
Metrics that measure the amount of entities (kv entries, nodes, services, config entries, etc.) present in the Consul datacenter.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.state.billable_service_instances | Total number of billable service instances in the local datacenter. | number of objects | gauge |
consul.state.config_entries | Measures the current number of unique configuration entries registered with Consul, labeled by Kind. It is only emitted by Consul servers. Added in v1.10.4. | number of objects | gauge |
consul.state.connect_instances | Measures the current number of unique connect service instances registered with Consul, labeled by Kind. It is only emitted by Consul servers. Added in v1.10.4. | number of objects | gauge |
consul.state.kv_entries | Measures the current number of entries in the Consul KV store. It is only emitted by Consul servers. Added in v1.10.3. | number of objects | gauge |
consul.state.nodes | Measures the current number of nodes registered with Consul. It is only emitted by Consul servers. Added in v1.9.0. | number of objects | gauge |
consul.state.peerings | Measures the current number of peerings registered with Consul. It is only emitted by Consul servers. Added in v1.13.0. | number of objects | gauge |
consul.state.service_instances | Measures the current number of unique services registered with Consul, based on service name. It is only emitted by Consul servers. Added in v1.9.0. | number of objects | gauge |
consul.state.services | Measures the current number of unique services registered with Consul, based on service name. It is only emitted by Consul servers. Added in v1.9.0. | number of objects | gauge |
Other Consul info metrics
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.agent.tls.cert.expiry | Seconds until the agent tls certificate expires. Updated every hour. | seconds | gauge |
consul.version | Represents the Consul version. | version | gauge |
consul.system.licenseExpiration | Represents the number of hours until the current license is going to expire. | hours | gauge |
Transaction metrics
Metrics that measure the time needed to perform transactions inside the Consul cluster.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.txn.apply | Measures the time spent applying a transaction operation. | ms | summary |
consul.txn.read | Measures the time spent returning a read transaction. | ms | summary |
xDS metrics
Metrics that measure the number and performance of the Consul servers' xDS streams.
| Metric | Description | Unit | Type |
|---|---|---|---|
consul.xds.server.idealStreamsMax | The maximum number of xDS streams per server, chosen to achieve a roughly even spread of load across servers. | streams | gauge |
consul.xds.server.streamDrained | Counts the number of xDS streams that are drained when rebalancing the load between servers. | streams | counter |
consul.xds.server.streamStart | Measures the time in milliseconds after an xDS stream is opened until xDS resources are first generated for the stream. | ms | summary |
consul.xds.server.streams | Measures the number of active xDS streams handled by the server split by protocol version. | streams | gauge |
consul.xds.server.streamsUnauthenticated | Counts the number of active xDS streams handled by the server that are unauthenticated because ACLs are not enabled or ACL tokens were missing. | streams | gauge |