Consul
Custom Proxy Configuration for Service Mesh
This topic describes the process and API endpoints you can use to extend proxies for integration with Consul.
Overview
You can extend any proxy to support Connect. Consul ships with a built-in proxy suitable for an out-of-the-box development experience, but you may require a more robust proxy solution for production environments.
The proxy you integrate must be able to accept inbound connections and/or establish outbound connections identified as a particular service. In some cases, either ability may be acceptable, but both are generally required to support for full sidecar functionality.
Sidecar proxies may support L4 or L7 network functionality. L4 integration is simpler and adequate for securing all traffic. L4 treats all traffic as TCP, however, so advanced routing or metrics features are not supported.
Full L7 support is built on top of L4 support. An L7 proxy integration supports most or all of the L7 traffic routing features in Connect by dynamically configuring routing, retries, and other L7 features. The built-in proxy only supports L4, while Envoy supports the full L7 feature set.
Areas where the integration approach differs between L4 and L7 are identified in this topic.
Accepting Inbound Connections
The proxy must accept TLS connections on some port to accept inbound connections.
Obtaining and validating client certificates
Call the /v1/agent/connect/ca/leaf/
API endpoint to obtain the client certificate, e.g.:
$ curl http://<host-ip>:8500/v1/agent/connect/ca/leaf/<service-name>
The client certificate from the inbound connection must be validated against the Connect CA root certificates. Call the /v1/agent/connect/ca/roots
endpoint to obtain the root certificates from the Connect CA, e.g.:
$ curl http://<host-ip>:8500/v1/agent/connect/ca/roots
Authorizing the connection
After validating the client certificate from the caller, the proxy can authorize the entire connection (L4) or each request (L7). Depending upon the protocol of the proxied service, authorization is performed either on a per-connection (L4) or per-request (L7) basis. Authentication is based on "service identity" (TLS), and is implemented at the transport layer.
Note: Some features, such as (local) rate limiting or max connections, are expected to be proxy-level configurations enforced separately when authorization calls are made. Proxies can enforce the configurations based on information about request rates and other states that should already be available.
The proxy can authorize the connection by either calling the /v1/agent/connect/authorize
API endpoint or by querying the intention match API endpoint.
The /v1/agent/connect/authorize
endpoint should be called in the connection path for each received connection.
If the local Consul agent is down or unresponsive, the success rate of new connections will be compromised.
The agent uses locally-cached data to authorize the connection and typically responds in microseconds. As a result, the TLS handshake typically spans microseconds.
Note: This endpoint is only suitable for L4 (e.g., TCP) integration. The endpoint always treats intentions with Permissions
defined (i.e., L7 criteria) as deny
intentions during evaluation.
The proxy can query the intention match API endpoint on startup to retrieve a list of intentions that match the proxy destination. The matches should be stored in the native filter configuration of the proxy, such as RBAC for Envoy.
For performance and reliability reasons, querying the intention match API endpoint is the recommended method for implementing intention enforcement. The cached intentions should be consulted for each incoming connection (L4) or request (L7) to determine if the connection or request should be accepted or rejected.
Persistent TCP connections and intentions
For a proxied service configured with the TCP protocol, potentially long-lived TCP connections will only be authorized when the connections are initially established. But because many services, such as databases, typically use persistent connection pools, changing intentions to deny access does not terminate existing connections. This behavior violates the updated intention. In these cases, it may appear as if the intention is not being enforced.
Implement one of the following strategies to close connections:
Configure connections to terminate after a maximum lifetime, e.g., several hours. This balances the overhead of establishing new connections with determining how long existing connections remain open after an intention changes.
Periodically re-authorize every open connection. The authorization call is inexpensive and should be a local, in-memory operation on the Consul agent. Periodically authorizing thousands of open connections (e.g., once every minute) is likely to be negligible overhead, but doing so enforces a tighter upper boundary on how long it takes to enforce intention changes without affecting the protocol efficiency of persistent connections.
Certificate serial in authorization
Intentions currently use TLS URI Subject Alternative Name (SAN) for enforcement. The AuthZ
API in the Go SDK contains a field for passing the serial number (consul/connect/tls.go
). Proxies may provide this value during authorization.
Updating data
The API endpoints described in this section operate on agent-local data that is updated in the background. The leaf, roots, and intentions should be updated in the background by the proxy.
The leaf cert, root cert, and intentions endpoints support blocking queries, which should be used to get near-immediate updates for root key rotations, new leaf certs before expiry, and intention changes.
SPIFFE certificates
Although Consul follows the SPIFFE spec for certificates, some CA providers do not allow strict adherence. For example, CA certificates may not have the correct trust-domain SPIFFE URI SAN for the cluster. If SPIFFE validation is performed in the proxy, be aware that it should be possible to opt out, otherwise certain CA providers supported by Consul will not be compatible with the use of that proxy. Neither Envoy nor the built-in proxy currently validate the SPIFFE URI of the chain beyond the leaf certificate.
Establishing Outbound Connections
For outbound connections, the proxy should communicate with a Connect-capable
endpoint for a service and provide a client certificate from the
/v1/agent/connect/ca/leaf/
API endpoint. The proxy must use the root certificate obtained from the /v1/agent/connect/ca/roots
endpoint to verify the certificate served from the destination endpoint.
Configuration Discovery
The /v1/agent/service/:service_id
API endpoint enables any proxy to discover proxy configurations registered with a local service. This endpoint supports hash-based blocking, which enables long-polling for changes
to the registration/configuration. Any changes to the registration/config will
result in the new config being returned immediately.
Refer to the built-in proxy for an example implementation. Using the Go SDK, the proxy calls the HTTP "pull" API via the watch
package: consul/connect/proxy/config.go
.
The discovery chain for each upstream service should be fetched from the
/v1/discovery-chain/:service_id
API endpoint. This will return a compiled graph of configurations a sidecar needs for a particular upstream service.
If you are only implementing L4 support in your proxy, set the
OverrideProtocol
value to tcp
when
fetching the discovery chain so that L7 features, such as HTTP routing rules, are
not returned.
For each target in the resulting
discovery chain, a list of healthy, Connect-capable endpoints may be fetched
from the [/v1/health/connect/:service_id
] API endpoint as described in the Service
Discovery section.
The remaining nodes in the chain include configurations that should be translated into the nearest equivalent for features, such as HTTP routing, connection timeouts, connection pool settings, rate limits, etc. See the full discovery chain documentation and relevant config entry documentation for details about supported configuration parameters.
Service Discovery
Proxies can use Consul's service discovery API to return all available, Connect-capable endpoints for a given service. This endpoint supports a cached
query parameter, which uses agent caching to improve
performance. The API package provides a UseCache
query option to leverage caching.
In addition to performance improvements, using the cache makes the mesh more resilient to Consul server outages. This is because the mesh "fails static" with the last known set of service instances still used, rather than errors on new connections.
Proxies can decide whether to perform just-in-time queries to the API when a new connection needs to be routed, or to use blocking queries to load the current set of endpoints for a service and keep that list updated. The SDK and built-in proxy currently use just-in-time resolution however many existing proxies are likely to find it easier to integrate by pulling the set of endpoints and maintaining it in local memory using blocking queries.
Upstreams may be defined with the Prepared Query target type. These upstreams
should use Consul's prepared query API to determine a list of upstream endpoints for the service. Note that the PreparedQuery
API does not support blocking, so proxies choosing to populate endpoints in memory will need to poll the endpoint at a suitable and, ideally, configurable frequency.
Long-term support for service-resolver
configuration
entries. The service-resolver
configuration will completely replace prepared queries in future versions of Consul. In some instances, however, prepared queries are still used.
Sidecar Instantiation
Consul does not start or manage sidecar proxy processes. Proxies running on a physical host or VM are designed to be started and run by process supervisor systems, such as init, systemd, supervisord, etc. If deployed within a cluster scheduler (Kubernetes, Nomad), proxies should run as a sidecar container in the same namespace.
Proxies will use the CONSUL_HTTP_TOKEN
and
CONSUL_HTTP_ADDR
environment variables to
contact Consul and fetch certificates. This occurs if the CONSUL_HTTP_TOKEN
environment variable contains a Consul ACL token that has the necessary permissions
to read configurations for that service. If you use the Go api
package, then
the environment variables will be read and the client configured for you
automatically.
Alternatively, you may also use the flags -token
or -token-file
to provide the Consul ACL token.
Providing a Consul ACL Token
$ consul connect envoy -sidecar-for "web" -token-file=/etc/consul.d/consul.token
If TLS is enabled on Consul, you will also need to add the following environment variables prior to starting the proxy:
The CONSUL_CACERT
, CONSUL_CLIENT_CERT
and CONSUL_CLIENT_KEY
can also be provided as CLI flags. Refer to the consul connect proxy
documentation for details.
The proxy service ID comes from the user. See consul connect envoy
for an example. You can use the -proxy-id
flag to specify the ID of the proxy service you have already registered with the local agent.
Alternatively, you can start the service using the -sidecar-for=<service>
option. This option queries Consul for a proxy that is registered as a sidecar for the specified <service>
. If exactly one service associated with the proxy is returned, the ID will be used to start the proxy. Your controller only needs to accept -proxy-id
as an argument; the Consul CLI will resolve the
ID for the name specified in -sidecar-for
flag.