Appendix 1: Admin partitions
Consul admin partitions give organizations the option to define tenancy boundaries for services using Consul. This can help organizations managing services across teams and business units. Teams can benefit from managing and customizing their own Consul environment, without impacting other teams, or other Consul environments.
Some organizations want to allow organizational or business units to deploy their own installations of Consul Enterprise on their own Kubernetes clusters. Centrally managing multiple Consul installations can be an operational challenge for organizations. Instead of giving teams their own clusters, the organization can consolidate these installations onto a shared multi-tenant server cluster. This cluster serves as the control plane for Consul clients in the tenant clusters, and ensures separation between tenants of the system. This deployment model provides teams the autonomy to configure Consul and application networking as they require. This increases the team's flexibility to manage application deployments, and eliminates the operational overhead associated with managing individual server clusters.
Appendix 2: Kubernetes catalog sync
To use catalog sync, you must enable it in the Helm chart(opens in new tab). Catalog sync allows you to sync services between Consul and Kubernetes. The sync can be unidirectional in either direction or bidirectional. Refer to the operating guide to learn more about the configuration options.
Services are synced from Kubernetes and non-Kubernetes to Consul service registry to make them discoverable, like any other service within the Consul datacenter. This service sync allows Kubernetes services to use Kubernetes' native service discovery capabilities to discover and connect to external services registered in Consul, and for external services to use Consul service discovery to discover and connect to Kubernetes services. Read more in the network connectivity(opens in new tab) section to learn more about related Kubernetes configuration. Services synced from Consul to Kubernetes are discoverable with the built-in Kubernetes DNS once a Consul stub domain(opens in new tab) is deployed. When bidirectional catalog sync is enabled, it behaves like the two unidirectional setups.
Appendix 3: Gossip protocol
Consul uses a gossip protocol(opens in new tab) to manage membership and broadcast messages to the cluster. The protocol, membership management, and message broadcasting is provided through the Serf library(opens in new tab). The gossip protocol used by Serf is based on a modified version of the SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol(opens in new tab).
Consul uses a LAN gossip pool and a WAN gossip pool to perform different functions. The pools are able to perform their functions by leveraging an embedded Serf(opens in new tab) library. The library is abstracted and masked by Consul to simplify the user experience, but developers may find it useful to understand how the library is leveraged.
Appendix 4: Sample Terraform and helm configuration
Consul server
Terraform configuration
# Nodes tainted for running consul servers
consul_server = {
  name = "consul_server"
 
  instance_types = var.consul_server_node_type
 
  min_size   = 1
  max_size   = 5
  desired_size = var.consul_server_node_count
 
  // Nodes only for consul server agents; excluding other agents
  taints = {
    dedicated = {
      key  = "consul_agent_type"
      value = "server"
      effect = "NO_SCHEDULE"
    }
  }
  labels = {
    consul_agent_type = "server"
  }
}
Resource requirement (helm)
# Configure your Consul servers in this section.
server:
# Specify three servers that wait until all are healthy to bootstrap the Consul cluster.
replicas: 3
 # Specify the resources that servers request for placement. These values will serve a large environment.
resources:
  requests:
    memory: '32Gi'
    cpu: '4'
    disk: '50Gi'
  limits:
    memory: '32Gi'
    cpu: '4'
    disk: '50Gi'
Consul client
Terraform configuration
# Nodes to deploy services and consul client
consul_workload = {
  name = "consul_workload"
 
  instance_types = var.consul_workload_node_type
 
  min_size   = 0
  max_size   = 200
  desired_size = var.consul_workload_node_count
  labels = {
    app = "client - Data Plane"
  }
}
Resource requirement (helm)
# Configure Consul clients in this section
client:
# Specify the resources that clients request for deployment.
resources:
  requests:
    memory: '8Gi'
    cpu: '2'
    disk: '15Gi'
  limits:
    memory: '8Gi'
    cpu: '2'
    disk: '15Gi'
Helm configuration - load balancer
# ELB or Classic Load Balancer
---
global:
  name: consul
  datacenter: dc1
ui:
  enabled: true
  service:
    type: LoadBalancer
# NLB: Network Load Balancer
---
global:
  name: consul
  datacenter: dc1
ui:
  enabled: true
  service:
    type: LoadBalancer
    ingress:
     enabled: true
     ingressClassName: alb
     hosts:
      - host: consul-ui.test.consul.domain
     annotations: |
      'alb.ingress.kubernetes.io/certificate-arn': 'arn:aws:acm:us-east-2:01234xxxxxxx:certificate/f36b75c3-xxxx-40ca-xxx-3a2fad7f419d'
       'alb.ingress.kubernetes.io/listen-ports': '[{"HTTPS": 443}]'
       'alb.ingress.kubernetes.io/backend-protocol': 'HTTPS'
       'alb.ingress.kubernetes.io/healthcheck-path': '/v1/status/leader'
       'alb.ingress.kubernetes.io/group.name': 'envname-consul-server'
 
---
global:
  name: consul
  datacenter: dc1
ui:
  enabled: true
  service:
    type: LoadBalancer
    annotations: |
      'service.beta.kubernetes.io/aws-load-balancer-type': "external"
      'service.beta.kubernetes.io/aws-load-balancer-nlb-target-type': "instance"
ELB_IRSA - Terraform configuration
resource "kubernetes_service_account" "lb_controller" {
  metadata {
    name   = "aws-load-balancer-controller"
    namespace = "kube-system"
    labels = {
      "app.kubernetes.io/component" = "controller"
      "app.kubernetes.io/name"   = "aws-load-balancer-controller"
    }
    annotations = {
      "eks.amazonaws.com/role-arn" = module.lb_irsa.iam_role_arn
    }
  }
  depends_on = [
    module.eks
  ]
}
AWS load balancer controller - TF
resource "helm_release" "lb_controller" {
  name    = "aws-load-balancer-controller"
  repository = "https://aws.github.io/eks-charts"
  chart   = "aws-load-balancer-controller"
  namespace = "kube-system"
 
  set {
    name = "clusterName"
    value = module.eks.cluster_name
  }
 
  set {
    name = "serviceAccount.create"
    value = false
  }
 
  set {
    name = "serviceAccount.name"
    value = kubernetes_service_account.lb_controller.metadata[0].name
  }
}
EBS CSI driver
Terraform configuration for installing an EBS CSI driver.
data "aws_iam_policy" "ebs_csi_policy" {
  arn = "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
}
 
module "irsa-ebs-csi" {
  source =
   "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version = "5.27.0"
  create_role          = true
  role_name           =
                      "AmazonEKSTFEBSCSIRole-${module.eks.cluster_name}"
  provider_url         = module.eks.oidc_provider
  role_policy_arns       = [data.aws_iam_policy.ebs_csi_policy.arn]
  oidc_fully_qualified_subjects =
             ["system:serviceaccount:kube-system:ebs-csi-controller-sa"]
}
 
resource "aws_eks_addon" "ebs-csi" {
  cluster_name       = module.eks.cluster_name
  addon_name        = "aws-ebs-csi-driver"
  addon_version      = "v1.24.0-eksbuild.1"
  service_account_role_arn = module.irsa-ebs-csi.iam_role_arn
  tags = {
    "eks_addon" = "ebs-csi"
    "terraform" = "true"
  }
  preserve = false
}
Consul server helm
global:
 
  name: consul
  image: "hashicorp/consul-enterprise:1.16.X-ent"
  datacenter: default
  adminPartitions:
    enabled: true
    name: "default"
  acls:
    manageSystemACLs: true
  enableConsulNamespaces: true
  enterpriseLicense:
    secretName: consul-enterprise-license
    secretKey: license
    enableLicenseAutoload: true
  peering:
    enabled: true
  tls:
    enabled: true
 
server:
 replicas: 3
 bootstrapExpect: 3
 exposeService:
    enabled: true
    type: LoadBalancer
 extraConfig: |
    {
      "log_level": "TRACE"
    }
 
syncCatalog:
 enabled: true
 k8sAllowNamespaces: ["*"]
 consulNamespaces:
   mirroringK8S: true
 
connectInject:
  enabled: true
  transparentProxy:
    defaultEnabled: false
  consulNamespaces:
    mirroringK8S: true
  k8sAllowNamespaces: ['*']
  k8sDenyNamespaces: []
  apiGateway:
    managedGatewayClass:
      serviceType: LoadBalancer
 
 
meshGateway:
  enabled: true
  replicas: 3
  service:
    enabled: true
    type: LoadBalancer
 
ui:
  enabled: true
  service:
    enabled: true
    type: LoadBalancer
Non-default partition K8s cluster - helm
global:
  name: consul
  datacenter: default
  enabled: false
  image: "hashicorp/consul-enterprise:1.16.X-ent"
  imageK8S: hashicorp/consul-k8s-control-plane:1.0.2
  imageConsulDataplane: "hashicorp/consul-dataplane:1.0.0"
 
  enableConsulNamespaces: true
  adminPartitions:
    enabled: true
    name: "prod-partition-1"
  peering:
    enabled: true
 
  tls:
    enabled: true
    caCert:
      secretName: consul-ca-cert
      secretKey: tls.crt
    caKey:
      secretName: consul-ca-key
      secretKey: tls.key
 
  acls:
    manageSystemACLs: true
    bootstrapToken:
      secretName: consul-bootstrap-acl-token
      secretKey: token
 
externalServers:
  enabled: true
  hosts: [ "a7d8d3f12cdfb4783af0357050e95416-secondryTest.us-east-1.elb.amazonaws.com" ] # External-IP (or DNS name) of the Expose Servers
  tlsServerName: server.default.consul   # <server>.<datacenter>.<dns>
  k8sAuthMethodHost: "CE8490A70630FBF25B9DsecondryTest.gr7.us-east-1.eks.amazonaws.com" # DNS name of EKS API of client1 - prod-parition-1
  httpsPort: 8501
  grpcPort: 8502
  useSystemRoots: false
 
server:
  enabled: false
 
connectInject:
  transparentProxy:
    defaultEnabled: true
  enabled: true
  default: true
  apiGateway:
    managedGatewayClass:
      serviceType: LoadBalancer
 
meshGateway:
 enabled: true
 replicas: 3
 service:
  enabled: true
  type: LoadBalancer
Agent telemetry
The Consul agent collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten second (10s) interval and are retained for one minute. An interval is the period of time between instances of data being collected and aggregated.
When telemetry is being streamed to an external metrics store, the interval is defined to be that store's flush interval.
| External Store | Interval (Seconds) | 
|---|---|
| dogstatsd(opens in new tab) | 10s | 
| Prometheus(opens in new tab) | 60s | 
| statsd(opens in new tab) | 10s | 
Consul emits metrics under two major categories Consul health and server health:
Consul health:
- Transaction timing
- Leadership changes
- Autopilot
- Garbage collection
Server health:
- File descriptors
- CPU usage
- Network activity
- Disk activity
- Memory usage
Consul telemetry collector intention
Create a service-intentions configuration entry that allows all traffic to consul-telemetry-collector:
# consul-telemetry-collector.yaml
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceIntentions
metadata:
  name: consul-telemetry-collector
spec:
  destination:
    name: consul-telemetry-collector
  sources:
  - action: allow
    name: '*'
Create configuration entry:
kubectl apply --namespace consul --filename consul-telemetry-collector.yaml
Consul server sizing - EC2 to Consul cluster size
| Provider | Size | Instance Type | CPU | Memory | Disk Capacity | Disk IO | 
|---|---|---|---|---|---|---|
| AWS | Small | m5.large | 2 | 8 | min: 100 GB (gp3) | min: 3000 IOPS | 
| AWS | Medium | m5.xlarge | 4 | 16 | min: 100 GB (gp3) | min: 3000 IOPS | 
| AWS | Large | m5.2xlarge | 8 | 32 | min: 200 GB (gp3) | min: 7500 IOPS | 
| AWS | Extra Lage | m5.4xlarge | 16 | 64 | min: 200 GB (gp3) | min: 7500 IOPS | 
Agents
You can run the Consul binary to start Consul agents, which are daemons that implement Consul control plane functionality. You can start agents as Servers or clients.
terraform.auto.tfvars - Terraform deployment
friendly_name_prefix = "consul"
 
common_tags = {
  deployment = "consul"
  site       = "westeros"
}
 
route53_failover_record = {
  record_name = "consul"
}
 
secretsmanager_secrets = {
  license = {
    name = "consul-license"
    data = ""
  }
  ca_certificate_bundle = {
    name = "consul-ca-bundle"
    path = "./consul-agent-ca.pem"
  }
  cert_pem_secret = {
    name = "consul-public"
    path = "./consul-server-public.pem"
  }
  cert_pem_private_key_secret = {
    name = "consul-private"
    path = "./consul-server-private.pem"
  }
  consul_initial_management_token = {
    generate = true
  }
  consul_agent_token = {
    generate = true
  }
  consul_gossip_key = {
    generate = true
  }
  consul_snapshot_token = {
    generate = true
  }
  consul_ingress_gw_token = {
    generate = true
  }
  consul_terminating_gw_token = {
    generate = true
  }
  consul_mesh_gw_token = {
    generate = true
  }
}
 
snapshot_interval = "5 min"
 
s3_buckets = {
  snapshot = {
    bucket_name   = "consul-westeros-snapshots"
    force_destroy = true
  }
}
 
route53_zone_name = "test.aws.sbx.hashicorpdemo.com"
ssh_public_key    = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCX/57xBO3ZBhWFnHXcO0+DOKyrajTWyvOlxFHUQ/PlH9iNqog4XIWkYlG/3f0ctl61IR0InrH2PRYYctlR4HIeMWO2cbJcC8PovWlaB9nHU3rb16JWsx47C48R6iurTxyvYHkeYPbjYlicqztwMbvdh55jw+vOTZCM85Ni+burz1dxSTYh164rsB2WzRL+G/c74D5L6ufOnY6k9VTlf9VGpZ6Zh72xmm9IKyHwO6t518Ht5QZdQtBPKEjbGMByLSHPBsw1ceq1P+r315YfH7rYR11DnDNDpkrf87RB5nC9TiukMlz53MtW6vdPzThB/XlupqWDjwdlQGmU9BnGMu+jz0eWtUIQkaTANQXxtQAgv/YvuAq2QuRsd/lRLwR49fRbUXy3VRThYVu25oZsvPgknsY4ZarTYh1d65C2qrVVvoEYdnx4w+rBQWWludOhvcwfz5edpvxIoUh9ksdWog1kMlr8fFUCQepCPUF8ObM69sXjJv9sdM3GpGiGtUinda8="
 
iam_resources = {
  ssm_enable              = true
  cloud_auto_join_enabled = true
  log_forwarding_enabled  = true
  role_name               = "consul-role"
  policy_name             = "consul-policy"
  ssm_enable              = true
}
 
rules = {
  consul = {
    server = {
      rpc = {
        enabled   = true
        self      = true
        target_sg = "agent"
      }
      serf_lan_tcp = {
        enabled   = true
        self      = true
        target_sg = "agent"
      }
      serf_lan_udp = {
        enabled   = true
        self      = true
        target_sg = "agent"
      }
      dns_tcp = {
        enabled    = true
        self       = true
        bidrection = true
      }
      dns_udp = {
        enabled    = true
        self       = true
        bidrection = true
      }
      https_api = {
        enabled    = true
        self       = true
        bidrection = false
      }
      grpc = {
        enabled    = true
        self       = true
        bidrection = false
      }
      grpc_tls = {
        enabled    = true
        self       = true
        bidrection = false
      }
    }
    agent = {
      rpc = {
        enabled = true
        self    = true
      }
      serf_lan_tcp = {
        enabled = true
        self    = true
      }
      serf_lan_udp = {
        enabled = true
        self    = true
      }
      dns_tcp = {
        enabled       = true
        self          = true
        bidirectional = true
      }
      dns_udp = {
        enabled       = true
        self          = true
        bidirectional = true
      }
      serf_lan_udp = {
        enabled = true
        self    = true
      }
      mesh_gateway = {
        enabled = true
        self    = true
      }
      ingress_gateway = {
        enabled = true
        self    = true
      }
    }
  }
}