High availability (HA) and disaster recovery (DR)

HCP Vault Dedicated's highly-available, single-tenant data plane architecture enable HCP-managed Vault Enterprise clusters to remain operational independent of the HCP Control Plane. This architecture is designed to maximize availability of clusters managed by HCP.

The sections below provide additional information around the high availability and disaster recover posture for HCP Vault Dedicated:

3-node HA clusters

All production-tier HCP Vault Dedicated clusters (essentials, or standard) consist of 3 highly available Vault nodes leveraging Vault Enterprise's performance standby capability. HCP has robust monitoring in place to regularly check the health of the cluster and recovery mechanisms to restore cluster availability in the event of a disruption.

Cross-region disaster recovery

HCP organizations with flex or entitlement billing can configure Essentials and Standard Tier clusters with a backup network (HVN) when creating a new cluster or by editing an existing cluster's configuration within the HCP Portal. HCP will use this backup network to perform a failover of your cluster if HCP detects a cloud provider region outage impacting the availability of your cluster(s). You will receive an email notification if your cluster is failed over into it's backup region and when it is failed back to its original primary region. You can also subscribe to HashiCorp's status page for incident updates. See below for other important considerations and refer to this guide for details on how to configure.

If you need to delete a cross-region DR cluster, contact support.

Important considerations

The HVN used for the cluster's backup network should be in a different region than the primary with a non-overlapping CIDR.
You will need to ensure you have network connectivity to this HVN to communicate with your cluster when it is failed over into the backup network region.
- The cluster private and public URL (DNS) will remain the same during a fail over and can be used to access the cluster.
- Clusters with HCP Proxy address enabled will not be able to accept requests to the cluster's proxy address when a cluster is active in its backup region.
When configuring a backup network on an existing cluster, the cluster will be unavailable to service any requests for approximately 10 minutes.
If you have audit log and metrics streaming enabled, they will be emitted with DR- in the cluster_id label when your cluster is active in the backup network region.

Platform outages

Clusters can remain operational and HCP still has ability to fail over clusters into configured backup regions during a disaster event even during a platform outage. The HCP API and UI may be affected, but already running clusters will remain operational and serve client requests. During a platform outage, cluster management operations such as admin token generation will be unavailable.

Recommended practice for Vault administrators

The admin token generated in the HCP Portal should be used for initial configuration or emergency access only.

You can mitigate the risk of not being able to generate admin tokens during a platform outage by setting up appropriate authentication within Vault to generate tokens that provide the necessary administrative access.

Refer to Authentication method documentation for more information on configuring Vault auth methods.

Cluster deletion and snapshot restore

Once a cluster is deleted, all affiliated resources (including audit logs) are deleted with it, except snapshots which are retained by HCP for 30 days after deletion of a cluster. Snapshots are only retained for standard, and plus tiers post cluster deletion. Snapshots can be used to recover a deleted cluster, including restoring to a different region. To request restore from a snapshot, please file a support ticket here.