Reliability pillar introduction

9min

Introduction

HashiCorp Well-Architected Framework provides best-practice guidance for organizations. Specifically, it aims to help practitioners optimize their production HashiCorp deployments while meeting their organization's specific architectural needs. The well-architected framework starts at the cloud operating model, it sets the overarching goal of the framework, cloud migration enablement.

The reliability pillar recommends strategies that help prevent disruptions from a single point of failure; ensuring high availability and business continuity of your mission-critical applications and infrastructure.

Consistent availability of infrastructure and applications

With increasing ephemerality and distribution of infrastructure and applications that support critical workflows, you should ensure that your infrastructure and applications are always available. You must also enable policy and governance tooling to match the speed of delivery with compliance to manage risk.

You can configure HashiCorp products to operate reliably with high availability features like failover, automatic cleanup of failed servers, and recovery from catastrophic failure with data snapshots. You can also monitor the health and performance of HashiCorp tools through telemetry metrics and observability data.

Best practices

HashiCorp's reliability pillar provides best practices to,

Design fault tolerant systems
Deploy infrastructure and applications with zero-downtime
Back up and restore Terraform Enterprise data.

Design fault tolerant systems

Consider the resources presented in design fault tolerant systems regarding the design, implementation, and operation of your business systems to best achieve your reliability goals.

Deploy infrastructure and applications with zero-downtime

Deploying your infrastructure and applications with zero-downtime is essential for maintaining high availability. Review the Terraform Enterprise zero-downtime deployment recommended pattern to learn best practices for zero-downtime deployments with Terraform.

Backup and recover

A reliable backup of your Terraform Enterprise deployment is crucial to ensuring business continuity. Learn the best practices, options, and considerations to back up Terraform Enterprise and increase its resiliency in the Terraform Enterprise backup recommended pattern. Backups are critical, but restoration and recovery with backups are just as important. Learn how to restore your Terraform Enterprise deployment from a backup in the Terraform Enterprise restore recommended pattern.

Monitor your infrastructure and applications

When you monitor your infrastructure and applications, you can detect issues before they impact users. By monitoring a combination of observability data, such as metrics, operational and audit logging, and alerts you can identify bottlenecks, track resource use, and pinpoint potential failures. Follow the guidance in React to metrics and monitoring to help define your organization's monitoring strategy.

Collection Overview

Reliability

Monitor your infrastructure and services