Infrastructure as Code
Infrastructure as Code
Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | {{DATE}} | {{AUTHOR}} | Initial draft |
1. Overview
IaC Tool: {{IAC_TOOL}} Tool Version: {{IAC_VERSION}} Provider: {{CLOUD_PROVIDER}} Provider Version: {{PROVIDER_VERSION}}
Rationale for tool choice:
{{IAC_RATIONALE}}
Core Principles:
- All infrastructure changes go through code (no manual console changes in staging/prod)
- IaC reviewed like application code (PR, review, merge)
- State is the single source of truth
- Modules are versioned and reusable
2. Repository Structure
{{IaC_REPO}}/
├── modules/ # Reusable modules
│ ├── networking/ # VPC, subnets, security groups
│ ├── compute/ # EC2, ECS, Lambda
│ ├── database/ # RDS, ElastiCache
│ ├── storage/ # S3, EFS
│ └── monitoring/ # CloudWatch, alerts
├── environments/ # Environment-specific configs
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── terraform.tfvars
│ └── production/
│ ├── main.tf
│ ├── variables.tf
│ └── terraform.tfvars
├── shared/ # Shared resources (DNS, accounts)
├── scripts/ # Helper scripts
│ ├── bootstrap.sh # Initialize state backend
│ └── validate.sh # Pre-apply validation
├── .terraform-version # Pin tool version (tfenv)
├── .tflint.hcl # Linting config
└── README.md
2.1 Module Organization
| Module | Purpose | Inputs | Outputs |
|---|---|---|---|
modules/networking |
VPC, subnets, routing | region, cidr_block, az_count | vpc_id, subnet_ids, sg_ids |
modules/compute |
ECS cluster, task definitions | cluster_name, instance_type | cluster_arn, task_role_arn |
modules/database |
RDS instance, parameter groups | engine, instance_class | db_endpoint, db_secret_arn |
modules/storage |
S3 buckets with policies | bucket_name, purpose | bucket_arn, bucket_name |
modules/monitoring |
CloudWatch dashboards, alarms | service_name, thresholds | alarm_arns, dashboard_url |
2.2 Environment Separation
- Each environment directory is independently deployable
- Environments call the same modules with different variable values
- No cross-environment dependencies (except shared DNS zone)
- Production has stricter apply controls (see Section 6)
2.3 Shared Modules
| Module | Source | Version | Used By |
|---|---|---|---|
networking |
{{REGISTRY}}/networking |
~> 2.0 |
All environments |
database |
{{REGISTRY}}/database |
~> 1.5 |
Staging, Production |
monitoring |
{{REGISTRY}}/monitoring |
~> 1.2 |
All environments |
3. State Management
3.1 Remote State Backend
Backend: {{STATE_BACKEND}}
| Environment | State Location | Access |
|---|---|---|
| Dev | {{STATE_BUCKET}}/dev/terraform.tfstate |
DevOps team |
| Staging | {{STATE_BUCKET}}/staging/terraform.tfstate |
DevOps team |
| Production | {{STATE_BUCKET}}/production/terraform.tfstate |
Senior DevOps + CI only |
Bootstrap (first-time setup):
bash scripts/bootstrap.sh {{ENVIRONMENT}}
3.2 State Locking
Locking Mechanism: {{LOCK_MECHANISM}} Lock timeout: {{LOCK_TIMEOUT}}s Force unlock: Only by senior DevOps after verifying no active apply
Lock table (if DynamoDB):
- Table:
{{LOCK_TABLE}} - Key:
LockID - Billing: On-demand
3.3 State File Organization
Splitting strategy: {{SPLIT_STRATEGY}}
| State File | Contains | Reason for split |
|---|---|---|
base/terraform.tfstate |
Networking, IAM | Infrequently changed |
app/terraform.tfstate |
Compute, app services | Frequently changed |
data/terraform.tfstate |
Databases, caches | High risk, separate lifecycle |
4. Module Design
4.1 Naming Conventions
Resource naming pattern: {{PROJECT}}-{{ENVIRONMENT}}-{{COMPONENT}}-{{SUFFIX}}
| Resource | Example |
|---|---|
| VPC | myapp-prod-vpc |
| ECS Cluster | myapp-prod-cluster |
| RDS Instance | myapp-prod-db-primary |
| S3 Bucket | myapp-prod-assets-{{ACCOUNT_ID}} |
| Security Group | myapp-prod-app-sg |
| IAM Role | myapp-prod-app-task-role |
4.2 Input / Output Variables
Required variable fields:
variable "environment" {
description = "Deployment environment (dev/staging/production)"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
}
}
Required output fields:
output "database_endpoint" {
description = "The hostname of the database endpoint"
value = aws_db_instance.main.endpoint
sensitive = false
}
4.3 Versioning Strategy
Module versioning: Semantic versioning (MAJOR.MINOR.PATCH)
Pin strategy: ~> MAJOR.MINOR (allow patch updates, pin minor)
Upgrade policy: Review and test before upgrading minor/major versions
Changelog: Every module version bump requires a CHANGELOG entry
5. Workflow
5.1 Standard Change Process
flowchart LR
BRANCH[Create branch] --> CODE[Write/modify IaC]
CODE --> VALIDATE[terraform validate + tflint]
VALIDATE --> PLAN[terraform plan]
PLAN --> PR[Open PR with plan output]
PR --> REVIEW[Peer review]
REVIEW --> APPROVE[Approval]
APPROVE --> APPLY[terraform apply in CI]
APPLY --> VERIFY[Verify resources]
Steps:
- Create feature branch:
infra/{{TICKET}}-description - Make changes, run
terraform validate && terraform fmt - Run
terraform plan— attach output to PR - Open PR for review (at least 1 reviewer required for dev/staging, 2 for production)
- CI runs
terraform planautomatically on PR open - Merge triggers
terraform applyin CI (dev/staging) - Production apply requires manual trigger after PR merge
5.2 PR-Based Infrastructure Changes
PR Requirements:
- Title:
[IaC] {{ENVIRONMENT}}: description of change - Must include
terraform planoutput in PR description or CI artifact - Must include justification for the change
- Must reference the related application ticket (if applicable)
- Must have passing CI validation (fmt, validate, tflint, plan)
5.3 Automated Drift Detection
Schedule: {{DRIFT_SCHEDULE}} Tool: {{DRIFT_TOOL}} Alert Channel: {{DRIFT_ALERT_CHANNEL}} Action on drift:
- Investigate cause (manual change, provider issue, external system)
- Either fix drift (apply IaC) or update IaC to reflect intentional change
- Never leave drift unresolved for > {{DRIFT_SLA}}
6. Security
6.1 Least Privilege for IaC Service Account
| Environment | Service Account | Permissions |
|---|---|---|
| Dev | ci-iac-dev@{{PROJECT}} |
Full write within dev resources |
| Staging | ci-iac-staging@{{PROJECT}} |
Full write within staging resources |
| Production | ci-iac-prod@{{PROJECT}} |
Restricted write, requires MFA session |
6.2 Secret Injection (Not in State)
Rule: Never pass passwords, API keys, or secrets as Terraform variables Pattern: Reference secrets manager in resource configuration:
# WRONG — secret in state
resource "aws_db_instance" "main" {
password = var.db_password # This will be in state in plaintext!
}
# RIGHT — secret from Secrets Manager
resource "aws_db_instance" "main" {
manage_master_user_password = true # AWS manages the password in Secrets Manager
}
6.3 Policy as Code
Tool: {{POLICY_TOOL}}
| Policy | Enforcement |
|---|---|
| No public S3 buckets | Block |
| All resources must have environment tag | Warn |
| RDS must be in private subnet | Block |
Security groups must not allow 0.0.0.0/0 on sensitive ports |
Block |
| Encryption at rest required for data resources | Block |
7. Tagging Strategy
| Tag | Value | Purpose |
|---|---|---|
Project |
{{PROJECT_NAME}} |
Cost attribution |
Environment |
dev / staging / production |
Environment filter |
ManagedBy |
terraform |
Identifies IaC-managed resources |
Team |
{{TEAM}} |
Ownership |
CostCenter |
{{COST_CENTER}} |
Finance attribution |
| Tag | Value | Purpose |
|---|---|---|
Service |
{{SERVICE_NAME}} |
Service-level grouping |
Ticket |
{{TICKET_ID}} |
Change tracking |
ExpiresAt |
{{DATE}} |
Ephemeral resource cleanup |
8. Cost Management
Budget alerts:
- Dev: Alert at ${{DEV_BUDGET}} / month
- Staging: Alert at ${{STG_BUDGET}} / month
- Production: Alert at ${{PROD_BUDGET}} / month
Cost optimization built into IaC:
- Dev/staging auto-shutdown: {{AUTO_SHUTDOWN_SCHEDULE}}
- Right-sizing: Instance types reviewed quarterly
- Reserved instances / savings plans: Applied to production
9. Disaster Recovery for IaC State
State backup: {{STATE_BACKUP}} Recovery procedure:
- Restore from most recent backup
- Run
terraform plan— verify no unexpected changes - If state is unrecoverable:
terraform importfor each managed resource (refer to resource inventory)
Prevention:
- S3 versioning enabled on state bucket
- MFA delete required for state bucket
- State bucket access logged to CloudTrail
Related Documents
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | |||
| Reviewer | |||
| Approver |
No comments to display
No comments to display