Infrastructure as Code
Infrastructure as Code
Project:
Drop{{PROJECT_NAME}} Version:0.1.0{{VERSION}} Date:2026-02-23{{DATE}} Author:Platform Architect (AI){{AUTHOR}} Status: Draft | In Review | Approved Reviewers:Alem Bašić (CEO){{REVIEWERS}}
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | Initial draft |
1. Overview
Drop's current production infrastructure (AWS App Runner + RDS) was provisioned manually via AWS Console. IaC tooling exists in the repository (infrastructure/ directory) as a cloud audit and WAF rules reference, but Terraform is not yet wired into the CI/CD pipeline. This document describes the target IaC state and the existing configuration.
IaC Tool: Terraform{{IAC_TOOL}} (target) — currently partially implemented
Tool Version: TBD — requires team decision on version pinning{{IAC_VERSION}}
Provider: AWS (hashicorp/aws){{CLOUD_PROVIDER}}
Provider Version: {{PROVIDER_VERSION}}~> 5.0
Rationale for tool choice:
{{IAC_RATIONALE}}
Core Principles:
- All infrastructure changes
shouldgo through code (no manual console changes in staging/prod once IaC is wired)prod) - IaC reviewed like application code (PR, review, merge)
- State is the single source of truth
SecretsModulesneverarestoredversionedinandTerraform state (use AWS Secrets Manager data sources)reusable
2. Repository Structure
infrastructure/{{IaC_REPO}}/
├── cloud-audit.mdmodules/ # ExistingReusable AWS resource inventory
├── waf-rules.md # WAF configuration reference
├── terraform/ # Target IaC (to be implemented)modules
│ ├── modules/networking/ │# VPC, subnets, security groups
│ ├── app-runner/compute/ # AppEC2, RunnerECS, service + ECR
│Lambda
│ ├── rds/database/ # RDSRDS, PostgreSQL instanceElastiCache
│ ├── storage/ # S3, EFS
│ └── secrets/monitoring/ # AWSCloudWatch, Secretsalerts
Manager├── resourcesenvironments/ # Environment-specific configs
│ ├── environments/
│ │ ├── production/
│dev/
│ │ ├── main.tf
│ │ │ ├── variables.tf
│
│ │ └── terraform.tfvars
│ │ └├── staging/
#│ Fly.io│ managed├── separatelymain.tf
(not│ Terraform)│ ├── variables.tf
│ │ └── shared/terraform.tfvars
│ └── ecr.production/
│ ├── main.tf
#│ ECR├── repositoryvariables.tf
(shared across envs)│ └── terraform.tfvars
├── shared/ # Shared resources (DNS, accounts)
├── scripts/ └# Helper scripts
│ ├── bootstrap.sh # Initialize S3 state backend
│ └── validate.sh # Pre-apply validation
├── .terraform-version # Pin tool version (tfenv)
├── .tflint.hcl # Linting config
└── README.md
2.1 Module Organization
| Module | Purpose | ||
|---|---|---|---|
modules/ |
az_count |
subnet_ids, sg_ids |
|
modules/ |
instance_type |
task_role_arn |
|
modules/ |
instance_class |
db_endpoint, db_secret_arn |
|
modules/ |
bucket_name, purpose | bucket_arn, bucket_name | |
|
CloudWatch dashboards, alarms |
service_name, thresholds | alarm_arns, dashboard_url |
2.2 Environment Separation
Production: AWS (eu-west-1) — managed by TerraformStaging: Fly.io — managed by Fly CLI (fly deploy) — NOT Terraform- Each environment directory is independently deployable
- Environments call the same modules with different variable values
- No cross-environment
Terraformdependenciesdependencies(except shared DNS zone) - Production has stricter apply controls (see Section 6)
2.3 Shared Modules
| Module | Source | Version | Used By |
|---|---|---|---|
|
|
~> 2.0 |
All environments |
database |
{{REGISTRY}}/database |
~> 1.5 |
Staging, Production |
|
| ||
|
|
3. State Management
3.1 Remote State Backend
Backend: S3{{STATE_BACKEND}} + DynamoDB (planned — not yet configured)
| Environment | State Location | Access |
|---|---|---|
| Dev | {{STATE_BUCKET}}/dev/terraform.tfstate |
DevOps team |
| Staging | {{STATE_BUCKET}}/staging/terraform.tfstate |
DevOps team |
| Production | |
Bootstrap (first-time setup):
#bash Create S3 bucket for state storage
aws s3api create-bucket \
--bucket drop-terraform-state-324480209768 \
--region eu-west-1 \
--create-bucket-configuration LocationConstraint=eu-west-1
# Enable versioning
aws s3api put-bucket-versioning \
--bucket drop-terraform-state-324480209768 \
--versioning-configuration Status=Enabled
# Create DynamoDB lock table
aws dynamodb create-table \
--table-name drop-terraform-locks \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region eu-west-1
S3 backend configuration:
terraformscripts/bootstrap.sh {
backend "s3" {
bucket = "drop-terraform-state-324480209768"
key = "production/terraform.tfstate"
region = "eu-west-1"
dynamodb_table = "drop-terraform-locks"
encrypt = true
}
ENVIRONMENT}}
3.2 State Locking
Locking Mechanism: DynamoDB{{LOCK_MECHANISM}} table
Lock timeout: drop-terraform-locks15 minutes (Terraform default){{LOCK_TIMEOUT}}s
Force unlock: Only Alemby Bašićsenior DevOps after verifying no active apply
Lock table (if DynamoDB):
- Table:
{{LOCK_TABLE}} - Key:
LockID - Billing: On-demand
3.3 State File Organization
Splitting strategy: Single{{SPLIT_STRATEGY}}
| State File | Contains | Reason for split |
|---|---|---|
|
Infrequently |
|
app/terraform.tfstate |
Compute, app services | Frequently changed |
data/terraform.tfstate |
Databases, caches | High risk, separate lifecycle |
4. Module Design
4.1 Naming Conventions
Resource naming pattern: drop-{environment}{PROJECT}}-{component}{ENVIRONMENT}}-{{COMPONENT}}-{{SUFFIX}}
| Resource | Example |
|---|---|
|
|
|
|
| RDS Instance | |
|
|
| Security Group | myapp-prod-app-sg |
| IAM Role |
|
|
Current production resources (manually provisioned):
App Runner Service:drop-web(ARN:arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec)RDS Instance:drop-db(endpoint:drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com)ECR Repository:drop-web(324480209768.dkr.ecr.eu-west-1.amazonaws.com/drop-web)
4.2 Input / Output Variables
AppRequired Runnervariable module example:fields:
variable "environment" {
description = "Deployment environment (dev/staging/production)"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be stagingdev, staging, or production."
}
}
variable
Required output fields:
output "ecr_image_uri"database_endpoint" {
description = "FullThe ECRhostname imageof URIthe includingdatabase tag"
type = string
}
output "service_url" {
description = "App Runner service URL"endpoint"
value = aws_apprunner_service.aws_db_instance.main.service_urlendpoint
}
output "service_arn" {
descriptionsensitive = "App Runner service ARN"
value = aws_apprunner_service.main.arn
}
Secrets — never in state:
# Reference Secrets Manager — secret value NOT stored in Terraform state
data "aws_secretsmanager_secret_version" "jwt" {
secret_id = "drop/production/jwt-secret"
}
# Pass ARN (not value) to App Runner
resource "aws_apprunner_service" "main" {
source_configuration {
image_repository {
image_configuration {
runtime_environment_secrets = {
JWT_SECRET = data.aws_secretsmanager_secret_version.jwt.arn
DATABASE_URL = aws_secretsmanager_secret_version.db_url.arn
}
}
}
}false
}
4.3 Versioning Strategy
Module versioning: GitSemantic tags on IaC repositoryversioning (format: )
Pin strategy: infra/v1.0.0MAJOR.MINOR.PATCHReference~> byMAJOR.MINORgit(allow tagpatch inupdates, modulepin sourceminor)
Upgrade policy: Review terraformand plan outputtest before applyingupgrading anyminor/major module version changeversions
Changelog: Every inframodule changeversion bump requires ana CHANGELOG entry in infrastructure/CHANGELOG.md
5. Workflow
5.1 Standard Change Process
flowchart LR
BRANCH[Create branch\ninfra/description]branch] --> CODE[Write/modify Terraform]IaC]
CODE --> VALIDATE[terraform validate\n+validate terraform+ fmt]tflint]
VALIDATE --> PLAN[terraform plan\nattach output to PR]plan]
PLAN --> PR[Open PR]PR with plan output]
PR --> REVIEW[ReviewPeer by Alem]review]
REVIEW --> APPROVE[Approval]
APPROVE --> APPLY[terraform apply\nmanualapply trigger]in CI]
APPLY --> VERIFY[Verify via AWS console\n+ health check]resources]
Steps:
- Create feature branch:
infra/{{TICKET}}-description - Make changes, run
terraform validate && terraform fmt - Run
terraform plan—pasteattach outputintoto PRdescription - Open PR
—forAlemreviewreviews(at least 1 reviewer required for dev/staging, 2 for production) - CI runs
terraform planautomatically on PR open - Merge
→ manualtriggersterraform applyin CI (automated CD for IaC pending)dev/staging) VerifyProductionviaapplyApprequiresRunnermanualconsoletrigger+aftermergecurlPRhttps://.../api/health
5.2 PR-Based Infrastructure Changes
PR Requirements:
- Title:
[IaC] {{ENVIRONMENT}}: description of change - Must include
terraform planoutput in PR description or CI artifact - Must include justification for the change
- Must reference the related application ticket
or(ifjustificationapplicable) - Must
passhaveCIterraformpassingvalidateandvalidationtflint, plan)terraform(fmt,fmtvalidate,-check
5.3 Automated Drift Detection
Schedule: Manual{{DRIFT_SCHEDULE}} before
eachTool: production{{DRIFT_TOOL}} deployment
(automatedAlert driftChannel: detection pending){{DRIFT_ALERT_CHANNEL}}
Action on drift:
- Investigate cause (manual
consolechange,changeprovider issue, external system) - Either fix drift (apply IaC) or
provider drift) Runterraform importto bring resource under management, or applyupdate IaC toreconcilereflect intentional changeDocumentNeverdecisionleaveindriftunresolved for > {{DRIFT_SLA}}infrastructure/cloud-audit.md
6. Security
6.1 Least Privilege for IaC Service Account
| Environment | Service Account | Permissions |
|---|---|---|
| Dev | ci-iac-dev@{{PROJECT}} |
Full write within dev resources |
| Staging | ci-iac-staging@{{PROJECT}} |
Full write within staging resources |
| Production | ci-iac-prod@{{PROJECT}} |
requires |
# OIDC trust policy for GitHub Actions
data "aws_iam_policy_document" "github_oidc_trust" {
statement {
actions = ["sts:AssumeRoleWithWebIdentity"]
principals {
type = "Federated"
identifiers = ["arn:aws:iam::324480209768:oidc-provider/token.actions.githubusercontent.com"]
}
condition {
test = "StringLike"
variable = "token.actions.githubusercontent.com:sub"
values = ["repo:ALAI-org/drop:*"]
}
}
}
6.2 Secret Injection (Not in State)
Rule: Never pass passwords, API keys, or secrets as Terraform variablevariables
values.Pattern: Reference secrets manager in resource configuration:
# CORRECTWRONG — usesecret AWSin Secrets Manager, pass ARN to App Runnerstate
resource "aws_apprunner_service"aws_db_instance" "drop_web"main" {
source_configurationpassword = var.db_password # This will be in state in plaintext!
}
# RIGHT — secret from Secrets Manager
resource "aws_db_instance" "main" {
image_repository {
image_configuration {
runtime_environment_secretsmanage_master_user_password = {true JWT_SECRET# =AWS aws_secretsmanager_secret.jwt.arnmanages DATABASE_URLthe =password aws_secretsmanager_secret.db_url.arnin }Secrets }
}
}Manager
}
6.3 Policy as Code
Tool: {{POLICY_TOOL}} tflint+ Checkov (planned for CI integration)
| Policy | Enforcement |
|---|---|
| Block | |
All resources must have |
Warn |
| Block | |
Security groups must not 0.0.0.0/0 |
Block |
| Encryption at rest required for data resources | Block |
7. Tagging Strategy
| Tag | Value | Purpose |
|---|---|---|
Project |
|
Cost attribution |
Environment |
/ staging / production |
Environment filter |
ManagedBy |
terraform |
Identifies |
Team |
|
Ownership |
CostCenter |
{{COST_CENTER}} |
Finance attribution |
| Tag | Value | Purpose |
|---|---|---|
Service |
|
Service-level grouping |
Ticket |
|
Change tracking |
ExpiresAt |
{{DATE}} |
Ephemeral resource cleanup |
8. Cost Management
Budget alerts:
- Dev: Alert at ${{DEV_BUDGET}} / month
- Staging: Alert at ${{STG_BUDGET}} / month
- Production: Alert at $
150/{{PROD_BUDGET}} / month(AWS Budgets — TBD setup)
Cost optimization built into IaC:
AppDev/stagingRunner:auto-shutdown:No{{AUTO_SHUTDOWN_SCHEDULE}}running- Right-sizing: Instance types reviewed quarterly
- Reserved instances
when/idlesavings(pay-per-requestplans:model)Applied RDStodb.t4g.micro: ARM Graviton (20% cheaper than x86 equivalent)ECR lifecycle policy: Delete untagged images after 7 days, keep last 10 tagged imagesproduction
resource "aws_ecr_lifecycle_policy" "drop_web" {
repository = aws_ecr_repository.drop_web.name
policy = jsonencode({
rules = [
{
rulePriority = 1
description = "Keep last 10 tagged images"
selection = { tagStatus = "tagged", countType = "imageCountMoreThan", countNumber = 10 }
action = { type = "expire" }
},
{
rulePriority = 2
description = "Remove untagged images after 7 days"
selection = { tagStatus = "untagged", countType = "sinceImagePushed", countUnit = "days", countNumber = 7 }
action = { type = "expire" }
}
]
})
}
9. Disaster Recovery for IaC State
State backup: S3{{STATE_BACKUP}} versioning
enabled on drop-terraform-state-324480209768 bucket — all state versions preserved.
Recovery procedure:
- Restore from
S3mostversionrecenthistory:aws s3api list-object-versions --bucket drop-terraform-state-324480209768 Download specific version:backupaws s3api get-object --version-id <version-id> ...- Run
terraform plan— verify no unexpected changesbefore apply
Existing manually-provisioned resources:
inventory)terraform importaws_apprunner_service.drop_webfor\eacharn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeecmanagedterraformresourceimport(referaws_db_instance.drop_dbtodrop-dbresource
Prevention:
- S3 versioning enabled on state bucket
- MFA delete required
onfor state bucket(planned) - State bucket access logged to CloudTrail
Related Documents
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | |||
| Reviewer | |||
| Approver |