Skip to main content

Infrastructure as Code

Infrastructure as Code

Project: Drop Version: 0.1.0 Date: 2026-02-23 Author: Platform Architect (AI) Status: In Review Reviewers: Alem Bašić (CEO)

Document History

Version Date Author Changes
0.1 2026-02-23 Platform Architect (AI) Initial draft from infrastructure audit

1. Overview

Drop's current production infrastructure (AWS App Runner + RDS) was provisioned manually via AWS Console. IaC tooling exists in the repository (infrastructure/ directory) as a cloud audit and WAF rules reference, but Terraform is not yet wired into the CI/CD pipeline. This document describes the target IaC state and the existing configuration.

IaC Tool: Terraform (target) — currently partially implemented Tool Version: TBD — requires team decision on version pinning Provider: AWS (hashicorp/aws) Provider Version: ~> 5.0

Rationale for tool choice: Terraform chosen for its mature AWS provider, declarative HCL syntax, and cloud-agnostic design (future multi-cloud flexibility). Team has AWS CLI familiarity.

Core Principles:

  • All infrastructure changes should go through code (no manual console changes in staging/prod once IaC is wired)
  • IaC reviewed like application code (PR, review, merge)
  • State is the single source of truth
  • Secrets never stored in Terraform state (use AWS Secrets Manager data sources)

2. Repository Structure

infrastructure/
├── cloud-audit.md              # Existing AWS resource inventory
├── waf-rules.md                # WAF configuration reference
├── terraform/                  # Target IaC (to be implemented)
│   ├── modules/
│   │   ├── app-runner/         # App Runner service + ECR
│   │   ├── rds/                # RDS PostgreSQL instance
│   │   └── secrets/            # AWS Secrets Manager resources
│   ├── environments/
│   │   ├── production/
│   │   │   ├── main.tf
│   │   │   ├── variables.tf
│   │   │   └── terraform.tfvars
│   │   └── staging/            # Fly.io managed separately (not Terraform)
│   └── shared/
│       └── ecr.tf              # ECR repository (shared across envs)
└── scripts/
    └── bootstrap.sh            # Initialize S3 state backend

2.1 Module Organization

Module Purpose Key Inputs Key Outputs
modules/app-runner App Runner service, IAM roles, ECR image config service_name, ecr_image_uri, env_vars service_arn, service_url
modules/rds RDS PostgreSQL instance, parameter group, subnet group instance_class, db_name, vpc_id db_endpoint, db_secret_arn
modules/secrets AWS Secrets Manager secrets secret_name, secret_value secret_arn
modules/ecr ECR repository with lifecycle policy repository_name repository_url

2.2 Environment Separation

  • Production: AWS (eu-west-1) — managed by Terraform
  • Staging: Fly.io — managed by Fly CLI (fly deploy) — NOT Terraform
  • Each environment independently deployable
  • No cross-environment Terraform dependencies

2.3 Shared Modules

Shared module registry: Local modules (no Terraform Registry for private modules yet)

Module Source Used By
modules/app-runner ./modules/app-runner Production
modules/rds ./modules/rds Production
modules/ecr ./modules/ecr Production (shared ECR)

3. State Management

3.1 Remote State Backend

Backend: S3 + DynamoDB (planned — not yet configured)

Environment State Location Access
Production s3://drop-terraform-state-324480209768/production/terraform.tfstate Alem Bašić + CI deploy role

Bootstrap (first-time setup):

# Create S3 bucket for state storage
aws s3api create-bucket \
  --bucket drop-terraform-state-324480209768 \
  --region eu-west-1 \
  --create-bucket-configuration LocationConstraint=eu-west-1

# Enable versioning
aws s3api put-bucket-versioning \
  --bucket drop-terraform-state-324480209768 \
  --versioning-configuration Status=Enabled

# Create DynamoDB lock table
aws dynamodb create-table \
  --table-name drop-terraform-locks \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region eu-west-1

S3 backend configuration:

terraform {
  backend "s3" {
    bucket         = "drop-terraform-state-324480209768"
    key            = "production/terraform.tfstate"
    region         = "eu-west-1"
    dynamodb_table = "drop-terraform-locks"
    encrypt        = true
  }
}

3.2 State Locking

Locking Mechanism: DynamoDB table drop-terraform-locks Lock timeout: 15 minutes (Terraform default) Force unlock: Only Alem Bašić after verifying no active apply

3.3 State File Organization

Splitting strategy: Single state file per environment (simple — low resource count)

State File Contains
production/terraform.tfstate App Runner service, ECR, RDS, IAM roles, Secrets Manager

4. Module Design

4.1 Naming Conventions

Resource naming pattern: drop-{environment}-{component}

Resource Example
App Runner Service drop-production-web
ECR Repository drop-web
RDS Instance drop-db (existing)
Secrets Manager Secret drop/production/jwt-secret
IAM Role (App Runner) drop-production-app-runner-role
IAM Role (Deploy) drop-production-github-deploy-role

Current production resources (manually provisioned):

  • App Runner Service: drop-web (ARN: arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec)
  • RDS Instance: drop-db (endpoint: drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com)
  • ECR Repository: drop-web (324480209768.dkr.ecr.eu-west-1.amazonaws.com/drop-web)

4.2 Input / Output Variables

App Runner module example:

variable "environment" {
  description = "Deployment environment (staging/production)"
  type        = string
  validation {
    condition     = contains(["staging", "production"], var.environment)
    error_message = "Environment must be staging or production."
  }
}

variable "ecr_image_uri" {
  description = "Full ECR image URI including tag"
  type        = string
}

output "service_url" {
  description = "App Runner service URL"
  value       = aws_apprunner_service.main.service_url
}

output "service_arn" {
  description = "App Runner service ARN"
  value       = aws_apprunner_service.main.arn
}

Secrets — never in state:

# Reference Secrets Manager — secret value NOT stored in Terraform state
data "aws_secretsmanager_secret_version" "jwt" {
  secret_id = "drop/production/jwt-secret"
}

# Pass ARN (not value) to App Runner
resource "aws_apprunner_service" "main" {
  source_configuration {
    image_repository {
      image_configuration {
        runtime_environment_secrets = {
          JWT_SECRET   = data.aws_secretsmanager_secret_version.jwt.arn
          DATABASE_URL = aws_secretsmanager_secret_version.db_url.arn
        }
      }
    }
  }
}

4.3 Versioning Strategy

Module versioning: Git tags on IaC repository (format: infra/v1.0.0) Pin strategy: Reference by git tag in module source Upgrade policy: Review terraform plan output before applying any module version change Changelog: Every infra change requires an entry in infrastructure/CHANGELOG.md


5. Workflow

5.1 Standard Change Process

flowchart LR
    BRANCH[Create branch\ninfra/description] --> CODE[Write/modify Terraform]
    CODE --> VALIDATE[terraform validate\n+ terraform fmt]
    VALIDATE --> PLAN[terraform plan\nattach output to PR]
    PLAN --> PR[Open PR]
    PR --> REVIEW[Review by Alem]
    REVIEW --> APPROVE[Approval]
    APPROVE --> APPLY[terraform apply\nmanual trigger]
    APPLY --> VERIFY[Verify via AWS console\n+ health check]

Steps:

  1. Create feature branch: infra/description
  2. Make changes, run terraform validate && terraform fmt
  3. Run terraform plan — paste output into PR description
  4. Open PR — Alem reviews
  5. Merge → manual terraform apply (automated CD for IaC pending)
  6. Verify via App Runner console + curl https://.../api/health

5.2 PR-Based Infrastructure Changes

PR Requirements:

  • Title: [IaC] description of change
  • Must include terraform plan output
  • Must reference the related ticket or justification
  • Must pass terraform validate and terraform fmt -check

5.3 Automated Drift Detection

Schedule: Manual before each production deployment (automated drift detection pending) Action on drift:

  1. Investigate cause (manual console change or provider drift)
  2. Run terraform import to bring resource under management, or apply IaC to reconcile
  3. Document decision in infrastructure/cloud-audit.md

6. Security

6.1 Least Privilege for IaC Service Account

Environment Service Account Permissions
Production GitHub Actions OIDC role apprunner:*, ecr:*, secretsmanager:GetSecretValue scoped to Drop resources
# OIDC trust policy for GitHub Actions
data "aws_iam_policy_document" "github_oidc_trust" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    principals {
      type        = "Federated"
      identifiers = ["arn:aws:iam::324480209768:oidc-provider/token.actions.githubusercontent.com"]
    }
    condition {
      test     = "StringLike"
      variable = "token.actions.githubusercontent.com:sub"
      values   = ["repo:ALAI-org/drop:*"]
    }
  }
}

6.2 Secret Injection (Not in State)

Rule: Never pass passwords, API keys, or secrets as Terraform variable values.

# CORRECT — use AWS Secrets Manager, pass ARN to App Runner
resource "aws_apprunner_service" "drop_web" {
  source_configuration {
    image_repository {
      image_configuration {
        runtime_environment_secrets = {
          JWT_SECRET   = aws_secretsmanager_secret.jwt.arn
          DATABASE_URL = aws_secretsmanager_secret.db_url.arn
        }
      }
    }
  }
}

6.3 Policy as Code

Tool: tflint + Checkov (planned for CI integration)

Policy Enforcement
RDS encryption at rest Block
RDS not publicly accessible Block
App Runner minimum 1 instance in production Warn
All resources must have Project, Environment, ManagedBy tags Warn
Secrets Manager secrets must not be in Terraform variable values Block

7. Tagging Strategy

Required tags on all AWS resources:

Tag Value Purpose
Project drop Cost attribution
Environment production / staging Environment filter
ManagedBy terraform / manual Identifies IaC vs console-managed
Team alai Ownership

Optional tags:

Tag Value Purpose
Service web / db / ecr Service-level grouping
Ticket MC-XXXX Change tracking

8. Cost Management

Budget alerts:

  • Production: Alert at $150/month (AWS Budgets — TBD setup)

Cost optimization built into IaC:

  • App Runner: No running instances when idle (pay-per-request model)
  • RDS db.t4g.micro: ARM Graviton (20% cheaper than x86 equivalent)
  • ECR lifecycle policy: Delete untagged images after 7 days, keep last 10 tagged images
resource "aws_ecr_lifecycle_policy" "drop_web" {
  repository = aws_ecr_repository.drop_web.name
  policy = jsonencode({
    rules = [
      {
        rulePriority = 1
        description  = "Keep last 10 tagged images"
        selection = { tagStatus = "tagged", countType = "imageCountMoreThan", countNumber = 10 }
        action = { type = "expire" }
      },
      {
        rulePriority = 2
        description  = "Remove untagged images after 7 days"
        selection = { tagStatus = "untagged", countType = "sinceImagePushed", countUnit = "days", countNumber = 7 }
        action = { type = "expire" }
      }
    ]
  })
}

9. Disaster Recovery for IaC State

State backup: S3 versioning enabled on drop-terraform-state-324480209768 bucket — all state versions preserved.

Recovery procedure:

  1. Restore from S3 version history: aws s3api list-object-versions --bucket drop-terraform-state-324480209768
  2. Download specific version: aws s3api get-object --version-id <version-id> ...
  3. Run terraform plan — verify no unexpected changes before apply

Existing manually-provisioned resources: If state is lost, import manually:

terraform import aws_apprunner_service.drop_web \
  arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec

terraform import aws_db_instance.drop_db drop-db

Prevention:

  • S3 versioning enabled on state bucket
  • MFA delete required on state bucket (planned)
  • State bucket access logged to CloudTrail


Approval

Role Name Date Signature
Author Platform Architect (AI) 2026-02-23
Reviewer
Approver Alem Bašić