Skip to main content

Infrastructure as Code

Infrastructure as Code

Project: {{PROJECT_NAME}}Drop Version: {{VERSION}}0.1.0 Date: {{DATE}}2026-02-23 Author: {{AUTHOR}}Platform Architect (AI) Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}Alem Bašić (CEO)

Document History

Version Date Author Changes
0.1 {{DATE}}2026-02-23 {{AUTHOR}}Platform Architect (AI) Initial draft from infrastructure audit

1. Overview

Drop's current production infrastructure (AWS App Runner + RDS) was provisioned manually via AWS Console. IaC tooling exists in the repository (infrastructure/ directory) as a cloud audit and WAF rules reference, but Terraform is not yet wired into the CI/CD pipeline. This document describes the target IaC state and the existing configuration.

IaC Tool: {{IAC_TOOL}}Terraform (target) — currently partially implemented Tool Version: {{IAC_VERSION}}TBD — requires team decision on version pinning Provider: {{CLOUD_PROVIDER}}AWS (hashicorp/aws) Provider Version: {{PROVIDER_VERSION}}~> 5.0

Rationale for tool choice:

Terraform

{{IAC_RATIONALE}}chosen for its mature AWS provider, declarative HCL syntax, and cloud-agnostic design (future multi-cloud flexibility). Team has AWS CLI familiarity.

Core Principles:

  • All infrastructure changes should go through code (no manual console changes in staging/prod)prod once IaC is wired)
  • IaC reviewed like application code (PR, review, merge)
  • State is the single source of truth
  • ModulesSecrets arenever versionedstored andin reusableTerraform state (use AWS Secrets Manager data sources)

2. Repository Structure

{{IaC_REPO}}/infrastructure/
├── cloud-audit.md              # Existing AWS resource inventory
├── waf-rules.md                # WAF configuration reference
├── terraform/                  # Target IaC (to be implemented)
│   ├── modules/
# Reusable modules   │   ├── networking/app-runner/         # VPC,App subnets,Runner securityservice groups+ ECR
│   │   ├── compute/rds/                # EC2,RDS ECS,PostgreSQL Lambdainstance
│   │   └── secrets/            # AWS Secrets Manager resources
│   ├── database/environments/
# RDS, ElastiCache   │   ├── storage/                # S3, EFSproduction/   └── monitoring/             # CloudWatch, alerts
├── environments/               # Environment-specific configs
│   ├── dev/   │   │   ├── main.tf
│   │   ├── variables.tf   │   │   └── terraform.tfvars
│   │   └── staging/            # Fly.io ├──managed main.tfseparately (not │   ├── variables.tf
│Terraform)
│   └── terraform.tfvarsshared/
│       └── production/
│       ├── main.ecr.tf              # ├──ECR variables.tfrepository (shared across envs)
└── terraform.tfvars
├── shared/                     # Shared resources (DNS, accounts)
├── scripts/
    # Helper scripts
│   ├── bootstrap.sh            # Initialize S3 state backend
│   └── validate.sh             # Pre-apply validation
├── .terraform-version          # Pin tool version (tfenv)
├── .tflint.hcl                 # Linting config
└── README.md

2.1 Module Organization

Module Purpose Key Inputs Key Outputs
modules/networkingapp-runner VPC,App subnets,Runner routingservice, IAM roles, ECR image config region,service_name, cidr_block,ecr_image_uri, az_countenv_vars vpc_id,service_arn, subnet_ids, sg_idsservice_url
modules/computerds ECSRDS cluster,PostgreSQL taskinstance, definitionsparameter group, subnet group cluster_name,instance_class, instance_typedb_name, vpc_id cluster_arn,db_endpoint, task_role_arndb_secret_arn
modules/databasesecrets RDSAWS instance,Secrets parameterManager groupssecrets engine,secret_name, instance_classsecret_value db_endpoint, db_secret_arnsecret_arn
modules/storageecr S3ECR bucketsrepository with policieslifecycle policy bucket_name, purposebucket_arn, bucket_name
modules/monitoringrepository_name CloudWatch dashboards, alarmsservice_name, thresholdsalarm_arns, dashboard_urlrepository_url

2.2 Environment Separation

    • Production: AWS (eu-west-1) — managed by Terraform
    • Staging: Fly.io — managed by Fly CLI (fly deploy) — NOT Terraform
    • Each environment directory is independently deployable
    • Environments call the same modules with different variable values
    • No cross-environment Terraform dependencies (except shared DNS zone)
    • Production has stricter apply controls (see Section 6)

    2.3 Shared Modules

    Shared module registry: {{MODULE_REGISTRY}}Local modules (no Terraform Registry for private modules yet)

    Module SourceVersion Used By
    networkingmodules/app-runner {{REGISTRY}}./networkingmodules/app-runner ~> 2.0All environments
    database{{REGISTRY}}/database~> 1.5Staging, Production
    monitoringmodules/rds {{REGISTRY}}./monitoringmodules/rdsProduction
    modules/ecr ~> 1.2./modules/ecr AllProduction environments(shared ECR)

    3. State Management

    3.1 Remote State Backend

    Backend: {{STATE_BACKEND}}S3 + DynamoDB (planned — not yet configured)

    Environment State Location Access
    Dev{{STATE_BUCKET}}/dev/terraform.tfstateDevOps team
    Staging{{STATE_BUCKET}}/staging/terraform.tfstateDevOps team
    Production {{STATE_BUCKET}}s3://drop-terraform-state-324480209768/production/terraform.tfstate SeniorAlem DevOpsBašić + CI onlydeploy role

    Bootstrap (first-time setup):

    bash# scripts/bootstrap.shCreate S3 bucket for state storage
    aws s3api create-bucket \
      --bucket drop-terraform-state-324480209768 \
      --region eu-west-1 \
      --create-bucket-configuration LocationConstraint=eu-west-1
    
    # Enable versioning
    aws s3api put-bucket-versioning \
      --bucket drop-terraform-state-324480209768 \
      --versioning-configuration Status=Enabled
    
    # Create DynamoDB lock table
    aws dynamodb create-table \
      --table-name drop-terraform-locks \
      --attribute-definitions AttributeName=LockID,AttributeType=S \
      --key-schema AttributeName=LockID,KeyType=HASH \
      --billing-mode PAY_PER_REQUEST \
      --region eu-west-1
    

    S3 backend configuration:

    terraform {
      backend "s3" {ENVIRONMENT}
        bucket         = "drop-terraform-state-324480209768"
        key            = "production/terraform.tfstate"
        region         = "eu-west-1"
        dynamodb_table = "drop-terraform-locks"
        encrypt        = true
      }
    }
    

    3.2 State Locking

    Locking Mechanism: {{LOCK_MECHANISM}}DynamoDB table drop-terraform-locks Lock timeout: {{LOCK_TIMEOUT}}s15 minutes (Terraform default) Force unlock: Only byAlem senior DevOpsBašić after verifying no active apply

    Lock table (if DynamoDB):

    • Table: {{LOCK_TABLE}}
    • Key: LockID
    • Billing: On-demand

    3.3 State File Organization

    Splitting strategy: {{SPLIT_STRATEGY}}Single state file per environment (simple — low resource count)

    Secrets
    State File Contains Reason for split
    base/production/terraform.tfstate Networking,App Runner service, ECR, RDS, IAM Infrequentlyroles, changed
    app/terraform.tfstateCompute, app servicesFrequently changed
    data/terraform.tfstateDatabases, cachesHigh risk, separate lifecycleManager

    4. Module Design

    4.1 Naming Conventions

    Resource naming pattern: drop-{{PROJECT}}environment}-{{ENVIRONMENT}}-{{COMPONENT}}-{{SUFFIX}}component}

    Resource Example
    VPCApp Runner Service myapp-prod-vpcdrop-production-web
    ECSECR ClusterRepository myapp-prod-clusterdrop-web
    RDS Instance myapp-prod-db-primarydrop-db (existing)
    S3Secrets BucketManager Secret myapp-prod-assets-{{ACCOUNT_ID}}
    Security Groupmyapp-prod-app-sgdrop/production/jwt-secret
    IAM Role (App Runner) myapp-prod-drop-production-app-task-runner-role
    IAM Role (Deploy)drop-production-github-deploy-role

    Current production resources (manually provisioned):

    • App Runner Service: drop-web (ARN: arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec)
    • RDS Instance: drop-db (endpoint: drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com)
    • ECR Repository: drop-web (324480209768.dkr.ecr.eu-west-1.amazonaws.com/drop-web)

    4.2 Input / Output Variables

    RequiredApp variableRunner fields:module example:

    variable "environment" {
      description = "Deployment environment (dev/staging/production)"
      type        = string
      validation {
        condition     = contains(["dev", "staging", "production"], var.environment)
        error_message = "Environment must be dev, staging,staging or production."
      }
    }
    
    

    Required output fields:

    outputvariable "database_endpoint"ecr_image_uri" {
      description = "TheFull hostnameECR ofimage theURI databaseincluding endpoint"tag"
      type        = string
    }
    
    output "service_url" {
      description = "App Runner service URL"
      value       = aws_db_instance.aws_apprunner_service.main.endpointservice_url
    sensitive}
    
    output "service_arn" {
      description = false"App Runner service ARN"
      value       = aws_apprunner_service.main.arn
    }
    

    Secrets — never in state:

    # Reference Secrets Manager — secret value NOT stored in Terraform state
    data "aws_secretsmanager_secret_version" "jwt" {
      secret_id = "drop/production/jwt-secret"
    }
    
    # Pass ARN (not value) to App Runner
    resource "aws_apprunner_service" "main" {
      source_configuration {
        image_repository {
          image_configuration {
            runtime_environment_secrets = {
              JWT_SECRET   = data.aws_secretsmanager_secret_version.jwt.arn
              DATABASE_URL = aws_secretsmanager_secret_version.db_url.arn
            }
          }
        }
      }
    }
    

    4.3 Versioning Strategy

    Module versioning: SemanticGit versioningtags on IaC repository (format: MAJOR.MINOR.PATCHinfra/v1.0.0) Pin strategy: ~>Reference MAJOR.MINORby (allowgit patchtag updates,in pinmodule minor)source Upgrade policy: Review andterraform testplan output before upgradingapplying minor/majorany versionsmodule version change Changelog: Every moduleinfra version bumpchange requires a CHANGELOGan entry in infrastructure/CHANGELOG.md


    5. Workflow

    5.1 Standard Change Process

    flowchart LR
        BRANCH[Create branch]branch\ninfra/description] --> CODE[Write/modify IaC]Terraform]
        CODE --> VALIDATE[terraform validatevalidate\n+ +terraform tflint]fmt]
        VALIDATE --> PLAN[terraform plan]plan\nattach output to PR]
        PLAN --> PR[Open PR with plan output]PR]
        PR --> REVIEW[PeerReview review]by Alem]
        REVIEW --> APPROVE[Approval]
        APPROVE --> APPLY[terraform applyapply\nmanual in CI]trigger]
        APPLY --> VERIFY[Verify resources]via AWS console\n+ health check]
    

    Steps:

    1. Create feature branch: infra/{{TICKET}}-description
    2. Make changes, run terraform validate && terraform fmt
    3. Run terraform planattachpaste output tointo PR description
    4. Open PR for reviewAlem (at least 1 reviewer required for dev/staging, 2 for production)
    5. CI runs terraform plan automatically on PR openreviews
    6. Merge triggers→ manual terraform apply in(automated CICD (dev/staging)for IaC pending)
    7. ProductionVerify applyvia requiresApp manualRunner triggerconsole after+ PRcurl mergehttps://.../api/health

    5.2 PR-Based Infrastructure Changes

    PR Requirements:

    • Title: [IaC] {{ENVIRONMENT}}: description of change
    • Must include terraform plan output in PR description or CI artifact
    • Must include justification for the change
    • Must reference the related application ticket (ifor applicable)justification
    • Must havepass passingterraform CIvalidate validationand (fmt,terraform validate,fmt tflint, plan)-check

    5.3 Automated Drift Detection

    Schedule: {{DRIFT_SCHEDULE}}Manual before Tool:each {{DRIFT_TOOL}}production deployment Alert(automated Channel:drift {{DRIFT_ALERT_CHANNEL}}detection pending) Action on drift:

    1. Investigate cause (manual change,console change or provider issue, external system)drift)
    2. EitherRun fixterraform driftimport (applyto IaC)bring resource under management, or updateapply IaC to reflect intentional changereconcile
    3. NeverDocument leavedecision driftin unresolved for > {{DRIFT_SLA}}infrastructure/cloud-audit.md

    6. Security

    6.1 Least Privilege for IaC Service Account

    ,
    Environment Service Account Permissions
    DevProductionGitHub Actions OIDC role ci-iac-dev@{{PROJECT}}apprunner:* Fullecr:*, writesecretsmanager:GetSecretValue withinscoped devto Drop resources
    Stagingci-iac-staging@{{PROJECT}}Full write within staging resources
    Productionci-iac-prod@{{PROJECT}}Restricted write, requires MFA session
    # OIDC trust policy for GitHub Actions
    data "aws_iam_policy_document" "github_oidc_trust" {
      statement {
        actions = ["sts:AssumeRoleWithWebIdentity"]
        principals {
          type        = "Federated"
          identifiers = ["arn:aws:iam::324480209768:oidc-provider/token.actions.githubusercontent.com"]
        }
        condition {
          test     = "StringLike"
          variable = "token.actions.githubusercontent.com:sub"
          values   = ["repo:ALAI-org/drop:*"]
        }
      }
    }
    

    6.2 Secret Injection (Not in State)

    Rule: Never pass passwords, API keys, or secrets as Terraform variablesvariable Pattern: Reference secrets manager in resource configuration:values.

    # WRONGCORRECTsecretuse inAWS stateSecrets Manager, pass ARN to App Runner
    resource "aws_db_instance"aws_apprunner_service" "main"drop_web" {
      passwordsource_configuration {
        image_repository {
          image_configuration {
            runtime_environment_secrets = var.db_password{
              #JWT_SECRET   This= willaws_secretsmanager_secret.jwt.arn
              beDATABASE_URL in= state in plaintext!aws_secretsmanager_secret.db_url.arn
            }
          #}
        RIGHT}
      — secret from Secrets Manager
    resource "aws_db_instance" "main" {
      manage_master_user_password = true  # AWS manages the password in Secrets Manager}
    }
    

    6.3 Policy as Code

    Tool: {{POLICY_TOOL}}tflint + Checkov (planned for CI integration)

    variable
    Policy Enforcement
    NoRDS publicencryption S3at bucketsrest Block
    RDS not publicly accessibleBlock
    App Runner minimum 1 instance in productionWarn
    All resources must have environmentProject, tagEnvironment, ManagedBy tags Warn
    RDSSecrets Manager secrets must not be in privateTerraform subnet Block
    Security groups must not allow 0.0.0.0/0 on sensitive portsBlock
    Encryption at rest required for data resourcesvalues Block

    7. Tagging Strategy

    Required tags on all AWS resources:

    Tag Value Purpose
    Project {{PROJECT_NAME}}drop Cost attribution
    Environment devproduction / staging / production Environment filter
    ManagedBy terraform / manual Identifies IaC-IaC vs console-managed resources
    Team {{TEAM}}alai Ownership
    CostCenter{{COST_CENTER}}Finance attribution

    Optional tags:

    Tag Value Purpose
    Service {{SERVICE_NAME}}web / db / ecr Service-level grouping
    Ticket {{TICKET_ID}}MC-XXXX Change tracking
    ExpiresAt{{DATE}}Ephemeral resource cleanup

    8. Cost Management

    Budget alerts:

    • Dev: Alert at ${{DEV_BUDGET}} / month
    • Staging: Alert at ${{STG_BUDGET}} / month
    • Production: Alert at ${{PROD_BUDGET}}150/month /(AWS monthBudgets — TBD setup)

    Cost optimization built into IaC:

    • Dev/stagingApp auto-shutdown:Runner: {{AUTO_SHUTDOWN_SCHEDULE}}No running instances when idle (pay-per-request model)
    • Right-sizing:RDS Instancedb.t4g.micro: typesARM reviewedGraviton quarterly(20% cheaper than x86 equivalent)
    • ReservedECR instanceslifecycle /policy: savingsDelete plans:untagged Appliedimages toafter production7 days, keep last 10 tagged images
    resource "aws_ecr_lifecycle_policy" "drop_web" {
      repository = aws_ecr_repository.drop_web.name
      policy = jsonencode({
        rules = [
          {
            rulePriority = 1
            description  = "Keep last 10 tagged images"
            selection = { tagStatus = "tagged", countType = "imageCountMoreThan", countNumber = 10 }
            action = { type = "expire" }
          },
          {
            rulePriority = 2
            description  = "Remove untagged images after 7 days"
            selection = { tagStatus = "untagged", countType = "sinceImagePushed", countUnit = "days", countNumber = 7 }
            action = { type = "expire" }
          }
        ]
      })
    }
    

    9. Disaster Recovery for IaC State

    State backup: {{STATE_BACKUP}}S3 versioning enabled on drop-terraform-state-324480209768 bucket — all state versions preserved.

    Recovery procedure:

    1. Restore from mostS3 recentversion backuphistory: aws s3api list-object-versions --bucket drop-terraform-state-324480209768
    2. Download specific version: aws s3api get-object --version-id <version-id> ...
    3. Run terraform plan — verify no unexpected changes before apply

    Existing manually-provisioned resources: If state is unrecoverable:lost, import manually:

    terraform import aws_apprunner_service.drop_web \
      arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec
    
    terraform import aws_db_instance.drop_db drop-db
     for each managed resource (refer to resource inventory)
    

    Prevention:

    • S3 versioning enabled on state bucket
    • MFA delete required foron state bucket (planned)
    • State bucket access logged to CloudTrail


    Approval

    Role Name Date Signature
    Author Platform Architect (AI) 2026-02-23
    Reviewer
    Approver Alem Bašić