Infrastructure as Code
Infrastructure as Code
Project:
{{PROJECT_NAME}}Drop Version:{{VERSION}}0.1.0 Date:{{DATE}}2026-02-23 Author:{{AUTHOR}}Platform Architect (AI) Status:Draft |In Review| ApprovedReviewers:{{REVIEWERS}}Alem Bašić (CEO)
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | Initial draft from infrastructure audit |
1. Overview
Drop's current production infrastructure (AWS App Runner + RDS) was provisioned manually via AWS Console. IaC tooling exists in the repository (infrastructure/ directory) as a cloud audit and WAF rules reference, but Terraform is not yet wired into the CI/CD pipeline. This document describes the target IaC state and the existing configuration.
IaC Tool: {{IAC_TOOL}}Terraform (target) — currently partially implemented
Tool Version: {{IAC_VERSION}}TBD — requires team decision on version pinning
Provider: {{CLOUD_PROVIDER}}AWS (hashicorp/aws)
Provider Version: {{PROVIDER_VERSION}}~> 5.0
Rationale for tool choice:
Terraform{{IAC_RATIONALE}}chosen for its mature AWS provider, declarative HCL syntax, and cloud-agnostic design (future multi-cloud flexibility). Team has AWS CLI familiarity.
Core Principles:
- All infrastructure changes should go through code (no manual console changes in staging/
prod)prod once IaC is wired) - IaC reviewed like application code (PR, review, merge)
- State is the single source of truth
ModulesSecretsareneverversionedstoredandinreusableTerraform state (use AWS Secrets Manager data sources)
2. Repository Structure
{{IaC_REPO}}/infrastructure/
├── cloud-audit.md # Existing AWS resource inventory
├── waf-rules.md # WAF configuration reference
├── terraform/ # Target IaC (to be implemented)
│ ├── modules/
# Reusable modules│ │ ├── networking/app-runner/ # VPC,App subnets,Runner securityservice groups+ ECR
│ │ ├── compute/rds/ # EC2,RDS ECS,PostgreSQL Lambdainstance
│ │ └── secrets/ # AWS Secrets Manager resources
│ ├── database/environments/
# RDS, ElastiCache│ │ ├── storage/ # S3, EFSproduction/
│ └── monitoring/ # CloudWatch, alerts
├── environments/ # Environment-specific configs
│ ├── dev/ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ └── terraform.tfvars
│ ├│ └── staging/ │# │Fly.io ├──managed main.tfseparately │(not │ ├── variables.tf
│Terraform)
│ └── terraform.tfvarsshared/
│ └── production/
│ ├── main.ecr.tf │# ├──ECR variables.tfrepository │(shared across envs)
└── terraform.tfvars
├── shared/ # Shared resources (DNS, accounts)
├── scripts/
# Helper scripts
│ ├└── bootstrap.sh # Initialize S3 state backend
│ └── validate.sh # Pre-apply validation
├── .terraform-version # Pin tool version (tfenv)
├── .tflint.hcl # Linting config
└── README.md
2.1 Module Organization
| Module | Purpose | Key Inputs | Key Outputs |
|---|---|---|---|
modules/ |
service_name, ecr_image_uri, env_vars |
service_arn, service_url |
|
modules/ |
instance_class, db_name, vpc_id |
db_endpoint, db_secret_arn |
|
modules/ |
secret_name, secret_value |
secret_arn |
|
modules/ |
|||
|
repository_url |
2.2 Environment Separation
- Production: AWS (
eu-west-1) — managed by Terraform - Staging: Fly.io — managed by Fly CLI (
fly deploy) — NOT Terraform - Each environment
directory isindependently deployable Environments call the same modules with different variable values- No cross-environment Terraform dependencies
(except shared DNS zone) Production has stricter apply controls (see Section 6)
2.3 Shared Modules
| Module | Source | Used By | |
|---|---|---|---|
|
|
| |
| | | |
|
|
Production | |
modules/ecr |
|
3. State Management
3.1 Remote State Backend
Backend: {{STATE_BACKEND}}S3 + DynamoDB (planned — not yet configured)
| Environment | State Location | Access |
|---|---|---|
| ||
| ||
| Production | |
Bootstrap (first-time setup):
bash# scripts/bootstrap.shCreate S3 bucket for state storage
aws s3api create-bucket \
--bucket drop-terraform-state-324480209768 \
--region eu-west-1 \
--create-bucket-configuration LocationConstraint=eu-west-1
# Enable versioning
aws s3api put-bucket-versioning \
--bucket drop-terraform-state-324480209768 \
--versioning-configuration Status=Enabled
# Create DynamoDB lock table
aws dynamodb create-table \
--table-name drop-terraform-locks \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region eu-west-1
S3 backend configuration:
terraform {
backend "s3" {ENVIRONMENT}
bucket = "drop-terraform-state-324480209768"
key = "production/terraform.tfstate"
region = "eu-west-1"
dynamodb_table = "drop-terraform-locks"
encrypt = true
}
}
3.2 State Locking
Locking Mechanism: {{LOCK_MECHANISM}}DynamoDB table drop-terraform-locks
Lock timeout: {{LOCK_TIMEOUT}}s15 minutes (Terraform default)
Force unlock: Only byAlem senior DevOpsBašić after verifying no active apply
Lock table (if DynamoDB):
Table:{{LOCK_TABLE}}Key:LockIDBilling: On-demand
3.3 State File Organization
Splitting strategy: {{SPLIT_STRATEGY}}Single state file per environment (simple — low resource count)
| State File | Contains | |
|---|---|---|
|
||
| ||
|
4. Module Design
4.1 Naming Conventions
Resource naming pattern: drop-{{PROJECT}}environment}-{{ENVIRONMENT}}-{{COMPONENT}}-{{SUFFIX}}component}
| Resource | Example |
|---|---|
|
|
|
|
| RDS Instance | (existing) |
| |
|
|
| IAM Role (App Runner) | |
| IAM Role (Deploy) | drop-production-github-deploy-role |
Current production resources (manually provisioned):
- App Runner Service:
drop-web(ARN:arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec) - RDS Instance:
drop-db(endpoint:drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com) - ECR Repository:
drop-web(324480209768.dkr.ecr.eu-west-1.amazonaws.com/drop-web)
4.2 Input / Output Variables
RequiredApp variableRunner fields:module example:
variable "environment" {
description = "Deployment environment (dev/staging/production)"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging,staging or production."
}
}
Required output fields:
outputvariable "database_endpoint"ecr_image_uri" {
description = "TheFull hostnameECR ofimage theURI databaseincluding endpoint"tag"
type = string
}
output "service_url" {
description = "App Runner service URL"
value = aws_db_instance.aws_apprunner_service.main.endpointservice_url
sensitive}
output "service_arn" {
description = false"App Runner service ARN"
value = aws_apprunner_service.main.arn
}
Secrets — never in state:
# Reference Secrets Manager — secret value NOT stored in Terraform state
data "aws_secretsmanager_secret_version" "jwt" {
secret_id = "drop/production/jwt-secret"
}
# Pass ARN (not value) to App Runner
resource "aws_apprunner_service" "main" {
source_configuration {
image_repository {
image_configuration {
runtime_environment_secrets = {
JWT_SECRET = data.aws_secretsmanager_secret_version.jwt.arn
DATABASE_URL = aws_secretsmanager_secret_version.db_url.arn
}
}
}
}
}
4.3 Versioning Strategy
Module versioning: SemanticGit versioningtags on IaC repository (format: )
Pin strategy: MAJOR.MINOR.PATCHinfra/v1.0.0by ~>Reference MAJOR.MINOR(allowgit patchtag updates,in pinmodule minor)source
Upgrade policy: Review andterraform output before testplanupgradingapplying minor/majorany versionsmodule version change
Changelog: Every moduleinfra version bumpchange requires a CHANGELOGan entry in infrastructure/CHANGELOG.md
5. Workflow
5.1 Standard Change Process
flowchart LR
BRANCH[Create branch]branch\ninfra/description] --> CODE[Write/modify IaC]Terraform]
CODE --> VALIDATE[terraform validatevalidate\n+ +terraform tflint]fmt]
VALIDATE --> PLAN[terraform plan]plan\nattach output to PR]
PLAN --> PR[Open PR with plan output]PR]
PR --> REVIEW[PeerReview review]by Alem]
REVIEW --> APPROVE[Approval]
APPROVE --> APPLY[terraform applyapply\nmanual in CI]trigger]
APPLY --> VERIFY[Verify resources]via AWS console\n+ health check]
Steps:
- Create feature branch:
infra/{{TICKET}}-description - Make changes, run
terraform validate && terraform fmt - Run
terraform plan—attachpaste outputtointo PR description - Open PR
for—reviewAlem(at least 1 reviewer required for dev/staging, 2 for production) CI runsterraform planautomatically on PR openreviews- Merge
triggers→ manualterraform applyin(automatedCICD(dev/staging)for IaC pending) ProductionVerifyapplyviarequiresAppmanualRunnertriggerconsoleafter+PRcurlmergehttps://.../api/health
5.2 PR-Based Infrastructure Changes
PR Requirements:
- Title:
[IaC]{{ENVIRONMENT}}:description of change - Must include
terraform planoutputin PR description or CI artifact Must include justification for the change- Must reference the related
applicationticket(iforapplicable)justification - Must
havepasspassingterraformCIvalidatevalidationand(fmt,terraformvalidate,fmttflint, plan)-check
5.3 Automated Drift Detection
Schedule: {{DRIFT_SCHEDULE}}Manual before Tool:each {{DRIFT_TOOL}}production deployment Alert(automated Channel:drift {{DRIFT_ALERT_CHANNEL}}detection pending)
Action on drift:
- Investigate cause (manual
change,console change or providerissue, external system)drift) EitherRunfixterraformdriftimport(applytoIaC)bring resource under management, orupdateapply IaC toreflect intentional changereconcileNeverDocumentleavedecisiondriftinunresolved for > {{DRIFT_SLA}}infrastructure/cloud-audit.md
6. Security
6.1 Least Privilege for IaC Service Account
| Environment | Service Account | Permissions | |
|---|---|---|---|
| GitHub Actions OIDC role | | , ecr:*, secretsmanager:GetSecretValue | |
| |||
|
# OIDC trust policy for GitHub Actions
data "aws_iam_policy_document" "github_oidc_trust" {
statement {
actions = ["sts:AssumeRoleWithWebIdentity"]
principals {
type = "Federated"
identifiers = ["arn:aws:iam::324480209768:oidc-provider/token.actions.githubusercontent.com"]
}
condition {
test = "StringLike"
variable = "token.actions.githubusercontent.com:sub"
values = ["repo:ALAI-org/drop:*"]
}
}
}
6.2 Secret Injection (Not in State)
Rule: Never pass passwords, API keys, or secrets as Terraform variablesvariable Pattern: Reference secrets manager in resource configuration:values.
# WRONGCORRECT — secretuse inAWS stateSecrets Manager, pass ARN to App Runner
resource "aws_db_instance"aws_apprunner_service" "main"drop_web" {
passwordsource_configuration {
image_repository {
image_configuration {
runtime_environment_secrets = var.db_password{
#JWT_SECRET This= willaws_secretsmanager_secret.jwt.arn
beDATABASE_URL in= state in plaintext!aws_secretsmanager_secret.db_url.arn
}
#}
RIGHT}
— secret from Secrets Manager
resource "aws_db_instance" "main" {
manage_master_user_password = true # AWS manages the password in Secrets Manager}
}
6.3 Policy as Code
Tool: {{POLICY_TOOL}}tflint + Checkov (planned for CI integration)
| Policy | Enforcement |
|---|---|
| Block | |
| RDS not publicly accessible | Block |
| App Runner minimum 1 instance in production | Warn |
All resources must have Project, Environment, ManagedBy tags |
Warn |
| |
| Block |
7. Tagging Strategy
| Tag | Value | Purpose |
|---|---|---|
Project |
|
Cost attribution |
Environment |
/ staging |
Environment filter |
ManagedBy |
terraform / manual |
Identifies |
Team |
|
Ownership |
| |
| Tag | Value | Purpose |
|---|---|---|
Service |
/ db / ecr |
Service-level grouping |
Ticket |
|
Change tracking |
| |
8. Cost Management
Budget alerts:
Dev: Alert at ${{DEV_BUDGET}} / monthStaging: Alert at ${{STG_BUDGET}} / month- Production: Alert at $
{{PROD_BUDGET}}150/month/(AWSmonthBudgets — TBD setup)
Cost optimization built into IaC:
Dev/stagingAppauto-shutdown:Runner:{{AUTO_SHUTDOWN_SCHEDULE}}No running instances when idle (pay-per-request model)Right-sizing:RDSInstancedb.t4g.micro:typesARMreviewedGravitonquarterly(20% cheaper than x86 equivalent)ReservedECRinstanceslifecycle/policy:savingsDeleteplans:untaggedAppliedimagestoafterproduction7 days, keep last 10 tagged images
resource "aws_ecr_lifecycle_policy" "drop_web" {
repository = aws_ecr_repository.drop_web.name
policy = jsonencode({
rules = [
{
rulePriority = 1
description = "Keep last 10 tagged images"
selection = { tagStatus = "tagged", countType = "imageCountMoreThan", countNumber = 10 }
action = { type = "expire" }
},
{
rulePriority = 2
description = "Remove untagged images after 7 days"
selection = { tagStatus = "untagged", countType = "sinceImagePushed", countUnit = "days", countNumber = 7 }
action = { type = "expire" }
}
]
})
}
9. Disaster Recovery for IaC State
State backup: {{STATE_BACKUP}}S3 versioning enabled on drop-terraform-state-324480209768 bucket — all state versions preserved.
Recovery procedure:
- Restore from
mostS3recentversionbackuphistory:aws s3api list-object-versions --bucket drop-terraform-state-324480209768 - Download specific version:
aws s3api get-object --version-id <version-id> ... - Run
terraform plan— verify no unexpected changes before apply
Existing manually-provisioned resources: If state is unrecoverable:lost, import manually:
terraform import aws_apprunner_service.drop_web \ arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec terraform import aws_db_instance.drop_db drop-dbfor each managed resource (refer to resource inventory)
Prevention:
- S3 versioning enabled on state bucket
- MFA delete required
foron state bucket (planned) - State bucket access logged to CloudTrail
Related Documents
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | Platform Architect (AI) | 2026-02-23 | |
| Reviewer | |||
| Approver | Alem Bašić |