Operations Guide

Last Verified: 2026-02-17 | Owner: John 
 
 Operations Manual 
 Version: 1.0
 Last Updated: 2026-01-28
 Owner: Alem Basic
 Prepared by: John (Director) + Nermin Šabić (DevOps) + Amina Hadžić (Head of Projects) 
 
 Executive Summary 
 This is the operational playbook. Daily/weekly/monthly routines, KPIs, reporting structure, tools, systems, and disaster recovery. Everything needed to run the organization day-to-day. 
 Purpose: Ensure operations run smoothly whether Alem is available or not. John + team can operate independently using this manual. 
 
 1. Daily Operations 
 1.1 Daily Routine — John (Director) 
 Every Morning (Oslo Time Zone): 
 08:00 — Wake up (boot)
 ├─ Read MEMORY.md (context refresh)
 ├─ Check john.db for overnight tasks
 ├─ Check Telegram for Alem messages
 └─ Check email (john@basicconsulting.no) for urgent matters

08:30 — Prepare morning brief
 ├─ Tasks completed yesterday
 ├─ Planned for today
 ├─ Blockers
 ├─ Metrics snapshot (revenue, uptime, trading)
 └─ Send to Alem via Telegram (if significant updates)

09:00 — Monitor task queue
 ├─ ~/clawd/tasks/pending/ → assign to agents
 ├─ ~/clawd/tasks/in-progress/ → check progress
 └─ ~/clawd/tasks/completed/ → archive

09:15 — Daily standup (all team, 15 min)

09:30-18:00 — Execution mode
 ├─ Delegate tasks to agents
 ├─ Monitor progress
 ├─ Escalate blockers within 4h
 ├─ Respond to Telegram within 5 min
 ├─ Check trading positions (every 3 hours: 12:00, 15:00, 18:00)
 └─ Log all decisions to john.db

18:00 — Evening wrap-up
 ├─ Archive completed tasks
 ├─ Sync DB to GitHub (automatic, verify)
 ├─ Prepare tomorrow's priorities
 └─ Send summary to Alem (if needed)
 
 24/7 Monitoring: 
 
 Trading: Every 3 hours (cron job) 
 Infrastructure: Continuous (Datadog + PagerDuty) 
 Task queue: Every 30 seconds (john-daemon.sh) 
 Telegram: Within 5 minutes 
 
 1.2 Daily Routine — Team 
 9:15 AM CET — Daily Standup (Mon-Fri, 15 min max) 
 Attendees: All team (Emir leads) 
 Format: 
 
 Each person: 3 questions, 1 min each
 
 What did I do yesterday? 
 What will I do today? 
 Any blockers? 
 
 
 
 Emir's checklist: 
 
 Update sprint board before standup 
 Note blockers → escalate after standup 
 Update burn-down chart 
 
 Throughout Day: 
 
 Agents execute assigned tasks 
 Update Jira/Linear as tasks progress 
 Escalate blockers within 1 hour 
 Code review within 24 hours 
 Respond to Slack/messages within 4 hours (business hours) 
 
 End of Day: 
 
 Update task status 
 Log any decisions or learnings 
 Prepare for tomorrow's standup 
 
 
 2. Weekly Operations 
 2.1 Weekly Cadence 
 
 
 
 Day 
 Time 
 Event 
 Owner 
 Duration 
 
 
 
 
 Monday 
 9:15 AM 
 Daily standup 
 Emir 
 15 min 
 
 
 
 10:00 AM 
 Sprint planning (every 2 weeks) 
 Emir + Amina 
 2-3 hours 
 
 
 Tuesday 
 9:15 AM 
 Daily standup 
 Emir 
 15 min 
 
 
 Wednesday 
 9:15 AM 
 Daily standup 
 Emir 
 15 min 
 
 
 
 14:00 CET 
 Backlog refinement 
 Emir + Lejla + Selma 
 1 hour 
 
 
 Thursday 
 9:15 AM 
 Daily standup 
 Emir 
 15 min 
 
 
 
 14:00 CET 
 Architecture review (bi-weekly) 
 Lejla 
 1-2 hours 
 
 
 Friday 
 9:15 AM 
 Daily standup 
 Emir 
 15 min 
 
 
 
 15:00 CET 
 Sprint review + retro (every 2 weeks) 
 Emir + team 
 2 hours 
 
 
 
 16:00 CET 
 Weekly wrap-up (John → Alem) 
 John 
 30 min 
 
 
 
 2.2 Weekly Reporting (John → Alem) 
 Every Friday, 16:00 CET: 
 Weekly Summary (Telegram or Email): 
 # Week of [Date]

## Shipped This Week
- [Feature/bug 1]
- [Feature/bug 2]

## Metrics
- Revenue: €X (+/- Y% from last week)
- Customers: X (+/- Y new/churn)
- Uptime: 99.X%
- Trading ROI: +/-X%

## Blockers / Risks
- [Blocker 1 — mitigation plan]
- [Risk 1 — action taken]

## Next Week Priorities
- [Priority 1]
- [Priority 2]

## Decisions Made (>€500)
- [Decision 1 — €X — rationale]
 
 Alem reviews and provides feedback. 
 
 3. Monthly Operations 
 3.1 Monthly Business Review (MBR) 
 Last Friday of Every Month, 14:00-16:00 CET 
 Attendees: Alem, John, Amina 
 Agenda: 
 
 
 Revenue & Growth (10 min) 
 
 MRR, new customers, churn, LTV/CAC 
 Forecast: next 3 months 
 
 
 
 Product & Development (10 min) 
 
 Features shipped this month 
 Sprint velocity trend 
 Tech debt status 
 
 
 
 Operations (10 min) 
 
 Uptime, incidents, MTTR 
 Support metrics (tickets, CSAT) 
 
 
 
 Trading (5 min) 
 
 Monthly P&L 
 ROI, Sharpe ratio 
 Portfolio allocation 
 
 
 
 Risks & Compliance (10 min) 
 
 Top 5 risks, status 
 Compliance updates (HIPAA, audits) 
 
 
 
 Team & People (5 min) 
 
 Agent utilization 
 Process improvements 
 Hiring needs (if any) 
 
 
 
 Next Month Priorities (10 min) 
 
 What are we focusing on? 
 Resource allocation 
 
 
 
 Output: 
 
 Updated priorities for next month 
 Budget adjustments (if needed) 
 Action items 
 
 3.2 Monthly Reporting — Financial 
 By 5th of Each Month: 
 John prepares financial report: 
 
 
 
 Metric 
 Current Month 
 Last Month 
 Change 
 
 
 
 
 Revenue (Fast Constructions) 
 €X 
 €Y 
 +/- Z% 
 
 
 Expenses (total) 
 €X 
 €Y 
 +/- Z% 
 
 
 ├─ Infrastructure 
 €X 
 €Y 
 +/- Z% 
 
 
 ├─ SaaS tools 
 €X 
 €Y 
 +/- Z% 
 
 
 ├─ Marketing 
 €X 
 €Y 
 +/- Z% 
 
 
 ├─ Development (SnowIT payment) 
 €X 
 €Y 
 +/- Z% 
 
 
 ├─ Professional services 
 €X 
 €Y 
 +/- Z% 
 
 
 ├─ Insurance 
 €X 
 €Y 
 +/- Z% 
 
 
 Net Profit 
 €X 
 €Y 
 +/- Z% 
 
 
 Charity (50%) 
 €X (accrued) 
 — 
 — 
 
 
 Burn Rate 
 €X/month 
 — 
 — 
 
 
 Runway 
 X months 
 — 
 — 
 
 
 
 Sent to: Alem + accountant (if hired) 
 3.3 Monthly Tasks Checklist 
 John's Monthly Checklist: 
 
 Financial report prepared (by 5th) 
 Monthly Business Review held (last Friday) 
 Risk register reviewed (Dženan) 
 Compliance checklist reviewed (Dženan) 
 Infrastructure cost optimization (Nermin) 
 Trading performance report (Nick) 
 Customer feedback analysis (Selma) 
 Sprint velocity analysis (Emir) 
 Tech debt review (Lejla + Tarik) 
 Backup verification (Nermin — test restore) 
 Security scan (Tarik — OWASP ZAP) 
 Vendor BAA review (Dženan) 
 Database cleanup (old logs, expired data) 
 Update MEMORY.md (add learnings from month) 
 
 
 4. Quarterly Operations 
 4.1 Quarterly Planning 
 Last Week of Quarter (March, June, September, December) 
 Attendees: Alem, John, Amina, Lejla, Selma 
 Agenda: 
 
 Review last quarter (goals, achievements, misses) 
 Market & competitive analysis (Selma) 
 Product roadmap update (Amina + Lejla) 
 Strategic priorities for next quarter (Alem) 
 Budget allocation 
 Hiring plan (if needed) 
 Risk review (Dženan) 
 
 Output: 
 
 OKRs (Objectives & Key Results) for next quarter 
 Budget approved 
 Roadmap locked for next 3 months 
 
 4.2 Quarterly Tasks 
 
 Tech Debt Sprint — one full sprint dedicated to refactoring, testing, documentation (Lejla) 
 Compliance audit — internal HIPAA audit (Dženan + Tarik) 
 Infrastructure review — cost, performance, scaling plan (Nermin) 
 Customer satisfaction survey — NPS survey to all customers (Selma) 
 Competitive analysis — review competitors, market trends (Selma) 
 Patent progress review — if filing in progress (Dženan) 
 Insurance review — renew or update policies (Dženan) 
 Document review — update all org docs (MEMORY.md, ORGANIZATION.md, this doc) 
 
 
 5. Annual Operations 
 5.1 Annual Planning 
 December (for next year) 
 Agenda: 
 
 Review full year (revenue, customers, product, team) 
 Set annual vision and goals (Alem) 
 Product roadmap (12 months) 
 Budget (annual) 
 Hiring plan 
 Strategic initiatives (new products, markets, partnerships) 
 Charity commitment (allocate 50% of profit) 
 
 Output: 
 
 Annual OKRs 
 Annual budget 
 12-month roadmap 
 
 5.2 Annual Tasks 
 
 Annual financial statements — P&L, balance sheet, cash flow (accountant) 
 Tax filings — US (Fast Constructions), BiH (SnowIT), Norway (Alem personal) 
 SOC 2 Type II audit — external audit (Dženan + auditor) 
 HIPAA risk assessment — annual requirement (Dženan) 
 Insurance renewal — cyber liability, E&O, general liability (Dženan) 
 Charitable giving — donate 50% of net profit (Alem selects charities) 
 Transparency report — publish charity donations on lumiscare.com/impact 
 Team performance reviews — if real humans hired (Amina) 
 Document archive — backup all docs, contracts, decisions to secure storage 
 
 
 6. Local Infrastructure (Mac Studio) 
 6.1 Services Overview 
 All services run locally on Mac Studio M3 Ultra (96GB RAM). Zero cloud dependency for operations. 
 
 
 
 Service 
 Type 
 Port 
 Purpose 
 
 
 
 
 Mattermost 
 Docker 
 8065 
 Team chat (4 teams: basic, wizard, rendrom, riad) 
 
 
 Planka 
 Docker 
 3100 
 Kanban boards (boards.basicconsulting.no) 
 
 
 Documenso 
 Docker 
 3003 
 Document signing (sign.basicconsulting.no) 
 
 
 BookStack 
 Docker 
 6875 
 Wiki/documentation 
 
 
 MC Dashboard 
 Node.js 
 3030 
 Mission Control web UI (task management) 
 
 
 Ollama 
 Native 
 11434 
 Local AI (8b classify, 32b respond/code) 
 
 
 
 External access: Cloudflare tunnels (mm/boards/sign.basicconsulting.no) 
 6.2 Daemons (LaunchAgents) 
 
 
 
 Daemon 
 Interval 
 Purpose 
 
 
 
 
 com.john.ops-agent 
 5 min 
 Autonomous ops — MM monitoring, health checks, auto-fix, task creation, intelligent responses 
 
 
 com.edita.autowork 
 30 min 
 Background task worker (Claude haiku) 
 
 
 com.john.mc-dashboard 
 always 
 Mission Control web dashboard 
 
 
 com.john.mc-session-worker 
 on events 
 Session state extraction 
 
 
 
 6.3 Ops Agent (Autonomous Operations) 
 Replaces manual MM monitoring. Runs 24/7, $0 cost (local Ollama AI). 
 What it does: 
 
 Reads all MM messages from 4 teams 
 Classifies via Ollama 8b: ROUTINE / TASK / INCIDENT 
 ROUTINE — logs, no action 
 TASK — creates MC task (BILLABLE if client team) + Planka card + MM reply 
 INCIDENT — HIGH priority task + escalation to John 
 Runs health checks on all 20 services every cycle 
 Auto-fixes known issues (restart, cleanup) with safety limits (max 3/hour) 
 
 Runbook: ~/system/context/docs/runbooks/ops-agent.md 
 6.4 Monitoring Stack 
 
 
 
 Tool 
 What It Monitors 
 Action 
 
 
 
 
 health-check.js 
 Docker (8), HTTP (6), system (2), daemons (4) 
 Status report 
 
 
 auto-fix.js 
 Service failures 
 Automated restart/cleanup (max 3/hour) 
 
 
 ops-agent.js 
 MM messages + health 
 Classify, respond, create tasks, fix 
 
 
 smoke-test.js 
 Integration tests 
 Pre/post deployment verification 
 
 
 
 Dashboard: http://localhost:3030 (MC — tasks, stats, mobile-friendly) 
 
 7. Key Performance Indicators (KPIs) {#kpis} 
 6.1 KPI Dashboard (Live, Updated Daily) 
 Owner: John (maintains dashboard) 
 Tool: Notion, Grafana, or Google Sheets 
 Metrics: 
 Business Metrics 
 
 
 
 Metric 
 Current 
 Target 
 Trend 
 Owner 
 
 
 
 
 MRR (Monthly Recurring Revenue) 
 €X 
 10%+ MoM growth 
 ↗️ 
 Selma 
 
 
 Customer Count 
 X 
 10 (Month 6), 50 (Month 12) 
 ↗️ 
 Selma 
 
 
 Churn Rate 
 X% 
 < 5% monthly 
 ↘️ 
 Selma 
 
 
 Customer Acquisition Cost (CAC) 
 €X 
 < €500 
 ↘️ 
 Selma 
 
 
 Lifetime Value (LTV) 
 €X 
 > €2,000 
 ↗️ 
 Selma 
 
 
 LTV/CAC Ratio 
 X:1 
 > 3:1 
 ↗️ 
 Selma 
 
 
 
 Technical Metrics 
 
 
 
 Metric 
 Current 
 Target 
 Trend 
 Owner 
 
 
 
 
 Uptime 
 99.X% 
 99.9% (LumisCare) 
 ↗️ 
 Nermin 
 
 
 API Latency (p95) 
 Xms 
 < 500ms 
 ↘️ 
 Nermin 
 
 
 Page Load Time 
 Xs 
 < 2s 
 ↘️ 
 Frontend 
 
 
 Deployment Frequency 
 X/week 
 Daily (staging), weekly (prod) 
 ↗️ 
 Nermin 
 
 
 Mean Time to Recovery (MTTR) 
 Xh 
 < 4h 
 ↘️ 
 Nermin + Lejla 
 
 
 Bug Escape Rate 
 X% 
 < 5% 
 ↘️ 
 Tarik 
 
 
 Test Coverage 
 X% 
 ≥ 80% 
 ↗️ 
 Tarik 
 
 
 
 Operational Metrics 
 
 
 
 Metric 
 Current 
 Target 
 Trend 
 Owner 
 
 
 
 
 Sprint Velocity 
 X pts 
 Consistent ±10% 
 → 
 Emir 
 
 
 Task Completion Rate 
 X% 
 ≥ 95% 
 ↗️ 
 John 
 
 
 Support Response Time 
 Xmin 
 < 30min (Tier 1) 
 ↘️ 
 Selma 
 
 
 Customer Satisfaction (CSAT) 
 X/5 
 ≥ 4.5/5 
 ↗️ 
 Selma 
 
 
 Agent Utilization 
 X% 
 ≥ 70% billable 
 ↗️ 
 Amina 
 
 
 
 Trading Metrics 
 
 
 
 Metric 
 Current 
 Target 
 Trend 
 Owner 
 
 
 
 
 Monthly ROI 
 X% 
 ≥ 5% 
 ↗️ 
 Nick 
 
 
 Sharpe Ratio 
 X 
 > 1.5 
 ↗️ 
 Nick 
 
 
 Portfolio Value 
 €X 
 €10,000 (current) 
 ↗️ 
 Nick 
 
 
 Stop-Loss Adherence 
 X% 
 100% 
 → 
 Nick 
 
 
 
 6.2 Monitoring & Alerting 
 Tools: 
 
 Datadog: Infrastructure, APM, logs 
 PagerDuty: On-call rotation, incident alerts 
 Stripe Dashboard: Revenue, subscriptions 
 Binance API: Trading positions, P&L 
 john.db: All decisions, tasks, logs 
 
 Alert Configuration: 
 
 
 
 Alert 
 Threshold 
 Action 
 Owner 
 
 
 
 
 Uptime < 99.9% 
 Downtime detected 
 PagerDuty → Nermin 
 Nermin 
 
 
 API latency > 1s (p95) 
 Sustained 5 min 
 Slack alert → Lejla 
 Lejla 
 
 
 Error rate > 1% 
 Sustained 5 min 
 PagerDuty → Nermin 
 Nermin 
 
 
 Database CPU > 80% 
 Sustained 10 min 
 Slack alert → Nermin 
 Nermin 
 
 
 Trading loss > 5% 
 Single position 
 Telegram → John → Nick 
 Nick 
 
 
 Customer churn > 10% 
 Monthly 
 Email → Selma, Amina 
 Selma 
 
 
 Support ticket SLA breach 
 > 4h unresolved 
 Slack alert → Selma 
 Selma 
 
 
 
 On-Call Rotation: 
 
 Primary: Nermin (DevOps) 
 Secondary: Lejla (Tech Lead) 
 Escalation: John → Alem 
 
 On-Call SLA: 
 
 Acknowledge alert: 15 min 
 Begin investigation: 30 min 
 Escalate if not resolved: 2 hours (P2), 4 hours (P1) 
 
 
 7. Tools & Systems 
 7.1 Core Systems 
 
 
 
 System 
 Purpose 
 Access 
 Owner 
 
 
 
 
 john.db (SQLite) 
 Source of truth, all decisions logged 
 John, Alem (query) 
 John 
 
 
 GitHub 
 Code repository, version control 
 All devs 
 Lejla 
 
 
 AWS 
 Infrastructure (ECS, RDS, S3, CloudFront) 
 Nermin, Lejla 
 Nermin 
 
 
 Stripe 
 Payment processing, subscriptions 
 Selma, Amina, John 
 Selma 
 
 
 Binance 
 Trading 
 Nick, John 
 Nick 
 
 
 Jira / Linear 
 Task tracking, sprint management 
 All team 
 Emir 
 
 
 Datadog 
 Monitoring, APM, logs 
 Nermin, Lejla, John 
 Nermin 
 
 
 PagerDuty 
 On-call, incident alerts 
 Nermin, Lejla 
 Nermin 
 
 
 Intercom / Crisp 
 Customer support chat 
 Selma, Tarik 
 Selma 
 
 
 Telegram (@johnbasicas_bot) 
 Quick communication (Alem ↔ John) 
 Alem, John 
 John 
 
 
 
 7.2 Tool Stack (Detailed) 
 Development 
 
 
 
 Tool 
 Purpose 
 Cost 
 Owner 
 
 
 
 
 GitHub 
 Code repo, PRs, CI/CD 
 Free (public repos) 
 Lejla 
 
 
 GitHub Actions 
 CI/CD pipelines 
 Included 
 Nermin 
 
 
 VSCode / Cursor 
 IDE 
 Free 
 All devs 
 
 
 Playwright 
 E2E testing 
 Free 
 Tarik 
 
 
 Jest 
 Unit testing 
 Free 
 Tarik 
 
 
 ESLint / Prettier 
 Linting, formatting 
 Free 
 Lejla 
 
 
 
 Infrastructure 
 
 
 
 Tool 
 Purpose 
 Cost 
 Owner 
 
 
 
 
 AWS ECS/EKS 
 Container orchestration 
 ~€500-2,000/mo 
 Nermin 
 
 
 AWS RDS (PostgreSQL) 
 Database 
 ~€100-500/mo 
 Nermin 
 
 
 AWS S3 
 File storage 
 ~€50-200/mo 
 Nermin 
 
 
 CloudFront 
 CDN 
 ~€50-200/mo 
 Nermin 
 
 
 Route53 
 DNS 
 ~€10/mo 
 Nermin 
 
 
 Datadog 
 Monitoring, APM 
 ~€100-500/mo 
 Nermin 
 
 
 PagerDuty 
 On-call 
 ~€50-100/mo 
 Nermin 
 
 
 
 Business & Operations 
 
 
 
 Tool 
 Purpose 
 Cost 
 Owner 
 
 
 
 
 Stripe 
 Payments 
 2.9% + €0.30/transaction 
 Selma 
 
 
 Intercom / Crisp 
 Support chat 
 ~€50-200/mo 
 Selma 
 
 
 Jira / Linear 
 Project management 
 ~€50-100/mo 
 Emir 
 
 
 Notion 
 Documentation, wiki 
 Free tier or ~€10/mo 
 John 
 
 
 Apollo.io 
 Sales outreach 
 ~€100/mo 
 Selma 
 
 
 LinkedIn Sales Navigator 
 Sales prospecting 
 ~€80/mo 
 Selma 
 
 
 
 Communication 
 
 
 
 Tool 
 Purpose 
 Cost 
 Owner 
 
 
 
 
 Telegram 
 Alem ↔ John quick chat 
 Free 
 John 
 
 
 Email (one.com) 
 External communication 
 ~€10/mo 
 John 
 
 
 Slack (future) 
 Team collaboration 
 Free tier or ~€50/mo 
 John 
 
 
 
 Total Estimated Tool Cost: €1,000-3,000/month (scales with usage) 
 
 8. Disaster Recovery & Business Continuity 
 8.1 Backup Strategy 
 What We Back Up: 
 
 
 
 Data 
 Backup Method 
 Frequency 
 Retention 
 Location 
 
 
 
 
 john.db (SQLite) 
 GitHub sync 
 Hourly 
 Indefinitely 
 github.com/johnatbasicas/clawd 
 
 
 Source code 
 GitHub 
 Every commit 
 Indefinitely 
 github.com/johnatbasicas 
 
 
 Database (PostgreSQL) 
 AWS RDS automated backups 
 Daily 
 30 days 
 AWS S3 
 
 
 File storage (S3) 
 Cross-region replication 
 Real-time 
 90 days 
 AWS S3 (us-west-2) 
 
 
 Audit logs 
 S3 + Glacier 
 Daily 
 6 years (HIPAA) 
 AWS Glacier 
 
 
 Configuration files 
 GitHub 
 Every change 
 Indefinitely 
 GitHub (encrypted secrets) 
 
 
 
 Backup Verification: Nermin tests restore monthly. 
 8.2 Disaster Scenarios & Response 
 Scenario 1: AWS Region Failure 
 Impact: LumisCare production down 
 Response: 
 
 Nermin deploys to secondary region (us-west-2) — automated failover 
 DNS updated to point to secondary (Route53 health checks) 
 Restore database from latest backup 
 Verify functionality 
 Notify customers (status page) 
 
 RTO (Recovery Time Objective): 2 hours
 RPO (Recovery Point Objective): 24 hours (last daily backup) 
 Scenario 2: Data Breach (HIPAA) 
 Impact: PHI exposed 
 Response: 
 
 Dženan activates breach response plan (see GOVERNANCE.md, section 8.3) 
 Contain breach (Nermin) 
 Assess impact (Lejla + Dženan) 
 Notify customers (within 60 days per HIPAA) 
 Notify HHS (if >500 individuals) 
 Remediate + post-mortem 
 
 Timeline: Immediate containment, notification within 60 days 
 Scenario 3: Key Person Unavailable (Alem, John, Lejla, Nermin) 
 Alem unavailable: 
 
 John continues operations (delegated authority) 
 Strategic decisions deferred or escalated to Asmir (SnowIT partner) 
 If prolonged: Appoint interim CEO or sell 
 
 John unavailable: 
 
 Task queue still processed (daemon runs 24/7) 
 Amina coordinates team 
 Alem steps in for strategic decisions 
 John can be "rebooted" from MEMORY.md + ORGANIZATION.md 
 
 Lejla unavailable: 
 
 API Developer + Frontend Specialist continue development 
 Architecture decisions deferred or escalated to Amina → Alem 
 Code reviews by 2 other senior devs (if available) 
 
 Nermin unavailable: 
 
 Infrastructure on auto-pilot (monitoring, auto-scaling) 
 Lejla handles incidents (secondary on-call) 
 Engage external DevOps contractor if prolonged 
 
 Scenario 4: GitHub Outage 
 Impact: Can't access code, can't deploy 
 Response: 
 
 Local copies of code on all dev machines 
 Deploy from last known good state (Docker images cached) 
 Wait for GitHub to restore (historically < 4 hours) 
 
 RTO: 4 hours
 RPO: 0 (local copies) 
 Scenario 5: Stripe Outage 
 Impact: Can't process payments 
 Response: 
 
 Monitor Stripe status page 
 Notify customers (if prolonged) 
 Wait for Stripe to restore 
 No immediate action (payments retry automatically) 
 
 RTO: Dependent on Stripe
 RPO: 0 (Stripe handles retries) 
 8.3 Runbooks 
 Location: ~/clawd/org/runbooks/ 
 Runbooks Maintained by Nermin: 
 
 runbook-db-restore.md — How to restore PostgreSQL from backup 
 runbook-deploy-rollback.md — How to rollback production deployment 
 runbook-region-failover.md — How to failover to secondary AWS region 
 runbook-security-incident.md — How to respond to security breach 
 runbook-scaling.md — How to manually scale infrastructure 
 runbook-certificate-renewal.md — How to renew TLS certificates 
 
 Each runbook includes: 
 
 When to use it 
 Step-by-step instructions 
 Commands to run 
 Expected output 
 Rollback procedure 
 Contact information (who to escalate to) 
 
 Review Cadence: Quarterly (test at least one runbook per quarter) 
 
 9. Communication Channels & Etiquette 
 See PROCESSES.md, Section 4 for full communication protocols. 
 Quick Reference: 
 
 
 
 Channel 
 Use Case 
 Response SLA 
 
 
 
 
 Telegram 
 Urgent, quick decisions 
 5-15 min 
 
 
 CLI (Claude Code) 
 Deep work, architecture 
 Real-time 
 
 
 Email 
 External, formal 
 4-24 hours 
 
 
 Slack (future) 
 Team collaboration 
 1-4 hours 
 
 
 Jira/Linear 
 Task tracking 
 Daily check 
 
 
 GitHub 
 Code, PRs 
 24 hours 
 
 
 Standup 
 Daily status 
 9:15 AM CET 
 
 
 
 
 10. Document Control 
 
 
 
 Version 
 Date 
 Changes 
 Author 
 
 
 
 
 1.0 
 2026-01-28 
 Initial document 
 John + Nermin + Amina 
 
 
 
 Next Review: 2026-04-01 (quarterly) 
 Owner: Alem Basic
 Maintained By: John (Director) + Nermin Šabić (DevOps) + Amina Hadžić (Head of Projects) 
 
 End of Operations Manual 
 Run the organization like a machine. Predictable, reliable, scalable. Daily, weekly, monthly, quarterly routines. KPIs tracked. Backups verified. Disasters planned for. Just execute.