Operations Guide
Last Verified: 2026-02-17 | Owner: John
Operations Manual
Version: 1.0 Last Updated: 2026-01-28 Owner: Alem Basic Prepared by: John (Director) + Nermin Šabić (DevOps) + Amina Hadžić (Head of Projects)
Executive Summary
This is the operational playbook. Daily/weekly/monthly routines, KPIs, reporting structure, tools, systems, and disaster recovery. Everything needed to run the organization day-to-day.
Purpose: Ensure operations run smoothly whether Alem is available or not. John + team can operate independently using this manual.
1. Daily Operations
1.1 Daily Routine — John (Director)
Every Morning (Oslo Time Zone):
08:00 — Wake up (boot)
├─ Read MEMORY.md (context refresh)
├─ Check john.db for overnight tasks
├─ Check Telegram for Alem messages
└─ Check email ([email protected]) for urgent matters
08:30 — Prepare morning brief
├─ Tasks completed yesterday
├─ Planned for today
├─ Blockers
├─ Metrics snapshot (revenue, uptime, trading)
└─ Send to Alem via Telegram (if significant updates)
09:00 — Monitor task queue
├─ ~/clawd/tasks/pending/ → assign to agents
├─ ~/clawd/tasks/in-progress/ → check progress
└─ ~/clawd/tasks/completed/ → archive
09:15 — Daily standup (all team, 15 min)
09:30-18:00 — Execution mode
├─ Delegate tasks to agents
├─ Monitor progress
├─ Escalate blockers within 4h
├─ Respond to Telegram within 5 min
├─ Check trading positions (every 3 hours: 12:00, 15:00, 18:00)
└─ Log all decisions to john.db
18:00 — Evening wrap-up
├─ Archive completed tasks
├─ Sync DB to GitHub (automatic, verify)
├─ Prepare tomorrow's priorities
└─ Send summary to Alem (if needed)
24/7 Monitoring:
- Trading: Every 3 hours (cron job)
- Infrastructure: Continuous (Datadog + PagerDuty)
- Task queue: Every 30 seconds (john-daemon.sh)
- Telegram: Within 5 minutes
1.2 Daily Routine — Team
9:15 AM CET — Daily Standup (Mon-Fri, 15 min max)
Attendees: All team (Emir leads)
Format:
- Each person: 3 questions, 1 min each
- What did I do yesterday?
- What will I do today?
- Any blockers?
Emir's checklist:
- Update sprint board before standup
- Note blockers → escalate after standup
- Update burn-down chart
Throughout Day:
- Agents execute assigned tasks
- Update Jira/Linear as tasks progress
- Escalate blockers within 1 hour
- Code review within 24 hours
- Respond to Slack/messages within 4 hours (business hours)
End of Day:
- Update task status
- Log any decisions or learnings
- Prepare for tomorrow's standup
2. Weekly Operations
2.1 Weekly Cadence
| Day | Time | Event | Owner | Duration |
|---|---|---|---|---|
| Monday | 9:15 AM | Daily standup | Emir | 15 min |
| 10:00 AM | Sprint planning (every 2 weeks) | Emir + Amina | 2-3 hours | |
| Tuesday | 9:15 AM | Daily standup | Emir | 15 min |
| Wednesday | 9:15 AM | Daily standup | Emir | 15 min |
| 14:00 CET | Backlog refinement | Emir + Lejla + Selma | 1 hour | |
| Thursday | 9:15 AM | Daily standup | Emir | 15 min |
| 14:00 CET | Architecture review (bi-weekly) | Lejla | 1-2 hours | |
| Friday | 9:15 AM | Daily standup | Emir | 15 min |
| 15:00 CET | Sprint review + retro (every 2 weeks) | Emir + team | 2 hours | |
| 16:00 CET | Weekly wrap-up (John → Alem) | John | 30 min |
2.2 Weekly Reporting (John → Alem)
Every Friday, 16:00 CET:
Weekly Summary (Telegram or Email):
# Week of [Date]
## Shipped This Week
- [Feature/bug 1]
- [Feature/bug 2]
## Metrics
- Revenue: €X (+/- Y% from last week)
- Customers: X (+/- Y new/churn)
- Uptime: 99.X%
- Trading ROI: +/-X%
## Blockers / Risks
- [Blocker 1 — mitigation plan]
- [Risk 1 — action taken]
## Next Week Priorities
- [Priority 1]
- [Priority 2]
## Decisions Made (>€500)
- [Decision 1 — €X — rationale]
Alem reviews and provides feedback.
3. Monthly Operations
3.1 Monthly Business Review (MBR)
Last Friday of Every Month, 14:00-16:00 CET
Attendees: Alem, John, Amina
Agenda:
-
Revenue & Growth (10 min)
- MRR, new customers, churn, LTV/CAC
- Forecast: next 3 months
-
Product & Development (10 min)
- Features shipped this month
- Sprint velocity trend
- Tech debt status
-
Operations (10 min)
- Uptime, incidents, MTTR
- Support metrics (tickets, CSAT)
-
Trading (5 min)
- Monthly P&L
- ROI, Sharpe ratio
- Portfolio allocation
-
Risks & Compliance (10 min)
- Top 5 risks, status
- Compliance updates (HIPAA, audits)
-
Team & People (5 min)
- Agent utilization
- Process improvements
- Hiring needs (if any)
-
Next Month Priorities (10 min)
- What are we focusing on?
- Resource allocation
Output:
- Updated priorities for next month
- Budget adjustments (if needed)
- Action items
3.2 Monthly Reporting — Financial
By 5th of Each Month:
John prepares financial report:
| Metric | Current Month | Last Month | Change |
|---|---|---|---|
| Revenue (Fast Constructions) | €X | €Y | +/- Z% |
| Expenses (total) | €X | €Y | +/- Z% |
| ├─ Infrastructure | €X | €Y | +/- Z% |
| ├─ SaaS tools | €X | €Y | +/- Z% |
| ├─ Marketing | €X | €Y | +/- Z% |
| ├─ Development (SnowIT payment) | €X | €Y | +/- Z% |
| ├─ Professional services | €X | €Y | +/- Z% |
| ├─ Insurance | €X | €Y | +/- Z% |
| Net Profit | €X | €Y | +/- Z% |
| Charity (50%) | €X (accrued) | — | — |
| Burn Rate | €X/month | — | — |
| Runway | X months | — | — |
Sent to: Alem + accountant (if hired)
3.3 Monthly Tasks Checklist
John's Monthly Checklist:
- Financial report prepared (by 5th)
- Monthly Business Review held (last Friday)
- Risk register reviewed (Dženan)
- Compliance checklist reviewed (Dženan)
- Infrastructure cost optimization (Nermin)
- Trading performance report (Nick)
- Customer feedback analysis (Selma)
- Sprint velocity analysis (Emir)
- Tech debt review (Lejla + Tarik)
- Backup verification (Nermin — test restore)
- Security scan (Tarik — OWASP ZAP)
- Vendor BAA review (Dženan)
- Database cleanup (old logs, expired data)
- Update MEMORY.md (add learnings from month)
4. Quarterly Operations
4.1 Quarterly Planning
Last Week of Quarter (March, June, September, December)
Attendees: Alem, John, Amina, Lejla, Selma
Agenda:
- Review last quarter (goals, achievements, misses)
- Market & competitive analysis (Selma)
- Product roadmap update (Amina + Lejla)
- Strategic priorities for next quarter (Alem)
- Budget allocation
- Hiring plan (if needed)
- Risk review (Dženan)
Output:
- OKRs (Objectives & Key Results) for next quarter
- Budget approved
- Roadmap locked for next 3 months
4.2 Quarterly Tasks
- Tech Debt Sprint — one full sprint dedicated to refactoring, testing, documentation (Lejla)
- Compliance audit — internal HIPAA audit (Dženan + Tarik)
- Infrastructure review — cost, performance, scaling plan (Nermin)
- Customer satisfaction survey — NPS survey to all customers (Selma)
- Competitive analysis — review competitors, market trends (Selma)
- Patent progress review — if filing in progress (Dženan)
- Insurance review — renew or update policies (Dženan)
- Document review — update all org docs (MEMORY.md, ORGANIZATION.md, this doc)
5. Annual Operations
5.1 Annual Planning
December (for next year)
Agenda:
- Review full year (revenue, customers, product, team)
- Set annual vision and goals (Alem)
- Product roadmap (12 months)
- Budget (annual)
- Hiring plan
- Strategic initiatives (new products, markets, partnerships)
- Charity commitment (allocate 50% of profit)
Output:
- Annual OKRs
- Annual budget
- 12-month roadmap
5.2 Annual Tasks
- Annual financial statements — P&L, balance sheet, cash flow (accountant)
- Tax filings — US (Fast Constructions), BiH (SnowIT), Norway (Alem personal)
- SOC 2 Type II audit — external audit (Dženan + auditor)
- HIPAA risk assessment — annual requirement (Dženan)
- Insurance renewal — cyber liability, E&O, general liability (Dženan)
- Charitable giving — donate 50% of net profit (Alem selects charities)
- Transparency report — publish charity donations on lumiscare.com/impact
- Team performance reviews — if real humans hired (Amina)
- Document archive — backup all docs, contracts, decisions to secure storage
6. Local Infrastructure (Mac Studio)
6.1 Services Overview
All services run locally on Mac Studio M3 Ultra (96GB RAM). Zero cloud dependency for operations.
| Service | Type | Port | Purpose |
|---|---|---|---|
| Mattermost | Docker | 8065 | Team chat (4 teams: basic, wizard, rendrom, riad) |
| Planka | Docker | 3100 | Kanban boards (boards.basicconsulting.no) |
| Documenso | Docker | 3003 | Document signing (sign.basicconsulting.no) |
| BookStack | Docker | 6875 | Wiki/documentation |
| MC Dashboard | Node.js | 3030 | Mission Control web UI (task management) |
| Ollama | Native | 11434 | Local AI (8b classify, 32b respond/code) |
External access: Cloudflare tunnels (mm/boards/sign.basicconsulting.no)
6.2 Daemons (LaunchAgents)
| Daemon | Interval | Purpose |
|---|---|---|
| com.john.ops-agent | 5 min | Autonomous ops — MM monitoring, health checks, auto-fix, task creation, intelligent responses |
| com.edita.autowork | 30 min | Background task worker (Claude haiku) |
| com.john.mc-dashboard | always | Mission Control web dashboard |
| com.john.mc-session-worker | on events | Session state extraction |
6.3 Ops Agent (Autonomous Operations)
Replaces manual MM monitoring. Runs 24/7, $0 cost (local Ollama AI).
What it does:
- Reads all MM messages from 4 teams
- Classifies via Ollama 8b: ROUTINE / TASK / INCIDENT
- ROUTINE — logs, no action
- TASK — creates MC task (BILLABLE if client team) + Planka card + MM reply
- INCIDENT — HIGH priority task + escalation to John
- Runs health checks on all 20 services every cycle
- Auto-fixes known issues (restart, cleanup) with safety limits (max 3/hour)
Runbook: ~/system/context/docs/runbooks/ops-agent.md
6.4 Monitoring Stack
| Tool | What It Monitors | Action |
|---|---|---|
| health-check.js | Docker (8), HTTP (6), system (2), daemons (4) | Status report |
| auto-fix.js | Service failures | Automated restart/cleanup (max 3/hour) |
| ops-agent.js | MM messages + health | Classify, respond, create tasks, fix |
| smoke-test.js | Integration tests | Pre/post deployment verification |
Dashboard: http://localhost:3030 (MC — tasks, stats, mobile-friendly)
7. Key Performance Indicators (KPIs) {#kpis}
6.1 KPI Dashboard (Live, Updated Daily)
Owner: John (maintains dashboard)
Tool: Notion, Grafana, or Google Sheets
Metrics:
Business Metrics
| Metric | Current | Target | Trend | Owner |
|---|---|---|---|---|
| MRR (Monthly Recurring Revenue) | €X | 10%+ MoM growth | ↗️ | Selma |
| Customer Count | X | 10 (Month 6), 50 (Month 12) | ↗️ | Selma |
| Churn Rate | X% | < 5% monthly | ↘️ | Selma |
| Customer Acquisition Cost (CAC) | €X | < €500 | ↘️ | Selma |
| Lifetime Value (LTV) | €X | > €2,000 | ↗️ | Selma |
| LTV/CAC Ratio | X:1 | > 3:1 | ↗️ | Selma |
Technical Metrics
| Metric | Current | Target | Trend | Owner |
|---|---|---|---|---|
| Uptime | 99.X% | 99.9% (LumisCare) | ↗️ | Nermin |
| API Latency (p95) | Xms | < 500ms | ↘️ | Nermin |
| Page Load Time | Xs | < 2s | ↘️ | Frontend |
| Deployment Frequency | X/week | Daily (staging), weekly (prod) | ↗️ | Nermin |
| Mean Time to Recovery (MTTR) | Xh | < 4h | ↘️ | Nermin + Lejla |
| Bug Escape Rate | X% | < 5% | ↘️ | Tarik |
| Test Coverage | X% | ≥ 80% | ↗️ | Tarik |
Operational Metrics
| Metric | Current | Target | Trend | Owner |
|---|---|---|---|---|
| Sprint Velocity | X pts | Consistent ±10% | → | Emir |
| Task Completion Rate | X% | ≥ 95% | ↗️ | John |
| Support Response Time | Xmin | < 30min (Tier 1) | ↘️ | Selma |
| Customer Satisfaction (CSAT) | X/5 | ≥ 4.5/5 | ↗️ | Selma |
| Agent Utilization | X% | ≥ 70% billable | ↗️ | Amina |
Trading Metrics
| Metric | Current | Target | Trend | Owner |
|---|---|---|---|---|
| Monthly ROI | X% | ≥ 5% | ↗️ | Nick |
| Sharpe Ratio | X | > 1.5 | ↗️ | Nick |
| Portfolio Value | €X | €10,000 (current) | ↗️ | Nick |
| Stop-Loss Adherence | X% | 100% | → | Nick |
6.2 Monitoring & Alerting
Tools:
- Datadog: Infrastructure, APM, logs
- PagerDuty: On-call rotation, incident alerts
- Stripe Dashboard: Revenue, subscriptions
- Binance API: Trading positions, P&L
- john.db: All decisions, tasks, logs
Alert Configuration:
| Alert | Threshold | Action | Owner |
|---|---|---|---|
| Uptime < 99.9% | Downtime detected | PagerDuty → Nermin | Nermin |
| API latency > 1s (p95) | Sustained 5 min | Slack alert → Lejla | Lejla |
| Error rate > 1% | Sustained 5 min | PagerDuty → Nermin | Nermin |
| Database CPU > 80% | Sustained 10 min | Slack alert → Nermin | Nermin |
| Trading loss > 5% | Single position | Telegram → John → Nick | Nick |
| Customer churn > 10% | Monthly | Email → Selma, Amina | Selma |
| Support ticket SLA breach | > 4h unresolved | Slack alert → Selma | Selma |
On-Call Rotation:
- Primary: Nermin (DevOps)
- Secondary: Lejla (Tech Lead)
- Escalation: John → Alem
On-Call SLA:
- Acknowledge alert: 15 min
- Begin investigation: 30 min
- Escalate if not resolved: 2 hours (P2), 4 hours (P1)
7. Tools & Systems
7.1 Core Systems
| System | Purpose | Access | Owner |
|---|---|---|---|
| john.db (SQLite) | Source of truth, all decisions logged | John, Alem (query) | John |
| GitHub | Code repository, version control | All devs | Lejla |
| AWS | Infrastructure (ECS, RDS, S3, CloudFront) | Nermin, Lejla | Nermin |
| Stripe | Payment processing, subscriptions | Selma, Amina, John | Selma |
| Binance | Trading | Nick, John | Nick |
| Jira / Linear | Task tracking, sprint management | All team | Emir |
| Datadog | Monitoring, APM, logs | Nermin, Lejla, John | Nermin |
| PagerDuty | On-call, incident alerts | Nermin, Lejla | Nermin |
| Intercom / Crisp | Customer support chat | Selma, Tarik | Selma |
| Telegram (@johnbasicas_bot) | Quick communication (Alem ↔ John) | Alem, John | John |
7.2 Tool Stack (Detailed)
Development
| Tool | Purpose | Cost | Owner |
|---|---|---|---|
| GitHub | Code repo, PRs, CI/CD | Free (public repos) | Lejla |
| GitHub Actions | CI/CD pipelines | Included | Nermin |
| VSCode / Cursor | IDE | Free | All devs |
| Playwright | E2E testing | Free | Tarik |
| Jest | Unit testing | Free | Tarik |
| ESLint / Prettier | Linting, formatting | Free | Lejla |
Infrastructure
| Tool | Purpose | Cost | Owner |
|---|---|---|---|
| AWS ECS/EKS | Container orchestration | ~€500-2,000/mo | Nermin |
| AWS RDS (PostgreSQL) | Database | ~€100-500/mo | Nermin |
| AWS S3 | File storage | ~€50-200/mo | Nermin |
| CloudFront | CDN | ~€50-200/mo | Nermin |
| Route53 | DNS | ~€10/mo | Nermin |
| Datadog | Monitoring, APM | ~€100-500/mo | Nermin |
| PagerDuty | On-call | ~€50-100/mo | Nermin |
Business & Operations
| Tool | Purpose | Cost | Owner |
|---|---|---|---|
| Stripe | Payments | 2.9% + €0.30/transaction | Selma |
| Intercom / Crisp | Support chat | ~€50-200/mo | Selma |
| Jira / Linear | Project management | ~€50-100/mo | Emir |
| Notion | Documentation, wiki | Free tier or ~€10/mo | John |
| Apollo.io | Sales outreach | ~€100/mo | Selma |
| LinkedIn Sales Navigator | Sales prospecting | ~€80/mo | Selma |
Communication
| Tool | Purpose | Cost | Owner |
|---|---|---|---|
| Telegram | Alem ↔ John quick chat | Free | John |
| Email (one.com) | External communication | ~€10/mo | John |
| Slack (future) | Team collaboration | Free tier or ~€50/mo | John |
Total Estimated Tool Cost: €1,000-3,000/month (scales with usage)
8. Disaster Recovery & Business Continuity
8.1 Backup Strategy
What We Back Up:
| Data | Backup Method | Frequency | Retention | Location |
|---|---|---|---|---|
| john.db (SQLite) | GitHub sync | Hourly | Indefinitely | github.com/johnatbasicas/clawd |
| Source code | GitHub | Every commit | Indefinitely | github.com/johnatbasicas |
| Database (PostgreSQL) | AWS RDS automated backups | Daily | 30 days | AWS S3 |
| File storage (S3) | Cross-region replication | Real-time | 90 days | AWS S3 (us-west-2) |
| Audit logs | S3 + Glacier | Daily | 6 years (HIPAA) | AWS Glacier |
| Configuration files | GitHub | Every change | Indefinitely | GitHub (encrypted secrets) |
Backup Verification: Nermin tests restore monthly.
8.2 Disaster Scenarios & Response
Scenario 1: AWS Region Failure
Impact: LumisCare production down
Response:
- Nermin deploys to secondary region (us-west-2) — automated failover
- DNS updated to point to secondary (Route53 health checks)
- Restore database from latest backup
- Verify functionality
- Notify customers (status page)
RTO (Recovery Time Objective): 2 hours RPO (Recovery Point Objective): 24 hours (last daily backup)
Scenario 2: Data Breach (HIPAA)
Impact: PHI exposed
Response:
- Dženan activates breach response plan (see GOVERNANCE.md, section 8.3)
- Contain breach (Nermin)
- Assess impact (Lejla + Dženan)
- Notify customers (within 60 days per HIPAA)
- Notify HHS (if >500 individuals)
- Remediate + post-mortem
Timeline: Immediate containment, notification within 60 days
Scenario 3: Key Person Unavailable (Alem, John, Lejla, Nermin)
- John continues operations (delegated authority)
- Strategic decisions deferred or escalated to Asmir (SnowIT partner)
- If prolonged: Appoint interim CEO or sell
- Task queue still processed (daemon runs 24/7)
- Amina coordinates team
- Alem steps in for strategic decisions
- John can be "rebooted" from MEMORY.md + ORGANIZATION.md
- API Developer + Frontend Specialist continue development
- Architecture decisions deferred or escalated to Amina → Alem
- Code reviews by 2 other senior devs (if available)
- Infrastructure on auto-pilot (monitoring, auto-scaling)
- Lejla handles incidents (secondary on-call)
- Engage external DevOps contractor if prolonged
Scenario 4: GitHub Outage
Impact: Can't access code, can't deploy
Response:
- Local copies of code on all dev machines
- Deploy from last known good state (Docker images cached)
- Wait for GitHub to restore (historically < 4 hours)
RTO: 4 hours RPO: 0 (local copies)
Scenario 5: Stripe Outage
Impact: Can't process payments
Response:
- Monitor Stripe status page
- Notify customers (if prolonged)
- Wait for Stripe to restore
- No immediate action (payments retry automatically)
RTO: Dependent on Stripe RPO: 0 (Stripe handles retries)
8.3 Runbooks
Location: ~/clawd/org/runbooks/
Runbooks Maintained by Nermin:
- runbook-db-restore.md — How to restore PostgreSQL from backup
- runbook-deploy-rollback.md — How to rollback production deployment
- runbook-region-failover.md — How to failover to secondary AWS region
- runbook-security-incident.md — How to respond to security breach
- runbook-scaling.md — How to manually scale infrastructure
- runbook-certificate-renewal.md — How to renew TLS certificates
Each runbook includes:
- When to use it
- Step-by-step instructions
- Commands to run
- Expected output
- Rollback procedure
- Contact information (who to escalate to)
Review Cadence: Quarterly (test at least one runbook per quarter)
9. Communication Channels & Etiquette
See PROCESSES.md, Section 4 for full communication protocols.
Quick Reference:
| Channel | Use Case | Response SLA |
|---|---|---|
| Telegram | Urgent, quick decisions | 5-15 min |
| CLI (Claude Code) | Deep work, architecture | Real-time |
| External, formal | 4-24 hours | |
| Slack (future) | Team collaboration | 1-4 hours |
| Jira/Linear | Task tracking | Daily check |
| GitHub | Code, PRs | 24 hours |
| Standup | Daily status | 9:15 AM CET |
10. Document Control
| Version | Date | Changes | Author |
|---|---|---|---|
| 1.0 | 2026-01-28 | Initial document | John + Nermin + Amina |
Next Review: 2026-04-01 (quarterly)
Owner: Alem Basic Maintained By: John (Director) + Nermin Šabić (DevOps) + Amina Hadžić (Head of Projects)
End of Operations Manual
Run the organization like a machine. Predictable, reliable, scalable. Daily, weekly, monthly, quarterly routines. KPIs tracked. Backups verified. Disasters planned for. Just execute.
No comments to display
No comments to display