# BetterStack Setup

# BetterStack Uptime Monitoring Setup Guide

**Last updated:** 2026-02-20
**Related:** [MONITORING.md](MONITORING.md), [health-check.sh](../../infrastructure/health-check.sh)
**Purpose:** External uptime monitoring for Drop production environment

---

## Why BetterStack?

BetterStack provides external uptime monitoring independent of Drop's infrastructure:
- Detects infrastructure failures (AWS App Runner crashes, network issues)
- Alerts when the entire application is unreachable
- Provides uptime SLA tracking and historical reports
- Multiple notification channels (Slack, Email, SMS)
- Status page for client transparency

**Key difference from internal health checks:** Internal checks (Docker, Fly.io) only work when the container is running. BetterStack catches total outages.

---

## Free Tier Limits

**Plan:** Free tier (no credit card required)
**Limits:**
- **10 monitors** (enough for Drop production)
- **3-minute check interval** (paid plan: 30s minimum)
- **1 status page**
- **Unlimited team members**
- **Unlimited integrations** (Slack, email, webhooks)

**Upgrade required for:**
- Faster check intervals (<3 minutes)
- More than 10 monitors (e.g., multi-region checks)
- Advanced features (maintenance windows, custom headers)

---

## Account Setup

### Step 1: Create Account

1. Go to https://betterstack.com/uptime
2. Click **"Start free trial"** (becomes free tier after trial)
3. Sign up with Alem's email: `alem@alai.no`
4. Verify email address
5. Create workspace name: **"ALAI Products"** (shared across Drop, BasicFakta)

### Step 2: Configure Team

1. Navigate to **Settings** > **Team**
2. Add team members:
   - `alem@alai.no` (Owner)
   - `john@basicconsulting.no` (Admin)
3. Set **Default timezone:** `Europe/Oslo` (UTC+1)

---

## Monitor Configuration

### Monitor 1: Health Endpoint (Primary)

**Purpose:** Verify API health and database connectivity

1. Go to **Monitors** > **Create Monitor**
2. Configure:
   - **Monitor name:** `Drop Health Check`
   - **Monitor type:** `HTTP`
   - **URL:** `https://drop.alai.no/api/health`
   - **Check interval:** `3 minutes` (free tier)
   - **Request timeout:** `5 seconds`
   - **Method:** `GET`
   - **Confirmation period:** `30 seconds` (1 retry before alerting)

3. **Expected Response:**
   - **Status code:** `200`
   - **Keyword check:** Enable
     - Response body contains: `"status":"ok"`
     - **Why:** Ensures health endpoint returns valid JSON, not just HTTP 200

4. **Advanced settings:**
   - **Follow redirects:** `Enabled` (default)
   - **Verify SSL certificate:** `Enabled`
   - **SSL expiry warning:** `14 days before expiration`

5. Click **Create Monitor**

---

### Monitor 2: Landing Page

**Purpose:** Verify public website availability

1. Go to **Monitors** > **Create Monitor**
2. Configure:
   - **Monitor name:** `Drop Landing Page`
   - **Monitor type:** `HTTP`
   - **URL:** `https://drop.alai.no`
   - **Check interval:** `3 minutes`
   - **Request timeout:** `10 seconds` (landing page has more assets)
   - **Method:** `GET`
   - **Confirmation period:** `30 seconds`

3. **Expected Response:**
   - **Status code:** `200`
   - **Keyword check:** Enable
     - Response body contains: `Send penger` (tagline verification)

4. Click **Create Monitor**

---

### Monitor 3: Multi-Region Health Check

**Purpose:** Detect regional networking issues

1. Go to **Monitors** > **Create Monitor**
2. Configure:
   - **Monitor name:** `Drop Health (US East)`
   - **Monitor type:** `HTTP`
   - **URL:** `https://drop.alai.no/api/health`
   - **Check interval:** `3 minutes`
   - **Request timeout:** `5 seconds`
   - **Method:** `GET`
   - **Confirmation period:** `30 seconds`

3. **Expected Response:**
   - **Status code:** `200`
   - **Keyword check:** Response body contains `"status":"ok"`

4. **Advanced settings:**
   - **Region:** `US East` (different from default EU region)
   - **Why:** Detects if Drop is unreachable from specific geographies

5. Click **Create Monitor**

---

## Slack Integration

### Step 1: Create Slack Incoming Webhook

1. Go to your Slack workspace: **alai-talk.slack.com**
2. Navigate to **Slack App Directory** > **Incoming Webhooks**
3. Click **Add to Slack**
4. Select channel: **#drop-ops** (create if doesn't exist)
5. Click **Add Incoming Webhooks Integration**
6. Copy webhook URL (format: `https://hooks.slack.com/services/T.../B.../XXX`)
7. Save this URL securely (needed for BetterStack)

### Step 2: Add Slack Integration in BetterStack

1. In BetterStack, go to **Integrations**
2. Click **Add Integration** > **Slack**
3. Paste webhook URL from Step 1
4. Configure:
   - **Integration name:** `Drop Ops Slack`
   - **Notification channel:** `#drop-ops`
5. **Test integration:** Click **Send test message**
   - Verify message appears in `#drop-ops` channel
6. Click **Save Integration**

---

## On-Call Team Setup

### Step 1: Create On-Call Schedule

1. Go to **On-Call** > **Create Schedule**
2. Configure:
   - **Schedule name:** `Drop Primary On-Call`
   - **Timezone:** `Europe/Oslo`
3. Add rotation:
   - **Team member:** `alem@alai.no`
   - **Schedule type:** `24/7` (always on-call for now)
4. Click **Create Schedule**

### Step 2: Configure Escalation Policy

1. Go to **Escalation Policies** > **Create Policy**
2. Configure:
   - **Policy name:** `Drop Production Incidents`
3. Add escalation steps:

   **Step 1 (Immediate):**
   - **Who:** `Drop Ops Slack` integration
   - **Delay:** `0 minutes`

   **Step 2 (If still down after 5 minutes):**
   - **Who:** `alem@alai.no` (Email)
   - **Delay:** `5 minutes`

   **Step 3 (If still down after 15 minutes):**
   - **Who:** `alem@alai.no` (SMS) — **Requires phone number**
   - **Delay:** `15 minutes`
   - **Note:** SMS requires paid plan or verified phone number

4. Click **Create Policy**

### Step 3: Assign Policy to Monitors

1. Go to **Monitors**
2. For each monitor (`Drop Health Check`, `Drop Landing Page`, `Drop Health (US East)`):
   - Click monitor name
   - Go to **Settings** > **Escalation Policy**
   - Select: `Drop Production Incidents`
   - Click **Save**

---

## Status Page Setup

### Purpose
Public status page allows clients and stakeholders to check Drop availability without contacting support.

### Step 1: Create Status Page

1. Go to **Status Pages** > **Create Status Page**
2. Configure:
   - **Page name:** `Drop Status`
   - **Subdomain:** `drop-status` (URL: `https://drop-status.betteruptime.com`)
   - **Custom domain (optional):** `status.drop.alai.no` (requires DNS setup)

3. **Design settings:**
   - **Logo:** Upload Drop logo (green rounded rectangle)
   - **Brand color:** `#0B6E35` (Drop primary green)
   - **Header text:** `Drop Status`
   - **Tagline:** `Real-time service status and incident updates`

4. **Visibility:**
   - **Public:** Yes (anyone can view)
   - **Search engine indexing:** No (prevent Google indexing)

5. Click **Create Status Page**

### Step 2: Add Components

1. In the status page settings, go to **Components**
2. Click **Add Component**
3. Add three components:

   **Component 1:**
   - **Name:** `API & Health Endpoint`
   - **Linked monitor:** `Drop Health Check`
   - **Description:** `Core API functionality and database connectivity`

   **Component 2:**
   - **Name:** `Landing Page`
   - **Linked monitor:** `Drop Landing Page`
   - **Description:** `Public website and marketing content`

   **Component 3:**
   - **Name:** `Global Network`
   - **Linked monitor:** `Drop Health (US East)`
   - **Description:** `International access and routing`

4. Click **Save Components**

### Step 3: Configure Incident Communication

1. Go to **Status Pages** > **Settings** > **Incident Updates**
2. Enable:
   - **Auto-create incidents:** Yes (when monitor goes down)
   - **Auto-resolve incidents:** Yes (when monitor recovers)
3. **Notification subscribers:**
   - **Email subscriptions:** Enabled (users can subscribe to updates)
   - **Webhook notifications:** Disabled (optional for future)

### Step 4: Share Status Page

Once created, share the status page URL:
- **Internal:** Add to `#drop-ops` Slack channel description
- **External:** Link from Drop landing page footer (optional)
- **Clients:** Include in onboarding emails

**Status Page URL:** `https://drop-status.betteruptime.com`

---

## Verification Checklist

After completing setup, verify:

- [ ] **Monitors running:** All 3 monitors show green status
- [ ] **Slack alerts working:** Test by pausing a monitor (triggers down alert)
- [ ] **Email notifications working:** Verify Alem receives email on test alert
- [ ] **Status page public:** Open status page URL in incognito mode
- [ ] **Escalation policy assigned:** All monitors use `Drop Production Incidents` policy
- [ ] **SSL expiry alerts:** Monitors configured to warn 14 days before cert expiration

---

## Testing the Setup

### Test 1: Manual Down Alert

1. Go to **Monitors** > `Drop Health Check`
2. Click **Pause Monitor** (simulates downtime)
3. **Expected behavior:**
   - Slack alert in `#drop-ops` within 30 seconds
   - Email to `alem@alai.no` after 5 minutes (if still paused)
4. Click **Resume Monitor** to clear alert

### Test 2: Actual Downtime

1. SSH into production server (or use AWS App Runner console)
2. Stop the Drop application container temporarily
3. Wait for BetterStack to detect downtime (max 3 minutes + 30s confirmation)
4. **Expected behavior:**
   - Monitor shows red status
   - Slack alert in `#drop-ops`
   - Status page component shows "Down"
5. Restart application and verify recovery alert

### Test 3: SSL Expiry Warning

1. Go to **Monitors** > `Drop Health Check`
2. Verify **SSL expiry warning** is enabled (14 days)
3. **Expected behavior:**
   - Alert sent 14 days before SSL certificate expiration
   - Action required: Renew certificate before expiry

---

## Alert Examples

### Downtime Alert (Slack)

```
🚨 Drop Health Check is DOWN

Monitor: Drop Health Check
Status: DOWN
Response: Connection timeout
Region: EU West
Time: 2026-02-20 10:30 UTC

View incident: https://betterstack.com/incidents/...
```

### Recovery Alert (Slack)

```
✅ Drop Health Check is UP

Monitor: Drop Health Check
Status: UP
Response: 200 OK (2ms)
Downtime duration: 3 minutes
Time: 2026-02-20 10:33 UTC

Incident closed: https://betterstack.com/incidents/...
```

### SSL Expiry Warning (Email)

```
Subject: [BetterStack] SSL certificate expiring in 14 days

Monitor: Drop Health Check
Domain: drop.alai.no
Certificate expiry: 2026-03-06 23:59 UTC

Action required: Renew SSL certificate before expiration.
```

---

## Maintenance Mode

When performing planned maintenance (deployments, infrastructure upgrades):

1. Go to **Maintenance Windows** > **Create Window**
2. Configure:
   - **Name:** `Drop Deployment`
   - **Start time:** `2026-02-20 22:00 UTC`
   - **Duration:** `1 hour`
   - **Affected monitors:** Select all Drop monitors
3. **Notification:**
   - **Status page update:** Yes (shows maintenance banner)
   - **Alert suppression:** Yes (no downtime alerts during window)
4. Click **Create Maintenance Window**

**Effect:** During maintenance, downtime alerts are suppressed and status page shows "Scheduled Maintenance" instead of "Down".

---

## Best Practices

### Do's
- ✅ **Test alerts monthly** — Pause a monitor to verify escalation works
- ✅ **Update on-call schedule** — Rotate on-call duty if team grows
- ✅ **Monitor SSL expiry** — Enable 14-day warnings to prevent outages
- ✅ **Use maintenance windows** — Prevent false alerts during deployments
- ✅ **Review incident history** — Monthly review of downtime patterns

### Don'ts
- ❌ **Don't ignore degraded status** — Investigate even if not fully down
- ❌ **Don't disable monitors** — Use pause for temporary suppression only
- ❌ **Don't skip keyword checks** — HTTP 200 alone doesn't guarantee working API
- ❌ **Don't forget to update URLs** — When domain changes, update all monitors
- ❌ **Don't rely solely on external monitoring** — Combine with internal health checks

---

## Troubleshooting

### Monitor shows false positives (frequent up/down)

**Cause:** Network instability or slow response times
**Fix:**
1. Increase **Request timeout** from 5s to 10s
2. Increase **Confirmation period** from 30s to 60s
3. Check Drop API latency in logs

### Slack alerts not received

**Cause:** Webhook URL incorrect or channel archived
**Fix:**
1. Go to **Integrations** > `Drop Ops Slack`
2. Click **Send test message**
3. If fails, regenerate webhook in Slack and update BetterStack

### Email alerts delayed

**Cause:** Email provider spam filtering
**Fix:**
1. Whitelist `notifications@betterstack.com` in email settings
2. Check spam/junk folder
3. Verify email address in BetterStack team settings

### Status page not updating

**Cause:** Monitor not linked to status page component
**Fix:**
1. Go to **Status Pages** > `Drop Status` > **Components**
2. Ensure each component has a **Linked monitor** assigned
3. Save changes and trigger test alert

---

## Related Documentation

- [MONITORING.md](MONITORING.md) — Full monitoring stack overview
- [health-check.sh](../../infrastructure/health-check.sh) — Internal health check script
- [alerts.ts](../../src/drop-app/src/lib/alerts.ts) — Slack alerting implementation
- [/api/health route](../../src/drop-app/src/app/api/health/route.ts) — Health endpoint source code

---

## Support

**BetterStack Support:**
- Documentation: https://betterstack.com/docs
- Email: support@betterstack.com
- Status: https://status.betterstack.com

**Internal Contact:**
- Slack: `#drop-ops`
- Email: `alem@alai.no`