Bilko Sentinel — Tier-0 Self-Healing Agent 2026-06-10

Status

LIVE and Proveo-verified as of 2026-06-10. MC #103337 (AgentForge implementation) + MC #103337 Proveo independent verification. Parent MC #103328.

What It Is

Bilko Sentinel is a read-only ops agent that runs on ANVIL every 3 minutes. It follows a four-stage pipeline:

Detect — queries the 8 GCP Cloud Monitoring alert policy conditions via the Monitoring REST API (GET only). Evaluates the last 6 minutes of time-series data locally against each condition's threshold.
Enrich — on a breach, fetches recent Cloud Run logs and the current revision/traffic split for the affected service.
Diagnose — calls FORGE Ollama (qwen2.5:7b-instruct-q8_0 at 10.0.0.2:11434) with a structured JSON prompt (temperature 0.1) to produce a root-cause hypothesis and recommended action. Falls back to a deterministic template per cause category if Ollama is unreachable.
Propose — posts exactly one structured proposal per unique incident to Slack #ceo and email [email protected]. Deduplicates by incident key; does not re-notify the same breach for 24 hours.

It never changes anything. Proveo independently verified: zero mutating verbs, no GCP mutations of any kind (no run deploy, no set-iam-policy, no SQL writes, no secrets writes). The only HTTP POST in the script goes to the Ollama local inference endpoint, not to googleapis.com.

Infrastructure

Component	Location
Script	`/Users/makinja/system/tools/bilko-sentinel.js`
LaunchAgent plist	`/Users/makinja/Library/LaunchAgents/com.alai.bilko-sentinel.plist`
State file (dedup)	`/Users/makinja/system/state/bilko-sentinel-state.json`
Audit log	`/Users/makinja/system/logs/bilko-sentinel-audit.jsonl`
Run log	`/Users/makinja/system/logs/bilko-sentinel.log`
Host	ANVIL (makinja local Mac)
Schedule	180-second interval, RunAtLoad=true
Node.js path	`/opt/homebrew/bin/node`

Policies Monitored (8 policies, 10 conditions)

Cloud SQL CPU utilization high (prod + stage)
Container restart/crash on prod services
HTTP 5xx rate high on bilko-api-demo
HTTP 5xx rate high on bilko-web-demo
Request latency P95 high on prod services (API + Web — 2 conditions)
CIAM — High 429 rate on bilko-api-demo (legacy from MC #103245)
Cloud SQL connections near max on bilko-demo-db
Uptime check failed (app.bilko.cloud + app-api.bilko.cloud — 2 conditions)

Severity Scale

Label	Meaning
P1-DOWN	Service is down or uptime check failing
P2-DEGRADED	Elevated error rate or restart loop
P3-WARN	Latency spike, DB pressure, CIAM abuse rate

Notification Format

Every proposal contains:

Header: BILKO SENTINEL — PROPOSAL (Tier-0, no action taken)
Incident ID, severity, env, resource, condition name
Metric value vs threshold (exact numbers)
Root-cause hypothesis (Ollama-generated or deterministic fallback)
Proposed remediation steps (for human to execute)
GCP Console link for the alert incident
Detected timestamp

Dedup key format: bilko-{policyId[-8:]}-{condId[-8:]}. Once notified, silent for 24 hours on the same condition.

Proveo Verification Summary

Proveo (MC #103337) independently verified all critical properties:

Property	Method	Result
Read-only guarantee	Exhaustive grep of all spawnSync calls and HTTP methods	CONFIRMED — zero mutating verbs
LaunchAgent loaded + healthy	`launchctl list \| grep bilko-sentinel` — LastExitStatus=0	PASS
Detect → Propose → Slack delivery	Independent verifier script with synthetic threshold (2ms vs real 9.5ms P95)	PASS — Slack message confirmed in #ceo at 04:24 UTC
Detect → Propose → Email delivery	Same synthetic test	PASS — Message-ID confirmed in audit DB
Dedup across cycles	Real 2-cycle disk-persistence test (not code inspection only)	PASS — Cycle 2 silent, no second Slack message
Healthy = silent	Normal threshold against real metric value	PASS — zero messages sent
No GCP mutation	Cloud Run revision before/after comparison	PASS — bilko-api-demo-00167-h9v unchanged

Honest gaps noted by AgentForge (now closed by Proveo): email exit-code quirk (fixed in script via stdout check); dedup 2-cycle test (now independently proven); Ollama not re-exercised in Proveo test (builder's synthtest confirmed it live).

Runbook

Pause sentinel

launchctl unload ~/Library/LaunchAgents/com.alai.bilko-sentinel.plist

Resume sentinel

launchctl load ~/Library/LaunchAgents/com.alai.bilko-sentinel.plist

Check last run status

launchctl list | grep bilko-sentinel
# PID="-" = not currently running (between intervals). LastExitStatus=0 = healthy.

tail -20 /Users/makinja/system/logs/bilko-sentinel.log

View audit trail

tail -f /Users/makinja/system/logs/bilko-sentinel-audit.jsonl

Tune alert thresholds

Edit the ALERT_POLICIES array in /Users/makinja/system/tools/bilko-sentinel.js, then reload the agent:

launchctl unload ~/Library/LaunchAgents/com.alai.bilko-sentinel.plist
# edit the script
launchctl load ~/Library/LaunchAgents/com.alai.bilko-sentinel.plist

Tier Model and Safety Rationale

The tier model was defined after the 2026-06 IAM incident, in which an automated set-iam-policy call wiped project IAM. The lesson: any agent that can mutate production infra must earn trust via a demonstrated read-only track record first.

Tier	Capability	Status	Safety gates
Tier 0 — current	Detect + Diagnose + Propose. Read-only. Posts structured proposal to #ceo and [email protected]. Zero blast radius.	LIVE	No code path to write to GCP. Proveo-verified.
Tier 1 — future MC	Bounded auto-remediation: Cloud Run revision rollback, instance scale adjustment, hung service restart. Circuit breaker (max N actions/hour). Full audit trail. Never touches DB schema, IAM, secrets, or financial data. Always announces before acting.	NOT BUILT — separate MC required	Explicit CEO approval token (`/tmp/bilko-sentinel-tier1-approved`) required before any mutation. Separate script (`bilko-sentinel-tier1.js`). Only after Tier-0 proves signal quality over weeks.
Tier 2	Broader autonomy.	Probably never for a prod-financial SaaS	N/A

The IAM incident reference is intentional: Tier-1 will be built with a hard whitelist of reversible Cloud Run and scaling operations only. No set-iam-policy, no SQL DDL, no secret rotation — ever.