Files
multi-agent-mux/skills/delegate-job/mqtt-broker-setup.md
T
Godopu 97f649a3e1 feat(skills): integrate delegate-job skill (squashed from delegate-job-skill)
- Copy delegate-job-skill/skills/delegate-job/ → skills/delegate-job/
- Move requirements.txt (paho-mqtt>=2.0.0) into the new location
- Refactor outdated hardcoded paths (~/PuKi/lab/, ~/.hermes/skills/) to dynamic resolution
- Add MQTT connection timeout / retry hardening
- Remove legacy delegate-job-skill/ directory
- Update .gitignore

Note: delegate-job-skill git history is squashed — preserved content, dropped commit lineage.
2026-06-19 14:00:29 +00:00

177 lines
5.6 KiB
Markdown

# MQTT Broker Setup — PoC → Production
The delegate-job scripts read **all** broker settings from environment
variables (or a job record's `broker.*` block) through a single helper,
`broker_config_from_env()` in
[`./scripts/mqtt_common.py`](./scripts/mqtt_common.py). The design goal:
**switch from the public PoC broker to your own broker with config only — no
code change.**
| Env var | Meaning | PoC default | Production |
|---------|---------|-------------|-----------|
| `MQTT_BROKER` | host | `broker.hivemq.com` | internal hostname/IP |
| `MQTT_PORT` | port | `1883` | `8883` (TLS) |
| `MQTT_TLS` | TLS on/off (`1`/`0`) | `0` | `1` |
| `MQTT_USERNAME` / `MQTT_PASSWORD` | auth | (none) | broker-issued |
| `MQTT_CA_CERTS` | CA bundle path | (none) | private CA path |
| `MQTT_CERTFILE` / `MQTT_KEYFILE` | client cert (optional mTLS) | (none) | per-client |
| `MQTT_CLIENT_ID_PREFIX` | client id prefix | `hermes` | per-environment |
---
## 1. PoC: public broker (`broker.hivemq.com`)
**Pros** — zero setup, reachable from anywhere, perfect for wiring up the
publish/subscribe loop and the timeout/state-machine logic.
**Cons / accepted assumptions** — no auth, no integrity, shared with the world:
- no secrets in payloads;
- `started`/`completed`/`error` are advisory signals only;
- non-retained messages are **not queued** for absent subscribers, so the
subscriber must start before the agent;
- a re-subscribing client cannot recover past (non-retained) events.
Use it only to validate the protocol, never for real decisions.
---
## 2. Production: self-hosted Mosquitto (or EMQX)
Both support MQTT 5 + ACL + TLS. Mosquitto shown below; EMQX is a drop-in for
the same env vars.
### 2.1 Install
```bash
# macOS
brew install mosquitto
# Debian/Ubuntu
sudo apt-get update && sudo apt-get install -y mosquitto mosquitto-clients
# Docker
docker run -d --name mosquitto -p 8883:8883 \
-v "$PWD/mosquitto.conf:/mosquitto/config/mosquitto.conf" \
-v "$PWD/certs:/mosquitto/certs" \
-v "$PWD/auth:/mosquitto/auth" \
eclipse-mosquitto:2
```
### 2.2 `mosquitto.conf` (key lines)
```conf
persistence true
persistence_location /mosquitto/data/
password_file /mosquitto/auth/passwd
acl_file /mosquitto/auth/acl
allow_anonymous false
listener 8883
cafile /mosquitto/certs/ca.crt
certfile /mosquitto/certs/server.crt
keyfile /mosquitto/certs/server.key
```
`persistence true` + QoS 1 + retained terminal events means a subscriber that
joins after a job finished still sees the final `completed`/`error`.
### 2.3 Users (username/password)
```bash
# create the file with the first user, then add more with -b
mosquitto_passwd -c /mosquitto/auth/passwd hermes # subscriber/delegator
mosquitto_passwd /mosquitto/auth/passwd claude-worker # publisher/agent
# (omit -c after the first; -c truncates the file)
```
### 2.4 ACL — least privilege
The worker only **publishes** events; Hermes only **subscribes**:
```conf
# /mosquitto/auth/acl
# claude-worker: may publish job events, may not read others' streams
user claude-worker
topic write python/mqtt/jobs/+/events
# hermes: observes every job's events
user hermes
topic read python/mqtt/jobs/+/events
# keep the legacy demo topic usable for both, if desired
pattern readwrite python/mqtt/sample
```
### 2.5 TLS certificates
**Quick self-signed (single host, internal only):**
```bash
mkdir -p certs && cd certs
openssl req -x509 -newkey rsa:2048 -nodes -days 825 \
-keyout server.key -out server.crt \
-subj "/CN=mqtt.internal"
cp server.crt ca.crt # clients trust this as the CA bundle
```
**Private CA (recommended — separate CA from server cert):**
```bash
# 1) CA
openssl genrsa -out ca.key 4096
openssl req -x509 -new -nodes -key ca.key -days 3650 -out ca.crt -subj "/CN=Hermes-CA"
# 2) server cert signed by the CA
openssl genrsa -out server.key 2048
openssl req -new -key server.key -out server.csr -subj "/CN=mqtt.internal"
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial \
-out server.crt -days 825
```
Clients trust `ca.crt` via `MQTT_CA_CERTS=/path/to/ca.crt`.
---
## 3. Cut-over verification (config-only, no code change)
Goal: prove the **same scripts** talk to your broker by changing only env/registry.
```bash
# 1) point the env at the new broker
export MQTT_BROKER=mqtt.internal
export MQTT_PORT=8883
export MQTT_TLS=1
export MQTT_CA_CERTS=$PWD/certs/ca.crt
export MQTT_USERNAME=hermes
export MQTT_PASSWORD=# subscriber side
# (publisher side uses claude-worker creds via the job record's broker block)
# 2) sanity-check with the mosquitto CLI first
mosquitto_sub -h "$MQTT_BROKER" -p 8883 --cafile "$MQTT_CA_CERTS" \
-u hermes -P "$MQTT_PASSWORD" -t 'python/mqtt/jobs/+/events' -v &
# 3) run the unchanged delegate-job loop
PY=.venv/bin/python
JID=$($PY scripts/registry.py register --prompt "broker cutover smoke")
$PY scripts/job_subscriber.py --job "$JID" --timeout 30 &
sleep 3
$PY scripts/publish_event.py --job "$JID" --event started
$PY scripts/publish_event.py --job "$JID" --event completed # auto-retained
```
Expected:
- subscriber prints the `started` and `completed` lines and exits 0;
- `mosquitto_sub` shows the same events (ACL allows `hermes` to read);
- publishing as a credential **without** write ACL is rejected by the broker;
- a subscriber started *after* `completed` still receives it (retained).
If all four hold, the migration is config-only. Persist the broker block into
each job record so `publish_event.py` connects from the registry alone:
```json
"broker": { "host": "mqtt.internal", "port": 8883, "tls": true,
"username": "claude-worker", "password": "…" }
```