177 lines
5.6 KiB
Markdown
177 lines
5.6 KiB
Markdown
# MQTT Broker Setup — PoC → Production
|
|
|
|
The tmux-agent-orchestrate-delegate-job scripts read **all** broker settings from environment
|
|
variables (or a job record's `broker.*` block) through a single helper,
|
|
`broker_config_from_env()` in
|
|
[`./scripts/mqtt_common.py`](./scripts/mqtt_common.py). The design goal:
|
|
**switch from the public PoC broker to your own broker with config only — no
|
|
code change.**
|
|
|
|
| Env var | Meaning | PoC default | Production |
|
|
|---------|---------|-------------|-----------|
|
|
| `MQTT_BROKER` | host | `broker.hivemq.com` | internal hostname/IP |
|
|
| `MQTT_PORT` | port | `1883` | `8883` (TLS) |
|
|
| `MQTT_TLS` | TLS on/off (`1`/`0`) | `0` | `1` |
|
|
| `MQTT_USERNAME` / `MQTT_PASSWORD` | auth | (none) | broker-issued |
|
|
| `MQTT_CA_CERTS` | CA bundle path | (none) | private CA path |
|
|
| `MQTT_CERTFILE` / `MQTT_KEYFILE` | client cert (optional mTLS) | (none) | per-client |
|
|
| `MQTT_CLIENT_ID_PREFIX` | client id prefix | `hermes` | per-environment |
|
|
|
|
---
|
|
|
|
## 1. PoC: public broker (`broker.hivemq.com`)
|
|
|
|
**Pros** — zero setup, reachable from anywhere, perfect for wiring up the
|
|
publish/subscribe loop and the timeout/state-machine logic.
|
|
|
|
**Cons / accepted assumptions** — no auth, no integrity, shared with the world:
|
|
|
|
- no secrets in payloads;
|
|
- `started`/`completed`/`error` are advisory signals only;
|
|
- non-retained messages are **not queued** for absent subscribers, so the
|
|
subscriber must start before the agent;
|
|
- a re-subscribing client cannot recover past (non-retained) events.
|
|
|
|
Use it only to validate the protocol, never for real decisions.
|
|
|
|
---
|
|
|
|
## 2. Production: self-hosted Mosquitto (or EMQX)
|
|
|
|
Both support MQTT 5 + ACL + TLS. Mosquitto shown below; EMQX is a drop-in for
|
|
the same env vars.
|
|
|
|
### 2.1 Install
|
|
|
|
```bash
|
|
# macOS
|
|
brew install mosquitto
|
|
|
|
# Debian/Ubuntu
|
|
sudo apt-get update && sudo apt-get install -y mosquitto mosquitto-clients
|
|
|
|
# Docker
|
|
docker run -d --name mosquitto -p 8883:8883 \
|
|
-v "$PWD/mosquitto.conf:/mosquitto/config/mosquitto.conf" \
|
|
-v "$PWD/certs:/mosquitto/certs" \
|
|
-v "$PWD/auth:/mosquitto/auth" \
|
|
eclipse-mosquitto:2
|
|
```
|
|
|
|
### 2.2 `mosquitto.conf` (key lines)
|
|
|
|
```conf
|
|
persistence true
|
|
persistence_location /mosquitto/data/
|
|
|
|
password_file /mosquitto/auth/passwd
|
|
acl_file /mosquitto/auth/acl
|
|
allow_anonymous false
|
|
|
|
listener 8883
|
|
cafile /mosquitto/certs/ca.crt
|
|
certfile /mosquitto/certs/server.crt
|
|
keyfile /mosquitto/certs/server.key
|
|
```
|
|
|
|
`persistence true` + QoS 1 + retained terminal events means a subscriber that
|
|
joins after a job finished still sees the final `completed`/`error`.
|
|
|
|
### 2.3 Users (username/password)
|
|
|
|
```bash
|
|
# create the file with the first user, then add more with -b
|
|
mosquitto_passwd -c /mosquitto/auth/passwd hermes # subscriber/delegator
|
|
mosquitto_passwd /mosquitto/auth/passwd claude-worker # publisher/agent
|
|
# (omit -c after the first; -c truncates the file)
|
|
```
|
|
|
|
### 2.4 ACL — least privilege
|
|
|
|
The worker only **publishes** events; Hermes only **subscribes**:
|
|
|
|
```conf
|
|
# /mosquitto/auth/acl
|
|
|
|
# claude-worker: may publish job events, may not read others' streams
|
|
user claude-worker
|
|
topic write python/mqtt/jobs/+/events
|
|
|
|
# hermes: observes every job's events
|
|
user hermes
|
|
topic read python/mqtt/jobs/+/events
|
|
|
|
# keep the legacy demo topic usable for both, if desired
|
|
pattern readwrite python/mqtt/sample
|
|
```
|
|
|
|
### 2.5 TLS certificates
|
|
|
|
**Quick self-signed (single host, internal only):**
|
|
|
|
```bash
|
|
mkdir -p certs && cd certs
|
|
openssl req -x509 -newkey rsa:2048 -nodes -days 825 \
|
|
-keyout server.key -out server.crt \
|
|
-subj "/CN=mqtt.internal"
|
|
cp server.crt ca.crt # clients trust this as the CA bundle
|
|
```
|
|
|
|
**Private CA (recommended — separate CA from server cert):**
|
|
|
|
```bash
|
|
# 1) CA
|
|
openssl genrsa -out ca.key 4096
|
|
openssl req -x509 -new -nodes -key ca.key -days 3650 -out ca.crt -subj "/CN=Hermes-CA"
|
|
# 2) server cert signed by the CA
|
|
openssl genrsa -out server.key 2048
|
|
openssl req -new -key server.key -out server.csr -subj "/CN=mqtt.internal"
|
|
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial \
|
|
-out server.crt -days 825
|
|
```
|
|
|
|
Clients trust `ca.crt` via `MQTT_CA_CERTS=/path/to/ca.crt`.
|
|
|
|
---
|
|
|
|
## 3. Cut-over verification (config-only, no code change)
|
|
|
|
Goal: prove the **same scripts** talk to your broker by changing only env/registry.
|
|
|
|
```bash
|
|
# 1) point the env at the new broker
|
|
export MQTT_BROKER=mqtt.internal
|
|
export MQTT_PORT=8883
|
|
export MQTT_TLS=1
|
|
export MQTT_CA_CERTS=$PWD/certs/ca.crt
|
|
export MQTT_USERNAME=hermes
|
|
export MQTT_PASSWORD=… # subscriber side
|
|
# (publisher side uses claude-worker creds via the job record's broker block)
|
|
|
|
# 2) sanity-check with the mosquitto CLI first
|
|
mosquitto_sub -h "$MQTT_BROKER" -p 8883 --cafile "$MQTT_CA_CERTS" \
|
|
-u hermes -P "$MQTT_PASSWORD" -t 'python/mqtt/jobs/+/events' -v &
|
|
|
|
# 3) run the unchanged tmux-agent-orchestrate-delegate-job loop
|
|
PY=.venv/bin/python
|
|
JID=$($PY scripts/registry.py register --prompt "broker cutover smoke")
|
|
$PY scripts/job_subscriber.py --job "$JID" --timeout 30 &
|
|
sleep 3
|
|
$PY scripts/publish_event.py --job "$JID" --event started
|
|
$PY scripts/publish_event.py --job "$JID" --event completed # auto-retained
|
|
```
|
|
|
|
Expected:
|
|
- subscriber prints the `started` and `completed` lines and exits 0;
|
|
- `mosquitto_sub` shows the same events (ACL allows `hermes` to read);
|
|
- publishing as a credential **without** write ACL is rejected by the broker;
|
|
- a subscriber started *after* `completed` still receives it (retained).
|
|
|
|
If all four hold, the migration is config-only. Persist the broker block into
|
|
each job record so `publish_event.py` connects from the registry alone:
|
|
|
|
```json
|
|
"broker": { "host": "mqtt.internal", "port": 8883, "tls": true,
|
|
"username": "claude-worker", "password": "…" }
|
|
```
|