feat(skills): integrate delegate-job skill (squashed from delegate-job-skill)
- Copy delegate-job-skill/skills/delegate-job/ → skills/delegate-job/ - Move requirements.txt (paho-mqtt>=2.0.0) into the new location - Refactor outdated hardcoded paths (~/PuKi/lab/, ~/.hermes/skills/) to dynamic resolution - Add MQTT connection timeout / retry hardening - Remove legacy delegate-job-skill/ directory - Update .gitignore Note: delegate-job-skill git history is squashed — preserved content, dropped commit lineage.
This commit is contained in:
@@ -0,0 +1,176 @@
|
||||
# MQTT Broker Setup — PoC → Production
|
||||
|
||||
The delegate-job scripts read **all** broker settings from environment
|
||||
variables (or a job record's `broker.*` block) through a single helper,
|
||||
`broker_config_from_env()` in
|
||||
[`./scripts/mqtt_common.py`](./scripts/mqtt_common.py). The design goal:
|
||||
**switch from the public PoC broker to your own broker with config only — no
|
||||
code change.**
|
||||
|
||||
| Env var | Meaning | PoC default | Production |
|
||||
|---------|---------|-------------|-----------|
|
||||
| `MQTT_BROKER` | host | `broker.hivemq.com` | internal hostname/IP |
|
||||
| `MQTT_PORT` | port | `1883` | `8883` (TLS) |
|
||||
| `MQTT_TLS` | TLS on/off (`1`/`0`) | `0` | `1` |
|
||||
| `MQTT_USERNAME` / `MQTT_PASSWORD` | auth | (none) | broker-issued |
|
||||
| `MQTT_CA_CERTS` | CA bundle path | (none) | private CA path |
|
||||
| `MQTT_CERTFILE` / `MQTT_KEYFILE` | client cert (optional mTLS) | (none) | per-client |
|
||||
| `MQTT_CLIENT_ID_PREFIX` | client id prefix | `hermes` | per-environment |
|
||||
|
||||
---
|
||||
|
||||
## 1. PoC: public broker (`broker.hivemq.com`)
|
||||
|
||||
**Pros** — zero setup, reachable from anywhere, perfect for wiring up the
|
||||
publish/subscribe loop and the timeout/state-machine logic.
|
||||
|
||||
**Cons / accepted assumptions** — no auth, no integrity, shared with the world:
|
||||
|
||||
- no secrets in payloads;
|
||||
- `started`/`completed`/`error` are advisory signals only;
|
||||
- non-retained messages are **not queued** for absent subscribers, so the
|
||||
subscriber must start before the agent;
|
||||
- a re-subscribing client cannot recover past (non-retained) events.
|
||||
|
||||
Use it only to validate the protocol, never for real decisions.
|
||||
|
||||
---
|
||||
|
||||
## 2. Production: self-hosted Mosquitto (or EMQX)
|
||||
|
||||
Both support MQTT 5 + ACL + TLS. Mosquitto shown below; EMQX is a drop-in for
|
||||
the same env vars.
|
||||
|
||||
### 2.1 Install
|
||||
|
||||
```bash
|
||||
# macOS
|
||||
brew install mosquitto
|
||||
|
||||
# Debian/Ubuntu
|
||||
sudo apt-get update && sudo apt-get install -y mosquitto mosquitto-clients
|
||||
|
||||
# Docker
|
||||
docker run -d --name mosquitto -p 8883:8883 \
|
||||
-v "$PWD/mosquitto.conf:/mosquitto/config/mosquitto.conf" \
|
||||
-v "$PWD/certs:/mosquitto/certs" \
|
||||
-v "$PWD/auth:/mosquitto/auth" \
|
||||
eclipse-mosquitto:2
|
||||
```
|
||||
|
||||
### 2.2 `mosquitto.conf` (key lines)
|
||||
|
||||
```conf
|
||||
persistence true
|
||||
persistence_location /mosquitto/data/
|
||||
|
||||
password_file /mosquitto/auth/passwd
|
||||
acl_file /mosquitto/auth/acl
|
||||
allow_anonymous false
|
||||
|
||||
listener 8883
|
||||
cafile /mosquitto/certs/ca.crt
|
||||
certfile /mosquitto/certs/server.crt
|
||||
keyfile /mosquitto/certs/server.key
|
||||
```
|
||||
|
||||
`persistence true` + QoS 1 + retained terminal events means a subscriber that
|
||||
joins after a job finished still sees the final `completed`/`error`.
|
||||
|
||||
### 2.3 Users (username/password)
|
||||
|
||||
```bash
|
||||
# create the file with the first user, then add more with -b
|
||||
mosquitto_passwd -c /mosquitto/auth/passwd hermes # subscriber/delegator
|
||||
mosquitto_passwd /mosquitto/auth/passwd claude-worker # publisher/agent
|
||||
# (omit -c after the first; -c truncates the file)
|
||||
```
|
||||
|
||||
### 2.4 ACL — least privilege
|
||||
|
||||
The worker only **publishes** events; Hermes only **subscribes**:
|
||||
|
||||
```conf
|
||||
# /mosquitto/auth/acl
|
||||
|
||||
# claude-worker: may publish job events, may not read others' streams
|
||||
user claude-worker
|
||||
topic write python/mqtt/jobs/+/events
|
||||
|
||||
# hermes: observes every job's events
|
||||
user hermes
|
||||
topic read python/mqtt/jobs/+/events
|
||||
|
||||
# keep the legacy demo topic usable for both, if desired
|
||||
pattern readwrite python/mqtt/sample
|
||||
```
|
||||
|
||||
### 2.5 TLS certificates
|
||||
|
||||
**Quick self-signed (single host, internal only):**
|
||||
|
||||
```bash
|
||||
mkdir -p certs && cd certs
|
||||
openssl req -x509 -newkey rsa:2048 -nodes -days 825 \
|
||||
-keyout server.key -out server.crt \
|
||||
-subj "/CN=mqtt.internal"
|
||||
cp server.crt ca.crt # clients trust this as the CA bundle
|
||||
```
|
||||
|
||||
**Private CA (recommended — separate CA from server cert):**
|
||||
|
||||
```bash
|
||||
# 1) CA
|
||||
openssl genrsa -out ca.key 4096
|
||||
openssl req -x509 -new -nodes -key ca.key -days 3650 -out ca.crt -subj "/CN=Hermes-CA"
|
||||
# 2) server cert signed by the CA
|
||||
openssl genrsa -out server.key 2048
|
||||
openssl req -new -key server.key -out server.csr -subj "/CN=mqtt.internal"
|
||||
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial \
|
||||
-out server.crt -days 825
|
||||
```
|
||||
|
||||
Clients trust `ca.crt` via `MQTT_CA_CERTS=/path/to/ca.crt`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Cut-over verification (config-only, no code change)
|
||||
|
||||
Goal: prove the **same scripts** talk to your broker by changing only env/registry.
|
||||
|
||||
```bash
|
||||
# 1) point the env at the new broker
|
||||
export MQTT_BROKER=mqtt.internal
|
||||
export MQTT_PORT=8883
|
||||
export MQTT_TLS=1
|
||||
export MQTT_CA_CERTS=$PWD/certs/ca.crt
|
||||
export MQTT_USERNAME=hermes
|
||||
export MQTT_PASSWORD=… # subscriber side
|
||||
# (publisher side uses claude-worker creds via the job record's broker block)
|
||||
|
||||
# 2) sanity-check with the mosquitto CLI first
|
||||
mosquitto_sub -h "$MQTT_BROKER" -p 8883 --cafile "$MQTT_CA_CERTS" \
|
||||
-u hermes -P "$MQTT_PASSWORD" -t 'python/mqtt/jobs/+/events' -v &
|
||||
|
||||
# 3) run the unchanged delegate-job loop
|
||||
PY=.venv/bin/python
|
||||
JID=$($PY scripts/registry.py register --prompt "broker cutover smoke")
|
||||
$PY scripts/job_subscriber.py --job "$JID" --timeout 30 &
|
||||
sleep 3
|
||||
$PY scripts/publish_event.py --job "$JID" --event started
|
||||
$PY scripts/publish_event.py --job "$JID" --event completed # auto-retained
|
||||
```
|
||||
|
||||
Expected:
|
||||
- subscriber prints the `started` and `completed` lines and exits 0;
|
||||
- `mosquitto_sub` shows the same events (ACL allows `hermes` to read);
|
||||
- publishing as a credential **without** write ACL is rejected by the broker;
|
||||
- a subscriber started *after* `completed` still receives it (retained).
|
||||
|
||||
If all four hold, the migration is config-only. Persist the broker block into
|
||||
each job record so `publish_event.py` connects from the registry alone:
|
||||
|
||||
```json
|
||||
"broker": { "host": "mqtt.internal", "port": 8883, "tls": true,
|
||||
"username": "claude-worker", "password": "…" }
|
||||
```
|
||||
Reference in New Issue
Block a user