tmpl/multi-agent-mux

Fork 0

Files

T

Godopu 5258b5013c feat(lib): implement FW-N1~FW-N4 items and pane snapshot guidelines

2026-06-21 09:19:46 +00:00

20 KiB

Raw Blame History

name, description, version, author, license, platforms, metadata

name

description

version

author

license

platforms

metadata

tmux-agent-orchestrate-delegate-job

Delegate a unit of work to any autonomous agent (claude-code, codex, opencode, or a human) and observe it asynchronously over an MQTT event channel. Each job gets a unique id, a registry record (prompt, broker, status, timeouts), and a single per-job topic that carries started/permission_required/progress/completed/error events as schema-versioned JSON. The delegator starts a subscriber first, runs the agent, and treats a completed/error event or a timeout as the job's terminal state. Ships a working reference implementation (publish_event.py, job_subscriber.py, registry.py, mqtt_common.py, tmux-agent-orchestrate-delegate-job wrapper) plus a PoC-to-production path: validate on a public broker, then move to an authenticated TLS broker by changing config only — no code change. Use when you need fire-and-observe delegation, multi-job fan-out across tmux sessions, or a uniform completion-signal protocol shared by several agent types.

1.0.0

Hermes Agent

MIT

linux

macos

windows

hermes

tmux-agent-orchestrate-delegate-job — Async Job Delegation over MQTT

Delegate a unit of work to an autonomous agent, then observe it instead of blocking on it. Every job gets a unique id and a registry record; the agent publishes lifecycle events (started, permission_required, progress, completed, error) to a per-job MQTT topic; the delegator subscribes and treats completed/error — or a timeout — as the terminal state.

This skill is a reference implementation: copy the files in this directory into your project and customise. The communication_over_mqtt project is the canonical concrete instance.

Overview

The model is deliberately small. A job is one delegated task. An agent is a worker (a claude-code tmux session, a codex run, a human). The registry (.hermes/jobs/<id>.json) holds everything about a job so nothing important lives in environment variables — which means one tmux session can process many jobs sequentially, and many sessions can fan out in parallel, with no env collisions. The event channel is one MQTT topic per job carrying JSON payloads; event discriminates the type.

Responsibility is split into exactly one entry point each: publish_event.py emits events (registry lookup, monotonic seq, retry+backoff) and job_subscriber.py observes them (timeouts, terminal state machine, defensive parsing). Shared logic lives in mqtt_common.py; registry I/O in registry.py. The demo publisher.py/subscriber.py in the host project stay frozen.

Two stages, same code. PoC runs on the public broker.hivemq.com to wire up the protocol. Production moves to your own authenticated TLS broker — the switch is config only (env vars + the registry broker.* block), never a code change. See mqtt-broker-setup.md.

When to Use / When NOT to Use

Use when:

you want fire-and-observe delegation — kick off work and get a completion signal rather than blocking a terminal;
several agent types (claude-code, codex, opencode, human) must follow one completion protocol;
you need multi-job fan-out across tmux sessions with safe job claiming;
you want a clean PoC → authenticated-broker upgrade path.

Do NOT use when:

a one-shot claude -p '…' that returns inline is enough (no async signal needed) — just use the claude-code skill directly;
you need request/response RPC or large artifact transfer (this is a one-direction event stream, not a data bus);
the payload would carry secrets and you're still on the public broker — move to the own-broker stage first.

Quick Start

The one-line wrapper handles register + subscriber-first + agent launch. If you're new, start here and only fall back to the manual 5-step flow when you need finer control.

# 1) one line: register → start subscriber → launch agent in tmux
#    (uses public broker by default; last stdout line is the audit-log dir)
tmux-agent-orchestrate-delegate-job submit \
  --agent claude-code \
  --prompt "정렬 문제 10개를 만들어 sort_problems.md로 저장" \
  --workdir /path/to/project \
  --agent-session tmux:demo \
  --timeout 3600 --idle-timeout 120
# → stdout: registered job: <JID>
#          subscriber pid: …
#          agent launched in tmux session: demo
#          subscriber output: <one line per event>
#          /path/to/project/.hermes/delegate_job_logs/<JID>     ← audit log dir

# 2) at any time, query the job or its audit log
tmux-agent-orchestrate-delegate-job status --job <JID>
tmux-agent-orchestrate-delegate-job logs   <JID>            # pretty timeline
tmux-agent-orchestrate-delegate-job logs   --list           # every job, live status

# 3) run a user-supplied validator against the job's artifacts
tmux-agent-orchestrate-delegate-job verify --job <JID> --validate ./validate.sh

The wrapper enforces the subscribe-before-publish ordering and forwards the freshly-minted JOB_ID into the agent's prompt (so the agent calls publish_event.py --job <JID> with the right id — see Pitfall §"Wrong job_id propagated to the agent"). When you need finer control, the manual flow is:

# Manual 5-step (same outcome, more knobs)
PY=.venv/bin/python
SKILL=./skills/tmux-agent-orchestrate-delegate-job/scripts

# 1) register
JID=$($PY "$SKILL/registry.py" register \
        --prompt "…" --agent claude-code --agent-session tmux:demo \
        --timeout 3600 --idle-timeout 120)

# 2) START THE SUBSCRIBER FIRST (MQTT does not queue non-retained msgs)
$PY "$SKILL/job_subscriber.py" --job "$JID" --timeout 3600 --idle-timeout 120 &

# 3) pass JID to the agent and instruct it to publish events with --job "$JID"
#    (don't hard-code a job id you saw earlier — see Pitfall §"Wrong job_id")

# 4) on completion the subscriber prints events and exits 0/1/2

# 5) inspect any time
$PY "$SKILL/registry.py" get       --job "$JID"
$PY "$SKILL/registry.py" logs      "$JID"        # positional job id
$PY "$SKILL/registry.py" logs --list

Job Protocol

One topic per job: python/mqtt/jobs/<job_id>/events. Payload (JSON, UTF-8, schema_version=1):

{ "schema_version": 1, "seq": 7, "job_id": "abc12345",
  "event": "started|permission_required|progress|completed|error",
  "timestamp": "2026-06-19T09:32:00Z", "detail": "generalised text",
  "data": { "optional": "metadata" } }

seq is monotonic per job (first = 1); the subscriber uses it to spot reorder/duplication.
timestamp is advisory — timeouts are measured from receive time.
detail/data carry no secrets or absolute paths.
A schema_version or job_id mismatch is dropped (defensive parsing).

started and completed/error are the mandatory bookends; completed→exit 0, error→exit 1. Full catalogue + production auth_token handling: job-protocol.md.

Registry Format

.hermes/jobs/<id>.json        # metadata record (single source of truth)
.hermes/jobs/<id>.events.log  # append-only JSON-lines log (debug, optional)
.hermes/jobs/.lock            # fcntl advisory lock for the registry

The record holds status, prompt, agent, agent_session, a broker block, topic_prefix, timeout_sec/idle_timeout_sec, expected_artifacts, last_seq, and (production) auth_token. Because the broker block lives in the record, publish_event.py connects from the registry alone. Concurrency, the atomic rename trick, and multi-session job claiming are in registry.md.

Audit Logs

Every job's lifecycle is mirrored to a persistent, append-only audit log under .hermes/delegate_job_logs/ (override with DELEGATE_JOB_LOGS_DIR; default <cwd>/.hermes/delegate_job_logs). Unlike the registry — live state mutated in place and liable to be cleaned up — the audit log is durable history you can replay after the fact. It is git-ignored.

.hermes/delegate_job_logs/<job_id>/
  meta.json      # registration snapshot: prompt, agent, broker, timeouts, …
  events.ndjson  # append-only, one JSON event per line, in time order
  status.json    # current status only (fast point-query)

What is logged, automatically:

When	`events.ndjson` line	Written by
job registered	`registered` (also seeds meta.json + status.json)	`registry.register_job`
any status change	`status_changed` (`from`/`to`; also rewrites status.json)	`update_job_status`, `pick_pending`
event published	`published` (carries the exact payload — reproducible)	`publish_event.py`
event received	`received` (subscriber's external view)	`job_subscriber.py`

Both the emitter side (published) and the observer side (received) are recorded, so a dropped publish or a missed receive is still visible from the other. Every write is best-effort and isolated — an fcntl-locked append guarded by try/except that only ever emits a logger.warning, so a logging failure can never break a publish, a subscribe, or a registry write. stdout is never touched.

Reading them:

tmux-agent-orchestrate-delegate-job logs <job_id>     # pretty-print one job's timeline
tmux-agent-orchestrate-delegate-job logs --list       # summarise every logged job (with live status)
# or directly via the registry CLI:
$PY scripts/registry.py logs <job_id> [--tail N] [--json]
$PY scripts/registry.py logs --list [--json]

submit prints the job's audit-log directory as its last stdout line, so a caller can tail -n1 to locate it.

Broker Setup

Stage	Broker	Auth	Transport
PoC	`broker.hivemq.com`	none	1883 plaintext
Production	self-hosted Mosquitto/EMQX	user/pass + ACL	8883 TLS

All connection settings come from env (MQTT_BROKER, MQTT_PORT, MQTT_TLS, MQTT_USERNAME/MQTT_PASSWORD, MQTT_CA_CERTS, …) resolved by broker_config_from_env(), with the registry broker.* block overriding per job. Moving to your own broker is config only: install Mosquitto, set persistence true + acl_file + password_file + a TLS listener 8883, grant the worker write python/mqtt/jobs/+/events and Hermes read, then flip MQTT_TLS=1 and fill the registry broker.*. Step-by-step (conf, ACL, mosquitto_passwd, self-signed/private-CA certs, cut-over verification): mqtt-broker-setup.md.

Agent Adapters

Each agent voluntarily follows the contract: receive a JOB_ID (or registry path), call publish_event.py at lifecycle points, exit 0/1/2. The contract in one line: every event call uses --job "$JOB_ID" where $JOB_ID is the freshly-issued id from the registry record for this delegation — never a job_id you saw in an earlier session (Pitfall §"Wrong job_id propagated to the agent").

claude-code — Claude Code calls publish_event.py via its Bash tool at lifecycle points. submit --mode tmux injects a prompt that already names $JOB_ID; if you drive claude manually, hand it the id explicitly. Reference instruction block (the wrapper injects something equivalent):

Your job_id is "$JOB_ID" (read it from the registry record for this delegation —
do not reuse any job_id you saw before).

On start:        $PY tmux-agent-orchestrate-delegate-job/scripts/publish_event.py --job "$JOB_ID" --event started
On permission:   $PY … --job "$JOB_ID" --event permission_required --detail "<tool>:<what>"
On progress:     $PY … --job "$JOB_ID" --event progress --detail "<short status>"
On success:      $PY … --job "$JOB_ID" --event completed --detail "<one-line summary>"
On failure:      $PY … --job "$JOB_ID" --event error     --detail "<one-line reason>"

Task: <the user's prompt>

The subscriber for "$JOB_ID" is already running; your completed/error event
ends the job. Exit codes: 0 completed, 1 error, 2 publish failure.

See claude-code for tmux orchestration patterns.

codex — same contract. Invoke codex exec "<instruction-block-above>" or wire publish_event.py as an MCP tool so the agent can call it directly.
opencode — wire publish_event.py as a tool/command the agent can call; identical event points.
human — a person does the work, reads the registry record, then runs publish_event.py --job <id> --event completed (or error) by hand.

User Interface

The tmux-agent-orchestrate-delegate-job bash wrapper bundles register + subscribe-first + run-agent + validate:

tmux-agent-orchestrate-delegate-job submit  --agent claude-code \
   --prompt "정렬 문제 10개를 만들어 sort_problems.md로 저장" \
    --workdir /path/to/project --timeout 3600 [--validate ./validate.sh]
tmux-agent-orchestrate-delegate-job status  --job <id>          # one record, pretty-printed
tmux-agent-orchestrate-delegate-job list                        # all jobs, one line each
tmux-agent-orchestrate-delegate-job verify  --job <id> --validate ./validate.sh   # runs it, reports exit code
tmux-agent-orchestrate-delegate-job wait    [--job <id>]        # block until terminal (else --wait-any)

submit always starts the subscriber before the agent (the ordering dependency), runs the agent in --mode print (one-shot) or --mode tmux, and calls --validate afterward if given. The skill automates job-id generation, registry creation, broker resolution, subscriber-first ordering, agent launch, and completion detection; it does not automate the agent's internals or your business-logic validation — those are hooks you fill (validate.sh reads $JOB_ID/$REGISTRY_DIR).

Common Pitfalls

Publishing before subscribing — MQTT does not queue non-retained messages for absent subscribers. Start job_subscriber.py before the agent, or rely on retained terminal events (production). submit enforces this.
Wrong job_id propagated to the agent — the wrapper prints a fresh JOB_ID on every submit. If your agent instruction (or the wrapper's prompt template) hard-codes an old job_id, the agent calls publish_event.py --job <wrong>, the subscriber's defensive parser drops it as a job_id mismatch, and the delegator waits until idle timeout (exit 2). Fix: instruct the agent to read the job_id from the registry record for this delegation (or pass it in via env / --prompt interpolation), never from prior runs. submit's default prompt template interpolates $JOB_ID for you — if you build a custom prompt, do the same.
tmux session name collision — submit --mode tmux derives the session name from --agent-session tmux:<name> (default tmux:claude). If a session with that name is already attached (e.g. you ran the demo and the previous session is still open), tmux new-session -d -s <name> fails and the agent never launches. Pick a unique --agent-session per concurrent delegation (e.g. tmux:demo, tmux:claude-a, tmux:claude-b) or kill the stale one (tmux kill-session -t claude) before re-running.
Timeout before started — a cold-starting agent may not emit started for a while; the wall-clock timeout starts at subscribe time so a stuck agent still terminates. Don't set --timeout so low you false-positive a slow start.
No retry on publish — a dropped completed would hang the delegator forever; publish_event.py retries with exponential backoff and exits 2 if it still fails, so the delegator is never left waiting silently.
QoS-1 duplicates / reorders — a terminal event can arrive twice, or error can trail completed; the subscriber's terminal state machine finalises each job once and ignores the rest.
Trusting the public broker — anyone can publish there; never make a real decision on a PoC signal. Add auth_token + an authenticated broker first.
Secrets in detail/data — keep payloads generalised; no paths, keys, or tokens (except the production auth_token in data).

Subagent Orchestration Pattern

When using this skill from a Hermes delegate_task subagent to dispatch work to a coding-agent CLI (agy/claude) running in a tmux session, the following pattern has been verified (2026-06-21, 6-batch refactoring sprint):

Roles

Main worker (implementation): one agent session (e.g. agy-new) receives brief files and executes code changes.
Reviewers (spec compliance + code quality): two other agent sessions (e.g. agy-existing, claude-existing) review the diff in parallel.
Hermes (orchestrator): dispatches subagents, verifies diffs, commits, and falls back to direct fixes when reviewers find issues.

Key lessons learned

Brief delivery via file path — don't paste long briefs inline via tmux send-keys; the TUI may swallow them. Instead, send a short instruction like "follow /tmp/batch1-brief.md" and let the agent read the file.
Polling vs MQTT subscriber — for short tasks (<5min), pane polling (capture-pane + grep for completion markers) is simpler and more reliable than registering a job via registry.py + job_subscriber.py. Use MQTT subscriber only for long-running jobs (>5min) where push notification matters.
Reviewers catch different bugs — in practice, agy (Flash) caught semantic issues (slash matching, export scope), while claude (Opus) caught API signature mismatches (paho v2 5-arg vs 4-arg on_disconnect). Two reviewers with different models provide complementary coverage.
Hermes fallback fix — when reviewers find a small, well-defined issue (wrong argument count, missing slash), Hermes should fix it directly rather than re-dispatching the implementer. This saves a full round-trip.
Batch grouping — group 2-3 FW items per batch when they touch different files (no file overlap). This amortises the dispatch overhead. Items touching the same file must be in separate batches to avoid conflicts.
Pane Snapshots & Truncation Prevention — to prevent long agent responses from being scrolled out and truncated due to TUI viewport limitations, enforce the following snapshotting pattern:
- Immediately after dispatching a brief, capture the pre-brief pane buffer via capture-pane -S -200.
- During long execution, run a background loop taking incremental snapshots (e.g. every 30 seconds >> /tmp/pane-snap.txt).
- Immediately after job termination, capture the entire final pane state to ensure no terminal logs are lost.

Verification Checklist

started → completed over the public broker: subscriber prints the lines and exits 0.
error path: subscriber exits 1.
timeout path: no terminal event within --timeout/--idle-timeout → exit 2.
polluted payload (bad JSON, wrong schema_version, wrong job_id) is dropped with a warning, not crashed on.
one tmux session processes two registry jobs in sequence; a second session with a different agent_session claims only its own.
broker cut-over: same scripts reach an authenticated TLS broker with env changes only; a credential without write ACL is rejected; a late subscriber still receives the retained terminal event.
publisher.py/subscriber.py/README.md demo on python/mqtt/sample still works unchanged (regression).
audit log integrity — for a completed job, .hermes/delegate_job_logs/<JID>/events.ndjson contains registered → received started → published completed (in that order), and status.json.status == "completed" matches the registry record. A logging failure (e.g. read-only log dir) does not break the publish or subscribe path — only a logger.warning is emitted.
end-to-end demo smoke — run tmux-agent-orchestrate-delegate-job submit --agent claude-code --agent-session tmux:demo-smoke --prompt "echo hello and call publish_event.py --job <JID> --event completed" --timeout 120 and confirm (a) registered job id echoed, (b) subscriber pid echoed, (c) tmux session name printed, (d) events.ndjson grows as the agent runs, (e) final stdout line is the audit-log dir.

20 KiB Raw Blame History