feat: add support for hermes agent in tmux orchestration scripts

2026-06-21 14:21:30 +00:00
parent aacea05f6a
commit e1d998e1ef
9 changed files with 99 additions and 1756 deletions
@@ -1,163 +0,0 @@
-# Understand-Anything: Project & Architecture Analysis Report
-
-This report presents a comprehensive architectural analysis and security verification of the `tmux_agent_orchestration` orchestration workspace. Using the static analysis principles inspired by the `Understand-Anything` pipeline, we map out the codebase structure, evaluate the integrity of the design, identify critical defects/inconsistencies between implementation and documentation, and provide concrete technical recommendations.
-
---
-
-## 1. Architectural Visualization
-
-The following diagram illustrates the interaction between the orchestrator (Hermes/PM), the worker agents running inside TMUX sessions, and the decentralized event backplane (MQTT).
-
-```mermaid
-sequenceDiagram
-    autonumber
-    actor User as User / PM
-    participant Registry as Job Registry (.hermes/jobs/)
-    participant DB as Session Registry (SQLite WAL & YAML)
-    participant TMUX as Tmux Workspace (Worker Session)
-    participant MQTT as MQTT Broker (HiveMQ / Private)
-    participant Sub as Job Subscriber (job_subscriber.py)
-    participant Mon as Reconcile Monitor (reconcile.sh)
-
-    User->>Registry: Register Job (registry.py register)
-    Registry-->>User: Return Job ID (JID)
-    
-    User->>Sub: Spawn background subscriber (job_subscriber.py --job JID)
-    Sub->>MQTT: Subscribe to topic (python/mqtt/jobs/JID/events)
-
-    User->>TMUX: Create session & execute agent (create_session.sh)
-    TMUX->>DB: Add running session (atomic_dump_yaml)
-    
-    Note over TMUX: Agent Starts execution
-    TMUX->>MQTT: Publish 'started' event (publish_event.py)
-    MQTT->>Sub: Deliver event (QoS 1)
-    Sub->>Sub: Verify HMAC Signature
-    Sub->>Sub: Log to events.ndjson & print stdout
-    
-    Note over TMUX: Agent does work & publishes checkpoints
-    TMUX->>MQTT: Publish 'progress' / 'permission_required'
-    MQTT->>Sub: Deliver event (QoS 1)
-    
-    Note over TMUX: Agent finishes execution
-    TMUX->>MQTT: Publish 'completed' or 'error' (retained)
-    MQTT->>Sub: Deliver terminal event (QoS 1)
-    Sub->>Sub: Transition to Terminal State & Exit
-    
-    Note over Mon: Reconcile loop runs periodically
-    Mon->>MQTT: Listen for terminal events
-    MQTT->>Mon: Deliver terminal events
-    Mon->>DB: Mark session terminated, kill tmux (reconcile.sh)
-```
-
---
-
-## 2. Core Mechanism Deep Dive & Verification
-
-### 2.1 MQTT Backplane & Event Protocol
-* **Wire Format**: Encoded in UTF-8 JSON matching `schema_version = 1`. It features monotonic `seq` indexing, `job_id`, `event` type, `timestamp`, `detail` description, and a `data` block for metadata.
-* **QoS and Retention**: Event publishing and subscribing enforce **QoS 1 (At Least Once)** delivery. Terminal events (`completed`/`error`) utilize `retain=True` on the broker. This ensures that late-joining subscribers immediately receive the terminal state without missing the final outcome.
-* **Network Handshake Isolation**: `publish_event.py` uses a short-lived connection pattern (connect, publish QoS 1, wait for PUBACK, disconnect) with exponential backoff retries. This limits long-lived socket starvation and mitigates socket exhaustion under high session concurrency.
-
-### 2.2 SQLite WAL Session Database
-* **Database & WAL Mode**: Session metadata has been migrated from a single-point-of-contention YAML file to a SQLite database (`.hermes/agent-sessions.db`) operating in **WAL (Write-Ahead Logging)** mode.
-* **Concurrency Control**: Concurrency is managed via `BEGIN IMMEDIATE` transactions in `atomic_dump_yaml()`, which blocks concurrent write attempts at the database level rather than relying on brittle file system locks.
-* **YAML Synchronization**: To maintain compatibility, `agent-sessions.yaml` is updated atomically (using `tempfile.mkstemp` and `os.replace`) only when a session transitions to a terminal state (`stopped`, `terminated`, `archived`), leaving active write traffic isolated within the SQLite WAL database.
-* **NFS Fallback**: If a network mount (NFS/CIFS/SSHFS) is detected, `lib.sh` automatically falls back to `PRAGMA journal_mode=DELETE` to prevent WAL serialization crashes, as NFS does not support shared-memory mapped files (`-shm`) required by WAL.
-
-### 2.3 HMAC-SHA256 Signature Verification
-* **Signature Generation**: The publisher serializes the payload (excluding `data.hmac_sig`) into a canonical JSON string (with sorted keys and no whitespace separators) and signs it using HMAC-SHA256 with the job's secret `auth_token`.
-* **Signature Verification**: `job_subscriber.py` intercepts payloads and calls `verify_hmac()`, which calculates the expected signature and compares it with the received signature using the constant-time `hmac.compare_digest` to prevent timing attacks.
-
---
-
-## 3. Discovered Flaws & Documentation Inconsistencies
-
-We have identified several critical gaps between the architecture specifications and the actual codebase implementation:
-
-### ⚠️ Flaw 1: Documentation Mismatch in `job-protocol.md` (Security Risk if Followed)
-* **Description**: Section 4 of `job-protocol.md` states:
-  > *`auth_token` (the bonus field) — each job record carries a per-job `auth_token` (`secrets.token_urlsafe(32)`). The publisher copies it into `data.auth_token`; the subscriber compares it against the registry's expected token and drops mismatches.*
-* **Reality in Code**: If the publisher copied the plaintext token into `data.auth_token`, it would be transmitted in plaintext across the MQTT network, exposing the secret token to any eavesdropper (especially on the public PoC broker). 
-* **Correction**: The code correctly implements **HMAC-SHA256 signatures** via `data.hmac_sig` and **never transmits the raw `auth_token`**. The documentation in `job-protocol.md` is obsolete and contradicts the secure implementation.
-
-### ⚠️ Flaw 2: Missing Automated `auth_token` Generation & CLI Support
-* **Description**: Both `MESSAGING.md` and `registry.md` state that when a job is registered, a cryptographic token is automatically generated using `secrets.token_urlsafe(32)`.
-* **Reality in Code**: In `registry.py`, `register_job()` accepts `auth_token: Optional[str] = None` and defaults it to `None`. No automatic token generation is implemented. Furthermore, the CLI registration parser (`registry.py register`) does not expose any `--auth-token` flag, nor does it generate one internally. As a result, **every job registered via the CLI is created with `auth_token = null`**, defaulting the system to the unauthenticated/unsecured PoC mode.
-
-### ⚠️ Flaw 3: Replay Attack Vulnerability for Non-Terminal Events
-* **Description**: `job_subscriber.py` enforces a terminal state machine to ignore duplicate `completed`/`error` events, but it does **not validate sequence numbers (`seq`) or timestamp freshness** for non-terminal events (`progress`, `permission_required`).
-* **Exploitation Vector**: An attacker sniffing network traffic (easy on HiveMQ's plaintext broker) can capture a signed `permission_required` or `progress` event and replay it repeatedly. Since the HMAC signature remains valid, `job_subscriber.py` will accept the replayed message, write it to the audit log (`events.ndjson`), and output it to stdout, potentially triggering downstream actions or corrupting the audit trail.
-
-### ⚠️ Flaw 4: NFS locking Vulnerability in Job Registry
-* **Description**: While the session registry was successfully migrated to SQLite to circumvent NFS locking issues, the Job Registry in `.hermes/jobs/` still relies on `fcntl.flock` over a shared `.lock` file to coordinate job claims (`pick_pending`).
-* **Impact**: If the project registry is located on a network-mounted file system, concurrent calls to `pick_pending` from multiple hosts could result in lock failures, leading to duplicate claims (split-brain) or corruption of the `<job_id>.json` files during write operations.
-
---
-
-## 4. Technical Recommendations
-
-To address these vulnerabilities and align the codebase with the target production security standards, we recommend the following changes:
-
-### 1. Correct the Protocol Documentation
-Update `job-protocol.md` to match the actual HMAC-SHA256 signature scheme, removing all references to transmitting the plaintext token in `data.auth_token`.
-
-### 2. Implement Automated Token Generation in `registry.py`
-Modify `register_job` to automatically generate a cryptographically secure token when running in production mode, and add the `--auth-token` argument to the CLI.
-
-*Proposed change in `registry.py`*:
-```python
-# In registry.py:register_job
-import secrets
-
-# Generate token if not provided (production mode default)
-if auth_token is None:
-    # If broker is secure/private, generate a token by default
-    if broker.get("tls") or broker.get("username"):
-        auth_token = secrets.token_urlsafe(32)
-```
-
-### 3. Harden `job_subscriber.py` Against Replay Attacks
-Implement monotonic sequence number tracking and timestamp freshness checks in `_Watcher.on_message`.
-
-*Proposed change in `job_subscriber.py`*:
-```python
-# In _Watcher inside job_subscriber.py
-def __init__(self, expected_job_ids: Set[str], expected_tokens: Dict[str, Optional[str]]):
-    self.events = queue.Queue()
-    self.expected = set(expected_job_ids)
-    self.tokens = expected_tokens
-    self.last_seq: Dict[str, int] = {}  # Track sequence numbers per job
-
-def on_message(self, _client, _userdata, msg) -> None:
-    # ... (after json parse and schema check) ...
-    jid = payload.get("job_id")
-    seq = payload.get("seq", 0)
-    
-    # 1. Monotonic Sequence Check
-    if jid in self.last_seq and seq <= self.last_seq[jid]:
-        logger.warning("drop replayed/duplicate event seq=%r for job %s", seq, jid)
-        return
-        
-    # 2. Timestamp freshness check (e.g., 60s window)
-    # (Optional but recommended for strict production environments)
-    
-    # ... (after HMAC verification succeeds) ...
-    self.last_seq[jid] = seq
-    # ...
-```
-
-### 4. Migrate the Job Registry to the SQLite DB
-To eliminate NFS locking issues completely, merge the Job Registry data into the SQLite database. Define a `jobs` table with a schema similar to:
-```sql
-CREATE TABLE IF NOT EXISTS jobs (
-    job_id TEXT PRIMARY KEY,
-    status TEXT,
-    agent_session TEXT,
-    created_at TEXT,
-    data JSON
-);
-```
-Replace the file-based `fcntl.flock` in `registry.py` with SQL transactions (`BEGIN IMMEDIATE`), ensuring absolute atomicity and locking security regardless of the underlying filesystem type.
-
---
-*Report compiled on 2026-06-21 by Antigravity Reviewer Agent.*
@@ -1,38 +0,0 @@
-# Review Brief: FW-L3 & FW-L2 Improvements (v2)
-
-We have implemented two long-term tasks from `FUTURE_WORKS.md`: `FW-L3` (SQLite Database Normalization) and `FW-L2` (Stop Semantics Simplification), including the migration safety improvements identified in the first review round.
-
-## 1. FW-L3: SQLite Database Normalization
- **Goal**: Transition from storing the entire JSON state as a single blob in `state` (id=1) table to a normalized table structure (`sessions` table) to support O(1) status queries, while maintaining compatibility with the existing YAML synchronization workflow.
- **Implementation**:
-  - In `skills/lib.sh`:
-    - Updated `atomic_dump_yaml` to create and maintain:
-      - `state (id=1, data TEXT)` table (holds global metadata such as `agent_identities`, with the `tmux_sessions` key removed).
-      - `sessions (name TEXT PRIMARY KEY, status TEXT, pane_cwd TEXT, data JSON)` table (each row holds a single session entry).
-      - Added index `idx_sessions_pane_cwd` on `sessions(pane_cwd)` for faster lookups.
-    - Inside `atomic_dump_yaml`, before executing caller mutations, the complete dictionary `d` is seamlessly reconstructed from both `state` and `sessions` tables to guarantee that existing mutations still run perfectly without any modification.
-    - Updated `resolve_tmux_server`, `find_workspace_uuid`, and `is_already_stopped` to run optimized O(1) SELECT queries directly on the normalized database table when it exists.
-    - **Migration Fallback**: Added comprehensive safety fallbacks: if `sessions` table does not exist yet (OperationalError) or returns no results, the reader functions fall back to querying the old `state` table's JSON blob. This guarantees zero degradation during the migration window when readers execute before the first write.
-  - In `status.sh` and `reconcile.sh`:
-    - Adjusted the read-only DB loading logic to pull and reconstruct the `d['tmux_sessions']` list from the `sessions` table.
-
-## 2. FW-L2: Stop Semantics Simplification
- **Goal**: Deprecate confusing `--mode soft|hard`, `--capture-id`, and `--graceful` flags. Make graceful shutdown and metadata capture the standard default behavior. Clarify the destructive `--purge-conversation` option.
- **Implementation**:
-  - In `skills/tmux-agent-orchestrate-stop/scripts/stop_session.sh`:
-    - Deprecated `--mode`, `--capture-id`, and `--graceful` arguments. Passing these flags now raises an error informing the user that they are deprecated.
-    - Default behavior is now equivalent to the previous stop mode: it gracefully exits the agent TUI, shuts down tmux, captures conversation IDs, and updates status to `stopped` (instead of `terminated`).
-    - Added custom reasons via `--reason` (still defaults to `manual_stop`).
-    - `--purge-conversation` is retained as a destructive option to purge conversation databases and JSONLs from disk. When purged, status transitions to `terminated` and `resumable` is set to `False`.
-  - In `skills/tmux-agent-orchestrate-stop/SKILL.md`:
-    - Re-wrote the stop documentation, removed deprecated options, and aligned with the new semantics.
-  - **Stale Documentation Cleanup**:
-    - Cleaned up outdated references to `--capture-id`/`--graceful` in `resume/SKILL.md` and `monitor/SKILL.md`.
-
-## Verification Checklist for Reviewers
-1. Does the SQLite schema creation/modification in `lib.sh` preserve concurrency safety (e.g. WAL mode, BEGIN IMMEDIATE, commit/rollback)?
-2. Do the O(1) optimizations in `lib.sh` (`resolve_tmux_server`, `find_workspace_uuid`, `is_already_stopped`) fallback safely to YAML/state-blob if the SQLite DB is missing or in old schema format?
-3. Are the stop options properly simplified in `stop_session.sh`, and does the default behavior work cleanly with the database/YAML update flow?
-4. Are there any edge cases where `reconcile.sh` or `status.sh` might fail when DB is newly initialized?
-
-Please perform a code review on these changes and reply with either a detailed feedback/corrections or a `PASS`.
@@ -438,6 +438,19 @@ def db_exists(uuid):
    return os.path.exists(f"{home}/.gemini/antigravity-cli/conversations/{uuid}.db")


+def hermes_exists(uuid):
+    hdb = f"{home}/.hermes/state.db"
+    if not os.path.exists(hdb):
+        return False
+    try:
+        conn = sqlite3.connect(hdb)
+        r = conn.execute("SELECT 1 FROM sessions WHERE id=?", (uuid,)).fetchone()
+        conn.close()
+        return r is not None
+    except Exception:
+        return False
+
+
 def emit(u):
    print(u)
    raise SystemExit(0)
@@ -483,6 +496,10 @@ for s in sessions:
        cand = s.get('agy_conversation_id_own')
        if cand and db_exists(cand):
            emit(cand)
+    if agent == 'hermes' and name.endswith('-creator-hermes'):
+        cand = s.get('hermes_conversation_id_own')
+        if cand and hermes_exists(cand):
+            emit(cand)

 # 2) disk scan scoped to THIS workspace
 if agent == 'claude':
@@ -511,6 +528,20 @@ elif agent == 'agy':
            cand = None
        if cand and db_exists(cand):
            emit(cand)
+elif agent == 'hermes':
+    hdb = f"{home}/.hermes/state.db"
+    if os.path.exists(hdb):
+        cand = None
+        try:
+            conn = sqlite3.connect(hdb)
+            r = conn.execute("SELECT id FROM sessions WHERE cwd=? ORDER BY started_at DESC LIMIT 1", (ws,)).fetchone()
+            conn.close()
+            if r:
+                cand = r[0]
+        except Exception:
+            cand = None
+        if cand:
+            emit(cand)

 # 3) agent_identities cache, ONLY when its project_cwd == this workspace
 ai = {}
@@ -538,6 +569,10 @@ if ai_agent.get('project_cwd') == ws:
        cand = ai.get('conversation_id')
        if cand and db_exists(cand):
            emit(cand)
+    elif agent == 'hermes':
+        cand = ai_agent.get('session_id') or ai.get('conversation_id')
+        if cand and hermes_exists(cand):
+            emit(cand)

 print('')
 PYEOF
@@ -23,11 +23,11 @@ source "$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)/lib.sh"

 usage() {
  cat <<EOF
-Usage: $0 --workspace <path> --agent <claude|agy> [options]
+Usage: $0 --workspace <path> --agent <claude|agy|hermes> [options]

 Options:
  --workspace PATH    project directory (required)
-  --agent AGENT       claude | agy (required)
+  --agent AGENT       claude | agy | hermes (required)
  --session NAME      tmux session name (default: derived from workspace)
  --wrapper           force use of ~/.local/bin/<session> wrapper even if not present
  --dry-run           print commands without executing
@@ -70,7 +70,7 @@ fi
 command -v tmux >/dev/null || { echo "ERROR: tmux not installed" >&2; exit 1; }
 command -v "$AGENT" >/dev/null || { echo "ERROR: $AGENT CLI not in PATH" >&2; exit 1; }

-# Auth Check (OAuth check for agy, loggedIn check for claude)
+# Auth Check (OAuth check for agy, loggedIn check for claude, status for hermes)
 if [ "$AGENT" = "claude" ]; then
  if ! claude auth status 2>/dev/null | grep -q '"loggedIn":\s*true'; then
    echo "ERROR: claude not logged in. Run 'claude auth login' first." >&2
@@ -81,6 +81,11 @@ elif [ "$AGENT" = "agy" ]; then
    echo "ERROR: agy is not authenticated. Please log in first." >&2
    exit 1
  fi
+elif [ "$AGENT" = "hermes" ]; then
+  if ! hermes status >/dev/null 2>&1; then
+    echo "ERROR: hermes is not functional. Run 'hermes setup' first." >&2
+    exit 1
+  fi
 fi

 # 세션 이름 — lib.sh::derive_session_name 이 단일 소스 (P0-A)
@@ -111,7 +116,10 @@ spawn() {
    agy)
      _tmux new-session -d -s "$SESSION_NAME" -x 140 -y 40 -c "$WORKSPACE" "agy --dangerously-skip-permissions"
      ;;
-    *) echo "ERROR: --agent must be claude or agy, got: $AGENT" >&2; exit 2 ;;
+    hermes)
+      _tmux new-session -d -s "$SESSION_NAME" -x 140 -y 40 -c "$WORKSPACE" "hermes"
+      ;;
+    *) echo "ERROR: --agent must be claude, agy or hermes, got: $AGENT" >&2; exit 2 ;;
  esac
 }

@@ -136,6 +144,7 @@ NOW_ISO=$(date -u +'%Y-%m-%dT%H:%M:%SZ')
 case "$AGENT" in
  claude) CMD_FULL='claude' ;;
  agy)    CMD_FULL='agy --dangerously-skip-permissions' ;;
+  hermes) CMD_FULL='hermes' ;;
 esac

 # 시작 명령
@@ -152,7 +161,7 @@ case "$AGENT" in
      START_CMD="$local_tmux new-session -d -s \"$SESSION_NAME\" -x 140 -y 40 -c \"$WORKSPACE\" \"claude\""
    fi
    ;;
-  agy)
+  agy|hermes)
    START_CMD="$local_tmux new-session -d -s \"$SESSION_NAME\" -x 140 -y 40 -c \"$WORKSPACE\" \"$CMD_FULL\""
    ;;
 esac
@@ -163,6 +172,8 @@ if [ -n "$SUBMIT_JOB_PROMPT" ]; then
  delegate_agent=""
  if [ "$AGENT" = "claude" ]; then
    delegate_agent="claude-code"
+  elif [ "$AGENT" = "hermes" ]; then
+    delegate_agent="hermes-agent"
  else
    delegate_agent="antigravity-cli"
  fi
@@ -180,8 +191,8 @@ fi
 # 모든 값은 환경변수로 전달 — heredoc interpolation 없음 (P1-B).
 # 자식 pid 는 bash 에서 pgrep 으로 미리 구함 (P2: 도구명 필터).
 CHILD_PID=0
-if [ "$AGENT" = "agy" ] && [ -n "$PANE_PID" ]; then
-  CHILD_PID=$(pgrep -P "$PANE_PID" -x agy 2>/dev/null | head -1 || true)
+if { [ "$AGENT" = "agy" ] || [ "$AGENT" = "hermes" ]; } && [ -n "$PANE_PID" ]; then
+  CHILD_PID=$(pgrep -P "$PANE_PID" -x "$AGENT" 2>/dev/null | head -1 || true)
  CHILD_PID="${CHILD_PID:-0}"
 fi

@@ -249,6 +260,11 @@ elif agent == 'agy':
        }
    ]
    entry['last_visible_status'] = "TUI started; awaiting first user message"
+elif agent == 'hermes':
+    cp = os.environ.get('CHILD_PID', '0')
+    entry['child_pid'] = int(cp) if cp.isdigit() else 0
+    entry['hermes_conversation_id_own'] = None
+    entry['last_visible_status'] = "TUI started; awaiting first user message"

 sessions.append(entry)

@@ -61,7 +61,7 @@ If all three are empty → the workspace has no conversation yet. Fall back to `

 ```bash
 WORKSPACE=/path/to/project
-AGENT=claude  # or agy
+AGENT=claude  # or agy or hermes
 SESSION_NAME=<workspace>-creator-<agent>  # same convention as tmux-agent-orchestrate-create

 # 1. Resolve the session id
@@ -100,6 +100,10 @@ case "$AGENT" in
    tmux new-session -d -s "$SESSION_NAME" -x 140 -y 40 -c "$WORKSPACE" \
      "agy --dangerously-skip-permissions --conversation $UUID"
    ;;
+  hermes)
+    tmux new-session -d -s "$SESSION_NAME" -x 140 -y 40 -c "$WORKSPACE" \
+      "hermes --resume $UUID"
+    ;;
 esac

 # 4. Update agent-sessions.yaml: status running, last_visible_status
@@ -33,8 +33,8 @@ done
 [ -n "$WORKSPACE" ] || { echo "ERROR: --workspace required" >&2; exit 2; }
 [ -n "$AGENT" ]    || { echo "ERROR: --agent required" >&2; exit 2; }
 case "$AGENT" in
-  claude|agy) ;;
-  *) echo "ERROR: --agent must be claude or agy" >&2; exit 2 ;;
+  claude|agy|hermes) ;;
+  *) echo "ERROR: --agent must be claude or agy or hermes" >&2; exit 2 ;;
 esac

 find_workspace_uuid "$WORKSPACE" "$AGENT"
@@ -40,6 +40,7 @@ if [ -z "$AGENT" ]; then
  case "$SESSION_NAME" in
    *-creator-claude) AGENT=claude ;;
    *-creator-agy)    AGENT=agy ;;
+    *-creator-hermes) AGENT=hermes ;;
    *) echo "ERROR: cannot infer agent from '$SESSION_NAME'; pass --agent" >&2; exit 2 ;;
  esac
 fi
@@ -50,8 +51,8 @@ NOW_ISO=$(date -u +'%Y-%m-%dT%H:%M:%SZ')
 PANE_PID=$(tmux list-panes -t "$SESSION_NAME" -F '#{pane_pid}' 2>/dev/null | head -1 || true)
 PANE_PID="${PANE_PID:-}"
 CHILD_PID=0
-if [ "$AGENT" = "agy" ] && [ -n "$PANE_PID" ]; then
-  CHILD_PID=$(pgrep -P "$PANE_PID" -x agy 2>/dev/null | head -1 || true)
+if { [ "$AGENT" = "agy" ] || [ "$AGENT" = "hermes" ]; } && [ -n "$PANE_PID" ]; then
+  CHILD_PID=$(pgrep -P "$PANE_PID" -x "$AGENT" 2>/dev/null | head -1 || true)
  CHILD_PID="${CHILD_PID:-0}"
 fi

@@ -136,6 +137,13 @@ elif agent == 'agy':
    cp = os.environ.get('CHILD_PID', '0')
    if cp.isdigit() and int(cp) > 0:
        target['child_pid'] = int(cp)
+elif agent == 'hermes':
+    target['pane']['cmd'] = 'hermes'
+    target['pane']['cmd_full'] = f'hermes --resume {uuid}'
+    target['hermes_conversation_id_own'] = uuid
+    cp = os.environ.get('CHILD_PID', '0')
+    if cp.isdigit() and int(cp) > 0:
+        target['child_pid'] = int(cp)

 snap = d.setdefault('snapshot', {})
 snap['taken_at'] = now
@@ -75,6 +75,7 @@ if [ -z "$AGENT" ]; then
  case "$SESSION_NAME" in
    *-creator-claude) AGENT=claude ;;
    *-creator-agy)    AGENT=agy ;;
+    *-creator-hermes) AGENT=hermes ;;
    *) echo "ERROR: cannot infer agent from '$SESSION_NAME'; pass --agent" >&2; exit 2 ;;
  esac
 fi
@@ -182,6 +183,7 @@ graceful_stop() {
  case "$AGENT" in
    claude) exitkey="/exit" ;;
    agy)    exitkey="Exit" ;;
+    hermes) exitkey="/exit" ;;
    *)      exitkey="/exit" ;;
  esac
  echo "graceful: send-keys '$exitkey' to $SESSION_NAME"
@@ -259,6 +261,8 @@ if captured and not purge:
        target['claude_session_id_own'] = captured
    elif agent == 'agy':
        target['agy_conversation_id_own'] = captured
+    elif agent == 'hermes':
+        target['hermes_conversation_id_own'] = captured
    target['resumable'] = True

 # --purge-conversation: 워크스페이스 격리된 UUID 의 디스크 artifact 만 삭제 (P0-C)
@@ -281,6 +285,24 @@ if purge and purge_uuid:
            shutil.rmtree(brain)
            print(f"purged: {brain}", flush=True)
        target['agy_conversation_id_own'] = None
+    elif agent == 'hermes':
+        json_file = f"{home}/.hermes/sessions/session_{purge_uuid}.json"
+        if os.path.exists(json_file):
+            os.remove(json_file)
+            print(f"purged: {json_file}", flush=True)
+        hdb = f"{home}/.hermes/state.db"
+        if os.path.exists(hdb):
+            try:
+                import sqlite3
+                conn = sqlite3.connect(hdb)
+                conn.execute("DELETE FROM sessions WHERE id=?", (purge_uuid,))
+                conn.execute("DELETE FROM messages WHERE session_id=?", (purge_uuid,))
+                conn.commit()
+                conn.close()
+                print(f"purged db records for session: {purge_uuid}", flush=True)
+            except Exception as e:
+                print(f"WARN: purge hermes db records failed: {e}", flush=True)
+        target['hermes_conversation_id_own'] = None
    # agent_identities 는 cache — 이 워크스페이스 것일 때만 비운다
    ai = (d.get('agent_identities') or {}).get(agent) or {}
    if ai.get('project_cwd') == ws:
@@ -293,6 +315,8 @@ if purge and purge_uuid:
            ai['conversation_id'] = None
            ai['conversation_db'] = None
            ai['conversation_brain_dir'] = None
+        elif agent == 'hermes' and ai.get('session_id') == purge_uuid:
+            ai['session_id'] = None
 elif purge and not purge_uuid:
    print("WARN: --purge-conversation requested but no workspace-scoped UUID resolved; nothing purged", flush=True)