commit 8a3abff2d64ff3be456aa5d27fa8635e1da7ad3d Author: Godopu Date: Fri Jun 19 13:32:36 2026 +0000 initial: canary multi-agent skills with tmux isolation support - lib.sh: TMUX_SERVER_NAME env var, _tmux helper, shim externalized to TMPDIR with recursive guard, resolve_tmux_server helper for YAML-driven server routing - multi-agent-create: --tmux-server opt-in flag, YAML tmux_server field for orphan prevention - multi-agent-delete/resume/status/agent-sessions-monitor: use resolve_tmux_server to auto-route to correct isolated server - SKILL.md × 4: documented isolation server workflow - Verified by claude review (R1+re-run) + agy R2 patches (orphan prevention + shim location fix) diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..c18d09c --- /dev/null +++ b/.gitignore @@ -0,0 +1,11 @@ +# 1회성 작업 자료 (agy/claude 워커에게 보낸 프롬프트) +_agy_prompt_*.md +_claude_prompt_*.md + +# 임시 검증용 산출물 +test-sessions.yaml +test-sessions.yaml.bak +test-sessions.yaml.lock + +# 자체 git repo 임베드 (별도 관리) +delegate-job-skill/ \ No newline at end of file diff --git a/skills/agent-sessions-monitor/SKILL.md b/skills/agent-sessions-monitor/SKILL.md new file mode 100644 index 0000000..1dfa4c0 --- /dev/null +++ b/skills/agent-sessions-monitor/SKILL.md @@ -0,0 +1,205 @@ +--- +name: agent-sessions-monitor +description: "Run a long-lived Kanban worker that polls ~/PuKi/lab/agent_sessions/agent-sessions.yaml against the actual tmux/agent runtime state and reconciles them. Use when you want live visibility into which agent sessions are running, which are dead, which have stale YAML entries, and which have new session ids that haven't been recorded yet. Designed to be dispatched as a Kanban goal_mode task (--goal) so it keeps running until the user stops it." +version: 1.0.0 +author: godopu +license: MIT +platforms: [linux, macos] +environments: [kanban, terminal, tmux] +metadata: + hermes: + tags: [agent, tmux, claude, antigravity, agy, monitor, kanban, observation, reconciliation] + related_skills: [multi-agent-create, multi-agent-resume, multi-agent-delete, kanban-orchestrator] + prereq_skills: [kanban-worker, multi-agent-create] +--- + +# Agent Sessions Monitor — Live Reconciliation via Kanban Worker + +> **Companion skills**: `multi-agent-create` / `multi-agent-resume` / `multi-agent-delete` (mutators); this skill is the **observer**. +> **Single source of truth**: `~/PuKi/lab/agent_sessions/agent-sessions.yaml`. + +## What this skill does + +Dispatch a **Kanban worker** (in `goal_mode`) that: + +1. Every ~30s polls the actual state of: + - `tmux ls` (which sessions are alive) + - `tmux list-panes -t ...` (pane cmd, cwd, pid) + - `~/.claude/projects//*.jsonl` mtime + first-line sessionId + - `~/.gemini/antigravity-cli/cache/last_conversations.json` (agy workspace → conversation mapping) + - `~/.gemini/antigravity-cli/conversations/.db` mtime (agy) +2. Compares the live state to `agent-sessions.yaml` +3. Detects 4 classes of drift: + - **yaml-only terminated**: tmux dead, YAML says `terminated` → OK + - **yaml-only running, tmux dead**: YAML says `running`, tmux is gone → mark `terminated` with timestamp + - **tmux-only running, not in YAML**: tmux session exists with `-creator-*` naming but YAML doesn't know about it → register as a new entry + - **stale UUID**: YAML has a UUID, but the on-disk artifact is gone → flag in comment +4. Writes a Kanban `kanban_comment` on every drift event with diff details +5. Heartbeat every 5 minutes +6. **Goal loop**: judge (auxiliary model) re-checks the card after each turn against the body to decide "is monitoring still wanted?". When the user says "stop monitoring" via comment, the worker blocks with `reason=stop-requested`. + +## When to use + +- You have multiple workspaces with tmux agent sessions and want a single source of truth +- You suspect YAML drift after a host reboot / crash +- You want a notification when a session id was just created (so you can record it before next restart) +- You're running multi-day work and want to know "what's actually running right now" + +## When NOT to use + +- One-off interactive session — just check `tmux ls` and read the YAML +- A single, short session — overhead > benefit +- You don't have a Kanban dispatcher running + +## Dispatching the monitor + +```bash +# Goal-mode task: keeps running until the user signals stop +hermes kanban create \ + --title "agent-sessions monitor (live reconcile)" \ + --assignee default \ + --workspace worktree \ + --branch wt/agent-sessions-monitor \ + --goal \ + --goal-max-turns 100 \ + --max-runtime 8h \ + --max-retries 1 \ + --skill agent-sessions-monitor \ + --body "$(cat <<'EOF' +You are the agent-sessions monitor. Every 30 seconds, do: + +1. Read ~/PuKi/lab/agent_sessions/agent-sessions.yaml +2. Run `tmux ls` and `tmux list-panes -F 'session=#{session_name} pid=#{pane_pid} cmd=#{pane_current_command} cwd=#{pane_current_path}'` +3. For each session in the YAML, check the corresponding tmux state +4. For each tmux session matching `*-creator-claude` or `*-creator-agy` that's not in the YAML, register it +5. For any drift, call `kanban_comment` with the diff +6. Sleep 30 seconds, then repeat + +If the user comments `stop` or `stop monitoring` on this card, call `kanban_block(reason="stop-requested by user")`. + +If you find that a Claude session's `claude_session_id_own` is null but there's a new *.jsonl in the project dir, read the sessionId from the first line and update the YAML. + +Use the helper script at ~/PuKi/lab/agent_sessions/skills/agent-sessions-monitor/scripts/reconcile.sh for the YAML updates — it handles all the merge logic and writes a structured comment to this card. +EOF +)" +``` + +## Helper script: `reconcile.sh` + +The worker calls this script every 30s. It: + +1. Diffs YAML ↔ tmux ↔ disk artifacts +2. Updates YAML if needed (only when changes are real, not on every poll — avoids spamming) +3. Emits a JSON diff to stdout that the worker turns into a `kanban_comment` + +```bash +# Reconcile + auto-update YAML (atomic, flock-guarded). Emits JSON drift to stdout. +bash ~/PuKi/lab/agent_sessions/skills/agent-sessions-monitor/scripts/reconcile.sh --once --emit-diff + +# Read-only: compute drift WITHOUT writing the YAML (use for "what's running?" checks). +bash ~/PuKi/lab/agent_sessions/skills/agent-sessions-monitor/scripts/reconcile.sh --once --emit-diff --dry-run +``` + +Flags: `--once` (single pass), `--emit-diff` (print JSON), `--dry-run` (P1-E — no +mutation). There are **no** `--workspace` / `--agent` / `--comment-card` flags; the +worker turns the emitted JSON `drifts[]` into `kanban_comment` calls itself. + +## Drift classes (what the script handles) + +### A. tmux dead, YAML says running → auto-terminate + +``` +YAML: status=running, pane.pid=201132, cmd=claude +tmux: no session + → set status=terminated, terminated_at=, termination_mode=auto-detected + → comment: "lab-landing-page-creator-claude: tmux gone (was pane 201132, cmd claude). Marked terminated." +``` + +### B. tmux alive, not in YAML → auto-register + +``` +tmux: session=lab-paper-pdf2md-creator-agy, pid=..., + cmd=agy, cwd=/home/godopu16/PuKi/lab/paper-pdf2md +YAML: no such session + → register as new entry: status=running, last_visible_status=auto-registered + → comment: "lab-paper-pdf2md-creator-agy: tmux found but not in YAML. Auto-registered." +``` + +### C. New session id materializes (claude first message sent) + +``` +YAML: claude_session_id_own=null (placeholder) +disk: ~/.claude/projects/.../b3a7...c2f.jsonl exists, mtime=now, + first line sessionId=b3a7...c2f + → update claude_session_id_own=b3a7...c2f + → comment: "lab-landing-page-creator-claude: session id materialized b3a7...c2f" +``` + +### D. Stale UUID (artifact gone) + +``` +YAML: agent_identities.claude.session_id=87dc548e-... +disk: ~/.claude/projects/.../87dc548e-...jsonl: missing + → flag in comment, but DO NOT delete from YAML + (the user may have moved the file or the disk may be temporarily unavailable; + only `--purge-conversation` should remove the id) +``` + +## Pitfalls + +- **Don't run the monitor without `--goal`** — without goal mode, a single turn will spawn, do one reconcile, and complete. Goal mode keeps the worker alive across many turns. +- **The 30s poll is a default** — workers may override if they detect heavy churn. A workspace with 5+ agent sessions should bump to 60s to avoid noise. +- **`kanban_comment` rate limits** — Kanban may throttle if you comment too fast. Coalesce: only comment when the diff is *new* (not the same drift on every poll). The script tracks a state file at `~/.cache/agent-sessions-monitor/.state` for this. +- **Don't fight the user's explicit action** — if `multi-agent-delete` is mid-flight and the monitor sees the same session in two states within 5s, prefer the user's most recent action. The monitor should not auto-revert a fresh `terminated` to `running` because of a stale `tmux has-session` check. +- **The monitor should never modify the conversation artifacts** (jsonl, db) — only the YAML. If you see a stale UUID, comment about it but don't delete the file. +- **TUI capture-pane is expensive** — only capture when you need to update `last_visible_status`, not every poll. + +## Worker body template (for `hermes kanban create --body`) + +The `--body` of the dispatched task IS the worker's behavior spec. Here's a tested template: + +```markdown +# agent-sessions monitor + +## Loop (every 30s) + +1. Read agent-sessions.yaml +2. Bash: `bash ~/PuKi/lab/agent_sessions/skills/agent-sessions-monitor/scripts/reconcile.sh --emit-diff` +3. Parse the JSON diff from stdout +4. If `drifts` is non-empty: + - For each drift, call `kanban_comment` with the diff message +5. Bash: `sleep 30` +6. Heartbeat every 5 min: `kanban_heartbeat(progress="alive, N drifts detected, last at