Commit Graph

63 Commits

Author SHA1 Message Date
Godopu 0de0f236b2 feat(tmux-agent-orchestrate-delete): add --capture-id, --reason, --graceful options
Implements user choice Option A: extend delete instead of adding a 6th 'stop' skill.

Changes:
- skills/lib.sh:
  - capture_conversation_id() — thin wrapper over find_workspace_uuid (race-free)
  - is_already_stopped() — idempotency check
  - _validate(): add 'stopped' to the valid status set (required for the new
    transition; without it atomic_dump_yaml silently rejected the write)
- skills/tmux-agent-orchestrate-delete/scripts/delete_session.sh:
  - --capture-id: records claude_session_id_own / agy_conversation_id_own +
    resumable:true to the row before kill (guarantees tier-1 resume)
  - --reason <reason>: records stop_reason (default manual_stop)
  - --graceful: send-keys exit -> 3s -> kill-session(SIGTERM) -> 5s -> SIGKILL
  - STOP mode (any of the three) transitions running -> stopped (vs terminated)
  - Idempotency: already-stopped session prints message + exit 0
  - No options -> identical legacy behaviour (hard->terminated, soft->archived)
- skills/tmux-agent-orchestrate-delete/SKILL.md: documented options + state machine

5-route surface preserved (no new directory). Other 5 routes unchanged.

Known follow-up (out of scope, monitor edits forbidden this round): monitor
reconcile drift-A treats a tmux-dead 'stopped' row as drift and would re-mark it
'terminated' (skip-set is only terminated/archived). status.sh shows DRIFT=A for
stopped rows. Needs a Phase-2 wiring change to add 'stopped' to the skip-set.

Verified on isolated server -L claude-stop-impl-test (kill-server after):
- syntax PASS; E2E: capture-id, idempotency(exit 0), graceful fallback chain,
  backward-compat(terminated), status renders stopped. Real YAML + main canary untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 15:19:09 +00:00
Godopu a876b70428 docs: write deep messaging system report in REPORT.md 2026-06-20 14:50:26 +00:00
Godopu 0cb8d058cb feat(env): add .env.example template + scripts/generate-env.sh
.env.example: committable template (all 13 skill env vars commented with
defaults; secrets use replace_me, no plaintext). .gitignore already carves
it out via !.env.example.

scripts/generate-env.sh: creates .env from .env.example if absent, no-ops
if present, --force overwrites with a .env.bak backup. Placed under a new
top-level scripts/ dir so it is committable without touching skills/*.

Verified on -L claude-env2-test (create/no-op/force/bad-arg paths).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 14:46:38 +00:00
Godopu 61ba8aae1d chore(.gitignore): exclude .env + .env.* with .env.example carve-out
Secures the workspace against accidental commits of:
- .env (skill env overrides, may contain secrets)
- .env.* (any env variant)
- except .env.example (committable template, can be added later)

Includes Korean comment noting secrets policy.
2026-06-20 14:43:22 +00:00
Godopu cd9eec112d refactor(skills): make skills portable across users/locations via workspace-relative paths + env var overrides
Changes:
- skills/lib.sh:
  - HOME_DIR default changed from $HOME to <workspace_root> (workspace self-sufficient)
  - Added CLAUDE_PROJECT_DIR / LOCAL_BIN env var pattern (default $HOME, overridable)
- skills/tmux-agent-orchestrate-monitor/scripts/reconcile.sh:
  - STATE_DIR moved from $HOME/.cache/... to <workspace>/.cache/tmux-agent-orchestrate-monitor
- skills/tmux-agent-orchestrate-create/scripts/create_session.sh:
  - WRAPPER uses $LOCAL_BIN env var (default $HOME/.local/bin)
- 6 SKILL.md: examples and explanations updated to mention env var override capability

User/portability contract:
- Workspace-internal data: .hermes/ + .cache/ (moves with workspace)
- User/system data: $HOME/* (overridable via CLAUDE_PROJECT_DIR, LOCAL_BIN)
- All env vars follow: ${VAR:-default} pattern with documented defaults

Verified on isolated server -L agy-homeport-test (kill-server after):
- syntax check PASS
- E2E: defaults resolve to workspace-relative paths
- E2E: env var override correctly changes paths
- 0 leftover direct $HOME references in code
- Global skill non-interference verified
- Main isolated server -L multi-agent-canary untouched
2026-06-20 05:39:27 +00:00
Godopu ad7be264e7 refactor(skills): convert absolute paths to workspace-relative + relocate agent-sessions.yaml to .hermes/
Changes:
- skills/lib.sh: AGENT_SESSIONS_YAML default moved from ~/PuKi/lab/.../agent-sessions.yaml
  to <workspace_root>/.hermes/agent-sessions.yaml (relative via BASH_SOURCE)
- 6 SKILL.md: descriptions + 'Single source of truth' lines updated to .hermes/agent-sessions.yaml
- 6 SKILL.md: bash examples (~/PuKi/lab/agent_sessions/skills/...) → relative paths
- SKILL.md file:// links converted from absolute to relative (resolves workspace tool warnings)
- tmux-agent-orchestrate-create/SKILL.md: removed outdated wrapper template reference
- lib.sh internal comments: removed /home/godopu16/PuKi/lab example
- All scripts: internal source/path references use relative resolution

Verified on isolated server -L agy-relative-path-test (kill-server after):
- syntax check PASS
- E2E: create_session.sh auto-creates .hermes/agent-sessions.yaml at new location
- status.sh reads new location correctly
- 0 leftover absolute path references
- Global skill non-interference verified
- Main isolated server -L multi-agent-canary untouched
2026-06-19 23:41:05 +00:00
Godopu e8eebe5eb1 feat(tmux-agent-orchestrate-monitor): integrate watchdog pattern as skill
Moved /tmp/subscriber-watchdog.sh → skills/tmux-agent-orchestrate-monitor/scripts/watchdog.sh
(skill-managed lifecycle, no longer lives outside workspace).

Added lib.sh::start_watchdog() helper:
- Spawns watchdog as background nohup process
- Writes watchdog log to .hermes/jobs/<JID>.watchdog.log
- Returns watchdog PID via stdout

Wired create_session.sh --submit-job to auto-start watchdog after JOB registration.

Fixes:
- Bug: registry.py get first-line parse was fragile (empty status → infinite loop)
  → Now uses python3 json.load for robust parsing
- Bug: old path skills/delegate-job/scripts/job_subscriber.py hardcoded
  → Now uses skills/tmux-agent-orchestrate-delegate-job/scripts/job_subscriber.py

Verified on isolated server -L agy-watchdog-skill-test (kill-server after):
- Syntax check PASS
- E2E: register job → start watchdog → publish completed → watchdog exits
- Global skill non-interference verified
- Main isolated server -L multi-agent-canary untouched
2026-06-19 23:33:46 +00:00
Godopu e9fc763d31 refactor(skills): rename multi-agent-* + agent-sessions-monitor + delegate-job to tmux-agent-orchestrate-*
Renamed 6 skills directories to tmux-agent-orchestrate-* prefix:
- multi-agent-create → tmux-agent-orchestrate-create
- multi-agent-resume → tmux-agent-orchestrate-resume
- multi-agent-delete → tmux-agent-orchestrate-delete
- multi-agent-status → tmux-agent-orchestrate-status
- agent-sessions-monitor → tmux-agent-orchestrate-monitor
- delegate-job → tmux-agent-orchestrate-delegate-job

Updated:
- skills/lib.sh internal paths (delegate_submit_job etc.)
- skills/tmux-agent-orchestrate-status/scripts/status.sh (monitor path)
- skills/tmux-agent-orchestrate-monitor/scripts/reconcile.sh
- .gitignore (HTML ignore patterns)
- 6 SKILL.md frontmatter (name, related_skills, prereq_skills) and body
- All script headers and Korean comments

Notes:
- tmux session naming convention unchanged (<slug>-creator-<agent>) — workspace identifier based, kept for backward compatibility
- Existing 2 sessions in -L multi-agent-canary untouched
- YAML delegate_job_id / agent-session (tmux:canary-...) preserved for log history compatibility

Verified on isolated server -L agy-rename-test (kill-server after).
2026-06-19 23:27:27 +00:00
Godopu 4fa276f3c5 chore(.gitignore): generalize test-sessions patterns for variants (phase4 suffix) 2026-06-19 15:21:16 +00:00
Godopu 06f076e9cc fix(skills): claude review items 4-7 (subscribe timeout, atomic_dump_yaml, hardcoded paths, lifecycle helper)
Item 4: --subscribe gains --timeout/--idle-timeout (idle default raised
        120s->600s, 0=disable); connect-error AND non-zero CONNACK now fall
        back to a polling loop. SKILL.md matches actual behaviour.
Item 5: --subscribe terminal-event YAML writes routed through
        lib.sh::atomic_dump_yaml (flock + schema-validate + .bak).
Item 6: removed hardcoded /home/godopu16/PuKi fallbacks in lib.sh,
        status.sh (x2) and reconcile.sh; paths now BASH_SOURCE-relative.
Item 7: lib.sh::delegate_publish_event helper consolidates the 4 duplicated
        lifecycle publish blocks; delete cwd|jid parser replaced with JSON.

Also: subscribe loop runs under the project venv python (paho) and delegates
all YAML work to atomic_dump_yaml on system python3 (PyYAML), since neither
interpreter has both modules — the original env_python path could never import
paho. Items 3 + 8 out of scope (per user). Verified on -L claude-phase4-test
(kill-server after).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 15:11:09 +00:00
Godopu 0eb1d94a9c refactor(skills): cleanup dead code + full workflow A→B→C→D integration
Cleanup:
- Remove unused validate_yaml() helper from lib.sh
- Remove USER_MANUAL.html + mqtt-broker-setup.html (no refs found)

Workflow A (create_session ↔ delegate-job):
- Add --submit-job <prompt> option to create_session.sh
- Auto-register session in delegate-job registry, store delegate_job_id in YAML

Workflow B (push-based monitor):
- Migrate reconcile.sh to MQTT subscriber mode (polling fallback preserved)

Workflow C (unified status):
- status.sh now shows session + delegate-job state in single column

Workflow D (audit log + perms):
- JSON job files chmod 600
- create/delete/resume now publish lifecycle events to delegate-job
2026-06-19 14:27:29 +00:00
Godopu 97f649a3e1 feat(skills): integrate delegate-job skill (squashed from delegate-job-skill)
- Copy delegate-job-skill/skills/delegate-job/ → skills/delegate-job/
- Move requirements.txt (paho-mqtt>=2.0.0) into the new location
- Refactor outdated hardcoded paths (~/PuKi/lab/, ~/.hermes/skills/) to dynamic resolution
- Add MQTT connection timeout / retry hardening
- Remove legacy delegate-job-skill/ directory
- Update .gitignore

Note: delegate-job-skill git history is squashed — preserved content, dropped commit lineage.
2026-06-19 14:00:29 +00:00
Godopu 8a3abff2d6 initial: canary multi-agent skills with tmux isolation support
- lib.sh: TMUX_SERVER_NAME env var, _tmux helper, shim externalized
  to TMPDIR with recursive guard, resolve_tmux_server helper for
  YAML-driven server routing
- multi-agent-create: --tmux-server opt-in flag, YAML tmux_server
  field for orphan prevention
- multi-agent-delete/resume/status/agent-sessions-monitor: use
  resolve_tmux_server to auto-route to correct isolated server
- SKILL.md × 4: documented isolation server workflow
- Verified by claude review (R1+re-run) + agy R2 patches
  (orphan prevention + shim location fix)
2026-06-19 13:32:36 +00:00