Commit Graph

14 Commits

Author SHA1 Message Date
Godopu e1d998e1ef feat: add support for hermes agent in tmux orchestration scripts 2026-06-21 14:21:30 +00:00
Godopu 8097df0cbe feat(lib): SQLite DB normalization (FW-L3) & stop semantics simplification (FW-L2) 2026-06-21 09:05:52 +00:00
Godopu 478be56679 fix(lib): hardening and edge-case bugfixes (FW-12, FW-16 round)
- Restored .bak generation to maintain P0-B backup invariants
- Fixed stale NFS warning message to reflect SQLite DELETE fallback
- Replaced vulnerable yaml.replace with os.path.splitext globally
- Ensured YAML dump occurs after conn.commit() to prevent partial syncs
- Re-applied chmod 0600 on SQLite -wal and -shm files
2026-06-21 08:43:06 +00:00
Godopu 9b797a5c8c feat(lib): migrate to SQLite WAL backend for robust concurrency (FW-L1)
- Replaces python fcntl.flock with SQLite BEGIN IMMEDIATE.
- Status/Reconcile read from SQLite SSOT, with YAML fallback.
- Explicitly documented tradeoff: YAML is no longer a real-time view.
- Handles PRAGMA wal_checkpoint(TRUNCATE) safely outside transactions.
2026-06-21 08:35:07 +00:00
Godopu f1a98be8de fix(lib.sh): add NFS flock warning (FW-02) + unify venv deps with pyyaml (FW-11)
FW-02: atomic_dump_yaml now calls _atomic_dump_yaml_check_nfs() which
  detects NFS/CIFS/SSHFS mounts and warns that flock is unreliable.
  Long-term fix (SQLite WAL) documented in FUTURE_WORKS.md.

FW-11: pyyaml added to requirements.txt and installed in .venv, so
  both paho-mqtt and yaml are available in a single interpreter.
  Eliminates the system-python3-vs-venv split for monitor --subscribe.
2026-06-21 06:39:12 +00:00
Godopu 4cea11438a refactor(lib.sh): extract hardcoded tmux shim paths to constants (FW-07) + cache _delegate_py_bin result (FW-08)
FW-07: _resolve_real_tmux_path and _init_tmux_isolation now use
  _TMUX_SHIM_DIR_PATTERN and _TMUX_SKILLS_BIN_PATTERN env-overridable
  constants instead of hardcoded path strings. All 4 reference sites
  updated (lines 32, 37, 57, 76). Default values preserve original
  slash semantics (/multi-agent-tmux-shim/, /skills/.bin).

FW-08: _delegate_py_bin caches result in AGENT_PYTHON_BIN shell
  variable (not exported — avoids cross-workspace pollution).
  Fallback uses command -v python3 for absolute path caching.

Reviewed by agy-existing (FAIL->fixed) and claude-existing (FAIL->fixed).
Both reviewers identified: slash omission, incomplete extraction at :57/:76,
export side effects. All issues resolved.
2026-06-21 06:24:31 +00:00
Godopu 0de0f236b2 feat(tmux-agent-orchestrate-delete): add --capture-id, --reason, --graceful options
Implements user choice Option A: extend delete instead of adding a 6th 'stop' skill.

Changes:
- skills/lib.sh:
  - capture_conversation_id() — thin wrapper over find_workspace_uuid (race-free)
  - is_already_stopped() — idempotency check
  - _validate(): add 'stopped' to the valid status set (required for the new
    transition; without it atomic_dump_yaml silently rejected the write)
- skills/tmux-agent-orchestrate-delete/scripts/delete_session.sh:
  - --capture-id: records claude_session_id_own / agy_conversation_id_own +
    resumable:true to the row before kill (guarantees tier-1 resume)
  - --reason <reason>: records stop_reason (default manual_stop)
  - --graceful: send-keys exit -> 3s -> kill-session(SIGTERM) -> 5s -> SIGKILL
  - STOP mode (any of the three) transitions running -> stopped (vs terminated)
  - Idempotency: already-stopped session prints message + exit 0
  - No options -> identical legacy behaviour (hard->terminated, soft->archived)
- skills/tmux-agent-orchestrate-delete/SKILL.md: documented options + state machine

5-route surface preserved (no new directory). Other 5 routes unchanged.

Known follow-up (out of scope, monitor edits forbidden this round): monitor
reconcile drift-A treats a tmux-dead 'stopped' row as drift and would re-mark it
'terminated' (skip-set is only terminated/archived). status.sh shows DRIFT=A for
stopped rows. Needs a Phase-2 wiring change to add 'stopped' to the skip-set.

Verified on isolated server -L claude-stop-impl-test (kill-server after):
- syntax PASS; E2E: capture-id, idempotency(exit 0), graceful fallback chain,
  backward-compat(terminated), status renders stopped. Real YAML + main canary untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 15:19:09 +00:00
Godopu cd9eec112d refactor(skills): make skills portable across users/locations via workspace-relative paths + env var overrides
Changes:
- skills/lib.sh:
  - HOME_DIR default changed from $HOME to <workspace_root> (workspace self-sufficient)
  - Added CLAUDE_PROJECT_DIR / LOCAL_BIN env var pattern (default $HOME, overridable)
- skills/tmux-agent-orchestrate-monitor/scripts/reconcile.sh:
  - STATE_DIR moved from $HOME/.cache/... to <workspace>/.cache/tmux-agent-orchestrate-monitor
- skills/tmux-agent-orchestrate-create/scripts/create_session.sh:
  - WRAPPER uses $LOCAL_BIN env var (default $HOME/.local/bin)
- 6 SKILL.md: examples and explanations updated to mention env var override capability

User/portability contract:
- Workspace-internal data: .hermes/ + .cache/ (moves with workspace)
- User/system data: $HOME/* (overridable via CLAUDE_PROJECT_DIR, LOCAL_BIN)
- All env vars follow: ${VAR:-default} pattern with documented defaults

Verified on isolated server -L agy-homeport-test (kill-server after):
- syntax check PASS
- E2E: defaults resolve to workspace-relative paths
- E2E: env var override correctly changes paths
- 0 leftover direct $HOME references in code
- Global skill non-interference verified
- Main isolated server -L multi-agent-canary untouched
2026-06-20 05:39:27 +00:00
Godopu ad7be264e7 refactor(skills): convert absolute paths to workspace-relative + relocate agent-sessions.yaml to .hermes/
Changes:
- skills/lib.sh: AGENT_SESSIONS_YAML default moved from ~/PuKi/lab/.../agent-sessions.yaml
  to <workspace_root>/.hermes/agent-sessions.yaml (relative via BASH_SOURCE)
- 6 SKILL.md: descriptions + 'Single source of truth' lines updated to .hermes/agent-sessions.yaml
- 6 SKILL.md: bash examples (~/PuKi/lab/agent_sessions/skills/...) → relative paths
- SKILL.md file:// links converted from absolute to relative (resolves workspace tool warnings)
- tmux-agent-orchestrate-create/SKILL.md: removed outdated wrapper template reference
- lib.sh internal comments: removed /home/godopu16/PuKi/lab example
- All scripts: internal source/path references use relative resolution

Verified on isolated server -L agy-relative-path-test (kill-server after):
- syntax check PASS
- E2E: create_session.sh auto-creates .hermes/agent-sessions.yaml at new location
- status.sh reads new location correctly
- 0 leftover absolute path references
- Global skill non-interference verified
- Main isolated server -L multi-agent-canary untouched
2026-06-19 23:41:05 +00:00
Godopu e8eebe5eb1 feat(tmux-agent-orchestrate-monitor): integrate watchdog pattern as skill
Moved /tmp/subscriber-watchdog.sh → skills/tmux-agent-orchestrate-monitor/scripts/watchdog.sh
(skill-managed lifecycle, no longer lives outside workspace).

Added lib.sh::start_watchdog() helper:
- Spawns watchdog as background nohup process
- Writes watchdog log to .hermes/jobs/<JID>.watchdog.log
- Returns watchdog PID via stdout

Wired create_session.sh --submit-job to auto-start watchdog after JOB registration.

Fixes:
- Bug: registry.py get first-line parse was fragile (empty status → infinite loop)
  → Now uses python3 json.load for robust parsing
- Bug: old path skills/delegate-job/scripts/job_subscriber.py hardcoded
  → Now uses skills/tmux-agent-orchestrate-delegate-job/scripts/job_subscriber.py

Verified on isolated server -L agy-watchdog-skill-test (kill-server after):
- Syntax check PASS
- E2E: register job → start watchdog → publish completed → watchdog exits
- Global skill non-interference verified
- Main isolated server -L multi-agent-canary untouched
2026-06-19 23:33:46 +00:00
Godopu e9fc763d31 refactor(skills): rename multi-agent-* + agent-sessions-monitor + delegate-job to tmux-agent-orchestrate-*
Renamed 6 skills directories to tmux-agent-orchestrate-* prefix:
- multi-agent-create → tmux-agent-orchestrate-create
- multi-agent-resume → tmux-agent-orchestrate-resume
- multi-agent-delete → tmux-agent-orchestrate-delete
- multi-agent-status → tmux-agent-orchestrate-status
- agent-sessions-monitor → tmux-agent-orchestrate-monitor
- delegate-job → tmux-agent-orchestrate-delegate-job

Updated:
- skills/lib.sh internal paths (delegate_submit_job etc.)
- skills/tmux-agent-orchestrate-status/scripts/status.sh (monitor path)
- skills/tmux-agent-orchestrate-monitor/scripts/reconcile.sh
- .gitignore (HTML ignore patterns)
- 6 SKILL.md frontmatter (name, related_skills, prereq_skills) and body
- All script headers and Korean comments

Notes:
- tmux session naming convention unchanged (<slug>-creator-<agent>) — workspace identifier based, kept for backward compatibility
- Existing 2 sessions in -L multi-agent-canary untouched
- YAML delegate_job_id / agent-session (tmux:canary-...) preserved for log history compatibility

Verified on isolated server -L agy-rename-test (kill-server after).
2026-06-19 23:27:27 +00:00
Godopu 06f076e9cc fix(skills): claude review items 4-7 (subscribe timeout, atomic_dump_yaml, hardcoded paths, lifecycle helper)
Item 4: --subscribe gains --timeout/--idle-timeout (idle default raised
        120s->600s, 0=disable); connect-error AND non-zero CONNACK now fall
        back to a polling loop. SKILL.md matches actual behaviour.
Item 5: --subscribe terminal-event YAML writes routed through
        lib.sh::atomic_dump_yaml (flock + schema-validate + .bak).
Item 6: removed hardcoded /home/godopu16/PuKi fallbacks in lib.sh,
        status.sh (x2) and reconcile.sh; paths now BASH_SOURCE-relative.
Item 7: lib.sh::delegate_publish_event helper consolidates the 4 duplicated
        lifecycle publish blocks; delete cwd|jid parser replaced with JSON.

Also: subscribe loop runs under the project venv python (paho) and delegates
all YAML work to atomic_dump_yaml on system python3 (PyYAML), since neither
interpreter has both modules — the original env_python path could never import
paho. Items 3 + 8 out of scope (per user). Verified on -L claude-phase4-test
(kill-server after).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 15:11:09 +00:00
Godopu 0eb1d94a9c refactor(skills): cleanup dead code + full workflow A→B→C→D integration
Cleanup:
- Remove unused validate_yaml() helper from lib.sh
- Remove USER_MANUAL.html + mqtt-broker-setup.html (no refs found)

Workflow A (create_session ↔ delegate-job):
- Add --submit-job <prompt> option to create_session.sh
- Auto-register session in delegate-job registry, store delegate_job_id in YAML

Workflow B (push-based monitor):
- Migrate reconcile.sh to MQTT subscriber mode (polling fallback preserved)

Workflow C (unified status):
- status.sh now shows session + delegate-job state in single column

Workflow D (audit log + perms):
- JSON job files chmod 600
- create/delete/resume now publish lifecycle events to delegate-job
2026-06-19 14:27:29 +00:00
Godopu 8a3abff2d6 initial: canary multi-agent skills with tmux isolation support
- lib.sh: TMUX_SERVER_NAME env var, _tmux helper, shim externalized
  to TMPDIR with recursive guard, resolve_tmux_server helper for
  YAML-driven server routing
- multi-agent-create: --tmux-server opt-in flag, YAML tmux_server
  field for orphan prevention
- multi-agent-delete/resume/status/agent-sessions-monitor: use
  resolve_tmux_server to auto-route to correct isolated server
- SKILL.md × 4: documented isolation server workflow
- Verified by claude review (R1+re-run) + agy R2 patches
  (orphan prevention + shim location fix)
2026-06-19 13:32:36 +00:00