- Replaces python fcntl.flock with SQLite BEGIN IMMEDIATE.
- Status/Reconcile read from SQLite SSOT, with YAML fallback.
- Explicitly documented tradeoff: YAML is no longer a real-time view.
- Handles PRAGMA wal_checkpoint(TRUNCATE) safely outside transactions.
FW-02: atomic_dump_yaml now calls _atomic_dump_yaml_check_nfs() which
detects NFS/CIFS/SSHFS mounts and warns that flock is unreliable.
Long-term fix (SQLite WAL) documented in FUTURE_WORKS.md.
FW-11: pyyaml added to requirements.txt and installed in .venv, so
both paho-mqtt and yaml are available in a single interpreter.
Eliminates the system-python3-vs-venv split for monitor --subscribe.
FW-09: SKILL.md defines valid last_visible_status values (running/stopped/
terminated/archived). reconcile.sh now sets last_visible_status to
'running' and uses last_visible_note for free-form comments.
FW-15: SKILL.md adds Security section for --subscribe on public brokers.
Documents wildcard subscription risks, auto-kill spoofing, HMAC
verification mitigation, and recommends --once/polling for PoC.
FW-04: mqtt_common.py now loads .env at module import via _load_dotenv().
Walks up from script dir to find workspace .env, sets vars not already
in os.environ (OS env takes precedence). Uses stdlib only — no
python-dotenv dependency.
FW-06: bash wrapper sets trap EXIT before tmux new-session to publish
error event if agent bootstrap fails (non-zero exit). Trap is cleared
after successful session creation. Only active when job_id is set.
FW-03: replace 'delete' with 'stop' in skill reference (line 299).
'terminated' retained as valid YAML status value (hard kill mode).
FW-10/FW-16: add Glossary section distinguishing session states
(running/stopped/terminated/archived in agent-sessions.yaml) from
job states (pending/running/completed/error/cancelled in registry).
Documents which skill/function sets each state.
FW-07: _resolve_real_tmux_path and _init_tmux_isolation now use
_TMUX_SHIM_DIR_PATTERN and _TMUX_SKILLS_BIN_PATTERN env-overridable
constants instead of hardcoded path strings. All 4 reference sites
updated (lines 32, 37, 57, 76). Default values preserve original
slash semantics (/multi-agent-tmux-shim/, /skills/.bin).
FW-08: _delegate_py_bin caches result in AGENT_PYTHON_BIN shell
variable (not exported — avoids cross-workspace pollution).
Fallback uses command -v python3 for absolute path caching.
Reviewed by agy-existing (FAIL->fixed) and claude-existing (FAIL->fixed).
Both reviewers identified: slash omission, incomplete extraction at :57/:76,
export side effects. All issues resolved.
Implements user choice Option B: the two follow-ups to 0de0f23, in one patch.
Changes:
- skills/tmux-agent-orchestrate-monitor/scripts/reconcile.sh:
- drift-A skip-set extended: ('terminated', 'archived', 'stopped')
- prevents the monitor from overwriting a tmux-dead 'stopped' row with
'terminated (auto-detected)', which would lose resumable + captured id
- skills/tmux-agent-orchestrate-resume/scripts/update_yaml_resumed.sh:
- pop stopped_at, stopped_at_epoch, stop_reason, resumable on resume
(alongside the existing terminated_at*/termination_mode/archived_at) so a
resumed row has no stale end-of-session metadata
- skills/tmux-agent-orchestrate-monitor/SKILL.md: documented 'stopped' in the
drift class list + a skip-set note on drift class A
- skills/tmux-agent-orchestrate-resume/SKILL.md: documented stopped -> running
transition + tier-1 race-free resume path
5-route surface preserved (no new directory). delete_session.sh untouched.
Verified on isolated server -L claude-followup-test (kill-server after):
- syntax PASS
- E2E A: stop -> tmux dead -> reconcile --once -> status stays 'stopped'
- E2E B: resume -> stopped_at/stopped_at_epoch/stop_reason/resumable all gone
- E2E C: plain delete -> terminated, reconcile leaves it (no regression)
- Real YAML + main canary untouched
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements user choice Option A: extend delete instead of adding a 6th 'stop' skill.
Changes:
- skills/lib.sh:
- capture_conversation_id() — thin wrapper over find_workspace_uuid (race-free)
- is_already_stopped() — idempotency check
- _validate(): add 'stopped' to the valid status set (required for the new
transition; without it atomic_dump_yaml silently rejected the write)
- skills/tmux-agent-orchestrate-delete/scripts/delete_session.sh:
- --capture-id: records claude_session_id_own / agy_conversation_id_own +
resumable:true to the row before kill (guarantees tier-1 resume)
- --reason <reason>: records stop_reason (default manual_stop)
- --graceful: send-keys exit -> 3s -> kill-session(SIGTERM) -> 5s -> SIGKILL
- STOP mode (any of the three) transitions running -> stopped (vs terminated)
- Idempotency: already-stopped session prints message + exit 0
- No options -> identical legacy behaviour (hard->terminated, soft->archived)
- skills/tmux-agent-orchestrate-delete/SKILL.md: documented options + state machine
5-route surface preserved (no new directory). Other 5 routes unchanged.
Known follow-up (out of scope, monitor edits forbidden this round): monitor
reconcile drift-A treats a tmux-dead 'stopped' row as drift and would re-mark it
'terminated' (skip-set is only terminated/archived). status.sh shows DRIFT=A for
stopped rows. Needs a Phase-2 wiring change to add 'stopped' to the skip-set.
Verified on isolated server -L claude-stop-impl-test (kill-server after):
- syntax PASS; E2E: capture-id, idempotency(exit 0), graceful fallback chain,
backward-compat(terminated), status renders stopped. Real YAML + main canary untouched.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
.env.example: committable template (all 13 skill env vars commented with
defaults; secrets use replace_me, no plaintext). .gitignore already carves
it out via !.env.example.
scripts/generate-env.sh: creates .env from .env.example if absent, no-ops
if present, --force overwrites with a .env.bak backup. Placed under a new
top-level scripts/ dir so it is committable without touching skills/*.
Verified on -L claude-env2-test (create/no-op/force/bad-arg paths).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Secures the workspace against accidental commits of:
- .env (skill env overrides, may contain secrets)
- .env.* (any env variant)
- except .env.example (committable template, can be added later)
Includes Korean comment noting secrets policy.
Item 4: --subscribe gains --timeout/--idle-timeout (idle default raised
120s->600s, 0=disable); connect-error AND non-zero CONNACK now fall
back to a polling loop. SKILL.md matches actual behaviour.
Item 5: --subscribe terminal-event YAML writes routed through
lib.sh::atomic_dump_yaml (flock + schema-validate + .bak).
Item 6: removed hardcoded /home/godopu16/PuKi fallbacks in lib.sh,
status.sh (x2) and reconcile.sh; paths now BASH_SOURCE-relative.
Item 7: lib.sh::delegate_publish_event helper consolidates the 4 duplicated
lifecycle publish blocks; delete cwd|jid parser replaced with JSON.
Also: subscribe loop runs under the project venv python (paho) and delegates
all YAML work to atomic_dump_yaml on system python3 (PyYAML), since neither
interpreter has both modules — the original env_python path could never import
paho. Items 3 + 8 out of scope (per user). Verified on -L claude-phase4-test
(kill-server after).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- lib.sh: TMUX_SERVER_NAME env var, _tmux helper, shim externalized
to TMPDIR with recursive guard, resolve_tmux_server helper for
YAML-driven server routing
- multi-agent-create: --tmux-server opt-in flag, YAML tmux_server
field for orphan prevention
- multi-agent-delete/resume/status/agent-sessions-monitor: use
resolve_tmux_server to auto-route to correct isolated server
- SKILL.md × 4: documented isolation server workflow
- Verified by claude review (R1+re-run) + agy R2 patches
(orphan prevention + shim location fix)