deploy/install.sh extracted the repo archive in-place with
`tar --strip-components=1`, which inside an existing project could silently
overwrite the target's own README.md/FUTURE_WORKS.md/etc and litter it with
this repo's dev docs.
Rebuild the fetch path:
- stage the clone/extract into a `mktemp -d` dir, never in-place
- verify `.agents/skills/lib.sh` is present before copying anything
- copy only runtime assets (.agents/, AGENT.md, .env.example) into the target
with per-file no-clobber guards (`[ ! -e ]`), so existing files always win
- post-fetch sanity check now tests a file, not just the directory
- fail fast when neither git nor curl is available
Use explicit `[ ! -e ]` guards + a POSIX find merge rather than `cp -n`
(non-portable; emits a deprecation warning on GNU coreutils 9.x). The earlier
`tar --exclude` denylist idea was rejected in review: non-portable and the
unanchored `--exclude="scripts"` pattern stripped the skills' own nested
scripts/ dirs, yielding a silently broken install.
Mark FW-D1 resolved and FW-D2 partially addressed in FUTURE_WORKS.md/.ko.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Replaces python fcntl.flock with SQLite BEGIN IMMEDIATE.
- Status/Reconcile read from SQLite SSOT, with YAML fallback.
- Explicitly documented tradeoff: YAML is no longer a real-time view.
- Handles PRAGMA wal_checkpoint(TRUNCATE) safely outside transactions.
FW-02: atomic_dump_yaml now calls _atomic_dump_yaml_check_nfs() which
detects NFS/CIFS/SSHFS mounts and warns that flock is unreliable.
Long-term fix (SQLite WAL) documented in FUTURE_WORKS.md.
FW-11: pyyaml added to requirements.txt and installed in .venv, so
both paho-mqtt and yaml are available in a single interpreter.
Eliminates the system-python3-vs-venv split for monitor --subscribe.
FW-09: SKILL.md defines valid last_visible_status values (running/stopped/
terminated/archived). reconcile.sh now sets last_visible_status to
'running' and uses last_visible_note for free-form comments.
FW-15: SKILL.md adds Security section for --subscribe on public brokers.
Documents wildcard subscription risks, auto-kill spoofing, HMAC
verification mitigation, and recommends --once/polling for PoC.
FW-04: mqtt_common.py now loads .env at module import via _load_dotenv().
Walks up from script dir to find workspace .env, sets vars not already
in os.environ (OS env takes precedence). Uses stdlib only — no
python-dotenv dependency.
FW-06: bash wrapper sets trap EXIT before tmux new-session to publish
error event if agent bootstrap fails (non-zero exit). Trap is cleared
after successful session creation. Only active when job_id is set.
FW-03: replace 'delete' with 'stop' in skill reference (line 299).
'terminated' retained as valid YAML status value (hard kill mode).
FW-10/FW-16: add Glossary section distinguishing session states
(running/stopped/terminated/archived in agent-sessions.yaml) from
job states (pending/running/completed/error/cancelled in registry).
Documents which skill/function sets each state.
FW-07: _resolve_real_tmux_path and _init_tmux_isolation now use
_TMUX_SHIM_DIR_PATTERN and _TMUX_SKILLS_BIN_PATTERN env-overridable
constants instead of hardcoded path strings. All 4 reference sites
updated (lines 32, 37, 57, 76). Default values preserve original
slash semantics (/multi-agent-tmux-shim/, /skills/.bin).
FW-08: _delegate_py_bin caches result in AGENT_PYTHON_BIN shell
variable (not exported — avoids cross-workspace pollution).
Fallback uses command -v python3 for absolute path caching.
Reviewed by agy-existing (FAIL->fixed) and claude-existing (FAIL->fixed).
Both reviewers identified: slash omission, incomplete extraction at :57/:76,
export side effects. All issues resolved.
Implements user choice Option B: the two follow-ups to 0de0f23, in one patch.
Changes:
- skills/tmux-agent-orchestrate-monitor/scripts/reconcile.sh:
- drift-A skip-set extended: ('terminated', 'archived', 'stopped')
- prevents the monitor from overwriting a tmux-dead 'stopped' row with
'terminated (auto-detected)', which would lose resumable + captured id
- skills/tmux-agent-orchestrate-resume/scripts/update_yaml_resumed.sh:
- pop stopped_at, stopped_at_epoch, stop_reason, resumable on resume
(alongside the existing terminated_at*/termination_mode/archived_at) so a
resumed row has no stale end-of-session metadata
- skills/tmux-agent-orchestrate-monitor/SKILL.md: documented 'stopped' in the
drift class list + a skip-set note on drift class A
- skills/tmux-agent-orchestrate-resume/SKILL.md: documented stopped -> running
transition + tier-1 race-free resume path
5-route surface preserved (no new directory). delete_session.sh untouched.
Verified on isolated server -L claude-followup-test (kill-server after):
- syntax PASS
- E2E A: stop -> tmux dead -> reconcile --once -> status stays 'stopped'
- E2E B: resume -> stopped_at/stopped_at_epoch/stop_reason/resumable all gone
- E2E C: plain delete -> terminated, reconcile leaves it (no regression)
- Real YAML + main canary untouched
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>