fix(deploy): stage installer download and copy runtime assets no-clobber (FW-D1)

deploy/install.sh extracted the repo archive in-place with
`tar --strip-components=1`, which inside an existing project could silently
overwrite the target's own README.md/FUTURE_WORKS.md/etc and litter it with
this repo's dev docs.

Rebuild the fetch path:
- stage the clone/extract into a `mktemp -d` dir, never in-place
- verify `.agents/skills/lib.sh` is present before copying anything
- copy only runtime assets (.agents/, AGENT.md, .env.example) into the target
  with per-file no-clobber guards (`[ ! -e ]`), so existing files always win
- post-fetch sanity check now tests a file, not just the directory
- fail fast when neither git nor curl is available

Use explicit `[ ! -e ]` guards + a POSIX find merge rather than `cp -n`
(non-portable; emits a deprecation warning on GNU coreutils 9.x). The earlier
`tar --exclude` denylist idea was rejected in review: non-portable and the
unanchored `--exclude="scripts"` pattern stripped the skills' own nested
scripts/ dirs, yielding a silently broken install.

Mark FW-D1 resolved and FW-D2 partially addressed in FUTURE_WORKS.md/.ko.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-24 10:33:05 +09:00
parent 7eaaaf8944
commit 387b43d8e3
3 changed files with 84 additions and 17 deletions
+12 -2
View File
@@ -2,7 +2,7 @@
> **Purpose**: Track future work candidates for the `multi-agent-mux` project.
> For completed items, see `DONE.md`.
> **Last Updated**: 2026-06-22
> **Last Updated**: 2026-06-24
---
@@ -20,12 +20,17 @@ Below is the list of pending future work items. These items were proposed based
| **FW-P5** | Resolve BASH_SOURCE path resolution under zsh | P2 (Medium) | Small | **Portability**: Fix `lib.sh` interactive sourcing issues under zsh shell where `${BASH_SOURCE[0]}` resolves to empty. | None |
| **FW-P6** | Anchor project root dynamically via marker-file lookup | P1 (High) | Medium | **Portability**: Resolve structural fragility caused by hardcoded `../..` relative directory traversal in `lib.sh`, `status.sh`, and `reconcile.sh`. Use an upward search for root markers (`.git`, `.mam`, `.env`) to export a single source of truth for `WORKSPACE_ROOT`. | None |
| **FW-P7** | Enforce HMAC verification and liveness checks on monitor termination | P1 (High) | Medium | **Portability / Security**: Prevent remote session killing by unauthorized or spoofed events. Integrate `verify_hmac` inside the monitor (`reconcile.sh`'s `on_message` handler) and confirm expected artifacts exist before executing `tmux kill-session`. | None |
| **FW-P8** | Unify `.env` loading in `lib.sh` to prevent split-brain path resolution | P1 (High) | Small | **Portability / Consistency**: Sourcing the `.env` file inside `lib.sh` is critical to prevent split-brain path resolution where shell scripts query the default session database path while Python scripts query a custom path defined in `.env`. Sourcing `.env` at the top of `lib.sh` ensures all shell utilities automatically inherit user overrides for `TMUX_SERVER_NAME`, `AGENT_SESSIONS_YAML`, etc. | None |
| **FW-W1** | Replace global registry lock with fine-grained locks | P2 (Medium) | Medium | **Concurrency / Scaling**: Eliminate throughput bottlenecks where all progress/sequence updates channel through a single fcntl lock on `.mam/jobs/`. Implement per-job lock files. | None |
| **FW-W2** | Implement readiness probes for blind TUI key inputs | P2 (Medium) | Large | **Workflow**: Replace fixed timing sleeps in create, resume, and stop scripts with dynamic terminal readiness probes (e.g. scrapers or CLI checking hooks) to dismiss trust dialogs robustly. | None |
| **FW-W4** | Persist subscriber sequence numbers alongside job records | P1 (High) | Medium | **Workflow / Security**: Persist `subscriber.last_seq` to disk or SQLite to prevent sequence counter reset on subscriber restart, locking down the replay defense window for the full job lifetime. | None |
| **FW-W5** | Define structured message schema for reviewer verdicts | P2 (Medium) | Medium | **Workflow**: Create a dedicated reviewer topic (e.g., `reviews/<job_id>/verdicts`) emitting structured JSON verdicts (`PASS` / `NOT_PASS` + details) to eliminate raw text grepping by the PM. | None |
| **FW-W6** | Expand monitor reconciliation support to Hermes agent | P2 (Medium) | Medium | **Workflow / Consistency**: Fully integrate `hermes` sessions into auto-registration (drift-B) and ID materialization (drift-C) under `reconcile.sh` to match Claude/Agy monitoring coverage. | None |
| **FW-W7** | Resolve path slug collisions in derive_session_name | P2 (Medium) | Small | **Workflow / Collision Avoidance**: Update `derive_session_name` to handle same-name nested directories (e.g. `/projectA/src` and `/projectB/src` both slugify to identical session names) by incorporating workspace-scoped identifiers or hash digests. | None |
| ~~**FW-D1**~~ | ✅ **RESOLVED (2026-06-24)** — installer no longer extracts in-place | — | — | **Deploy / Safety**: `deploy/install.sh` now stages the download into a `mktemp -d` dir, verifies `.agents/skills/lib.sh` is present, then copies only the runtime assets (`.agents/`, `AGENT.md`, `.env.example`) into the target with per-file no-clobber guards (`[ ! -e ]`), so existing target files always win and repo dev docs never land in the workspace. The post-fetch sanity check now tests a file, not just the directory. | Done |
| **FW-D2** | Pin and verify the source the installer downloads before sourcing it | P2 (Medium) | Small | **Deploy / Supply-chain**: The installer clones/extracts the moving `main` branch over the network, and the workspace later `source`s those shell scripts (`lib.sh` et al.). *Partially addressed (2026-06-24): the staged tree is now verified to contain `.agents/skills/lib.sh` before any file is copied.* **Remaining:** pin to a release tag or commit SHA and/or verify a published checksum so the fetched content is integrity-checked, not merely structurally present. | None |
| **FW-D3** | De-duplicate NFS detection between `install.sh` and `lib.sh` | P2 (Medium) | Small | **Deploy / Portability**: `deploy/install.sh` re-implements the GNU-specific `df --output=target` + `mount` NFS check already present in `lib.sh::_check_is_nfs`. The FW-P1 portability fix must cover this second copy — extract a single shared helper so both call sites stay correct on macOS/BSD. | FW-P1 |
| **FW-D4** | Close CI shellcheck coverage gaps | P3 (Low) | Small | **Deploy / Quality**: `deploy/gitea-ci.yml` shellchecks only 5 scripts; `status.sh`, `resolve_session_id.sh`, `update_yaml_resumed.sh`, and `scripts/generate-env.sh` are never linted. Glob all tracked `*.sh` so new scripts are covered automatically. | None |
---
@@ -44,4 +49,9 @@ Below is the list of pending future work items. These items were proposed based
* Auto-termination must not trust unauthenticated events. Since `reconcile.sh` listens to a wildcard topic, any client on a public broker could spoof a terminal message and trigger `tmux kill-session`. Requiring HMAC signature verification on the terminal event path, combined with artifact validation, mitigates spoofing and accidental session cleanup.
5. **Consolidation of per-job watchdogs (FW-W3)**:
* Instead of spawning an independent `watchdog.sh` process for each job which reconnects every 2 minutes, we consolidated the event handling, HMAC security verification, and sequence tracking into a single, persistent wildcard subscriber running under `reconcile.sh --subscribe`. This drastically reduces MQTT broker connections, simplifies cleanup logic, and leverages python's memory storage to handle replay attack prevention (monotonic sequence numbers) for concurrent jobs.
* Instead of spawning an independent `watchdog.sh` process for each job which reconnects every 2 minutes, we consolidated the event handling, HMAC security verification, and sequence tracking into a single, persistent wildcard subscriber running under `reconcile.sh --subscribe`. This drastically reduces MQTT broker connections, simplifies cleanup logic, and leverages python's memory storage to handle replay attack prevention (monotonic sequence numbers) for concurrent jobs.
6. **Consistent `.env` Sourcing across Shell and Python (FW-P8)**:
* Sourcing the `.env` configuration file inside `lib.sh` ensures that shell utilities and Python scripts are fully aligned. Without this, customized database locations or isolated tmux server names declared in `.env` are only honored by the Python-based MQTT subsystems, while the shell orchestrators silently fall back to default socket files and paths.
7. **Deployment Installer Hardening (FW-D1 ~ FW-D4)**:
* `deploy/install.sh` and the Gitea templates are the newest, least-reviewed surface (added after the DONE.md verification round) and the one path that runs *before* any of the reviewed orchestration code. **FW-D1 (the release blocker) is now resolved (2026-06-24):** rather than the originally proposed `tar --exclude` denylist — which review showed was non-portable and, worse, stripped the skills' own nested `scripts/` directories via the unanchored `--exclude="scripts"` pattern, yielding a silently broken install — the installer was rebuilt around temp-dir staging + an allowlist copy of runtime assets with per-file no-clobber guards. This closes the destructive-overwrite hole and the dev-doc clutter in one move. FW-D2 is partially addressed (the staged tree is structurally verified before copy); the remaining supply-chain hardening is pinning the fetch to a tag/SHA + checksum. FW-D3 (NFS detection drift, folded into FW-P1) and FW-D4 (CI lint coverage) remain open consistency/quality debt.