387b43d8e3
deploy/install.sh extracted the repo archive in-place with `tar --strip-components=1`, which inside an existing project could silently overwrite the target's own README.md/FUTURE_WORKS.md/etc and litter it with this repo's dev docs. Rebuild the fetch path: - stage the clone/extract into a `mktemp -d` dir, never in-place - verify `.agents/skills/lib.sh` is present before copying anything - copy only runtime assets (.agents/, AGENT.md, .env.example) into the target with per-file no-clobber guards (`[ ! -e ]`), so existing files always win - post-fetch sanity check now tests a file, not just the directory - fail fast when neither git nor curl is available Use explicit `[ ! -e ]` guards + a POSIX find merge rather than `cp -n` (non-portable; emits a deprecation warning on GNU coreutils 9.x). The earlier `tar --exclude` denylist idea was rejected in review: non-portable and the unanchored `--exclude="scripts"` pattern stripped the skills' own nested scripts/ dirs, yielding a silently broken install. Mark FW-D1 resolved and FW-D2 partially addressed in FUTURE_WORKS.md/.ko.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
11 KiB
11 KiB
FUTURE_WORKS.md
Purpose: Track future work candidates for the
multi-agent-muxproject. For completed items, seeDONE.md. Last Updated: 2026-06-24
Future Improvements Roadmap
Below is the list of pending future work items. These items were proposed based on the security, concurrency, portability, and workflow analysis of the system.
| ID | Task | Priority | Effort | Domain / Description | Dependencies |
|---|---|---|---|---|---|
| FW-L4 | Migrate Job Registry to SQLite to overcome NFS flock limitations | P3 (Low) | Large | Concurrency/Infrastructure Scalability: Similar to the Session Registry, migrate the individual JSON file lock (fcntl.flock) registry structure into an integrated SQLite database transaction structure, guaranteeing full reliability in distributed/network file systems like NFS. |
Conditional (commence only when multi-host/NFS deployment is required) |
| FW-P1 | Eliminate GNU/Linux userland assumptions in lib.sh | P2 (Medium) | Small | Portability: Replace GNU coreutils-specific commands (like df --output=target and Linux-specific mount formats) in lib.sh with portable equivalents, resolving silent failures of NFS detection on macOS/BSD. |
None |
| FW-P2 | Add explicit Windows concurrency strategy in mqtt_common.py | P1 (High) | Medium | Portability / Concurrency: Detect non-POSIX systems at module initialization and either fail fast with a descriptive warning or substitute alternative lock strategies (e.g. msvcrt.locking), while preserving the best-effort nature of the _file_lock log appender. |
None |
| FW-P3 | Align virtualenv loading and dependency verifications | P2 (Medium) | Medium | Portability: Prevent local interpreter mismatches in Poetry/UV environments and ensure the launch scripts fail early with clear diagnostic warnings if required Python dependencies are missing at startup. | None |
| FW-P4 | Secure default MQTT broker and namespaces | P1 (High) | Medium | Portability / Security: Prevent remote session hijack and eavesdropping by providing a private TLS-enabled broker template rather than defaulting to broker.hivemq.com in public namespaces. |
None |
| FW-P5 | Resolve BASH_SOURCE path resolution under zsh | P2 (Medium) | Small | Portability: Fix lib.sh interactive sourcing issues under zsh shell where ${BASH_SOURCE[0]} resolves to empty. |
None |
| FW-P6 | Anchor project root dynamically via marker-file lookup | P1 (High) | Medium | Portability: Resolve structural fragility caused by hardcoded ../.. relative directory traversal in lib.sh, status.sh, and reconcile.sh. Use an upward search for root markers (.git, .mam, .env) to export a single source of truth for WORKSPACE_ROOT. |
None |
| FW-P7 | Enforce HMAC verification and liveness checks on monitor termination | P1 (High) | Medium | Portability / Security: Prevent remote session killing by unauthorized or spoofed events. Integrate verify_hmac inside the monitor (reconcile.sh's on_message handler) and confirm expected artifacts exist before executing tmux kill-session. |
None |
| FW-P8 | Unify .env loading in lib.sh to prevent split-brain path resolution |
P1 (High) | Small | Portability / Consistency: Sourcing the .env file inside lib.sh is critical to prevent split-brain path resolution where shell scripts query the default session database path while Python scripts query a custom path defined in .env. Sourcing .env at the top of lib.sh ensures all shell utilities automatically inherit user overrides for TMUX_SERVER_NAME, AGENT_SESSIONS_YAML, etc. |
None |
| FW-W1 | Replace global registry lock with fine-grained locks | P2 (Medium) | Medium | Concurrency / Scaling: Eliminate throughput bottlenecks where all progress/sequence updates channel through a single fcntl lock on .mam/jobs/. Implement per-job lock files. |
None |
| FW-W2 | Implement readiness probes for blind TUI key inputs | P2 (Medium) | Large | Workflow: Replace fixed timing sleeps in create, resume, and stop scripts with dynamic terminal readiness probes (e.g. scrapers or CLI checking hooks) to dismiss trust dialogs robustly. | None |
| FW-W4 | Persist subscriber sequence numbers alongside job records | P1 (High) | Medium | Workflow / Security: Persist subscriber.last_seq to disk or SQLite to prevent sequence counter reset on subscriber restart, locking down the replay defense window for the full job lifetime. |
None |
| FW-W5 | Define structured message schema for reviewer verdicts | P2 (Medium) | Medium | Workflow: Create a dedicated reviewer topic (e.g., reviews/<job_id>/verdicts) emitting structured JSON verdicts (PASS / NOT_PASS + details) to eliminate raw text grepping by the PM. |
None |
| FW-W6 | Expand monitor reconciliation support to Hermes agent | P2 (Medium) | Medium | Workflow / Consistency: Fully integrate hermes sessions into auto-registration (drift-B) and ID materialization (drift-C) under reconcile.sh to match Claude/Agy monitoring coverage. |
None |
| FW-W7 | Resolve path slug collisions in derive_session_name | P2 (Medium) | Small | Workflow / Collision Avoidance: Update derive_session_name to handle same-name nested directories (e.g. /projectA/src and /projectB/src both slugify to identical session names) by incorporating workspace-scoped identifiers or hash digests. |
None |
| ✅ RESOLVED (2026-06-24) — installer no longer extracts in-place | — | — | Deploy / Safety: deploy/install.sh now stages the download into a mktemp -d dir, verifies .agents/skills/lib.sh is present, then copies only the runtime assets (.agents/, AGENT.md, .env.example) into the target with per-file no-clobber guards ([ ! -e ]), so existing target files always win and repo dev docs never land in the workspace. The post-fetch sanity check now tests a file, not just the directory. |
Done | |
| FW-D2 | Pin and verify the source the installer downloads before sourcing it | P2 (Medium) | Small | Deploy / Supply-chain: The installer clones/extracts the moving main branch over the network, and the workspace later sources those shell scripts (lib.sh et al.). Partially addressed (2026-06-24): the staged tree is now verified to contain .agents/skills/lib.sh before any file is copied. Remaining: pin to a release tag or commit SHA and/or verify a published checksum so the fetched content is integrity-checked, not merely structurally present. |
None |
| FW-D3 | De-duplicate NFS detection between install.sh and lib.sh |
P2 (Medium) | Small | Deploy / Portability: deploy/install.sh re-implements the GNU-specific df --output=target + mount NFS check already present in lib.sh::_check_is_nfs. The FW-P1 portability fix must cover this second copy — extract a single shared helper so both call sites stay correct on macOS/BSD. |
FW-P1 |
| FW-D4 | Close CI shellcheck coverage gaps | P3 (Low) | Small | Deploy / Quality: deploy/gitea-ci.yml shellchecks only 5 scripts; status.sh, resolve_session_id.sh, update_yaml_resumed.sh, and scripts/generate-env.sh are never linted. Glob all tracked *.sh so new scripts are covered automatically. |
None |
Detailed Discussion Results & Directions (Reviewer Consensus)
-
Conditional Deferral of SQLite Integration (FW-L4):
- Unlike the session registry, maintaining individual job data in JSON files is highly intuitive for management and debugging. Since the current deployment is constrained to a single-host local file system,
fcntl.flocklocks are sufficient. Thus, this is assigned a low priority (P3) and will be tackled conditionally.
- Unlike the session registry, maintaining individual job data in JSON files is highly intuitive for management and debugging. Since the current deployment is constrained to a single-host local file system,
-
Explicit Concurrency Strategy on Windows (FW-P1, FW-P2):
- Silent failovers are the worst design patterns for concurrency. Instead of letting Windows environments run without a lock (which occurs when fcntl fails silently), we detect POSIX availability at startup. We either fail fast to prompt the user to use a POSIX-compliant shell/wrapper, or dynamically load
msvcrt.lockingto provide a matching file locking mechanism. This guarantees consistent synchronization behaviors across Windows and Unix platforms.
- Silent failovers are the worst design patterns for concurrency. Instead of letting Windows environments run without a lock (which occurs when fcntl fails silently), we detect POSIX availability at startup. We either fail fast to prompt the user to use a POSIX-compliant shell/wrapper, or dynamically load
-
Dynamic Root Anchor (FW-P6):
- Hardcoding relative depth limits (like
../..relative to a skill's location) creates direct fragility when moving directories or refactoring. By walking up the directory tree to search for known anchors (like.gitor.mam), we establish a single canonical root path and prevent scripts from breaking when their execution wrappers are relocated.
- Hardcoding relative depth limits (like
-
Monitor Termination Authorization (FW-P7):
- Auto-termination must not trust unauthenticated events. Since
reconcile.shlistens to a wildcard topic, any client on a public broker could spoof a terminal message and triggertmux kill-session. Requiring HMAC signature verification on the terminal event path, combined with artifact validation, mitigates spoofing and accidental session cleanup.
- Auto-termination must not trust unauthenticated events. Since
-
Consolidation of per-job watchdogs (FW-W3):
- Instead of spawning an independent
watchdog.shprocess for each job which reconnects every 2 minutes, we consolidated the event handling, HMAC security verification, and sequence tracking into a single, persistent wildcard subscriber running underreconcile.sh --subscribe. This drastically reduces MQTT broker connections, simplifies cleanup logic, and leverages python's memory storage to handle replay attack prevention (monotonic sequence numbers) for concurrent jobs.
- Instead of spawning an independent
-
Consistent
.envSourcing across Shell and Python (FW-P8):- Sourcing the
.envconfiguration file insidelib.shensures that shell utilities and Python scripts are fully aligned. Without this, customized database locations or isolated tmux server names declared in.envare only honored by the Python-based MQTT subsystems, while the shell orchestrators silently fall back to default socket files and paths.
- Sourcing the
-
Deployment Installer Hardening (FW-D1 ~ FW-D4):
deploy/install.shand the Gitea templates are the newest, least-reviewed surface (added after the DONE.md verification round) and the one path that runs before any of the reviewed orchestration code. FW-D1 (the release blocker) is now resolved (2026-06-24): rather than the originally proposedtar --excludedenylist — which review showed was non-portable and, worse, stripped the skills' own nestedscripts/directories via the unanchored--exclude="scripts"pattern, yielding a silently broken install — the installer was rebuilt around temp-dir staging + an allowlist copy of runtime assets with per-file no-clobber guards. This closes the destructive-overwrite hole and the dev-doc clutter in one move. FW-D2 is partially addressed (the staged tree is structurally verified before copy); the remaining supply-chain hardening is pinning the fetch to a tag/SHA + checksum. FW-D3 (NFS detection drift, folded into FW-P1) and FW-D4 (CI lint coverage) remain open consistency/quality debt.