FUTURE_WORKS.md

Purpose: Track future work candidates for the multi-agent-mux project. For completed items, see DONE.md. Last Updated: 2026-06-24

Future Improvements Roadmap

Below is the list of pending future work items. These items were proposed based on the security, concurrency, portability, and workflow analysis of the system.

ID	Task	Priority	Effort	Domain / Description	Dependencies
FW-L4	Migrate Job Registry to SQLite to overcome NFS flock limitations	P3 (Low)	Large	Concurrency/Infrastructure Scalability: Similar to the Session Registry, migrate the individual JSON file lock (`fcntl.flock`) registry structure into an integrated SQLite database transaction structure, guaranteeing full reliability in distributed/network file systems like NFS.	Conditional (commence only when multi-host/NFS deployment is required)
FW-P1	Eliminate GNU/Linux userland assumptions in lib.sh	P2 (Medium)	Small	Portability: Replace GNU coreutils-specific commands (like `df --output=target` and Linux-specific mount formats) in `lib.sh` with portable equivalents, resolving silent failures of NFS detection on macOS/BSD.	None
FW-P2	Add explicit Windows concurrency strategy in mqtt_common.py	P1 (High)	Medium	Portability / Concurrency: Detect non-POSIX systems at module initialization and either fail fast with a descriptive warning or substitute alternative lock strategies (e.g. `msvcrt.locking`), while preserving the best-effort nature of the `_file_lock` log appender.	None
FW-P3	Align virtualenv loading and dependency verifications	P2 (Medium)	Medium	Portability: Prevent local interpreter mismatches in Poetry/UV environments and ensure the launch scripts fail early with clear diagnostic warnings if required Python dependencies are missing at startup.	None
FW-P4	Secure default MQTT broker and namespaces	P1 (High)	Medium	Portability / Security: Prevent remote session hijack and eavesdropping by providing a private TLS-enabled broker template rather than defaulting to `broker.hivemq.com` in public namespaces.	None
FW-P5	Resolve BASH_SOURCE path resolution under zsh	P2 (Medium)	Small	Portability: Fix `lib.sh` interactive sourcing issues under zsh shell where `${BASH_SOURCE[0]}` resolves to empty.	None
FW-P6	Anchor project root dynamically via marker-file lookup	P1 (High)	Medium	Portability: Resolve structural fragility caused by hardcoded `../..` relative directory traversal in `lib.sh`, `status.sh`, and `reconcile.sh`. Use an upward search for root markers (`.git`, `.mam`, `.env`) to export a single source of truth for `WORKSPACE_ROOT`.	None
FW-P7	Enforce HMAC verification and liveness checks on monitor termination	P1 (High)	Medium	Portability / Security: Prevent remote session killing by unauthorized or spoofed events. Integrate `verify_hmac` inside the monitor (`reconcile.sh`'s `on_message` handler) and confirm expected artifacts exist before executing `tmux kill-session`.	None
FW-P8	Unify `.env` loading in `lib.sh` to prevent split-brain path resolution	P1 (High)	Small	Portability / Consistency: Sourcing the `.env` file inside `lib.sh` is critical to prevent split-brain path resolution where shell scripts query the default session database path while Python scripts query a custom path defined in `.env`. Sourcing `.env` at the top of `lib.sh` ensures all shell utilities automatically inherit user overrides for `TMUX_SERVER_NAME`, `AGENT_SESSIONS_YAML`, etc.	None
FW-W1	Replace global registry lock with fine-grained locks	P2 (Medium)	Medium	Concurrency / Scaling: Eliminate throughput bottlenecks where all progress/sequence updates channel through a single fcntl lock on `.mam/jobs/`. Implement per-job lock files.	None
FW-W2	Implement readiness probes for blind TUI key inputs	P2 (Medium)	Large	Workflow: Replace fixed timing sleeps in create, resume, and stop scripts with dynamic terminal readiness probes (e.g. scrapers or CLI checking hooks) to dismiss trust dialogs robustly.	None
FW-W4	Persist subscriber sequence numbers alongside job records	P1 (High)	Medium	Workflow / Security: Persist `subscriber.last_seq` to disk or SQLite to prevent sequence counter reset on subscriber restart, locking down the replay defense window for the full job lifetime.	None
FW-W5	Define structured message schema for reviewer verdicts	P2 (Medium)	Medium	Workflow: Create a dedicated reviewer topic (e.g., `reviews/<job_id>/verdicts`) emitting structured JSON verdicts (`PASS` / `NOT_PASS` + details) to eliminate raw text grepping by the PM.	None
FW-W6	Expand monitor reconciliation support to Hermes agent	P2 (Medium)	Medium	Workflow / Consistency: Fully integrate `hermes` sessions into auto-registration (drift-B) and ID materialization (drift-C) under `reconcile.sh` to match Claude/Agy monitoring coverage.	None
FW-W7	Resolve path slug collisions in derive_session_name	P2 (Medium)	Small	Workflow / Collision Avoidance: Update `derive_session_name` to handle same-name nested directories (e.g. `/projectA/src` and `/projectB/src` both slugify to identical session names) by incorporating workspace-scoped identifiers or hash digests.	None
~~FW-D1~~	✅ RESOLVED (2026-06-24) — installer no longer extracts in-place	—	—	Deploy / Safety: `deploy/install.sh` now stages the download into a `mktemp -d` dir, verifies `.agents/skills/lib.sh` is present, then copies only the runtime assets (`.agents/`, `.env.example`) into the target with per-file no-clobber guards (`[ ! -e ]`), so existing target files always win and repo dev docs never land in the workspace. The post-fetch sanity check now tests a file, not just the directory.	Done
FW-D2	Pin and verify the source the installer downloads before sourcing it	P2 (Medium)	Small	Deploy / Supply-chain: The installer clones/extracts the moving `main` branch over the network, and the workspace later `source`s those shell scripts (`lib.sh` et al.). Partially addressed (2026-06-24): the staged tree is now verified to contain `.agents/skills/lib.sh` before any file is copied. Remaining: pin to a release tag or commit SHA and/or verify a published checksum so the fetched content is integrity-checked, not merely structurally present.	None
FW-D3	De-duplicate NFS detection between `install.sh` and `lib.sh`	P2 (Medium)	Small	Deploy / Portability: `deploy/install.sh` re-implements the GNU-specific `df --output=target` + `mount` NFS check already present in `lib.sh::_check_is_nfs`. The FW-P1 portability fix must cover this second copy — extract a single shared helper so both call sites stay correct on macOS/BSD.	FW-P1
FW-D4	Close CI shellcheck coverage gaps	P3 (Low)	Small	Deploy / Quality: `deploy/gitea-ci.yml` shellchecks only 5 scripts; `status.sh`, `resolve_session_id.sh`, `update_yaml_resumed.sh`, and `scripts/generate-env.sh` are never linted. Glob all tracked `*.sh` so new scripts are covered automatically.	None

Detailed Discussion Results & Directions (Reviewer Consensus)

Conditional Deferral of SQLite Integration (FW-L4):
- Unlike the session registry, maintaining individual job data in JSON files is highly intuitive for management and debugging. Since the current deployment is constrained to a single-host local file system, fcntl.flock locks are sufficient. Thus, this is assigned a low priority (P3) and will be tackled conditionally.
Explicit Concurrency Strategy on Windows (FW-P1, FW-P2):
- Silent failovers are the worst design patterns for concurrency. Instead of letting Windows environments run without a lock (which occurs when fcntl fails silently), we detect POSIX availability at startup. We either fail fast to prompt the user to use a POSIX-compliant shell/wrapper, or dynamically load msvcrt.locking to provide a matching file locking mechanism. This guarantees consistent synchronization behaviors across Windows and Unix platforms.
Dynamic Root Anchor (FW-P6):
- Hardcoding relative depth limits (like ../.. relative to a skill's location) creates direct fragility when moving directories or refactoring. By walking up the directory tree to search for known anchors (like .git or .mam), we establish a single canonical root path and prevent scripts from breaking when their execution wrappers are relocated.
Monitor Termination Authorization (FW-P7):
- Auto-termination must not trust unauthenticated events. Since reconcile.sh listens to a wildcard topic, any client on a public broker could spoof a terminal message and trigger tmux kill-session. Requiring HMAC signature verification on the terminal event path, combined with artifact validation, mitigates spoofing and accidental session cleanup.
Consolidation of per-job watchdogs (FW-W3):
- Instead of spawning an independent watchdog.sh process for each job which reconnects every 2 minutes, we consolidated the event handling, HMAC security verification, and sequence tracking into a single, persistent wildcard subscriber running under reconcile.sh --subscribe. This drastically reduces MQTT broker connections, simplifies cleanup logic, and leverages python's memory storage to handle replay attack prevention (monotonic sequence numbers) for concurrent jobs.
Consistent .env Sourcing across Shell and Python (FW-P8):
- Sourcing the .env configuration file inside lib.sh ensures that shell utilities and Python scripts are fully aligned. Without this, customized database locations or isolated tmux server names declared in .env are only honored by the Python-based MQTT subsystems, while the shell orchestrators silently fall back to default socket files and paths.
Deployment Installer Hardening (FW-D1 ~ FW-D4):
- deploy/install.sh and the Gitea templates are the newest, least-reviewed surface (added after the DONE.md verification round) and the one path that runs before any of the reviewed orchestration code. FW-D1 (the release blocker) is now resolved (2026-06-24): rather than the originally proposed tar --exclude denylist — which review showed was non-portable and, worse, stripped the skills' own nested scripts/ directories via the unanchored --exclude="scripts" pattern, yielding a silently broken install — the installer was rebuilt around temp-dir staging + an allowlist copy of runtime assets with per-file no-clobber guards. This closes the destructive-overwrite hole and the dev-doc clutter in one move. FW-D2 is partially addressed (the staged tree is structurally verified before copy); the remaining supply-chain hardening is pinning the fetch to a tag/SHA + checksum. FW-D3 (NFS detection drift, folded into FW-P1) and FW-D4 (CI lint coverage) remain open consistency/quality debt.

11 KiB Raw Permalink Blame History

FUTURE_WORKS.md

Future Improvements Roadmap

Detailed Discussion Results & Directions (Reviewer Consensus)

11 KiB

Raw Permalink Blame History