57 lines
11 KiB
Markdown
57 lines
11 KiB
Markdown
# FUTURE_WORKS.md
|
|
|
|
> **Purpose**: Track future work candidates for the `multi-agent-mux` project.
|
|
> For completed items, see `DONE.md`.
|
|
> **Last Updated**: 2026-06-24
|
|
|
|
---
|
|
|
|
## Future Improvements Roadmap
|
|
|
|
Below is the list of pending future work items. These items were proposed based on the security, concurrency, portability, and workflow analysis of the system.
|
|
|
|
| ID | Task | Priority | Effort | Domain / Description | Dependencies |
|
|
|---|---|---|---|---|---|
|
|
| **FW-L4** | Migrate Job Registry to SQLite to overcome NFS flock limitations | P3 (Low) | Large | **Concurrency/Infrastructure Scalability**: Similar to the Session Registry, migrate the individual JSON file lock (`fcntl.flock`) registry structure into an integrated SQLite database transaction structure, guaranteeing full reliability in distributed/network file systems like NFS. | **Conditional** (commence only when multi-host/NFS deployment is required) |
|
|
| **FW-P1** | Eliminate GNU/Linux userland assumptions in lib.sh | P2 (Medium) | Small | **Portability**: Replace GNU coreutils-specific commands (like `df --output=target` and Linux-specific mount formats) in `lib.sh` with portable equivalents, resolving silent failures of NFS detection on macOS/BSD. | None |
|
|
| **FW-P2** | Add explicit Windows concurrency strategy in mqtt_common.py | P1 (High) | Medium | **Portability / Concurrency**: Detect non-POSIX systems at module initialization and either fail fast with a descriptive warning or substitute alternative lock strategies (e.g. `msvcrt.locking`), while preserving the best-effort nature of the `_file_lock` log appender. | None |
|
|
| **FW-P3** | Align virtualenv loading and dependency verifications | P2 (Medium) | Medium | **Portability**: Prevent local interpreter mismatches in Poetry/UV environments and ensure the launch scripts fail early with clear diagnostic warnings if required Python dependencies are missing at startup. | None |
|
|
| **FW-P4** | Secure default MQTT broker and namespaces | P1 (High) | Medium | **Portability / Security**: Prevent remote session hijack and eavesdropping by providing a private TLS-enabled broker template rather than defaulting to `broker.hivemq.com` in public namespaces. | None |
|
|
| **FW-P5** | Resolve BASH_SOURCE path resolution under zsh | P2 (Medium) | Small | **Portability**: Fix `lib.sh` interactive sourcing issues under zsh shell where `${BASH_SOURCE[0]}` resolves to empty. | None |
|
|
| **FW-P6** | Anchor project root dynamically via marker-file lookup | P1 (High) | Medium | **Portability**: Resolve structural fragility caused by hardcoded `../..` relative directory traversal in `lib.sh`, `status.sh`, and `reconcile.sh`. Use an upward search for root markers (`.git`, `.mam`, `.env`) to export a single source of truth for `WORKSPACE_ROOT`. | None |
|
|
| **FW-P7** | Enforce HMAC verification and liveness checks on monitor termination | P1 (High) | Medium | **Portability / Security**: Prevent remote session killing by unauthorized or spoofed events. Integrate `verify_hmac` inside the monitor (`reconcile.sh`'s `on_message` handler) and confirm expected artifacts exist before executing `tmux kill-session`. | None |
|
|
| **FW-P8** | Unify `.env` loading in `lib.sh` to prevent split-brain path resolution | P1 (High) | Small | **Portability / Consistency**: Sourcing the `.env` file inside `lib.sh` is critical to prevent split-brain path resolution where shell scripts query the default session database path while Python scripts query a custom path defined in `.env`. Sourcing `.env` at the top of `lib.sh` ensures all shell utilities automatically inherit user overrides for `TMUX_SERVER_NAME`, `AGENT_SESSIONS_YAML`, etc. | None |
|
|
| **FW-W1** | Replace global registry lock with fine-grained locks | P2 (Medium) | Medium | **Concurrency / Scaling**: Eliminate throughput bottlenecks where all progress/sequence updates channel through a single fcntl lock on `.mam/jobs/`. Implement per-job lock files. | None |
|
|
| **FW-W2** | Implement readiness probes for blind TUI key inputs | P2 (Medium) | Large | **Workflow**: Replace fixed timing sleeps in create, resume, and stop scripts with dynamic terminal readiness probes (e.g. scrapers or CLI checking hooks) to dismiss trust dialogs robustly. | None |
|
|
| **FW-W4** | Persist subscriber sequence numbers alongside job records | P1 (High) | Medium | **Workflow / Security**: Persist `subscriber.last_seq` to disk or SQLite to prevent sequence counter reset on subscriber restart, locking down the replay defense window for the full job lifetime. | None |
|
|
| **FW-W5** | Define structured message schema for reviewer verdicts | P2 (Medium) | Medium | **Workflow**: Create a dedicated reviewer topic (e.g., `reviews/<job_id>/verdicts`) emitting structured JSON verdicts (`PASS` / `NOT_PASS` + details) to eliminate raw text grepping by the PM. | None |
|
|
| **FW-W6** | Expand monitor reconciliation support to Hermes agent | P2 (Medium) | Medium | **Workflow / Consistency**: Fully integrate `hermes` sessions into auto-registration (drift-B) and ID materialization (drift-C) under `reconcile.sh` to match Claude/Agy monitoring coverage. | None |
|
|
| **FW-W7** | Resolve path slug collisions in derive_session_name | P2 (Medium) | Small | **Workflow / Collision Avoidance**: Update `derive_session_name` to handle same-name nested directories (e.g. `/projectA/src` and `/projectB/src` both slugify to identical session names) by incorporating workspace-scoped identifiers or hash digests. | None |
|
|
| ~~**FW-D1**~~ | ✅ **RESOLVED (2026-06-24)** — installer no longer extracts in-place | — | — | **Deploy / Safety**: `deploy/install.sh` now stages the download into a `mktemp -d` dir, verifies `.agents/skills/lib.sh` is present, then copies only the runtime assets (`.agents/`, `.env.example`) into the target with per-file no-clobber guards (`[ ! -e ]`), so existing target files always win and repo dev docs never land in the workspace. The post-fetch sanity check now tests a file, not just the directory. | Done |
|
|
| **FW-D2** | Pin and verify the source the installer downloads before sourcing it | P2 (Medium) | Small | **Deploy / Supply-chain**: The installer clones/extracts the moving `main` branch over the network, and the workspace later `source`s those shell scripts (`lib.sh` et al.). *Partially addressed (2026-06-24): the staged tree is now verified to contain `.agents/skills/lib.sh` before any file is copied.* **Remaining:** pin to a release tag or commit SHA and/or verify a published checksum so the fetched content is integrity-checked, not merely structurally present. | None |
|
|
| **FW-D3** | De-duplicate NFS detection between `install.sh` and `lib.sh` | P2 (Medium) | Small | **Deploy / Portability**: `deploy/install.sh` re-implements the GNU-specific `df --output=target` + `mount` NFS check already present in `lib.sh::_check_is_nfs`. The FW-P1 portability fix must cover this second copy — extract a single shared helper so both call sites stay correct on macOS/BSD. | FW-P1 |
|
|
| **FW-D4** | Close CI shellcheck coverage gaps | P3 (Low) | Small | **Deploy / Quality**: `deploy/gitea-ci.yml` shellchecks only 5 scripts; `status.sh`, `resolve_session_id.sh`, `update_yaml_resumed.sh`, and `scripts/generate-env.sh` are never linted. Glob all tracked `*.sh` so new scripts are covered automatically. | None |
|
|
|
|
---
|
|
|
|
### Detailed Discussion Results & Directions (Reviewer Consensus)
|
|
|
|
1. **Conditional Deferral of SQLite Integration (FW-L4)**:
|
|
* Unlike the session registry, maintaining individual job data in JSON files is highly intuitive for management and debugging. Since the current deployment is constrained to a single-host local file system, `fcntl.flock` locks are sufficient. Thus, this is assigned a low priority (P3) and will be tackled conditionally.
|
|
|
|
2. **Explicit Concurrency Strategy on Windows (FW-P1, FW-P2)**:
|
|
* Silent failovers are the worst design patterns for concurrency. Instead of letting Windows environments run without a lock (which occurs when fcntl fails silently), we detect POSIX availability at startup. We either fail fast to prompt the user to use a POSIX-compliant shell/wrapper, or dynamically load `msvcrt.locking` to provide a matching file locking mechanism. This guarantees consistent synchronization behaviors across Windows and Unix platforms.
|
|
|
|
3. **Dynamic Root Anchor (FW-P6)**:
|
|
* Hardcoding relative depth limits (like `../..` relative to a skill's location) creates direct fragility when moving directories or refactoring. By walking up the directory tree to search for known anchors (like `.git` or `.mam`), we establish a single canonical root path and prevent scripts from breaking when their execution wrappers are relocated.
|
|
|
|
4. **Monitor Termination Authorization (FW-P7)**:
|
|
* Auto-termination must not trust unauthenticated events. Since `reconcile.sh` listens to a wildcard topic, any client on a public broker could spoof a terminal message and trigger `tmux kill-session`. Requiring HMAC signature verification on the terminal event path, combined with artifact validation, mitigates spoofing and accidental session cleanup.
|
|
|
|
5. **Consolidation of per-job watchdogs (FW-W3)**:
|
|
* Instead of spawning an independent `watchdog.sh` process for each job which reconnects every 2 minutes, we consolidated the event handling, HMAC security verification, and sequence tracking into a single, persistent wildcard subscriber running under `reconcile.sh --subscribe`. This drastically reduces MQTT broker connections, simplifies cleanup logic, and leverages python's memory storage to handle replay attack prevention (monotonic sequence numbers) for concurrent jobs.
|
|
6. **Consistent `.env` Sourcing across Shell and Python (FW-P8)**:
|
|
* Sourcing the `.env` configuration file inside `lib.sh` ensures that shell utilities and Python scripts are fully aligned. Without this, customized database locations or isolated tmux server names declared in `.env` are only honored by the Python-based MQTT subsystems, while the shell orchestrators silently fall back to default socket files and paths.
|
|
|
|
7. **Deployment Installer Hardening (FW-D1 ~ FW-D4)**:
|
|
* `deploy/install.sh` and the Gitea templates are the newest, least-reviewed surface (added after the DONE.md verification round) and the one path that runs *before* any of the reviewed orchestration code. **FW-D1 (the release blocker) is now resolved (2026-06-24):** rather than the originally proposed `tar --exclude` denylist — which review showed was non-portable and, worse, stripped the skills' own nested `scripts/` directories via the unanchored `--exclude="scripts"` pattern, yielding a silently broken install — the installer was rebuilt around temp-dir staging + an allowlist copy of runtime assets with per-file no-clobber guards. This closes the destructive-overwrite hole and the dev-doc clutter in one move. FW-D2 is partially addressed (the staged tree is structurally verified before copy); the remaining supply-chain hardening is pinning the fetch to a tag/SHA + checksum. FW-D3 (NFS detection drift, folded into FW-P1) and FW-D4 (CI lint coverage) remain open consistency/quality debt. |