Files

11 KiB

FUTURE_WORKS.md

Purpose: Track future work candidates for the multi-agent-mux project. For completed items, see DONE.md. Last Updated: 2026-06-24


Future Improvements Roadmap

Below is the list of pending future work items. These items were proposed based on the security, concurrency, portability, and workflow analysis of the system.

ID Task Priority Effort Domain / Description Dependencies
FW-L4 Migrate Job Registry to SQLite to overcome NFS flock limitations P3 (Low) Large Concurrency/Infrastructure Scalability: Similar to the Session Registry, migrate the individual JSON file lock (fcntl.flock) registry structure into an integrated SQLite database transaction structure, guaranteeing full reliability in distributed/network file systems like NFS. Conditional (commence only when multi-host/NFS deployment is required)
FW-P1 Eliminate GNU/Linux userland assumptions in lib.sh P2 (Medium) Small Portability: Replace GNU coreutils-specific commands (like df --output=target and Linux-specific mount formats) in lib.sh with portable equivalents, resolving silent failures of NFS detection on macOS/BSD. None
FW-P2 Add explicit Windows concurrency strategy in mqtt_common.py P1 (High) Medium Portability / Concurrency: Detect non-POSIX systems at module initialization and either fail fast with a descriptive warning or substitute alternative lock strategies (e.g. msvcrt.locking), while preserving the best-effort nature of the _file_lock log appender. None
FW-P3 Align virtualenv loading and dependency verifications P2 (Medium) Medium Portability: Prevent local interpreter mismatches in Poetry/UV environments and ensure the launch scripts fail early with clear diagnostic warnings if required Python dependencies are missing at startup. None
FW-P4 Secure default MQTT broker and namespaces P1 (High) Medium Portability / Security: Prevent remote session hijack and eavesdropping by providing a private TLS-enabled broker template rather than defaulting to broker.hivemq.com in public namespaces. None
FW-P5 Resolve BASH_SOURCE path resolution under zsh P2 (Medium) Small Portability: Fix lib.sh interactive sourcing issues under zsh shell where ${BASH_SOURCE[0]} resolves to empty. None
FW-P6 Anchor project root dynamically via marker-file lookup P1 (High) Medium Portability: Resolve structural fragility caused by hardcoded ../.. relative directory traversal in lib.sh, status.sh, and reconcile.sh. Use an upward search for root markers (.git, .mam, .env) to export a single source of truth for WORKSPACE_ROOT. None
FW-P7 Enforce HMAC verification and liveness checks on monitor termination P1 (High) Medium Portability / Security: Prevent remote session killing by unauthorized or spoofed events. Integrate verify_hmac inside the monitor (reconcile.sh's on_message handler) and confirm expected artifacts exist before executing tmux kill-session. None
FW-P8 Unify .env loading in lib.sh to prevent split-brain path resolution P1 (High) Small Portability / Consistency: Sourcing the .env file inside lib.sh is critical to prevent split-brain path resolution where shell scripts query the default session database path while Python scripts query a custom path defined in .env. Sourcing .env at the top of lib.sh ensures all shell utilities automatically inherit user overrides for TMUX_SERVER_NAME, AGENT_SESSIONS_YAML, etc. None
FW-W1 Replace global registry lock with fine-grained locks P2 (Medium) Medium Concurrency / Scaling: Eliminate throughput bottlenecks where all progress/sequence updates channel through a single fcntl lock on .mam/jobs/. Implement per-job lock files. None
FW-W2 Implement readiness probes for blind TUI key inputs P2 (Medium) Large Workflow: Replace fixed timing sleeps in create, resume, and stop scripts with dynamic terminal readiness probes (e.g. scrapers or CLI checking hooks) to dismiss trust dialogs robustly. None
FW-W4 Persist subscriber sequence numbers alongside job records P1 (High) Medium Workflow / Security: Persist subscriber.last_seq to disk or SQLite to prevent sequence counter reset on subscriber restart, locking down the replay defense window for the full job lifetime. None
FW-W5 Define structured message schema for reviewer verdicts P2 (Medium) Medium Workflow: Create a dedicated reviewer topic (e.g., reviews/<job_id>/verdicts) emitting structured JSON verdicts (PASS / NOT_PASS + details) to eliminate raw text grepping by the PM. None
FW-W6 Expand monitor reconciliation support to Hermes agent P2 (Medium) Medium Workflow / Consistency: Fully integrate hermes sessions into auto-registration (drift-B) and ID materialization (drift-C) under reconcile.sh to match Claude/Agy monitoring coverage. None
FW-W7 Resolve path slug collisions in derive_session_name P2 (Medium) Small Workflow / Collision Avoidance: Update derive_session_name to handle same-name nested directories (e.g. /projectA/src and /projectB/src both slugify to identical session names) by incorporating workspace-scoped identifiers or hash digests. None
FW-D1 RESOLVED (2026-06-24) — installer no longer extracts in-place Deploy / Safety: deploy/install.sh now stages the download into a mktemp -d dir, verifies .agents/skills/lib.sh is present, then copies only the runtime assets (.agents/, .env.example) into the target with per-file no-clobber guards ([ ! -e ]), so existing target files always win and repo dev docs never land in the workspace. The post-fetch sanity check now tests a file, not just the directory. Done
FW-D2 Pin and verify the source the installer downloads before sourcing it P2 (Medium) Small Deploy / Supply-chain: The installer clones/extracts the moving main branch over the network, and the workspace later sources those shell scripts (lib.sh et al.). Partially addressed (2026-06-24): the staged tree is now verified to contain .agents/skills/lib.sh before any file is copied. Remaining: pin to a release tag or commit SHA and/or verify a published checksum so the fetched content is integrity-checked, not merely structurally present. None
FW-D3 De-duplicate NFS detection between install.sh and lib.sh P2 (Medium) Small Deploy / Portability: deploy/install.sh re-implements the GNU-specific df --output=target + mount NFS check already present in lib.sh::_check_is_nfs. The FW-P1 portability fix must cover this second copy — extract a single shared helper so both call sites stay correct on macOS/BSD. FW-P1
FW-D4 Close CI shellcheck coverage gaps P3 (Low) Small Deploy / Quality: deploy/gitea-ci.yml shellchecks only 5 scripts; status.sh, resolve_session_id.sh, update_yaml_resumed.sh, and scripts/generate-env.sh are never linted. Glob all tracked *.sh so new scripts are covered automatically. None

Detailed Discussion Results & Directions (Reviewer Consensus)

  1. Conditional Deferral of SQLite Integration (FW-L4):

    • Unlike the session registry, maintaining individual job data in JSON files is highly intuitive for management and debugging. Since the current deployment is constrained to a single-host local file system, fcntl.flock locks are sufficient. Thus, this is assigned a low priority (P3) and will be tackled conditionally.
  2. Explicit Concurrency Strategy on Windows (FW-P1, FW-P2):

    • Silent failovers are the worst design patterns for concurrency. Instead of letting Windows environments run without a lock (which occurs when fcntl fails silently), we detect POSIX availability at startup. We either fail fast to prompt the user to use a POSIX-compliant shell/wrapper, or dynamically load msvcrt.locking to provide a matching file locking mechanism. This guarantees consistent synchronization behaviors across Windows and Unix platforms.
  3. Dynamic Root Anchor (FW-P6):

    • Hardcoding relative depth limits (like ../.. relative to a skill's location) creates direct fragility when moving directories or refactoring. By walking up the directory tree to search for known anchors (like .git or .mam), we establish a single canonical root path and prevent scripts from breaking when their execution wrappers are relocated.
  4. Monitor Termination Authorization (FW-P7):

    • Auto-termination must not trust unauthenticated events. Since reconcile.sh listens to a wildcard topic, any client on a public broker could spoof a terminal message and trigger tmux kill-session. Requiring HMAC signature verification on the terminal event path, combined with artifact validation, mitigates spoofing and accidental session cleanup.
  5. Consolidation of per-job watchdogs (FW-W3):

    • Instead of spawning an independent watchdog.sh process for each job which reconnects every 2 minutes, we consolidated the event handling, HMAC security verification, and sequence tracking into a single, persistent wildcard subscriber running under reconcile.sh --subscribe. This drastically reduces MQTT broker connections, simplifies cleanup logic, and leverages python's memory storage to handle replay attack prevention (monotonic sequence numbers) for concurrent jobs.
  6. Consistent .env Sourcing across Shell and Python (FW-P8):

    • Sourcing the .env configuration file inside lib.sh ensures that shell utilities and Python scripts are fully aligned. Without this, customized database locations or isolated tmux server names declared in .env are only honored by the Python-based MQTT subsystems, while the shell orchestrators silently fall back to default socket files and paths.
  7. Deployment Installer Hardening (FW-D1 ~ FW-D4):

    • deploy/install.sh and the Gitea templates are the newest, least-reviewed surface (added after the DONE.md verification round) and the one path that runs before any of the reviewed orchestration code. FW-D1 (the release blocker) is now resolved (2026-06-24): rather than the originally proposed tar --exclude denylist — which review showed was non-portable and, worse, stripped the skills' own nested scripts/ directories via the unanchored --exclude="scripts" pattern, yielding a silently broken install — the installer was rebuilt around temp-dir staging + an allowlist copy of runtime assets with per-file no-clobber guards. This closes the destructive-overwrite hole and the dev-doc clutter in one move. FW-D2 is partially addressed (the staged tree is structurally verified before copy); the remaining supply-chain hardening is pinning the fetch to a tag/SHA + checksum. FW-D3 (NFS detection drift, folded into FW-P1) and FW-D4 (CI lint coverage) remain open consistency/quality debt.