# tmux-agent-orchestration An advanced, high-reliability **Multi-Agent Orchestration & Messaging Backplane** framework built on Tmux and MQTT. It is designed to coordinate, isolate, and audit long-running agent tasks (such as code generation, refactoring, and security reviews) across multiple LLM backend clients (e.g., Claude, Hermes). --- ## 🚀 Overview Modern agentic workflows often suffer from session timeout, lack of process isolation, terminal viewport truncation (scrollback limits), and complex concurrency issues. **tmux-agent-orchestration** addresses these problems by providing: 1. **Tmux-based Process Isolation:** Spawning LLM client sessions inside dedicated, isolated tmux environments to support persistent background runs. 2. **Asynchronous Event-Driven Architecture:** Leveraging an MQTT broker as a message backplane to coordinate state transitions (`started`, `progress`, `completed`, `error`) between collaborating agents. 3. **Multi-Agent Mux (MAM):** Combining local file-based locks (fcntl) and an ACID-compliant SQLite WAL database (`.mam/agent-sessions.db`) to manage concurrent job claims and track running agent sessions without drift. 4. **Automated Review & Quality Loop:** Implementing parallel reviewer loops where worker agents must receive a `PASS` rating from various specialized verification agents (e.g., Claude for high-level logic, Hermes for shell syntax/safety) before merging code. --- ## 📦 Installation & Setup You can bootstrap the Multi-Agent Mux (MAM) framework in any workspace directory with a single command: ```bash curl -fsSL https://git.godopu.com/tmpl/multi-agent-mux/raw/branch/main/deploy/install.sh | bash ``` Alternatively, if you have already cloned the repository locally, run the installer directly: ```bash bash deploy/install.sh ``` The idempotent installer automatically validates system dependencies (tmux, python3, and PyYAML), creates the python virtual environment (`.venv`), installs dependencies, copies `.env.example` as `.env`, and initializes the `.agents/` scaffolding. --- ## 🛠️ Core Skills & Scaffolding All orchestration functionalities are structured under the `.agents/skills/` directory: * **`multi-agent-mux-create`**: Spawns isolated tmux sessions running specified agent CLI wrappers. It captures system processes, updates metadata registries, and enforces authentication checks. * **`multi-agent-mux-stop`**: Gracefully terminates agent CLI sessions (using key macros like `/exit` or `Exit`) and handles disk purge operations (removing conversation JSON files and SQLite logs for deleted workspaces). * **`multi-agent-mux-resume`**: Restores stopped sessions by resolving workspace UUIDs from disk or cache, and invokes the underlying agent using session-resume parameters (e.g., `claude -r ` or `hermes --resume `). * **`multi-agent-mux-status`**: Queries the running states of all active sessions, detecting PID mismatches, command signatures, and drifts between actual tmux instances and the registry database. * **`multi-agent-mux-monitor`**: A long-running Kanban reconcile worker that dynamically monitors tmux sessions and synchronizes states to `.mam/agent-sessions.yaml`. * **`multi-agent-mux-delegate-job`**: The core asynchronous task distribution module containing: * `registry.py`: Atomically registers and claims jobs using file advisory locks (`fcntl`). * `job_subscriber.py`: Connects to the MQTT backplane, captures live events, and appends them to audit trails. * `publish_event.py`: Emits execution status transitions and error details from workspace scripts. * `mqtt_common.py`: Manages connection policies, authentication, and HMAC signing. --- ## 📐 Big-Picture Architecture The system coordinates LLM agents across multiple workspaces through two core layers: 1. **Layer A — Tmux Orchestration (lib.sh + status/resume/stop/create)**: Runs the agents (one tmux session per agent-workspace combination) and maintains an authoritative registry in `.mam/agent-sessions.yaml` (+ `.mam/agent-sessions.db`). 2. **Layer B — Async Job Delegation (delegate-job)**: Dispatches a task to an agent and observes progress and completion via an event channel. These two layers share one lock-guarded chokepoint for file I/O: `lib.sh::atomic_dump_yaml`. Every write is protected by an exclusive SQLite database transaction lock and schema validation. ### Data Flow Overview ```text +-----------+ register_job +-------------------+ | delegator | ---------------> | .mam/jobs/.json| <-- live record +-----------+ +---------+---------+ | | atomic rename + fsync v +-----------------+ | audit log | <-- append-only | .mam/delegate_ | events.ndjson | job_logs// | +--------+--------+ ^ | (best-effort mirrors) | +-----------+ publish_event +-----+-----+ +---------+ | agent | ---------------> | MQTT broker | <--- | monitor | | (claude) | +-------------+ +----+----+ +-----------+ | ^ v | subscriber atomic_dump_yaml | (job_subscriber.py) (.mam/agent-sessions.yaml) | ^ +-------- delegator waits here ----------+ | +---+---+ | reconcil| | e.sh | +--------+ ``` ### 🔒 Tmux Server Isolation To prevent workspace tmux processes from interfering with each other or with system tmux servers, the framework enforces isolated tmux environments: * **Per-Workspace Shim:** `_init_tmux_isolation` and `_resolve_real_tmux_path` instantiate a per-workspace shim directory under `/tmp/multi-agent-tmux-shim//tmux` that intercepts tmux commands and wraps them in `tmux -L `. * **PATH Rewriting:** The `PATH` environment variable is dynamically prepended with the shim path in all child processes. This ensures any `tmux` invocation within the agent's process tree is restricted to its isolated socket server. * **Environment Restoration:** If `TMUX_SERVER_NAME` is set to `default`, the PATH override is removed, reverting to the default global tmux server. ### 🛡️ Concurrency Design & Write Serialization The framework implements lock-guarded execution pathways to prevent race conditions during parallel agent operations: * **SQLite Database Locks (`BEGIN IMMEDIATE`):** Every mutation of `agent-sessions.yaml` and the SQLite registry runs through `atomic_dump_yaml` inside `lib.sh`, which serializes writes via an exclusive `BEGIN IMMEDIATE` transaction lock on the SQLite database `.mam/agent-sessions.db`. * **Dual-Interpreter Strategy:** To minimize dependency bloat and guarantee stability, the backplane splits execution environments: the virtual environment `.venv` handles MQTT communication and async jobs (requiring `paho-mqtt`), while the system `python3` executes `atomic_dump_yaml` (relying on system-wide `PyYAML`). * **NFS and Network FS Safeguards:** Since file locking (`flock`) and SQLite WAL behave unreliably over network protocols (NFS, CIFS, SSHFS), `lib.sh` performs filesystem detection. If a network mount is identified, it outputs a safety warning and SQLite automatically switches its journaling mode from `WAL` to `DELETE`. --- ## 📐 Architecture & Coordination Loop The interaction between roles (Project Manager, Worker, and Reviewer) is structured as a strict iterative loop: ```mermaid sequenceDiagram autonumber actor User as User participant PM as Project Manager participant W as Worker participant R as Reviewers participant M as MQTT Backplane User->>PM: Hand over requirements Note over PM: Plan tasks & register jobs PM->>M: Register Job & start Subscriber PM->>W: Delegate task (Provide Job ID & Brief) W->>M: Publish 'started' event Note over W: Implement & verify code W->>M: Publish 'completed' (or 'error') PM->>R: Request parallel reviews (Provide Diff) Note over R: Parallel analysis (Claude, Hermes) alt Review Feedback (NOT PASS) R->>PM: NOT PASS (Feedback with code blocks) Note over PM: Apply fixes or re-delegate PM->>W: Re-delegate with comments else Verification PASS R->>PM: PASS end PM->>User: Commit changes & Report completion ``` --- ## 🔒 Security & Replay Attack Defense To ensure communication integrity across public MQTT brokers, the backplane integrates an **HMAC-SHA256 signature protocol**: * **PoC Mode (Unauthenticated):** Default mode where `auth_token` is `null`, skipping cryptographic validations for quick setups. * **Production Mode (Authenticated):** A unique cryptographic token is issued per job. Event payloads must include an `hmac_sig` computed with the token. * **Replay Attack Mitigation:** Each event carries a monotonically increasing integer sequence counter (`seq`). The subscriber (`job_subscriber.py`) drops any payload whose sequence number is not strictly greater than the highest sequence number it has already accepted for that job. Combined with the HMAC signature on the payload body, this rejects both re-injected and out-of-order packets without relying on clock synchronization. The wire-format timestamp field is advisory metadata only; the backplane does not enforce a clock-skew window. --- ## 📁 Repository Layout ```text . ├── .agents/ │ ├── AGENT.md # Agent roles, snapshottings, and execution charter │ ├── AGENT.ko.md # Agent roles, snapshottings, and execution charter (Korean) │ └── skills/ # Core orchestration shell wrappers & libraries │ ├── lib.sh # Shared orchestration library │ ├── multi-agent-mux-create/ │ ├── multi-agent-mux-stop/ │ ├── multi-agent-mux-resume/ │ ├── multi-agent-mux-status/ │ ├── multi-agent-mux-monitor/ │ └── multi-agent-mux-delegate-job/ │ ├── requirements.txt # Python dependency declaration │ └── scripts/ # Core backplane implementation (Python) ├── .mam/ # Multi-Agent Mux metadata (git-ignored) │ ├── agent-sessions.db # SQLite WAL session database │ ├── agent-sessions.yaml # Human-readable session registry │ └── jobs/ # Asynchronous job metadata files ├── scripts/ │ └── generate-env.sh # Environment bootstrap helper ├── BOOTSTRAP.md # Detailed installation and verification guide ├── MESSAGING.md # MQTT wire protocol specification └── README.md # Project introduction and overview (this file) ``` --- ## 🚦 Quick Start For detailed setup instructions, please consult the **[BOOTSTRAP.md](./BOOTSTRAP.md)** file. Below is a quick summary: 1. **Initialize Environment Config:** ```bash ./scripts/generate-env.sh ``` 2. **Create Virtual Environment and Install Dependencies:** ```bash python3 -m venv .venv source .venv/bin/activate pip install -r .agents/skills/multi-agent-mux-delegate-job/requirements.txt ``` 3. **Run Registry Diagnostics:** ```bash .venv/bin/python3 .agents/skills/multi-agent-mux-delegate-job/scripts/registry.py list ``` --- ## 📝 Guidelines for Collaborating Agents If you are an AI agent newly onboarded to this project: 1. Read **[AGENT.md](.agents/AGENT.md)** to align on development constraints and roles (PM, Worker, Reviewer). 2. Adhere to the **Pane Snapshotting Rules** in `AGENT.md` (Section 4) to prevent scrollback data loss during long execution steps. 3. Never modify core logic without submitting a diff to the reviewer sessions for evaluation.