refactor(security,concurrency): resolve structural issues, enforce Claude permission skip, update docs
This commit is contained in:
@@ -33,6 +33,63 @@ All orchestration functionalities are structured under the `.agents/skills/` dir
|
||||
|
||||
---
|
||||
|
||||
## 📐 Big-Picture Architecture
|
||||
|
||||
The system coordinates LLM agents across multiple workspaces through two core layers:
|
||||
|
||||
1. **Layer A — Tmux Orchestration (lib.sh + status/resume/stop/create)**: Runs the agents (one tmux session per agent-workspace combination) and maintains an authoritative registry in `.mam/agent-sessions.yaml` (+ `.mam/agent-sessions.db`).
|
||||
2. **Layer B — Async Job Delegation (delegate-job)**: Dispatches a task to an agent and observes progress and completion via an event channel.
|
||||
|
||||
These two layers share one lock-guarded chokepoint for file I/O: `lib.sh::atomic_dump_yaml`. Every write is protected by an exclusive `flock` and schema validation.
|
||||
|
||||
### Data Flow Overview
|
||||
|
||||
```text
|
||||
+-----------+ register_job +-------------------+
|
||||
| delegator | ---------------> | .mam/jobs/<id>.json| <-- live record
|
||||
+-----------+ +---------+---------+
|
||||
|
|
||||
| atomic rename + fsync
|
||||
v
|
||||
+-----------------+
|
||||
| audit log | <-- append-only
|
||||
| .mam/delegate_ | events.ndjson
|
||||
| job_logs/<id>/ |
|
||||
+--------+--------+
|
||||
^
|
||||
| (best-effort mirrors)
|
||||
|
|
||||
+-----------+ publish_event +-----+-----+ +---------+
|
||||
| agent | ---------------> | MQTT broker | <--- | monitor |
|
||||
| (claude) | +-------------+ +----+----+
|
||||
+-----------+ |
|
||||
^ v
|
||||
| subscriber atomic_dump_yaml
|
||||
| (job_subscriber.py) (.mam/agent-sessions.yaml)
|
||||
| ^
|
||||
+-------- delegator waits here ----------+ |
|
||||
+---+---+
|
||||
| reconcil|
|
||||
| e.sh |
|
||||
+--------+
|
||||
```
|
||||
|
||||
### 🔒 Tmux Server Isolation
|
||||
|
||||
To prevent workspace tmux processes from interfering with each other or with system tmux servers, the framework enforces isolated tmux environments:
|
||||
* **Per-Workspace Shim:** `_init_tmux_isolation` and `_resolve_real_tmux_path` instantiate a per-workspace shim directory under `/tmp/multi-agent-tmux-shim/<TMUX_SERVER_NAME>/tmux` that intercepts tmux commands and wraps them in `tmux -L <server>`.
|
||||
* **PATH Rewriting:** The `PATH` environment variable is dynamically prepended with the shim path in all child processes. This ensures any `tmux` invocation within the agent's process tree is restricted to its isolated socket server.
|
||||
* **Environment Restoration:** If `TMUX_SERVER_NAME` is set to `default`, the PATH override is removed, reverting to the default global tmux server.
|
||||
|
||||
### 🛡️ Concurrency Design & Write Serialization
|
||||
|
||||
The framework implements lock-guarded execution pathways to prevent race conditions during parallel agent operations:
|
||||
* **POSIX File Locks (`flock`):** Every mutation of `agent-sessions.yaml` and the SQLite registry runs through `atomic_dump_yaml` inside `lib.sh`, which serializes writes via an exclusive `flock` on `.mam/agent-sessions.yaml.lock`.
|
||||
* **Dual-Interpreter Strategy:** To minimize dependency bloat and guarantee stability, the backplane splits execution environments: the virtual environment `.venv` handles MQTT communication and async jobs (requiring `paho-mqtt`), while the system `python3` executes `atomic_dump_yaml` (relying on system-wide `PyYAML`).
|
||||
* **NFS and Network FS Safeguards:** Since `flock` behaves unreliably over network protocols (NFS, CIFS, SSHFS), `lib.sh` performs filesystem detection. If a network mount is identified, it outputs a safety warning and SQLite automatically switches its journaling mode from `WAL` to `DELETE`.
|
||||
|
||||
---
|
||||
|
||||
## 📐 Architecture & Coordination Loop
|
||||
|
||||
The interaction between roles (Project Manager, Worker, and Reviewer) is structured as a strict iterative loop:
|
||||
|
||||
Reference in New Issue
Block a user