refactor(security,concurrency): resolve structural issues, enforce Claude permission skip, update docs

This commit is contained in:
2026-06-23 08:03:43 +09:00
parent 12dceb14b2
commit 99ac8b3ce4
7 changed files with 209 additions and 45 deletions
+57
View File
@@ -33,6 +33,63 @@ All orchestration functionalities are structured under the `.agents/skills/` dir
---
## 📐 Big-Picture Architecture
The system coordinates LLM agents across multiple workspaces through two core layers:
1. **Layer A — Tmux Orchestration (lib.sh + status/resume/stop/create)**: Runs the agents (one tmux session per agent-workspace combination) and maintains an authoritative registry in `.mam/agent-sessions.yaml` (+ `.mam/agent-sessions.db`).
2. **Layer B — Async Job Delegation (delegate-job)**: Dispatches a task to an agent and observes progress and completion via an event channel.
These two layers share one lock-guarded chokepoint for file I/O: `lib.sh::atomic_dump_yaml`. Every write is protected by an exclusive `flock` and schema validation.
### Data Flow Overview
```text
+-----------+ register_job +-------------------+
| delegator | ---------------> | .mam/jobs/<id>.json| <-- live record
+-----------+ +---------+---------+
|
| atomic rename + fsync
v
+-----------------+
| audit log | <-- append-only
| .mam/delegate_ | events.ndjson
| job_logs/<id>/ |
+--------+--------+
^
| (best-effort mirrors)
|
+-----------+ publish_event +-----+-----+ +---------+
| agent | ---------------> | MQTT broker | <--- | monitor |
| (claude) | +-------------+ +----+----+
+-----------+ |
^ v
| subscriber atomic_dump_yaml
| (job_subscriber.py) (.mam/agent-sessions.yaml)
| ^
+-------- delegator waits here ----------+ |
+---+---+
| reconcil|
| e.sh |
+--------+
```
### 🔒 Tmux Server Isolation
To prevent workspace tmux processes from interfering with each other or with system tmux servers, the framework enforces isolated tmux environments:
* **Per-Workspace Shim:** `_init_tmux_isolation` and `_resolve_real_tmux_path` instantiate a per-workspace shim directory under `/tmp/multi-agent-tmux-shim/<TMUX_SERVER_NAME>/tmux` that intercepts tmux commands and wraps them in `tmux -L <server>`.
* **PATH Rewriting:** The `PATH` environment variable is dynamically prepended with the shim path in all child processes. This ensures any `tmux` invocation within the agent's process tree is restricted to its isolated socket server.
* **Environment Restoration:** If `TMUX_SERVER_NAME` is set to `default`, the PATH override is removed, reverting to the default global tmux server.
### 🛡️ Concurrency Design & Write Serialization
The framework implements lock-guarded execution pathways to prevent race conditions during parallel agent operations:
* **POSIX File Locks (`flock`):** Every mutation of `agent-sessions.yaml` and the SQLite registry runs through `atomic_dump_yaml` inside `lib.sh`, which serializes writes via an exclusive `flock` on `.mam/agent-sessions.yaml.lock`.
* **Dual-Interpreter Strategy:** To minimize dependency bloat and guarantee stability, the backplane splits execution environments: the virtual environment `.venv` handles MQTT communication and async jobs (requiring `paho-mqtt`), while the system `python3` executes `atomic_dump_yaml` (relying on system-wide `PyYAML`).
* **NFS and Network FS Safeguards:** Since `flock` behaves unreliably over network protocols (NFS, CIFS, SSHFS), `lib.sh` performs filesystem detection. If a network mount is identified, it outputs a safety warning and SQLite automatically switches its journaling mode from `WAL` to `DELETE`.
---
## 📐 Architecture & Coordination Loop
The interaction between roles (Project Manager, Worker, and Reviewer) is structured as a strict iterative loop: