Files
multi-agent-paper/AGENT.md
T
2026-06-25 12:19:20 +09:00

10 KiB

AGENT.md

This document serves as the common guidelines and protocol for introducing the MQTT messaging backplane and Tmux-based multi-agent orchestration workflow to a new project. It defines the rules and architecture to ensure collaborating agents perform tasks safely, robustly, and consistently.

All agents working on a new project must read this document thoroughly and comply with the defined protocols before starting any tasks.


1. Agent Roles Definition (Agent Roles)

We clearly separate responsibilities and permissions between roles to reduce bottlenecks and enhance the quality of execution.

👤 Project Manager (PM / Orchestrator)

  • Core Responsibility: Receive user requirements, establish detailed task plans, assign and instruct workers, control the overall workflow, and report final results.
  • Ambiguity Resolution: If a user's requirements contain ambiguous details, do not guess. Immediately ask the user for clarification (we recommend using the /grill-me slash command).
  • Feedback Loop Adjustment: Analyze verification feedback from Reviewers to decide on improvement paths. For complex technical challenges, direct Workers and Reviewers to research options, add the PM's own assessment, and present a final report to the user to decide the project's direction.
  • Self-Healing (Hermes Fallback Fix): If a defect pointed out by a Reviewer is extremely minor or is a simple typo/configuration omission, the PM should directly fix the source code instead of reassigning it to the Worker, thereby minimizing the round-trip cost.

🛠️ Worker (Implementation Agent)

  • Core Responsibility: Design business logic and implement source code as delegated by the PM.
  • Collaboration & Communication: If the implementation path is ambiguous or interface design changes are required within the assigned scope, ask the PM for consensus before applying surgical changes.
  • Contract Adherence: Comply with the single task instructions (Brief) and the unique Job ID convention provided by the PM. Workers must publish a started event when starting work, and a completed or error event to the backplane upon termination.

🔍 Reviewer (Verification Agent)

  • Core Responsibility: Verify source code changes (Diff) and implementation specifications submitted by Workers. Reviewers act as facilitators by detecting security vulnerabilities, proposing performance improvements, and examining design consistency.
  • Provide Concrete Alternatives: Simply rejecting changes (NOT PASS) is forbidden. When raising an issue, Reviewers must propose a concrete, stable, and verified alternative code block or solution.
  • Complementary Cross-Verification: Leverage the unique characteristics of different agent models (e.g., Flash-class models are skilled at capturing semantic shell bugs, while Opus/Sonnet-class models excel at API signatures and logical regression analysis) to perform parallel and mutually-supportive reviews.

2. Messaging Backplane & Registry Protocol

Asynchronous communication and state management between agents are controlled via distributed event channels and file/DB registries.

📡 MQTT Backplane

  • Event Lifecycle:
    • started (Job execution starts) ➡️ progress/permission_required (Share intermediate progress) ➡️ completed (Successful termination) or error (Failed termination)
    • completed and error are terminal events that are published exactly once.
  • Publish/Subscribe Rules:
    • Since MQTT does not guarantee persistent queues, the subscriber (job_subscriber.py) must be running in the background before the agent starts (the Subscribe-before-Publish principle).
    • When publishing terminal events, publish with retain=True on the broker so that subscribers joining late can still read the final state.
    • Generalize all transmitted data to ensure that sensitive secrets like passwords, private keys, or absolute system paths are not included.

🗃️ Registry & State Management

  • This architecture maintains two distinct registries based on their purpose:
    • Job Registry: The metadata and lifecycle of each asynchronous job are recorded in individual JSON files (.mam/jobs/<id>.json). Concurrency conflicts (claiming races) across multiple sessions are prevented via file-based fcntl advisory locks (registry_lock via registry.py).
    • Session Registry: TMUX monitoring states and running agent metadata are consistently controlled using a SQLite WAL database (.mam/agent-sessions.db) to support reliable concurrent transactions on a single host. However, since SQLite WAL mode does not guarantee complete file locking in Network File System (NFS) environments, we recommend using a local file system.

🛡️ Security Protocol (HMAC-SHA256)

  • Unauthenticated PoC Mode: If the auth_token in the job registry is set to null (the default PoC mode), signature verification is skipped and all events are accepted (verify_hmac always returns True).
  • Authenticated Production Mode: In production environments or integrations requiring authentication, a unique cryptographic token (auth_token) is issued for each job. The publisher must include an hmac_sig signature in the payload keyed by this token, and the receiving end (verify_hmac) will immediately drop messages that lack a signature or have mismatching signatures to prevent downgrade attacks.
  • Rollout Strategy: To avoid event drops caused by inconsistencies between publishing and receiving nodes when updating security schemes, hybrid transition formats (which risk leaking plaintext tokens) must not be used. Instead, adopt a "Simultaneous Rollout" where all nodes are updated at once.

3. Collaborative Workflow Execution Loop (Workflow Loop)

sequenceDiagram
    autonumber
    actor User as User
    participant PM as Project Manager
    participant W as Worker
    participant R as Reviewers
    participant M as MQTT Backplane

    User->>PM: Hand over requirements
    Note over PM: Run grill-me & plan tasks
    PM->>M: Register Job & start Subscriber
    PM->>W: Delegate task (Provide Job ID & Brief)
    W->>M: Publish 'started' event
    Note over W: Modify code & implement
    W->>M: Publish 'completed' (or 'error')
    PM->>R: Request parallel review (Provide Diff)
    Note over R: Cross-analysis & verification
    alt Defect Found
        R->>PM: NOT PASS (Feedback with alternatives)
        Note over PM: PM directly fixes minor defects
        PM->>W: Apply feedback & re-delegate
    else Verification Pass
        R->>PM: PASS
    end
    PM->>User: Report final pass & commit changes
  1. Planning and Allocation: The PM defines requirements and outlines independent jobs to avoid conflicting dependencies.
  2. Execution and Notification: The PM launches a subscriber, then assigns the job to a Worker session. The Worker performs the logic and sends a terminal event, automatically closing the session.
  3. Cross-Verification Iteration (Review Loop): Once the task is complete, the PM circulates the changes to the Reviewer agents in parallel. The modify-reject cycle repeats until all reviewers yield a PASS, ensuring high-quality code.
  4. Release and Cleanup: Code that passes verification is committed to Git, and temporary session resources are reclaimed.

4. Analysis Infrastructure Patterns & Practical Guide (Infra Patterns)

These are critical instructions for preventing data loss and infrastructure-level failures during long-running agent analyses.

📸 Preventing TUI Viewport Truncation (The 3 Pane Snapshotting Rules)

To ensure that agents running in TMUX environments do not lose debug logs or previous outputs due to screen scrollback limits, the following snapshotting pattern must be enforced:

  1. Pre-brief Capture: Capture the pane (capture-pane -S -200) immediately after sending the task instruction (Brief) to back up the starting point of the input history.
  2. Loop Snapshot: For long-running agent sessions (5 minutes or more), periodically (e.g., every 30 seconds) scan the viewport and append the incremental data to /tmp/pane-snap.txt.
  3. Post-job Capture: Capture the complete pane state one final time immediately after a job completes or returns an error to preserve the entire execution trajectory.

📄 Handling Long Briefing Instructions

  • Sending long instructions or prompts (hundreds of lines) sequentially via TMUX send-keys or input buffers can overwhelm the agent's TUI, leading to lost characters or truncated paragraphs.
  • Resolution: If instructions are long, write them separately to a file path (e.g., /tmp/brief-<job_id>.md) and send a simplified execution command to the agent: "Read /tmp/brief-... and execute".

⏱️ Timeout Configuration & Alignment Rules

  • Job Execution Limits (timeout_sec & idle_timeout_sec): Each job independently manages its overall execution timeout (timeout_sec, default 3600s) and idle timeout without receiving messages (idle_timeout_sec, default 120s).
  • Monitor Idle Waiting (SUB_IDLE_TIMEOUT): The idle timeout for the monitor script (reconcile.sh), SUB_IDLE_TIMEOUT, must always be set generously to 3600s (1 hour) or more to align with the maximum job budget. This prevents the monitor from terminating early due to idle detection, which would lose control over background tasks before they finish.

5. Setup Checklist for New Projects (Setup Checklist)

Use this checklist when deploying this agent orchestration model to a new project:

  • Virtualenv Dependencies: Are required Python packages like pyyaml and paho-mqtt included in .venv or requirements.txt?
  • Configuration File: Are the MQTT broker address and security credentials safely loaded and shared via the .env file?
  • Directory Convention: Are the registry path (.mam/jobs/) and logging path (.mam/delegate_job_logs/) added to .gitignore?
  • Core Scripts: Are the core scripts (mqtt_common.py, publish_event.py, job_subscriber.py, and registry.py) in place?
  • HMAC Enablement: When a new registry job is created, is a random auth_token correctly injected, and is signature-based mutual authentication active?
  • Charter Placement: Is this protocol file (AGENT.md) placed in the top-level root directory of the new project? (Placing it at the root is essential so that onboarding agents can recognize the rules immediately.)

This guide balances collaboration efficiency with strict code security. Any required changes must be discussed and agreed upon by the PM and all Reviewers before updating this document.