From ec92b6c3fa8c344c1392da8c3adbc3656d6ba5ba Mon Sep 17 00:00:00 2001 From: Godopu Date: Sun, 21 Jun 2026 10:07:38 +0000 Subject: [PATCH] docs: commit analysis report and instruction documents --- INSTRUCTION.md | 67 +++++++++++++ Understand_Anything_Analysis.md | 163 ++++++++++++++++++++++++++++++++ 2 files changed, 230 insertions(+) create mode 100644 INSTRUCTION.md create mode 100644 Understand_Anything_Analysis.md diff --git a/INSTRUCTION.md b/INSTRUCTION.md new file mode 100644 index 0000000..a54fc07 --- /dev/null +++ b/INSTRUCTION.md @@ -0,0 +1,67 @@ +# CLAUDE.md + +Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed. + +**Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment. + +## 1. Think Before Coding + +**Don't assume. Don't hide confusion. Surface tradeoffs.** + +Before implementing: +- State your assumptions explicitly. If uncertain, ask. +- If multiple interpretations exist, present them - don't pick silently. +- If a simpler approach exists, say so. Push back when warranted. +- If something is unclear, stop. Name what's confusing. Ask. + +## 2. Simplicity First + +**Minimum code that solves the problem. Nothing speculative.** + +- No features beyond what was asked. +- No abstractions for single-use code. +- No "flexibility" or "configurability" that wasn't requested. +- No error handling for impossible scenarios. +- If you write 200 lines and it could be 50, rewrite it. + +Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify. + +## 3. Surgical Changes + +**Touch only what you must. Clean up only your own mess.** + +When editing existing code: +- Don't "improve" adjacent code, comments, or formatting. +- Don't refactor things that aren't broken. +- Match existing style, even if you'd do it differently. +- If you notice unrelated dead code, mention it - don't delete it. + +When your changes create orphans: +- Remove imports/variables/functions that YOUR changes made unused. +- Don't remove pre-existing dead code unless asked. + +The test: Every changed line should trace directly to the user's request. + +## 4. Goal-Driven Execution + +**Define success criteria. Loop until verified.** + +Transform tasks into verifiable goals: +- "Add validation" → "Write tests for invalid inputs, then make them pass" +- "Fix the bug" → "Write a test that reproduces it, then make it pass" +- "Refactor X" → "Ensure tests pass before and after" + +For multi-step tasks, state a brief plan: +``` +1. [Step] → verify: [check] +2. [Step] → verify: [check] +3. [Step] → verify: [check] +``` + +Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification. + +--- + +**These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes. + +Read AGENT.md first before working and follow the instructions for orchestration. \ No newline at end of file diff --git a/Understand_Anything_Analysis.md b/Understand_Anything_Analysis.md new file mode 100644 index 0000000..a2a9f5b --- /dev/null +++ b/Understand_Anything_Analysis.md @@ -0,0 +1,163 @@ +# Understand-Anything: Project & Architecture Analysis Report + +This report presents a comprehensive architectural analysis and security verification of the `advanced_multi_agent` orchestration workspace. Using the static analysis principles inspired by the `Understand-Anything` pipeline, we map out the codebase structure, evaluate the integrity of the design, identify critical defects/inconsistencies between implementation and documentation, and provide concrete technical recommendations. + +--- + +## 1. Architectural Visualization + +The following diagram illustrates the interaction between the orchestrator (Hermes/PM), the worker agents running inside TMUX sessions, and the decentralized event backplane (MQTT). + +```mermaid +sequenceDiagram + autonumber + actor User as User / PM + participant Registry as Job Registry (.hermes/jobs/) + participant DB as Session Registry (SQLite WAL & YAML) + participant TMUX as Tmux Workspace (Worker Session) + participant MQTT as MQTT Broker (HiveMQ / Private) + participant Sub as Job Subscriber (job_subscriber.py) + participant Mon as Reconcile Monitor (reconcile.sh) + + User->>Registry: Register Job (registry.py register) + Registry-->>User: Return Job ID (JID) + + User->>Sub: Spawn background subscriber (job_subscriber.py --job JID) + Sub->>MQTT: Subscribe to topic (python/mqtt/jobs/JID/events) + + User->>TMUX: Create session & execute agent (create_session.sh) + TMUX->>DB: Add running session (atomic_dump_yaml) + + Note over TMUX: Agent Starts execution + TMUX->>MQTT: Publish 'started' event (publish_event.py) + MQTT->>Sub: Deliver event (QoS 1) + Sub->>Sub: Verify HMAC Signature + Sub->>Sub: Log to events.ndjson & print stdout + + Note over TMUX: Agent does work & publishes checkpoints + TMUX->>MQTT: Publish 'progress' / 'permission_required' + MQTT->>Sub: Deliver event (QoS 1) + + Note over TMUX: Agent finishes execution + TMUX->>MQTT: Publish 'completed' or 'error' (retained) + MQTT->>Sub: Deliver terminal event (QoS 1) + Sub->>Sub: Transition to Terminal State & Exit + + Note over Mon: Reconcile loop runs periodically + Mon->>MQTT: Listen for terminal events + MQTT->>Mon: Deliver terminal events + Mon->>DB: Mark session terminated, kill tmux (reconcile.sh) +``` + +--- + +## 2. Core Mechanism Deep Dive & Verification + +### 2.1 MQTT Backplane & Event Protocol +* **Wire Format**: Encoded in UTF-8 JSON matching `schema_version = 1`. It features monotonic `seq` indexing, `job_id`, `event` type, `timestamp`, `detail` description, and a `data` block for metadata. +* **QoS and Retention**: Event publishing and subscribing enforce **QoS 1 (At Least Once)** delivery. Terminal events (`completed`/`error`) utilize `retain=True` on the broker. This ensures that late-joining subscribers immediately receive the terminal state without missing the final outcome. +* **Network Handshake Isolation**: `publish_event.py` uses a short-lived connection pattern (connect, publish QoS 1, wait for PUBACK, disconnect) with exponential backoff retries. This limits long-lived socket starvation and mitigates socket exhaustion under high session concurrency. + +### 2.2 SQLite WAL Session Database +* **Database & WAL Mode**: Session metadata has been migrated from a single-point-of-contention YAML file to a SQLite database (`.hermes/agent-sessions.db`) operating in **WAL (Write-Ahead Logging)** mode. +* **Concurrency Control**: Concurrency is managed via `BEGIN IMMEDIATE` transactions in `atomic_dump_yaml()`, which blocks concurrent write attempts at the database level rather than relying on brittle file system locks. +* **YAML Synchronization**: To maintain compatibility, `agent-sessions.yaml` is updated atomically (using `tempfile.mkstemp` and `os.replace`) only when a session transitions to a terminal state (`stopped`, `terminated`, `archived`), leaving active write traffic isolated within the SQLite WAL database. +* **NFS Fallback**: If a network mount (NFS/CIFS/SSHFS) is detected, `lib.sh` automatically falls back to `PRAGMA journal_mode=DELETE` to prevent WAL serialization crashes, as NFS does not support shared-memory mapped files (`-shm`) required by WAL. + +### 2.3 HMAC-SHA256 Signature Verification +* **Signature Generation**: The publisher serializes the payload (excluding `data.hmac_sig`) into a canonical JSON string (with sorted keys and no whitespace separators) and signs it using HMAC-SHA256 with the job's secret `auth_token`. +* **Signature Verification**: `job_subscriber.py` intercepts payloads and calls `verify_hmac()`, which calculates the expected signature and compares it with the received signature using the constant-time `hmac.compare_digest` to prevent timing attacks. + +--- + +## 3. Discovered Flaws & Documentation Inconsistencies + +We have identified several critical gaps between the architecture specifications and the actual codebase implementation: + +### ⚠️ Flaw 1: Documentation Mismatch in `job-protocol.md` (Security Risk if Followed) +* **Description**: Section 4 of `job-protocol.md` states: + > *`auth_token` (the bonus field) — each job record carries a per-job `auth_token` (`secrets.token_urlsafe(32)`). The publisher copies it into `data.auth_token`; the subscriber compares it against the registry's expected token and drops mismatches.* +* **Reality in Code**: If the publisher copied the plaintext token into `data.auth_token`, it would be transmitted in plaintext across the MQTT network, exposing the secret token to any eavesdropper (especially on the public PoC broker). +* **Correction**: The code correctly implements **HMAC-SHA256 signatures** via `data.hmac_sig` and **never transmits the raw `auth_token`**. The documentation in `job-protocol.md` is obsolete and contradicts the secure implementation. + +### ⚠️ Flaw 2: Missing Automated `auth_token` Generation & CLI Support +* **Description**: Both `MESSAGING.md` and `registry.md` state that when a job is registered, a cryptographic token is automatically generated using `secrets.token_urlsafe(32)`. +* **Reality in Code**: In `registry.py`, `register_job()` accepts `auth_token: Optional[str] = None` and defaults it to `None`. No automatic token generation is implemented. Furthermore, the CLI registration parser (`registry.py register`) does not expose any `--auth-token` flag, nor does it generate one internally. As a result, **every job registered via the CLI is created with `auth_token = null`**, defaulting the system to the unauthenticated/unsecured PoC mode. + +### ⚠️ Flaw 3: Replay Attack Vulnerability for Non-Terminal Events +* **Description**: `job_subscriber.py` enforces a terminal state machine to ignore duplicate `completed`/`error` events, but it does **not validate sequence numbers (`seq`) or timestamp freshness** for non-terminal events (`progress`, `permission_required`). +* **Exploitation Vector**: An attacker sniffing network traffic (easy on HiveMQ's plaintext broker) can capture a signed `permission_required` or `progress` event and replay it repeatedly. Since the HMAC signature remains valid, `job_subscriber.py` will accept the replayed message, write it to the audit log (`events.ndjson`), and output it to stdout, potentially triggering downstream actions or corrupting the audit trail. + +### ⚠️ Flaw 4: NFS locking Vulnerability in Job Registry +* **Description**: While the session registry was successfully migrated to SQLite to circumvent NFS locking issues, the Job Registry in `.hermes/jobs/` still relies on `fcntl.flock` over a shared `.lock` file to coordinate job claims (`pick_pending`). +* **Impact**: If the project registry is located on a network-mounted file system, concurrent calls to `pick_pending` from multiple hosts could result in lock failures, leading to duplicate claims (split-brain) or corruption of the `.json` files during write operations. + +--- + +## 4. Technical Recommendations + +To address these vulnerabilities and align the codebase with the target production security standards, we recommend the following changes: + +### 1. Correct the Protocol Documentation +Update `job-protocol.md` to match the actual HMAC-SHA256 signature scheme, removing all references to transmitting the plaintext token in `data.auth_token`. + +### 2. Implement Automated Token Generation in `registry.py` +Modify `register_job` to automatically generate a cryptographically secure token when running in production mode, and add the `--auth-token` argument to the CLI. + +*Proposed change in `registry.py`*: +```python +# In registry.py:register_job +import secrets + +# Generate token if not provided (production mode default) +if auth_token is None: + # If broker is secure/private, generate a token by default + if broker.get("tls") or broker.get("username"): + auth_token = secrets.token_urlsafe(32) +``` + +### 3. Harden `job_subscriber.py` Against Replay Attacks +Implement monotonic sequence number tracking and timestamp freshness checks in `_Watcher.on_message`. + +*Proposed change in `job_subscriber.py`*: +```python +# In _Watcher inside job_subscriber.py +def __init__(self, expected_job_ids: Set[str], expected_tokens: Dict[str, Optional[str]]): + self.events = queue.Queue() + self.expected = set(expected_job_ids) + self.tokens = expected_tokens + self.last_seq: Dict[str, int] = {} # Track sequence numbers per job + +def on_message(self, _client, _userdata, msg) -> None: + # ... (after json parse and schema check) ... + jid = payload.get("job_id") + seq = payload.get("seq", 0) + + # 1. Monotonic Sequence Check + if jid in self.last_seq and seq <= self.last_seq[jid]: + logger.warning("drop replayed/duplicate event seq=%r for job %s", seq, jid) + return + + # 2. Timestamp freshness check (e.g., 60s window) + # (Optional but recommended for strict production environments) + + # ... (after HMAC verification succeeds) ... + self.last_seq[jid] = seq + # ... +``` + +### 4. Migrate the Job Registry to the SQLite DB +To eliminate NFS locking issues completely, merge the Job Registry data into the SQLite database. Define a `jobs` table with a schema similar to: +```sql +CREATE TABLE IF NOT EXISTS jobs ( + job_id TEXT PRIMARY KEY, + status TEXT, + agent_session TEXT, + created_at TEXT, + data JSON +); +``` +Replace the file-based `fcntl.flock` in `registry.py` with SQL transactions (`BEGIN IMMEDIATE`), ensuring absolute atomicity and locking security regardless of the underlying filesystem type. + +--- +*Report compiled on 2026-06-21 by Antigravity Reviewer Agent.*