Architecture Module OpenClaw Ecosystem

Security Shield

Real-time protection for your AI agent infrastructure

Overview

Policy Enforcement Engine monitors every tool call and LLM interaction in real-time, blocking malicious commands before they execute. Built as a native OpenClaw plugin with 4 hooks deeply integrated into the agent runtime.

Threat Landscape

Why It Matters

AI agents face a fundamentally different attack surface than traditional software. Here's what we protect against.

Threat Likelihood Impact Rating

Indirect Prompt Injection Critical Critical CRITICAL ▶

Malicious instructions hidden in external content (web pages, documents, emails) hijack the agent's behavior without any direct user interaction. An agent browsing the web or reading files can be silently redirected to exfiltrate data or execute unauthorized commands.

Tool Misuse via Prompt High Critical CRITICAL ▶

Attackers craft inputs that manipulate the LLM into calling tools in unintended ways — deleting files, sending messages, or escalating privileges — while appearing to follow legitimate instructions. Unlike traditional exploits, no code vulnerability is required.

Cross-Agent Cascade Attack Medium Critical CRITICAL ▶

In multi-agent systems, a compromised agent can poison messages or shared memory to corrupt other agents downstream. A single infected data source can propagate malicious instructions across an entire agent network.

Data Exfiltration via Agent High High HIGH ▶

Agents with network access and file permissions are a perfect exfiltration vector. A prompt injection can instruct the agent to quietly bundle and transmit sensitive files to an external endpoint, leaving little trace.

Agent Hijacking Medium Critical HIGH ▶

An attacker gains persistent control over an agent's decision-making by injecting long-term instructions into its context or memory. The agent continues operating normally from the user's perspective while executing a hidden agenda.

Memory Poisoning Medium High HIGH ▶

Persistent memory stores are a high-value target. By injecting crafted content into an agent's long-term memory, attackers can influence future behavior across sessions — creating a persistent backdoor that survives restarts.

Resource Exhaustion High Medium MEDIUM ▶

Agents can be tricked into spawning excessive subprocesses, making recursive tool calls, or entering infinite loops — consuming CPU, memory, and API quota until the system becomes unavailable.

Supply Chain (Plugin) Attack Low Critical MEDIUM ▶

Malicious or compromised plugins can introduce backdoors, data leaks, or privilege escalation at the infrastructure level. A single rogue plugin has access to all hooks, all tool calls, and all LLM I/O.

Capabilities

Key Features

Command Blacklist

Blocks dangerous shell commands (wget, curl to external IPs, nc, rm -rf, etc.) before execution

Prompt Injection Detection

Scans every incoming message for jailbreak attempts and social engineering patterns

Audit Logging

Every tool call logged to security-audit.jsonl with timestamp, agent, and verdict

Zero Latency

Runs synchronously via 4 lifecycle hooks — before execution, not after

Architecture

How It Works

Hooks into 4 lifecycle events — intercepting every tool call and LLM message at the runtime level. No external API calls. Fully local. Zero dependencies.

before_tool_call Intercepts and validates before execution

after_tool_call Audits results and flags anomalies

llm_input Scans incoming messages for injection patterns

llm_output Reviews outgoing responses for data leakage

Lifecycle Hooks

0ms

External Latency

100%

Local Execution

∞

Audit Trail

Coverage

What Gets Protected

Six purpose-built modules, each targeting a distinct attack class — all running locally with negligible overhead.

Spotlighting

Tags all external data before it enters the LLM context, making injected instructions visible and distinguishable from legitimate system prompts.

Defends Against

Indirect Prompt Injection Cross-Agent Cascade Attack

Overhead Zero pure string wrapping, no inference calls

Audit Logger

Logs every tool call and alert to an append-only JSONL file with timestamps, agent IDs, and verdict. Your immutable audit trail.

Defends Against

Tool Misuse via Prompt Data Exfiltration via Agent

Overhead ~0ms async append-only write, no blocking

Permission Checker

Enforces a command blacklist at the shell level, blocking dangerous binaries (wget, curl, nc, rm -rf, etc.) before they execute.

Defends Against

Tool Misuse via Prompt Data Exfiltration via Agent Resource Exhaustion

Overhead <1ms regex match only, no subprocess spawned

LLM Guard

Scans incoming prompts for injection patterns and outgoing responses for secrets and malicious URLs using regex-based heuristics.

Defends Against

Indirect Prompt Injection Agent Hijacking Data Exfiltration via Agent

Overhead <1ms compiled regex, no external API calls

Secure Agent Comms

Signs all cross-agent messages with HMAC-SHA256 and verifies signatures on receipt, preventing message tampering in multi-agent pipelines.

Defends Against

Cross-Agent Cascade Attack Agent Hijacking

Overhead <1ms pure cryptographic hash, no I/O

Memory ACL

Applies a whitelist and injection detection to every memory write, preventing persistent backdoors from surviving across sessions.

Defends Against

Memory Poisoning Cross-Agent Cascade Attack

Overhead <1ms in-process check before write

📖 Technical Documentation →