コンテンツにスキップ

CoderClaw Threat Model v1.0

このコンテンツはまだ日本語訳がありません。

Version: 1.0-draft Last Updated: 2026-02-04 Methodology: MITRE ATLAS + Data Flow Diagrams Framework: MITRE ATLAS (Adversarial Threat Landscape for AI Systems)

This threat model is built on MITRE ATLAS, the industry-standard framework for documenting adversarial threats to AI/ML systems. ATLAS is maintained by MITRE in collaboration with the AI security community.

Key ATLAS Resources:

This is a living document maintained by the CoderClaw community. See CONTRIBUTING-THREAT-MODEL.md for guidelines on contributing:

  • Reporting new threats
  • Updating existing threats
  • Proposing attack chains
  • Suggesting mitigations

This threat model documents adversarial threats to the CoderClaw AI agent platform and ClawHub skill marketplace, using the MITRE ATLAS framework designed specifically for AI/ML systems.

ComponentIncludedNotes
CoderClaw Agent RuntimeYesCore agent execution, tool calls, sessions
GatewayYesAuthentication, routing, channel integration
Channel IntegrationsYesWhatsApp, Telegram, Discord, Signal, Slack, etc.
ClawHub MarketplaceYesSkill publishing, moderation, distribution
MCP ServersYesExternal tool providers
User DevicesPartialMobile apps, desktop clients

Nothing is explicitly out of scope for this threat model.


┌─────────────────────────────────────────────────────────────────┐
│ UNTRUSTED ZONE │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ WhatsApp │ │ Telegram │ │ Discord │ ... │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
└─────────┼────────────────┼────────────────┼──────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY 1: Channel Access │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ GATEWAY │ │
│ │ • Device Pairing (30s grace period) │ │
│ │ • AllowFrom / AllowList validation │ │
│ │ • Token/Password/Tailscale auth │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY 2: Session Isolation │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ AGENT SESSIONS │ │
│ │ • Session key = agent:channel:peer │ │
│ │ • Tool policies per agent │ │
│ │ • Transcript logging │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY 3: Tool Execution │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ EXECUTION SANDBOX │ │
│ │ • Docker sandbox OR Host (exec-approvals) │ │
│ │ • Node remote execution │ │
│ │ • SSRF protection (DNS pinning + IP blocking) │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY 4: External Content │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ FETCHED URLs / EMAILS / WEBHOOKS │ │
│ │ • External content wrapping (XML tags) │ │
│ │ • Security notice injection │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY 5: Supply Chain │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ CLAWHUB │ │
│ │ • Skill publishing (semver, SKILL.md required) │ │
│ │ • Pattern-based moderation flags │ │
│ │ • VirusTotal scanning (coming soon) │ │
│ │ • GitHub account age verification │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
FlowSourceDestinationDataProtection
F1ChannelGatewayUser messagesTLS, AllowFrom
F2GatewayAgentRouted messagesSession isolation
F3AgentToolsTool invocationsPolicy enforcement
F4AgentExternalweb_fetch requestsSSRF blocking
F5ClawHubAgentSkill codeModeration, scanning
F6AgentChannelResponsesOutput filtering

AttributeValue
ATLAS IDAML.T0006 - Active Scanning
DescriptionAttacker scans for exposed CoderClaw gateway endpoints
Attack VectorNetwork scanning, shodan queries, DNS enumeration
Affected ComponentsGateway, exposed API endpoints
Current MitigationsTailscale auth option, bind to loopback by default
Residual RiskMedium - Public gateways discoverable
RecommendationsDocument secure deployment, add rate limiting on discovery endpoints
AttributeValue
ATLAS IDAML.T0006 - Active Scanning
DescriptionAttacker probes messaging channels to identify AI-managed accounts
Attack VectorSending test messages, observing response patterns
Affected ComponentsAll channel integrations
Current MitigationsNone specific
Residual RiskLow - Limited value from discovery alone
RecommendationsConsider response timing randomization

AttributeValue
ATLAS IDAML.T0040 - AI Model Inference API Access
DescriptionAttacker intercepts pairing code during 30s grace period
Attack VectorShoulder surfing, network sniffing, social engineering
Affected ComponentsDevice pairing system
Current Mitigations30s expiry, codes sent via existing channel
Residual RiskMedium - Grace period exploitable
RecommendationsReduce grace period, add confirmation step
AttributeValue
ATLAS IDAML.T0040 - AI Model Inference API Access
DescriptionAttacker spoofs allowed sender identity in channel
Attack VectorDepends on channel - phone number spoofing, username impersonation
Affected ComponentsAllowFrom validation per channel
Current MitigationsChannel-specific identity verification
Residual RiskMedium - Some channels vulnerable to spoofing
RecommendationsDocument channel-specific risks, add cryptographic verification where possible
AttributeValue
ATLAS IDAML.T0040 - AI Model Inference API Access
DescriptionAttacker steals authentication tokens from config files
Attack VectorMalware, unauthorized device access, config backup exposure
Affected Components~/.coderclaw/credentials/, config storage
Current MitigationsFile permissions
Residual RiskHigh - Tokens stored in plaintext
RecommendationsImplement token encryption at rest, add token rotation

AttributeValue
ATLAS IDAML.T0051.000 - LLM Prompt Injection: Direct
DescriptionAttacker sends crafted prompts to manipulate agent behavior
Attack VectorChannel messages containing adversarial instructions
Affected ComponentsAgent LLM, all input surfaces
Current MitigationsPattern detection, external content wrapping
Residual RiskCritical - Detection only, no blocking; sophisticated attacks bypass
RecommendationsImplement multi-layer defense, output validation, user confirmation for sensitive actions
AttributeValue
ATLAS IDAML.T0051.001 - LLM Prompt Injection: Indirect
DescriptionAttacker embeds malicious instructions in fetched content
Attack VectorMalicious URLs, poisoned emails, compromised webhooks
Affected Componentsweb_fetch, email ingestion, external data sources
Current MitigationsContent wrapping with XML tags and security notice
Residual RiskHigh - LLM may ignore wrapper instructions
RecommendationsImplement content sanitization, separate execution contexts
AttributeValue
ATLAS IDAML.T0051.000 - LLM Prompt Injection: Direct
DescriptionAttacker manipulates tool arguments through prompt injection
Attack VectorCrafted prompts that influence tool parameter values
Affected ComponentsAll tool invocations
Current MitigationsExec approvals for dangerous commands
Residual RiskHigh - Relies on user judgment
RecommendationsImplement argument validation, parameterized tool calls
AttributeValue
ATLAS IDAML.T0043 - Craft Adversarial Data
DescriptionAttacker crafts commands that bypass approval allowlist
Attack VectorCommand obfuscation, alias exploitation, path manipulation
Affected Componentsexec-approvals.ts, command allowlist
Current MitigationsAllowlist + ask mode
Residual RiskHigh - No command sanitization
RecommendationsImplement command normalization, expand blocklist

T-PERSIST-001: Malicious Skill Installation

Section titled “T-PERSIST-001: Malicious Skill Installation”
AttributeValue
ATLAS IDAML.T0010.001 - Supply Chain Compromise: AI Software
DescriptionAttacker publishes malicious skill to ClawHub
Attack VectorCreate account, publish skill with hidden malicious code
Affected ComponentsClawHub, skill loading, agent execution
Current MitigationsGitHub account age verification, pattern-based moderation flags
Residual RiskCritical - No sandboxing, limited review
RecommendationsVirusTotal integration (in progress), skill sandboxing, community review
AttributeValue
ATLAS IDAML.T0010.001 - Supply Chain Compromise: AI Software
DescriptionAttacker compromises popular skill and pushes malicious update
Attack VectorAccount compromise, social engineering of skill owner
Affected ComponentsClawHub versioning, auto-update flows
Current MitigationsVersion fingerprinting
Residual RiskHigh - Auto-updates may pull malicious versions
RecommendationsImplement update signing, rollback capability, version pinning

T-PERSIST-003: Agent Configuration Tampering

Section titled “T-PERSIST-003: Agent Configuration Tampering”
AttributeValue
ATLAS IDAML.T0010.002 - Supply Chain Compromise: Data
DescriptionAttacker modifies agent configuration to persist access
Attack VectorConfig file modification, settings injection
Affected ComponentsAgent config, tool policies
Current MitigationsFile permissions
Residual RiskMedium - Requires local access
RecommendationsConfig integrity verification, audit logging for config changes

AttributeValue
ATLAS IDAML.T0043 - Craft Adversarial Data
DescriptionAttacker crafts skill content to evade moderation patterns
Attack VectorUnicode homoglyphs, encoding tricks, dynamic loading
Affected ComponentsClawHub moderation.ts
Current MitigationsPattern-based FLAG_RULES
Residual RiskHigh - Simple regex easily bypassed
RecommendationsAdd behavioral analysis (VirusTotal Code Insight), AST-based detection
AttributeValue
ATLAS IDAML.T0043 - Craft Adversarial Data
DescriptionAttacker crafts content that escapes XML wrapper context
Attack VectorTag manipulation, context confusion, instruction override
Affected ComponentsExternal content wrapping
Current MitigationsXML tags + security notice
Residual RiskMedium - Novel escapes discovered regularly
RecommendationsMultiple wrapper layers, output-side validation

AttributeValue
ATLAS IDAML.T0040 - AI Model Inference API Access
DescriptionAttacker enumerates available tools through prompting
Attack Vector”What tools do you have?” style queries
Affected ComponentsAgent tool registry
Current MitigationsNone specific
Residual RiskLow - Tools generally documented
RecommendationsConsider tool visibility controls
AttributeValue
ATLAS IDAML.T0040 - AI Model Inference API Access
DescriptionAttacker extracts sensitive data from session context
Attack Vector”What did we discuss?” queries, context probing
Affected ComponentsSession transcripts, context window
Current MitigationsSession isolation per sender
Residual RiskMedium - Within-session data accessible
RecommendationsImplement sensitive data redaction in context

3.7 Collection & Exfiltration (AML.TA0009, AML.TA0010)

Section titled “3.7 Collection & Exfiltration (AML.TA0009, AML.TA0010)”
AttributeValue
ATLAS IDAML.T0009 - Collection
DescriptionAttacker exfiltrates data by instructing agent to send to external URL
Attack VectorPrompt injection causing agent to POST data to attacker server
Affected Componentsweb_fetch tool
Current MitigationsSSRF blocking for internal networks
Residual RiskHigh - External URLs permitted
RecommendationsImplement URL allowlisting, data classification awareness
AttributeValue
ATLAS IDAML.T0009 - Collection
DescriptionAttacker causes agent to send messages containing sensitive data
Attack VectorPrompt injection causing agent to message attacker
Affected ComponentsMessage tool, channel integrations
Current MitigationsOutbound messaging gating
Residual RiskMedium - Gating may be bypassed
RecommendationsRequire explicit confirmation for new recipients
AttributeValue
ATLAS IDAML.T0009 - Collection
DescriptionMalicious skill harvests credentials from agent context
Attack VectorSkill code reads environment variables, config files
Affected ComponentsSkill execution environment
Current MitigationsNone specific to skills
Residual RiskCritical - Skills run with agent privileges
RecommendationsSkill sandboxing, credential isolation

T-IMPACT-001: Unauthorized Command Execution

Section titled “T-IMPACT-001: Unauthorized Command Execution”
AttributeValue
ATLAS IDAML.T0031 - Erode AI Model Integrity
DescriptionAttacker executes arbitrary commands on user system
Attack VectorPrompt injection combined with exec approval bypass
Affected ComponentsBash tool, command execution
Current MitigationsExec approvals, Docker sandbox option
Residual RiskCritical - Host execution without sandbox
RecommendationsDefault to sandbox, improve approval UX
AttributeValue
ATLAS IDAML.T0031 - Erode AI Model Integrity
DescriptionAttacker exhausts API credits or compute resources
Attack VectorAutomated message flooding, expensive tool calls
Affected ComponentsGateway, agent sessions, API provider
Current MitigationsNone
Residual RiskHigh - No rate limiting
RecommendationsImplement per-sender rate limits, cost budgets
AttributeValue
ATLAS IDAML.T0031 - Erode AI Model Integrity
DescriptionAttacker causes agent to send harmful/offensive content
Attack VectorPrompt injection causing inappropriate responses
Affected ComponentsOutput generation, channel messaging
Current MitigationsLLM provider content policies
Residual RiskMedium - Provider filters imperfect
RecommendationsOutput filtering layer, user controls

ControlImplementationEffectiveness
GitHub Account AgerequireGitHubAccountAge()Medium - Raises bar for new attackers
Path SanitizationsanitizePath()High - Prevents path traversal
File Type ValidationisTextFile()Medium - Only text files, but can still be malicious
Size Limits50MB total bundleHigh - Prevents resource exhaustion
Required SKILL.mdMandatory readmeLow security value - Informational only
Pattern ModerationFLAG_RULES in moderation.tsLow - Easily bypassed
Moderation StatusmoderationStatus fieldMedium - Manual review possible

Current patterns in moderation.ts:

// Known-bad identifiers
/(keepcold131\/ClawdAuthenticatorTool|ClawdAuthenticatorTool)/i
// Suspicious keywords
/(malware|stealer|phish|phishing|keylogger)/i
/(api[-_ ]?key|token|password|private key|secret)/i
/(wallet|seed phrase|mnemonic|crypto)/i
/(discord\.gg|webhook|hooks\.slack)/i
/(curl[^\n]+\|\s*(sh|bash))/i
/(bit\.ly|tinyurl\.com|t\.co|goo\.gl|is\.gd)/i

Limitations:

  • Only checks slug, displayName, summary, frontmatter, metadata, file paths
  • Does not analyze actual skill code content
  • Simple regex easily bypassed with obfuscation
  • No behavioral analysis
ImprovementStatusImpact
VirusTotal IntegrationIn ProgressHigh - Code Insight behavioral analysis
Community ReportingPartial (skillReports table exists)Medium
Audit LoggingPartial (auditLogs table exists)Medium
Badge SystemImplementedMedium - highlighted, official, deprecated, redactionApproved

Threat IDLikelihoodImpactRisk LevelPriority
T-EXEC-001HighCriticalCriticalP0
T-PERSIST-001HighCriticalCriticalP0
T-EXFIL-003MediumCriticalCriticalP0
T-IMPACT-001MediumCriticalHighP1
T-EXEC-002HighHighHighP1
T-EXEC-004MediumHighHighP1
T-ACCESS-003MediumHighHighP1
T-EXFIL-001MediumHighHighP1
T-IMPACT-002HighMediumHighP1
T-EVADE-001HighMediumMediumP2
T-ACCESS-001LowHighMediumP2
T-ACCESS-002LowHighMediumP2
T-PERSIST-002LowHighMediumP2

Attack Chain 1: Skill-Based Data Theft

T-PERSIST-001 → T-EVADE-001 → T-EXFIL-003
(Publish malicious skill) → (Evade moderation) → (Harvest credentials)

Attack Chain 2: Prompt Injection to RCE

T-EXEC-001 → T-EXEC-004 → T-IMPACT-001
(Inject prompt) → (Bypass exec approval) → (Execute commands)

Attack Chain 3: Indirect Injection via Fetched Content

T-EXEC-002 → T-EXFIL-001 → External exfiltration
(Poison URL content) → (Agent fetches & follows instructions) → (Data sent to attacker)

IDRecommendationAddresses
R-001Complete VirusTotal integrationT-PERSIST-001, T-EVADE-001
R-002Implement skill sandboxingT-PERSIST-001, T-EXFIL-003
R-003Add output validation for sensitive actionsT-EXEC-001, T-EXEC-002
IDRecommendationAddresses
R-004Implement rate limitingT-IMPACT-002
R-005Add token encryption at restT-ACCESS-003
R-006Improve exec approval UX and validationT-EXEC-004
R-007Implement URL allowlisting for web_fetchT-EXFIL-001
IDRecommendationAddresses
R-008Add cryptographic channel verification where possibleT-ACCESS-002
R-009Implement config integrity verificationT-PERSIST-003
R-010Add update signing and version pinningT-PERSIST-002

ATLAS IDTechnique NameCoderClaw Threats
AML.T0006Active ScanningT-RECON-001, T-RECON-002
AML.T0009CollectionT-EXFIL-001, T-EXFIL-002, T-EXFIL-003
AML.T0010.001Supply Chain: AI SoftwareT-PERSIST-001, T-PERSIST-002
AML.T0010.002Supply Chain: DataT-PERSIST-003
AML.T0031Erode AI Model IntegrityT-IMPACT-001, T-IMPACT-002, T-IMPACT-003
AML.T0040AI Model Inference API AccessT-ACCESS-001, T-ACCESS-002, T-ACCESS-003, T-DISC-001, T-DISC-002
AML.T0043Craft Adversarial DataT-EXEC-004, T-EVADE-001, T-EVADE-002
AML.T0051.000LLM Prompt Injection: DirectT-EXEC-001, T-EXEC-003
AML.T0051.001LLM Prompt Injection: IndirectT-EXEC-002
PathPurposeRisk Level
src/infra/exec-approvals.tsCommand approval logicCritical
src/gateway/auth.tsGateway authenticationCritical
src/web/inbound/access-control.tsChannel access controlCritical
src/infra/net/ssrf.tsSSRF protectionCritical
src/security/external-content.tsPrompt injection mitigationCritical
src/agents/sandbox/tool-policy.tsTool policy enforcementCritical
convex/lib/moderation.tsClawHub moderationHigh
convex/lib/skillPublish.tsSkill publishing flowHigh
src/routing/resolve-route.tsSession isolationMedium
TermDefinition
ATLASMITRE’s Adversarial Threat Landscape for AI Systems
ClawHubCoderClaw’s skill marketplace
GatewayCoderClaw’s message routing and authentication layer
MCPModel Context Protocol - tool provider interface
Prompt InjectionAttack where malicious instructions are embedded in input
SkillDownloadable extension for CoderClaw agents
SSRFServer-Side Request Forgery

This threat model is a living document. Report security issues to [email protected]