coderClaw

Memory

CoderClaw memory is plain Markdown in the agent workspace. The files are the source of truth; the model only “remembers” what gets written to disk.

Memory search tools are provided by the active memory plugin (default: memory-core). Disable memory plugins with plugins.slots.memory = "none".

Memory files (Markdown)

The default workspace layout uses two memory layers:

These files live under the workspace (agents.defaults.workspace, default ~/.coderclaw/workspace). See Agent workspace for the full layout.

When to write memory

Automatic memory flush (pre-compaction ping)

When a session is close to auto-compaction, CoderClaw triggers a silent, agentic turn that reminds the model to write durable memory before the context is compacted. The default prompts explicitly say the model may reply, but usually NO_REPLY is the correct response so the user never sees this turn.

This is controlled by agents.defaults.compaction.memoryFlush:

{
  agents: {
    defaults: {
      compaction: {
        reserveTokensFloor: 20000,
        memoryFlush: {
          enabled: true,
          softThresholdTokens: 4000,
          systemPrompt: "Session nearing compaction. Store durable memories now.",
          prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store.",
        },
      },
    },
  },
}

Details:

For the full compaction lifecycle, see Session management + compaction.

CoderClaw can build a small vector index over MEMORY.md and memory/*.md so semantic queries can find related notes even when wording differs.

Defaults:

Remote embeddings require an API key for the embedding provider. CoderClaw resolves keys from auth profiles, models.providers.*.apiKey, or environment variables. Codex OAuth only covers chat/completions and does not satisfy embeddings for memory search. For Gemini, use GEMINI_API_KEY or models.providers.google.apiKey. For Voyage, use VOYAGE_API_KEY or models.providers.voyage.apiKey. When using a custom OpenAI-compatible endpoint, set memorySearch.remote.apiKey (and optional memorySearch.remote.headers).

QMD backend (experimental)

Set memory.backend = "qmd" to swap the built-in SQLite indexer for QMD: a local-first search sidecar that combines BM25 + vectors + reranking. Markdown stays the source of truth; CoderClaw shells out to QMD for retrieval. Key points:

Prereqs

How the sidecar runs

Config surface (memory.qmd.*)

Example

memory: {
  backend: "qmd",
  citations: "auto",
  qmd: {
    includeDefaultMemory: true,
    update: { interval: "5m", debounceMs: 15000 },
    limits: { maxResults: 6, timeoutMs: 4000 },
    scope: {
      default: "deny",
      rules: [
        { action: "allow", match: { chatType: "direct" } },
        // Normalized session-key prefix (strips `agent:<id>:`).
        { action: "deny", match: { keyPrefix: "discord:channel:" } },
        // Raw session-key prefix (includes `agent:<id>:`).
        { action: "deny", match: { rawKeyPrefix: "agent:main:discord:" } },
      ]
    },
    paths: [
      { name: "docs", path: "~/notes", pattern: "**/*.md" }
    ]
  }
}

Citations & fallback

Additional memory paths

If you want to index Markdown files outside the default workspace layout, add explicit paths:

agents: {
  defaults: {
    memorySearch: {
      extraPaths: ["../team-docs", "/srv/shared-notes/overview.md"]
    }
  }
}

Notes:

Gemini embeddings (native)

Set the provider to gemini to use the Gemini embeddings API directly:

agents: {
  defaults: {
    memorySearch: {
      provider: "gemini",
      model: "gemini-embedding-001",
      remote: {
        apiKey: "YOUR_GEMINI_API_KEY"
      }
    }
  }
}

Notes:

If you want to use a custom OpenAI-compatible endpoint (OpenRouter, vLLM, or a proxy), you can use the remote configuration with the OpenAI provider:

agents: {
  defaults: {
    memorySearch: {
      provider: "openai",
      model: "text-embedding-3-small",
      remote: {
        baseUrl: "https://api.example.com/v1/",
        apiKey: "YOUR_OPENAI_COMPAT_API_KEY",
        headers: { "X-Custom-Header": "value" }
      }
    }
  }
}

If you don’t want to set an API key, use memorySearch.provider = "local" or set memorySearch.fallback = "none".

Fallbacks:

Batch indexing (OpenAI + Gemini + Voyage):

Why OpenAI batch is fast + cheap:

Config example:

agents: {
  defaults: {
    memorySearch: {
      provider: "openai",
      model: "text-embedding-3-small",
      fallback: "openai",
      remote: {
        batch: { enabled: true, concurrency: 2 }
      },
      sync: { watch: true }
    }
  }
}

Tools:

Local mode:

How the memory tools work

What gets indexed (and when)

Hybrid search (BM25 + vector)

When enabled, CoderClaw combines:

If full-text search is unavailable on your platform, CoderClaw falls back to vector-only search.

Why hybrid?

Vector search is great at “this means the same thing”:

But it can be weak at exact, high-signal tokens:

BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases. Hybrid search is the pragmatic middle ground: use both retrieval signals so you get good results for both “natural language” queries and “needle in a haystack” queries.

How we merge results (the current design)

Implementation sketch:

  1. Retrieve a candidate pool from both sides:
  1. Convert BM25 rank into a 0..1-ish score:
  1. Union candidates by chunk id and compute a weighted score:

Notes:

This isn’t “IR-theory perfect”, but it’s simple, fast, and tends to improve recall/precision on real notes. If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization (min/max or z-score) before mixing.

Post-processing pipeline

After merging vector and keyword scores, two optional post-processing stages refine the result list before it reaches the agent:

Vector + Keyword → Weighted Merge → Temporal Decay → Sort → MMR → Top-K Results

Both stages are off by default and can be enabled independently.

MMR re-ranking (diversity)

When hybrid search returns results, multiple chunks may contain similar or overlapping content. For example, searching for “home network setup” might return five nearly identical snippets from different daily notes that all mention the same router configuration.

MMR (Maximal Marginal Relevance) re-ranks the results to balance relevance with diversity, ensuring the top results cover different aspects of the query instead of repeating the same information.

How it works:

  1. Results are scored by their original relevance (vector + BM25 weighted score).
  2. MMR iteratively selects results that maximize: λ × relevance − (1−λ) × max_similarity_to_selected.
  3. Similarity between results is measured using Jaccard text similarity on tokenized content.

The lambda parameter controls the trade-off:

Example — query: “home network setup”

Given these memory files:

memory/2026-02-10.md  → "Configured Omada router, set VLAN 10 for IoT devices"
memory/2026-02-08.md  → "Configured Omada router, moved IoT to VLAN 10"
memory/2026-02-05.md  → "Set up AdGuard DNS on 192.168.10.2"
memory/network.md     → "Router: Omada ER605, AdGuard: 192.168.10.2, VLAN 10: IoT"

Without MMR — top 3 results:

1. memory/2026-02-10.md  (score: 0.92)  ← router + VLAN
2. memory/2026-02-08.md  (score: 0.89)  ← router + VLAN (near-duplicate!)
3. memory/network.md     (score: 0.85)  ← reference doc

With MMR (λ=0.7) — top 3 results:

1. memory/2026-02-10.md  (score: 0.92)  ← router + VLAN
2. memory/network.md     (score: 0.85)  ← reference doc (diverse!)
3. memory/2026-02-05.md  (score: 0.78)  ← AdGuard DNS (diverse!)

The near-duplicate from Feb 8 drops out, and the agent gets three distinct pieces of information.

When to enable: If you notice memory_search returning redundant or near-duplicate snippets, especially with daily notes that often repeat similar information across days.

Temporal decay (recency boost)

Agents with daily notes accumulate hundreds of dated files over time. Without decay, a well-worded note from six months ago can outrank yesterday’s update on the same topic.

Temporal decay applies an exponential multiplier to scores based on the age of each result, so recent memories naturally rank higher while old ones fade:

decayedScore = score × e^(-λ × ageInDays)

where λ = ln(2) / halfLifeDays.

With the default half-life of 30 days:

Evergreen files are never decayed:

Dated daily files (memory/YYYY-MM-DD.md) use the date extracted from the filename. Other sources (e.g., session transcripts) fall back to file modification time (mtime).

Example — query: “what’s Rod’s work schedule?”

Given these memory files (today is Feb 10):

memory/2025-09-15.md  → "Rod works Mon-Fri, standup at 10am, pairing at 2pm"  (148 days old)
memory/2026-02-10.md  → "Rod has standup at 14:15, 1:1 with Zeb at 14:45"    (today)
memory/2026-02-03.md  → "Rod started new team, standup moved to 14:15"        (7 days old)

Without decay:

1. memory/2025-09-15.md  (score: 0.91)  ← best semantic match, but stale!
2. memory/2026-02-10.md  (score: 0.82)
3. memory/2026-02-03.md  (score: 0.80)

With decay (halfLife=30):

1. memory/2026-02-10.md  (score: 0.82 × 1.00 = 0.82)  ← today, no decay
2. memory/2026-02-03.md  (score: 0.80 × 0.85 = 0.68)  ← 7 days, mild decay
3. memory/2025-09-15.md  (score: 0.91 × 0.03 = 0.03)  ← 148 days, nearly gone

The stale September note drops to the bottom despite having the best raw semantic match.

When to enable: If your agent has months of daily notes and you find that old, stale information outranks recent context. A half-life of 30 days works well for daily-note-heavy workflows; increase it (e.g., 90 days) if you reference older notes frequently.

Configuration

Both features are configured under memorySearch.query.hybrid:

agents: {
  defaults: {
    memorySearch: {
      query: {
        hybrid: {
          enabled: true,
          vectorWeight: 0.7,
          textWeight: 0.3,
          candidateMultiplier: 4,
          // Diversity: reduce redundant results
          mmr: {
            enabled: true,    // default: false
            lambda: 0.7       // 0 = max diversity, 1 = max relevance
          },
          // Recency: boost newer memories
          temporalDecay: {
            enabled: true,    // default: false
            halfLifeDays: 30  // score halves every 30 days
          }
        }
      }
    }
  }
}

You can enable either feature independently:

Embedding cache

CoderClaw can cache chunk embeddings in SQLite so reindexing and frequent updates (especially session transcripts) don’t re-embed unchanged text.

Config:

agents: {
  defaults: {
    memorySearch: {
      cache: {
        enabled: true,
        maxEntries: 50000
      }
    }
  }
}

Session memory search (experimental)

You can optionally index session transcripts and surface them via memory_search. This is gated behind an experimental flag.

agents: {
  defaults: {
    memorySearch: {
      experimental: { sessionMemory: true },
      sources: ["memory", "sessions"]
    }
  }
}

Notes:

Delta thresholds (defaults shown):

agents: {
  defaults: {
    memorySearch: {
      sync: {
        sessions: {
          deltaBytes: 100000,   // ~100 KB
          deltaMessages: 50     // JSONL lines
        }
      }
    }
  }
}

SQLite vector acceleration (sqlite-vec)

When the sqlite-vec extension is available, CoderClaw stores embeddings in a SQLite virtual table (vec0) and performs vector distance queries in the database. This keeps search fast without loading every embedding into JS.

Configuration (optional):

agents: {
  defaults: {
    memorySearch: {
      store: {
        vector: {
          enabled: true,
          extensionPath: "/path/to/sqlite-vec"
        }
      }
    }
  }
}

Notes:

Local embedding auto-download

Custom OpenAI-compatible endpoint example

agents: {
  defaults: {
    memorySearch: {
      provider: "openai",
      model: "text-embedding-3-small",
      remote: {
        baseUrl: "https://api.example.com/v1/",
        apiKey: "YOUR_REMOTE_API_KEY",
        headers: {
          "X-Organization": "org-id",
          "X-Project": "project-id"
        }
      }
    }
  }
}

Notes: