coderClaw

Transcript Hygiene (Provider Fixups)

This document describes provider-specific fixes applied to transcripts before a run (building model context). These are in-memory adjustments used to satisfy strict provider requirements. These hygiene steps do not rewrite the stored JSONL transcript on disk; however, a separate session-file repair pass may rewrite malformed JSONL files by dropping invalid lines before the session is loaded. When a repair occurs, the original file is backed up alongside the session file.

Scope includes:

If you need transcript storage details, see:


Where this runs

All transcript hygiene is centralized in the embedded runner:

The policy uses provider, modelApi, and modelId to decide what to apply.

Separate from transcript hygiene, session files are repaired (if needed) before load:


Global rule: image sanitization

Image payloads are always sanitized to prevent provider-side rejection due to size limits (downscale/recompress oversized base64 images).

This also helps control image-driven token pressure for vision-capable models. Lower max dimensions generally reduce token usage; higher dimensions preserve detail.

Implementation:


Global rule: malformed tool calls

Assistant tool-call blocks that are missing both input and arguments are dropped before model context is built. This prevents provider rejections from partially persisted tool calls (for example, after a rate limit failure).

Implementation:


Global rule: inter-session input provenance

When an agent sends a prompt into another session via sessions_send (including agent-to-agent reply/announce steps), CoderClaw persists the created user turn with:

This metadata is written at transcript append time and does not change role (role: "user" remains for provider compatibility). Transcript readers can use this to avoid treating routed internal prompts as end-user-authored instructions.

During context rebuild, CoderClaw also prepends a short [Inter-session message] marker to those user turns in-memory so the model can distinguish them from external end-user instructions.


Provider matrix (current behavior)

OpenAI / OpenAI Codex

Google (Generative AI / Gemini CLI / Antigravity)

Anthropic / Minimax (Anthropic-compatible)

Mistral (including model-id based detection)

OpenRouter Gemini

Everything else


Historical behavior (pre-2026.1.22)

Before the 2026.1.22 release, CoderClaw applied multiple layers of transcript hygiene:

This complexity caused cross-provider regressions (notably openai-responses call_id|fc_id pairing). The 2026.1.22 cleanup removed the extension, centralized logic in the runner, and made OpenAI no-touch beyond image sanitization.