CoderClaw can convert outbound replies into audio using ElevenLabs, OpenAI, or Edge TTS. It works anywhere CoderClaw can send audio; Telegram gets a round voice-note bubble.
node-edge-tts, default when no API keys)Edge TTS uses Microsoft Edge’s online neural TTS service via the node-edge-tts
library. It’s a hosted service (not local), uses Microsoft’s endpoints, and does
not require an API key. node-edge-tts exposes speech configuration options and
output formats, but not all options are supported by the Edge service. citeturn2search0
Because Edge TTS is a public web service without a published SLA or quota, treat it as best-effort. If you need guaranteed limits and support, use OpenAI or ElevenLabs. Microsoft’s Speech REST API documents a 10‑minute audio limit per request; Edge TTS does not publish limits, so assume similar or lower limits. citeturn0search3
If you want OpenAI or ElevenLabs:
ELEVENLABS_API_KEY (or XI_API_KEY)OPENAI_API_KEYEdge TTS does not require an API key. If no API keys are found, CoderClaw defaults
to Edge TTS (unless disabled via messages.tts.edge.enabled=false).
If multiple providers are configured, the selected provider is used first and the others are fallback options.
Auto-summary uses the configured summaryModel (or agents.defaults.model.primary),
so that provider must also be authenticated if you enable summaries.
No. Auto‑TTS is off by default. Enable it in config with
messages.tts.auto or per session with /tts always (alias: /tts on).
Edge TTS is enabled by default once TTS is on, and is used automatically when no OpenAI or ElevenLabs API keys are available.
TTS config lives under messages.tts in coderclaw.json.
Full schema is in Gateway configuration.
{
messages: {
tts: {
auto: "always",
provider: "elevenlabs",
},
},
}
{
messages: {
tts: {
auto: "always",
provider: "openai",
summaryModel: "openai/gpt-4.1-mini",
modelOverrides: {
enabled: true,
},
openai: {
apiKey: "openai_api_key",
model: "gpt-4o-mini-tts",
voice: "alloy",
},
elevenlabs: {
apiKey: "elevenlabs_api_key",
baseUrl: "https://api.elevenlabs.io",
voiceId: "voice_id",
modelId: "eleven_multilingual_v2",
seed: 42,
applyTextNormalization: "auto",
languageCode: "en",
voiceSettings: {
stability: 0.5,
similarityBoost: 0.75,
style: 0.0,
useSpeakerBoost: true,
speed: 1.0,
},
},
},
},
}
{
messages: {
tts: {
auto: "always",
provider: "edge",
edge: {
enabled: true,
voice: "en-US-MichelleNeural",
lang: "en-US",
outputFormat: "audio-24khz-48kbitrate-mono-mp3",
rate: "+10%",
pitch: "-5%",
},
},
},
}
{
messages: {
tts: {
edge: {
enabled: false,
},
},
},
}
{
messages: {
tts: {
auto: "always",
maxTextLength: 4000,
timeoutMs: 30000,
prefsPath: "~/.coderclaw/settings/tts.json",
},
},
}
{
messages: {
tts: {
auto: "inbound",
},
},
}
{
messages: {
tts: {
auto: "always",
},
},
}
Then run:
/tts summary off
auto: auto‑TTS mode (off, always, inbound, tagged).
inbound only sends audio after an inbound voice note.tagged only sends audio when the reply includes [[tts]] tags.enabled: legacy toggle (doctor migrates this to auto).mode: "final" (default) or "all" (includes tool/block replies).provider: "elevenlabs", "openai", or "edge" (fallback is automatic).provider is unset, CoderClaw prefers openai (if key), then elevenlabs (if key),
otherwise edge.summaryModel: optional cheap model for auto-summary; defaults to agents.defaults.model.primary.
provider/model or a configured model alias.modelOverrides: allow the model to emit TTS directives (on by default).maxTextLength: hard cap for TTS input (chars). /tts audio fails if exceeded.timeoutMs: request timeout (ms).prefsPath: override the local prefs JSON path (provider/limit/summary).apiKey values fall back to env vars (ELEVENLABS_API_KEY/XI_API_KEY, OPENAI_API_KEY).elevenlabs.baseUrl: override ElevenLabs API base URL.elevenlabs.voiceSettings:
stability, similarityBoost, style: 0..1useSpeakerBoost: true|falsespeed: 0.5..2.0 (1.0 = normal)elevenlabs.applyTextNormalization: auto|on|offelevenlabs.languageCode: 2-letter ISO 639-1 (e.g. en, de)elevenlabs.seed: integer 0..4294967295 (best-effort determinism)edge.enabled: allow Edge TTS usage (default true; no API key).edge.voice: Edge neural voice name (e.g. en-US-MichelleNeural).edge.lang: language code (e.g. en-US).edge.outputFormat: Edge output format (e.g. audio-24khz-48kbitrate-mono-mp3).
edge.rate / edge.pitch / edge.volume: percent strings (e.g. +10%, -5%).edge.saveSubtitles: write JSON subtitles alongside the audio file.edge.proxy: proxy URL for Edge TTS requests.edge.timeoutMs: request timeout override (ms).By default, the model can emit TTS directives for a single reply.
When messages.tts.auto is tagged, these directives are required to trigger audio.
When enabled, the model can emit [[tts:...]] directives to override the voice
for a single reply, plus an optional [[tts:text]]...[[/tts:text]] block to
provide expressive tags (laughter, singing cues, etc) that should only appear in
the audio.
Example reply payload:
Here you go.
[[tts:provider=elevenlabs voiceId=pMsXgVXv3BLzUgSXRplE model=eleven_v3 speed=1.1]]
[[tts:text]](laughs) Read the song once more.[[/tts:text]]
Available directive keys (when enabled):
provider (openai |
elevenlabs |
edge) |
voice (OpenAI voice) or voiceId (ElevenLabs)model (OpenAI TTS model or ElevenLabs model id)stability, similarityBoost, style, speed, useSpeakerBoostapplyTextNormalization (auto|on|off)languageCode (ISO 639-1)seedDisable all model overrides:
{
messages: {
tts: {
modelOverrides: {
enabled: false,
},
},
},
}
Optional allowlist (disable specific overrides while keeping tags enabled):
{
messages: {
tts: {
modelOverrides: {
enabled: true,
allowProvider: false,
allowSeed: false,
},
},
},
}
Slash commands write local overrides to prefsPath (default:
~/.coderclaw/settings/tts.json, override with CODERCLAW_TTS_PREFS or
messages.tts.prefsPath).
Stored fields:
enabledprovidermaxLength (summary threshold; default 1500 chars)summarize (default true)These override messages.tts.* for that host.
opus_48000_64 from ElevenLabs, opus from OpenAI).
mp3_44100_128 from ElevenLabs, mp3 from OpenAI).
edge.outputFormat (default audio-24khz-48kbitrate-mono-mp3).
node-edge-tts accepts an outputFormat, but not all formats are available
from the Edge service. citeturn2search0sendVoice accepts OGG/MP3/M4A; use OpenAI/ElevenLabs if you need
guaranteed Opus voice notes. citeturn1search1OpenAI/ElevenLabs formats are fixed; Telegram expects Opus for voice-note UX.
When enabled, CoderClaw:
MEDIA: directive.agents.defaults.model.primary (or summaryModel).If the reply exceeds maxLength and summary is off (or no API key for the
summary model), audio
is skipped and the normal text reply is sent.
Reply -> TTS enabled?
no -> send text
yes -> has media / MEDIA: / short?
yes -> send text
no -> length > limit?
no -> TTS -> attach audio
yes -> summary enabled?
no -> send text
yes -> summarize (summaryModel or agents.defaults.model.primary)
-> TTS -> attach audio
There is a single command: /tts.
See Slash commands for enablement details.
Discord note: /tts is a built-in Discord command, so CoderClaw registers
/voice as the native command there. Text /tts ... still works.
/tts off
/tts always
/tts inbound
/tts tagged
/tts status
/tts provider openai
/tts limit 2000
/tts summary off
/tts audio Hello from CoderClaw
Notes:
commands.text or native command registration must be enabled.off|always|inbound|tagged are per‑session toggles (/tts on is an alias for /tts always).limit and summary are stored in local prefs, not the main config./tts audio generates a one-off audio reply (does not toggle TTS on).The tts tool converts text to speech and returns a MEDIA: path. When the
result is Telegram-compatible, the tool includes [[audio_as_voice]] so
Telegram sends a voice bubble.
Gateway methods:
tts.statustts.enabletts.disabletts.converttts.setProvidertts.providers