coderClaw

Text-to-speech (TTS)

CoderClaw can convert outbound replies into audio using ElevenLabs, OpenAI, or Edge TTS. It works anywhere CoderClaw can send audio; Telegram gets a round voice-note bubble.

Supported services

Edge TTS notes

Edge TTS uses Microsoft Edge’s online neural TTS service via the node-edge-tts library. It’s a hosted service (not local), uses Microsoft’s endpoints, and does not require an API key. node-edge-tts exposes speech configuration options and output formats, but not all options are supported by the Edge service. citeturn2search0

Because Edge TTS is a public web service without a published SLA or quota, treat it as best-effort. If you need guaranteed limits and support, use OpenAI or ElevenLabs. Microsoft’s Speech REST API documents a 10‑minute audio limit per request; Edge TTS does not publish limits, so assume similar or lower limits. citeturn0search3

Optional keys

If you want OpenAI or ElevenLabs:

Edge TTS does not require an API key. If no API keys are found, CoderClaw defaults to Edge TTS (unless disabled via messages.tts.edge.enabled=false).

If multiple providers are configured, the selected provider is used first and the others are fallback options. Auto-summary uses the configured summaryModel (or agents.defaults.model.primary), so that provider must also be authenticated if you enable summaries.

Is it enabled by default?

No. Auto‑TTS is off by default. Enable it in config with messages.tts.auto or per session with /tts always (alias: /tts on).

Edge TTS is enabled by default once TTS is on, and is used automatically when no OpenAI or ElevenLabs API keys are available.

Config

TTS config lives under messages.tts in coderclaw.json. Full schema is in Gateway configuration.

Minimal config (enable + provider)

{
  messages: {
    tts: {
      auto: "always",
      provider: "elevenlabs",
    },
  },
}

OpenAI primary with ElevenLabs fallback

{
  messages: {
    tts: {
      auto: "always",
      provider: "openai",
      summaryModel: "openai/gpt-4.1-mini",
      modelOverrides: {
        enabled: true,
      },
      openai: {
        apiKey: "openai_api_key",
        model: "gpt-4o-mini-tts",
        voice: "alloy",
      },
      elevenlabs: {
        apiKey: "elevenlabs_api_key",
        baseUrl: "https://api.elevenlabs.io",
        voiceId: "voice_id",
        modelId: "eleven_multilingual_v2",
        seed: 42,
        applyTextNormalization: "auto",
        languageCode: "en",
        voiceSettings: {
          stability: 0.5,
          similarityBoost: 0.75,
          style: 0.0,
          useSpeakerBoost: true,
          speed: 1.0,
        },
      },
    },
  },
}

Edge TTS primary (no API key)

{
  messages: {
    tts: {
      auto: "always",
      provider: "edge",
      edge: {
        enabled: true,
        voice: "en-US-MichelleNeural",
        lang: "en-US",
        outputFormat: "audio-24khz-48kbitrate-mono-mp3",
        rate: "+10%",
        pitch: "-5%",
      },
    },
  },
}

Disable Edge TTS

{
  messages: {
    tts: {
      edge: {
        enabled: false,
      },
    },
  },
}

Custom limits + prefs path

{
  messages: {
    tts: {
      auto: "always",
      maxTextLength: 4000,
      timeoutMs: 30000,
      prefsPath: "~/.coderclaw/settings/tts.json",
    },
  },
}

Only reply with audio after an inbound voice note

{
  messages: {
    tts: {
      auto: "inbound",
    },
  },
}

Disable auto-summary for long replies

{
  messages: {
    tts: {
      auto: "always",
    },
  },
}

Then run:

/tts summary off

Notes on fields

Model-driven overrides (default on)

By default, the model can emit TTS directives for a single reply. When messages.tts.auto is tagged, these directives are required to trigger audio.

When enabled, the model can emit [[tts:...]] directives to override the voice for a single reply, plus an optional [[tts:text]]...[[/tts:text]] block to provide expressive tags (laughter, singing cues, etc) that should only appear in the audio.

Example reply payload:

Here you go.

[[tts:provider=elevenlabs voiceId=pMsXgVXv3BLzUgSXRplE model=eleven_v3 speed=1.1]]
[[tts:text]](laughs) Read the song once more.[[/tts:text]]

Available directive keys (when enabled):

Disable all model overrides:

{
  messages: {
    tts: {
      modelOverrides: {
        enabled: false,
      },
    },
  },
}

Optional allowlist (disable specific overrides while keeping tags enabled):

{
  messages: {
    tts: {
      modelOverrides: {
        enabled: true,
        allowProvider: false,
        allowSeed: false,
      },
    },
  },
}

Per-user preferences

Slash commands write local overrides to prefsPath (default: ~/.coderclaw/settings/tts.json, override with CODERCLAW_TTS_PREFS or messages.tts.prefsPath).

Stored fields:

These override messages.tts.* for that host.

Output formats (fixed)

OpenAI/ElevenLabs formats are fixed; Telegram expects Opus for voice-note UX.

Auto-TTS behavior

When enabled, CoderClaw:

If the reply exceeds maxLength and summary is off (or no API key for the summary model), audio is skipped and the normal text reply is sent.

Flow diagram

Reply -> TTS enabled?
  no  -> send text
  yes -> has media / MEDIA: / short?
          yes -> send text
          no  -> length > limit?
                   no  -> TTS -> attach audio
                   yes -> summary enabled?
                            no  -> send text
                            yes -> summarize (summaryModel or agents.defaults.model.primary)
                                      -> TTS -> attach audio

Slash command usage

There is a single command: /tts. See Slash commands for enablement details.

Discord note: /tts is a built-in Discord command, so CoderClaw registers /voice as the native command there. Text /tts ... still works.

/tts off
/tts always
/tts inbound
/tts tagged
/tts status
/tts provider openai
/tts limit 2000
/tts summary off
/tts audio Hello from CoderClaw

Notes:

Agent tool

The tts tool converts text to speech and returns a MEDIA: path. When the result is Telegram-compatible, the tool includes [[audio_as_voice]] so Telegram sends a voice bubble.

Gateway RPC

Gateway methods: