Hugging Face Inference Providers offer OpenAI-compatible chat completions through a single router API. You get access to many models (DeepSeek, Llama, and more) with one token. CoderClaw uses the OpenAI-compatible endpoint (chat completions only); for text-to-image, embeddings, or speech use the HF inference clients directly.
huggingfaceHUGGINGFACE_HUB_TOKEN or HF_TOKEN (fine-grained token with Make calls to Inference Providers)https://router.huggingface.co/v1)coderclaw onboard --auth-choice huggingface-api-key
{
agents: {
defaults: {
model: { primary: "huggingface/deepseek-ai/DeepSeek-R1" },
},
},
}
coderclaw onboard --non-interactive \
--mode local \
--auth-choice huggingface-api-key \
--huggingface-api-key "$HF_TOKEN"
This will set huggingface/deepseek-ai/DeepSeek-R1 as the default model.
If the Gateway runs as a daemon (launchd/systemd), make sure HUGGINGFACE_HUB_TOKEN or HF_TOKEN
is available to that process (for example, in ~/.coderclaw/.env or via
env.shellEnv).
CoderClaw discovers models by calling the Inference endpoint directly:
GET https://router.huggingface.co/v1/models
(Optional: send Authorization: Bearer $HUGGINGFACE_HUB_TOKEN or $HF_TOKEN for the full list; some endpoints return a subset without auth.) The response is OpenAI-style { "object": "list", "data": [ { "id": "Qwen/Qwen3-8B", "owned_by": "Qwen", ... }, ... ] }.
When you configure a Hugging Face API key (via onboarding, HUGGINGFACE_HUB_TOKEN, or HF_TOKEN), CoderClaw uses this GET to discover available chat-completion models. During interactive onboarding, after you enter your token you see a Default Hugging Face model dropdown populated from that list (or the built-in catalog if the request fails). At runtime (e.g. Gateway startup), when a key is present, CoderClaw again calls GET https://router.huggingface.co/v1/models to refresh the catalog. The list is merged with a built-in catalog (for metadata like context window and cost). If the request fails or no key is set, only the built-in catalog is used.
name, title, or display_name; otherwise it is derived from the model id (e.g. deepseek-ai/DeepSeek-R1 → “DeepSeek R1”).{
agents: {
defaults: {
models: {
"huggingface/deepseek-ai/DeepSeek-R1": { alias: "DeepSeek R1 (fast)" },
"huggingface/deepseek-ai/DeepSeek-R1:cheapest": { alias: "DeepSeek R1 (cheap)" },
},
},
},
}
:fastest — highest throughput (router picks; provider choice is locked — no interactive backend picker).:cheapest — lowest cost per output token (router picks; provider choice is locked).:provider — force a specific backend (e.g. :sambanova, :together).When you select :cheapest or :fastest (e.g. in the onboarding model dropdown), the provider is locked: the router decides by cost or speed and no optional “prefer specific backend” step is shown. You can add these as separate entries in models.providers.huggingface.models or set model.primary with the suffix. You can also set your default order in Inference Provider settings (no suffix = use that order).
models.providers.huggingface.models (e.g. in models.json) are kept when config is merged. So any custom name, alias, or model options you set there are preserved.Model refs use the form huggingface/<org>/<model> (Hub-style IDs). The list below is from GET https://router.huggingface.co/v1/models; your catalog may include more.
Example IDs (from the inference endpoint):
| Model | Ref (prefix with huggingface/) |
|---|---|
| DeepSeek R1 | deepseek-ai/DeepSeek-R1 |
| DeepSeek V3.2 | deepseek-ai/DeepSeek-V3.2 |
| Qwen3 8B | Qwen/Qwen3-8B |
| Qwen2.5 7B Instruct | Qwen/Qwen2.5-7B-Instruct |
| Qwen3 32B | Qwen/Qwen3-32B |
| Llama 3.3 70B Instruct | meta-llama/Llama-3.3-70B-Instruct |
| Llama 3.1 8B Instruct | meta-llama/Llama-3.1-8B-Instruct |
| GPT-OSS 120B | openai/gpt-oss-120b |
| GLM 4.7 | zai-org/GLM-4.7 |
| Kimi K2.5 | moonshotai/Kimi-K2.5 |
You can append :fastest, :cheapest, or :provider (e.g. :together, :sambanova) to the model id. Set your default order in Inference Provider settings; see Inference Providers and GET https://router.huggingface.co/v1/models for the full list.
Primary DeepSeek R1 with Qwen fallback:
{
agents: {
defaults: {
model: {
primary: "huggingface/deepseek-ai/DeepSeek-R1",
fallbacks: ["huggingface/Qwen/Qwen3-8B"],
},
models: {
"huggingface/deepseek-ai/DeepSeek-R1": { alias: "DeepSeek R1" },
"huggingface/Qwen/Qwen3-8B": { alias: "Qwen3 8B" },
},
},
},
}
Qwen as default, with :cheapest and :fastest variants:
{
agents: {
defaults: {
model: { primary: "huggingface/Qwen/Qwen3-8B" },
models: {
"huggingface/Qwen/Qwen3-8B": { alias: "Qwen3 8B" },
"huggingface/Qwen/Qwen3-8B:cheapest": { alias: "Qwen3 8B (cheapest)" },
"huggingface/Qwen/Qwen3-8B:fastest": { alias: "Qwen3 8B (fastest)" },
},
},
},
}
DeepSeek + Llama + GPT-OSS with aliases:
{
agents: {
defaults: {
model: {
primary: "huggingface/deepseek-ai/DeepSeek-V3.2",
fallbacks: [
"huggingface/meta-llama/Llama-3.3-70B-Instruct",
"huggingface/openai/gpt-oss-120b",
],
},
models: {
"huggingface/deepseek-ai/DeepSeek-V3.2": { alias: "DeepSeek V3.2" },
"huggingface/meta-llama/Llama-3.3-70B-Instruct": { alias: "Llama 3.3 70B" },
"huggingface/openai/gpt-oss-120b": { alias: "GPT-OSS 120B" },
},
},
},
}
Force a specific backend with :provider:
{
agents: {
defaults: {
model: { primary: "huggingface/deepseek-ai/DeepSeek-R1:together" },
models: {
"huggingface/deepseek-ai/DeepSeek-R1:together": { alias: "DeepSeek R1 (Together)" },
},
},
},
}
Multiple Qwen and DeepSeek models with policy suffixes:
{
agents: {
defaults: {
model: { primary: "huggingface/Qwen/Qwen2.5-7B-Instruct:cheapest" },
models: {
"huggingface/Qwen/Qwen2.5-7B-Instruct": { alias: "Qwen2.5 7B" },
"huggingface/Qwen/Qwen2.5-7B-Instruct:cheapest": { alias: "Qwen2.5 7B (cheap)" },
"huggingface/deepseek-ai/DeepSeek-R1:fastest": { alias: "DeepSeek R1 (fast)" },
"huggingface/meta-llama/Llama-3.1-8B-Instruct": { alias: "Llama 3.1 8B" },
},
},
},
}