Documentation Index
Fetch the complete documentation index at: https://tinytalk.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Tiny Talk gives you access to models from multiple model providers (OpenAI, Anthropic, Google, Meta, xAI, DeepSeek, Mistral, Cohere, and Qwen) served through different inference providers (OpenAI, OpenRouter, Groq, and Azure). Each model has different strengths, speeds, and credit costs.
How credits work
On plans that include AI credits (Basic AI, Standard AI, Pro AI), each agent response consumes credits based on the model used. Credits reset monthly.
| Plan | Monthly Credits |
|---|
| Basic AI | 2,000 |
| Standard AI | 10,000 |
| Pro AI | 40,000 |
Most models cost 1 credit per response. Mid-tier models (GPT-4o, Claude Sonnet, Gemini Pro) cost 2 credits. Higher-tier models (GPT-4, o1) cost 5 credits, and top-tier models (Claude Opus 4) cost 10 credits.
Bring Your Own Key (BYOK)
On legacy plans (Basic, Team, Enterprise) and some other plans, you can use your own API key instead of credits. Go to Integrations → Hub and enter your inference provider API key. The agent will use your key for all API calls, with no credit consumption on Tiny Talk’s side.
After entering a key, click Verify to test it. Tiny Talk will check the key and display the models available for it. Make sure the model you want to use for your agent appears in this list.
An OpenAI key is always required on BYOK plans — Tiny Talk uses OpenAI’s embedding model for knowledge base indexing and search, regardless of which model you choose for agent responses.
Available models
OpenAI
| Model | Credit Cost | Reasoning | Features |
|---|
| GPT-3.5 Turbo | 1 | — | Fast, affordable general-purpose model |
| GPT-4 | 5 | — | High-intelligence model for complex tasks |
| GPT-4 Turbo | 5 | — | Faster, cheaper GPT-4 with vision support |
| GPT-4o | 2 | — | Versatile flagship model with vision |
| GPT-4o Mini | 1 | — | Fast, affordable with vision support |
| GPT-4.1 | 2 | — | Flagship for complex tasks |
| GPT-4.1 Mini | 1 | — | Balanced intelligence, speed, and cost |
| GPT-4.1 Nano | 1 | — | Fastest, most cost-effective GPT-4.1 |
| GPT-5 | 1 | Yes | Flagship for coding, reasoning, and agentic tasks |
| GPT-5 Mini | 1 | Yes | Faster, cost-efficient version of GPT-5 |
| GPT-5 Nano | 1 | Yes | Fastest, cheapest GPT-5 variant |
| GPT-5.1 | 2 | Yes | Advanced reasoning with vision |
| GPT-5.2 | 2 | Yes | Advanced reasoning with vision |
| GPT-5.2 Chat | 1 | Yes (medium only) | Cost-efficient reasoning with vision |
| o1 | 5 | Yes | Reasoning model with chain-of-thought |
| o3-mini | 1 | Yes | Fast reasoning model |
Anthropic (via OpenRouter)
| Model | Credit Cost | Reasoning | Features |
|---|
| Claude Opus 4 | 10 | — | Top coding model with sustained performance |
| Claude Sonnet 4.6 | 2 | — | Latest Sonnet with enhanced coding and reasoning |
| Claude Sonnet 4 | 2 | — | Enhanced coding and reasoning with precision |
| Claude 3.7 Sonnet | 2 | — | Improved reasoning and problem-solving |
| Claude 3 Haiku | 1 | — | Fastest Anthropic model, near-instant responses |
Google (via OpenRouter)
| Model | Credit Cost | Reasoning | Features |
|---|
| Gemini 3.1 Pro Preview | 2 | Yes | Latest flagship with reasoning, 1M token context |
| Gemini 3 Flash Preview | 1 | Yes | Optimized for speed, 1M token context |
| Gemini 2.5 Pro | 2 | — | Advanced reasoning, coding, and math |
| Gemini 2.5 Flash | 1 | — | Fast workhorse for reasoning tasks |
| Gemma 3 27B | 1 | — | Multilingual, 140+ languages |
| Model | Credit Cost | Reasoning | Features |
|---|
| Llama 4 Maverick | 1 | — | Multimodal, 12 languages |
| Llama 4 Scout | 1 | — | MoE model for multilingual tasks |
| Llama 3.3 70B Instruct | 1 | — | Multilingual dialogue, 8 languages |
| Llama 3.1 8B Instruct | 1 | — | Lightweight, low-latency |
xAI (via OpenRouter)
| Model | Credit Cost | Reasoning | Features |
|---|
| Grok 4.1 Fast | 1 | Yes | Fast reasoning model, 256k context |
| Grok 4 | 2 | Yes | Latest reasoning model, 256k context |
| Grok 3 Beta | 2 | — | Enterprise use cases, deep domain knowledge |
| Grok 3 Mini Beta | 1 | — | Lightweight thinking model |
Other model providers (via OpenRouter)
| Model | Credit Cost | Reasoning | Model Provider |
|---|
| DeepSeek-V3.2 | 1 | Yes | DeepSeek |
| DeepSeek-R1 | 1 | Yes | DeepSeek |
| DeepSeek V3 0324 | 1 | — | DeepSeek |
| Mistral Nemo | 1 | — | MistralAI |
| Mistral Small 3 | 1 | — | MistralAI |
| Command R | 1 | — | Cohere |
| Qwen3 235B A22B | 1 | — | Qwen |
| Qwen 2.5 | 1 | — | Qwen |
Groq (inference provider)
| Model | Credit Cost | Reasoning | Features |
|---|
| Llama 3.3 70B | 1 | — | Ultra-fast inference on Groq hardware |
| Llama 3.1 8B Instant | 1 | — | Instant responses, low latency |
| GPT-OSS 120B | 1 | Yes (low/medium/high) | Large reasoning model |
| GPT-OSS 20B | 1 | Yes (low/medium/high) | Smaller reasoning model |
Azure OpenAI (inference provider, EU data residency)
Available on Pro plan only. These models run on Azure’s European (Germany) infrastructure for data residency compliance.
| Model | Credit Cost | Reasoning | Features |
|---|
| GPT-4o (Azure EU/DE) | 2 | — | European data residency |
| GPT-4o Mini (Azure EU/DE) | 1 | — | European data residency |
Need a different model on Azure EU? Pro plan users can contact support to request provisioning of additional models.
Reasoning-capable models
Reasoning-capable models spend additional compute “thinking” before they answer, which improves quality on complex, multi-step questions in exchange for higher latency and credit cost. When you select one of these models, the Reasoning Effort and Reasoning Summary controls appear in your agent’s model configuration (and the temperature slider is hidden). See Reasoning configuration for details.
Models marked Yes in the Reasoning column of the tables above support these controls. They span several providers:
- OpenAI — o1, o3-mini, GPT-5, GPT-5 Mini, GPT-5 Nano, GPT-5.1, GPT-5.2, GPT-5.2 Chat
- Google — Gemini 3.1 Pro Preview, Gemini 3 Flash Preview
- xAI — Grok 4.1 Fast, Grok 4
- DeepSeek — DeepSeek-V3.2, DeepSeek-R1
- Groq — GPT-OSS 120B, GPT-OSS 20B
Available effort levels vary by model — for example, GPT-5.2 Chat only supports Medium, and GPT-OSS 120B/20B only support Low, Medium, and High. The dashboard only shows values the selected model accepts.
Selecting a model
Go to your agent’s Settings and choose a model from the dropdown. Consider:
- Cost — 1-credit models are most efficient for high-volume agents
- Quality — Premium models (GPT-5, Claude Sonnet/Opus) produce better responses for complex queries
- Speed — Groq models offer the fastest inference; Mini/Nano variants are faster than full models
- Reasoning — Reasoning-capable models deliver better answers on complex questions at the cost of latency and extra credits
- Data residency — Use Azure EU models if you need European data processing (Pro plan)