Tiny Talk gives you access to models from multiple model providers (OpenAI, Anthropic, Google, Meta, xAI, DeepSeek, Mistral, Cohere, and Qwen) served through different inference providers (OpenAI, OpenRouter, Groq, and Azure). Each model has different strengths, speeds, and credit costs.
How credits work
On plans that include AI credits (Basic AI, Standard AI, Pro AI), each agent response consumes credits based on the model used. Credits reset monthly.
| Plan | Monthly Credits |
|---|
| Basic AI | 2,000 |
| Standard AI | 10,000 |
| Pro AI | 40,000 |
Most models cost 1 credit per response. Mid-tier models (GPT-4o, Claude Sonnet, Gemini Pro) cost 2 credits. Higher-tier models (GPT-4, o1) cost 5 credits, and top-tier models (Claude Opus 4, Claude 3 Opus) cost 10 credits.
Bring Your Own Key (BYOK)
On legacy plans (Basic, Team, Enterprise) and some other plans, you can use your own API key instead of credits. Go to Integrations → Hub and enter your inference provider API key. The agent will use your key for all API calls, with no credit consumption on Tiny Talk’s side.
After entering a key, click Verify to test it. Tiny Talk will check the key and display the models available for it. Make sure the model you want to use for your agent appears in this list.
An OpenAI key is always required on BYOK plans — Tiny Talk uses OpenAI’s embedding model for knowledge base indexing and search, regardless of which model you choose for agent responses.
Available models
OpenAI
| Model | Credit Cost | Features |
|---|
| GPT-3.5 Turbo | 1 | Fast, affordable general-purpose model |
| GPT-4 | 5 | High-intelligence model for complex tasks |
| GPT-4 Turbo | 5 | Faster, cheaper GPT-4 with vision support |
| GPT-4o | 2 | Versatile flagship model with vision |
| GPT-4o Mini | 1 | Fast, affordable with vision support |
| GPT-4.1 | 2 | Flagship for complex tasks |
| GPT-4.1 Mini | 1 | Balanced intelligence, speed, and cost |
| GPT-4.1 Nano | 1 | Fastest, most cost-effective GPT-4.1 |
| GPT-5 | 1 | Flagship for coding, reasoning, and agentic tasks |
| GPT-5 Mini | 1 | Faster, cost-efficient version of GPT-5 |
| GPT-5 Nano | 1 | Fastest, cheapest GPT-5 variant |
| GPT-5.1 | 2 | Advanced reasoning with vision |
| GPT-5.2 | 2 | Advanced reasoning with vision |
| GPT-5.2 Chat | 1 | Cost-efficient reasoning with vision |
| o1 | 5 | Reasoning model with chain-of-thought |
| o1 Mini | 1 | Fast reasoning model |
| o3-mini | 1 | Fast reasoning model |
Anthropic (via OpenRouter)
| Model | Credit Cost | Features |
|---|
| Claude Opus 4 | 10 | Top coding model with sustained performance |
| Claude Sonnet 4.6 | 2 | Latest Sonnet with enhanced coding and reasoning |
| Claude Sonnet 4 | 2 | Enhanced coding and reasoning with precision |
| Claude 3.7 Sonnet | 2 | Improved reasoning and problem-solving |
| Claude 3.5 Sonnet | 2 | Great at coding, data science, visual processing |
| Claude 3 Opus | 10 | Powerful model for highly complex tasks |
| Claude 3 Haiku | 1 | Fastest Anthropic model, near-instant responses |
Google (via OpenRouter)
| Model | Credit Cost | Features |
|---|
| Gemini 3.1 Pro Preview | 2 | Latest flagship with reasoning, 1M token context |
| Gemini 3 Pro Preview | 2 | Flagship frontier model, 1M token context |
| Gemini 3 Flash Preview | 1 | Optimized for speed, 1M token context |
| Gemini 2.5 Pro | 2 | Advanced reasoning, coding, and math |
| Gemini 2.5 Flash | 1 | Fast workhorse for reasoning tasks |
| Gemini 2.0 Flash | 1 | Fast multimodal understanding |
| Gemini 1.5 Pro | 1 | Mid-size model for wide-range reasoning |
| Gemini Flash 1.5 | 1 | Good at classification and summarization |
| Gemma 3 27B | 1 | Multilingual, 140+ languages |
| Model | Credit Cost | Features |
|---|
| Llama 4 Maverick | 1 | Multimodal, 12 languages |
| Llama 4 Scout | 1 | MoE model for multilingual tasks |
| Llama 3.3 70B Instruct | 1 | Multilingual dialogue, 8 languages |
| Llama 3.1 8B Instruct | 1 | Lightweight, low-latency |
xAI (via OpenRouter)
| Model | Credit Cost | Features |
|---|
| Grok 4.1 Fast | 1 | Fast reasoning model, 256k context |
| Grok 4 | 2 | Latest reasoning model, 256k context |
| Grok 3 Beta | 2 | Enterprise use cases, deep domain knowledge |
| Grok 3 Mini Beta | 1 | Lightweight thinking model |
Other model providers (via OpenRouter)
| Model | Credit Cost | Model Provider |
|---|
| DeepSeek-V3.2 | 1 | DeepSeek |
| DeepSeek-R1 | 1 | DeepSeek |
| DeepSeek V3 0324 | 1 | DeepSeek |
| Mistral Nemo | 1 | MistralAI |
| Mistral Small 3 | 1 | MistralAI |
| Command R | 1 | Cohere |
| Qwen3 235B A22B | 1 | Qwen |
| Qwen 2.5 | 1 | Qwen |
Groq (inference provider)
| Model | Credit Cost | Features |
|---|
| Llama 3.3 70B | 1 | Ultra-fast inference on Groq hardware |
| Llama 3.1 8B Instant | 1 | Instant responses, low latency |
| Llama 4 Maverick | 1 | ~600 tokens/second on Groq |
| GPT-OSS 120B | 1 | Large reasoning model |
| GPT-OSS 20B | 1 | Smaller reasoning model |
Azure OpenAI (inference provider, EU data residency)
Available on Pro plan only. These models run on Azure’s European (Germany) infrastructure for data residency compliance.
| Model | Credit Cost | Features |
|---|
| GPT-4o (Azure EU/DE) | 2 | European data residency |
| GPT-4o Mini (Azure EU/DE) | 1 | European data residency |
Need a different model on Azure EU? Pro plan users can contact support to request provisioning of additional models.
Selecting a model
Go to your agent’s Settings and choose a model from the dropdown. Consider:
- Cost — 1-credit models are most efficient for high-volume agents
- Quality — Premium models (GPT-5, Claude Sonnet/Opus) produce better responses for complex queries
- Speed — Groq models offer the fastest inference; Mini/Nano variants are faster than full models
- Data residency — Use Azure EU models if you need European data processing (Pro plan)