Skip to main content

Documentation Index

Fetch the complete documentation index at: https://tinytalk.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Tiny Talk gives you access to models from multiple model providers (OpenAI, Anthropic, Google, Meta, xAI, DeepSeek, Mistral, Cohere, and Qwen) served through different inference providers (OpenAI, OpenRouter, Groq, and Azure). Each model has different strengths, speeds, and credit costs.

How credits work

On plans that include AI credits (Basic AI, Standard AI, Pro AI), each agent response consumes credits based on the model used. Credits reset monthly.
PlanMonthly Credits
Basic AI2,000
Standard AI10,000
Pro AI40,000
Most models cost 1 credit per response. Mid-tier models (GPT-4o, Claude Sonnet, Gemini Pro) cost 2 credits. Higher-tier models (GPT-4, o1) cost 5 credits, and top-tier models (Claude Opus 4) cost 10 credits.

Bring Your Own Key (BYOK)

On legacy plans (Basic, Team, Enterprise) and some other plans, you can use your own API key instead of credits. Go to Integrations → Hub and enter your inference provider API key. The agent will use your key for all API calls, with no credit consumption on Tiny Talk’s side. After entering a key, click Verify to test it. Tiny Talk will check the key and display the models available for it. Make sure the model you want to use for your agent appears in this list.
An OpenAI key is always required on BYOK plans — Tiny Talk uses OpenAI’s embedding model for knowledge base indexing and search, regardless of which model you choose for agent responses.
Using your own OpenAI key? See OpenAI’s pricing page for per-token costs of each model.

Available models

OpenAI

ModelCredit CostReasoningFeatures
GPT-3.5 Turbo1Fast, affordable general-purpose model
GPT-45High-intelligence model for complex tasks
GPT-4 Turbo5Faster, cheaper GPT-4 with vision support
GPT-4o2Versatile flagship model with vision
GPT-4o Mini1Fast, affordable with vision support
GPT-4.12Flagship for complex tasks
GPT-4.1 Mini1Balanced intelligence, speed, and cost
GPT-4.1 Nano1Fastest, most cost-effective GPT-4.1
GPT-51YesFlagship for coding, reasoning, and agentic tasks
GPT-5 Mini1YesFaster, cost-efficient version of GPT-5
GPT-5 Nano1YesFastest, cheapest GPT-5 variant
GPT-5.12YesAdvanced reasoning with vision
GPT-5.22YesAdvanced reasoning with vision
GPT-5.2 Chat1Yes (medium only)Cost-efficient reasoning with vision
o15YesReasoning model with chain-of-thought
o3-mini1YesFast reasoning model

Anthropic (via OpenRouter)

ModelCredit CostReasoningFeatures
Claude Opus 410Top coding model with sustained performance
Claude Sonnet 4.62Latest Sonnet with enhanced coding and reasoning
Claude Sonnet 42Enhanced coding and reasoning with precision
Claude 3.7 Sonnet2Improved reasoning and problem-solving
Claude 3 Haiku1Fastest Anthropic model, near-instant responses

Google (via OpenRouter)

ModelCredit CostReasoningFeatures
Gemini 3.1 Pro Preview2YesLatest flagship with reasoning, 1M token context
Gemini 3 Flash Preview1YesOptimized for speed, 1M token context
Gemini 2.5 Pro2Advanced reasoning, coding, and math
Gemini 2.5 Flash1Fast workhorse for reasoning tasks
Gemma 3 27B1Multilingual, 140+ languages

Meta (via OpenRouter)

ModelCredit CostReasoningFeatures
Llama 4 Maverick1Multimodal, 12 languages
Llama 4 Scout1MoE model for multilingual tasks
Llama 3.3 70B Instruct1Multilingual dialogue, 8 languages
Llama 3.1 8B Instruct1Lightweight, low-latency

xAI (via OpenRouter)

ModelCredit CostReasoningFeatures
Grok 4.1 Fast1YesFast reasoning model, 256k context
Grok 42YesLatest reasoning model, 256k context
Grok 3 Beta2Enterprise use cases, deep domain knowledge
Grok 3 Mini Beta1Lightweight thinking model

Other model providers (via OpenRouter)

ModelCredit CostReasoningModel Provider
DeepSeek-V3.21YesDeepSeek
DeepSeek-R11YesDeepSeek
DeepSeek V3 03241DeepSeek
Mistral Nemo1MistralAI
Mistral Small 31MistralAI
Command R1Cohere
Qwen3 235B A22B1Qwen
Qwen 2.51Qwen

Groq (inference provider)

ModelCredit CostReasoningFeatures
Llama 3.3 70B1Ultra-fast inference on Groq hardware
Llama 3.1 8B Instant1Instant responses, low latency
GPT-OSS 120B1Yes (low/medium/high)Large reasoning model
GPT-OSS 20B1Yes (low/medium/high)Smaller reasoning model

Azure OpenAI (inference provider, EU data residency)

Available on Pro plan only. These models run on Azure’s European (Germany) infrastructure for data residency compliance.
ModelCredit CostReasoningFeatures
GPT-4o (Azure EU/DE)2European data residency
GPT-4o Mini (Azure EU/DE)1European data residency
Need a different model on Azure EU? Pro plan users can contact support to request provisioning of additional models.

Reasoning-capable models

Reasoning-capable models spend additional compute “thinking” before they answer, which improves quality on complex, multi-step questions in exchange for higher latency and credit cost. When you select one of these models, the Reasoning Effort and Reasoning Summary controls appear in your agent’s model configuration (and the temperature slider is hidden). See Reasoning configuration for details. Models marked Yes in the Reasoning column of the tables above support these controls. They span several providers:
  • OpenAI — o1, o3-mini, GPT-5, GPT-5 Mini, GPT-5 Nano, GPT-5.1, GPT-5.2, GPT-5.2 Chat
  • Google — Gemini 3.1 Pro Preview, Gemini 3 Flash Preview
  • xAI — Grok 4.1 Fast, Grok 4
  • DeepSeek — DeepSeek-V3.2, DeepSeek-R1
  • Groq — GPT-OSS 120B, GPT-OSS 20B
Available effort levels vary by model — for example, GPT-5.2 Chat only supports Medium, and GPT-OSS 120B/20B only support Low, Medium, and High. The dashboard only shows values the selected model accepts.

Selecting a model

Go to your agent’s Settings and choose a model from the dropdown. Consider:
  • Cost — 1-credit models are most efficient for high-volume agents
  • Quality — Premium models (GPT-5, Claude Sonnet/Opus) produce better responses for complex queries
  • Speed — Groq models offer the fastest inference; Mini/Nano variants are faster than full models
  • ReasoningReasoning-capable models deliver better answers on complex questions at the cost of latency and extra credits
  • Data residency — Use Azure EU models if you need European data processing (Pro plan)