Reference notes.
Foundation models are large AI models trained on broad data that can be adapted to many downstream tasks. The term encompasses LLMs , vision models, and multimodal systems.
Commercial Models
Anthropic Claude
Model Context Strengths Claude Opus 4.6 1M Premium intelligence, deep reasoning Claude Sonnet 4.6 1M Best balance of speed and capability Claude Haiku 4.5 200K Fastest, cost-effective
Extended thinking for complex reasoning
Up to 64K max output tokens
Strong instruction following and tool use
Constitutional AI training
API Documentation
OpenAI GPT
Model Context Strengths GPT-5.2 400K Flagship, strong reasoning GPT-5.2 Pro 400K Premium tier GPT-5 mini 400K Balanced speed and intelligence GPT-5 nano 400K Most cost-efficient o3 / o3 Pro 200K Deep reasoning, STEM
100% on AIME 2025 maths benchmark
Strong function calling and tool use
API Documentation
Google Gemini
Model Context Strengths Gemini 3 Pro 1M Top-ranked, Deep Think reasoning Gemini 3 Flash 1M Fast, 78% on SWE-bench Verified Gemini 3.1 Pro 1M Latest preview, top of leaderboards
Native multimodal (text, images, audio, video)
Deep Think capabilities (2.5x reasoning improvement)
Strong agentic and coding performance
API Documentation
Others
Cohere Command — Enterprise focus, RAG-optimised
Amazon Nova — AWS Bedrock integration
xAI Grok — Strong reasoning, real-time data
Open-Source Models
Model Parameters Context Notes Llama 4 Scout 17B (16 experts) 10M Largest context window available Llama 4 Maverick 17B (128 experts) 1M Larger expert pool Llama 3.3 70B 70B 128K Text-only instruct
Mixture-of-experts architecture
Native multimodal (text + images)
Llama 4 Scout has a massive 10M token context window
Permissive licence (with restrictions)
Llama Downloads
Mistral
Model Parameters Context Notes Mistral Large ~100B 128K Commercial flagship Mixtral 8x22B MoE 64K Mixture of experts Mistral OCR 3 — — Document processing
European AI company
Strong efficiency/performance ratio
Le Chat consumer product
Mistral AI
Qwen (Alibaba)
Model Parameters Context Notes Qwen3 0.6B–235B 128K Latest generation, strong all-round Qwen2.5-VL Various 128K Vision-language QwQ-32B 32B 128K Reasoning model
Strong multilingual (100+ languages)
Extensive model family (code, maths, embedding, reranking variants)
Qwen3-Embedding models (0.6B–8B) top MTEB multilingual leaderboard
Fully open weights with permissive licence
Qwen
DeepSeek
Model Parameters Notes DeepSeek-V3.2 671B MoE (37B active) Latest flagship, rivals Gemini 3 Pro on reasoning DeepSeek R1 671B MoE (37B active) Strong reasoning, distilled variants available
Competitive with frontier models at a fraction of training cost
Cost-efficient MoE architecture with multi-head latent attention
DeepSeek-V3.2-Speciale variant matches frontier closed models on AIME/HMMT benchmarks
Open weights under MIT licence — free for commercial use
Extremely cost-effective API (0.28/ 0.42 per MTok)
DeepSeek
Others
Phi-4 (Microsoft) — Small but capable, strong reasoning
Gemma 3 (Google) — Open weights, vision support, research-friendly
OLMo 2 (AI2) — Fully open including training data
Grok (xAI) — Available via API
Model Comparison Factors
Capability Benchmarks
See Evaluation & Benchmarking for details.
MMLU — Broad knowledge
HumanEval — Coding
GSM8K — Maths reasoning
GPQA — Graduate-level science
Practical Considerations
Factor Considerations Latency Time to first token, tokens/second Cost Per-token pricing, volume discounts Context How much text can be processed Reliability Uptime, consistency Privacy Data handling, compliance Ecosystem SDKs, documentation, support
Licence Types
Proprietary API — No access to weights (GPT-5.2, Claude)
Gated open — Weights available with restrictions (Llama 4)
Permissive open — Few restrictions (Mistral, Qwen3, DeepSeek)
Fully open — Weights, code, and training data (OLMo)
API Providers
Model Providers
Direct from the source:
Aggregators / Routers
Access multiple models through one API:
See Model Serving for self-hosted inference and deployment options.
Choosing a Model
Decision Framework
Task requirements — What capability is most important?
Latency needs — Real-time vs batch processing
Cost constraints — Budget per million tokens
Privacy requirements — Can data leave your environment?
Context needs — How much text per request?
Compliance — Regulatory requirements
Rules of Thumb
Start with a capable model (Claude Sonnet 4.6, GPT-5.2, Gemini 3 Pro)
Optimise for cost/speed once it works (Haiku/mini/nano/Flash variants)
Open models for privacy-sensitive use cases (Llama 4, Qwen3, DeepSeek-V3.2)
LLM API prices dropped ~80% from 2025 to 2026 — re-evaluate cost assumptions
Smaller models for high-volume, simple tasks
Staying Current
The landscape changes rapidly. Track developments:
Resources