Reference notes.
Foundation models are large AI models trained on broad data that can be adapted to many downstream tasks. The term encompasses LLMs , vision models, and multimodal systems.
Commercial Models
Anthropic Claude
Model Context Strengths Claude Opus 4.7 1M Latest flagship, strongest coding and long-running agent tasks Claude Opus 4.6 1M Previous flagship, deep reasoning Claude Sonnet 4.6 1M Best balance of speed and capability Claude Haiku 4.5 200K Fastest, cost-effective
Extended thinking for complex reasoning
Up to 64K max output tokens
Strong instruction following and tool use
Constitutional AI training
API Documentation
OpenAI GPT
Model Context Strengths GPT-5.5 1M (API) / 400K (Codex) Latest flagship, default in ChatGPT (May 2026) GPT-5.2 400K Strong reasoning, enterprise coding (“Garlic”) GPT-5.2 Pro 400K Premium tier GPT-5 mini 400K Balanced speed and intelligence GPT-5 nano 400K Most cost-efficient o3 / o3 Pro 200K Deep reasoning, STEM
100% on AIME 2025 maths benchmark
Strong function calling and tool use
API Documentation
Google Gemini
Model Context Strengths Gemini 3.1 Pro 1M Current flagship (Feb 2026), top of leaderboards Gemini 3 Flash 1M Fast, 78% on SWE-bench Verified Gemini 3 Pro 1M Deprecated March 2026, replaced by 3.1 Pro
Native multimodal (text, images, audio, video)
Deep Think capabilities (2.5x reasoning improvement)
Strong agentic and coding performance
API Documentation
Others
Cohere Command — Enterprise focus, RAG-optimised
Amazon Nova — AWS Bedrock integration
xAI Grok — Strong reasoning, real-time data
Open-Source Models
Model Parameters Context Notes Llama 4 Scout 109B total / 17B active (16 experts) 10M Largest context window available, iRoPE Llama 4 Maverick 400B total / 17B active (128 experts) 1M Larger expert pool Llama 3.3 70B 70B dense 128K Text-only instruct
Mixture-of-experts architecture (Llama 4)
Native multimodal (text + images)
Llama 4 Scout uses Interleaved RoPE (iRoPE) — trained at 256K, extrapolates to 10M
Permissive licence (with restrictions)
Llama Downloads
Mistral
Model Parameters Context Notes Mistral Large 3 675B total / 41B active (MoE) 128K Flagship sparse MoE, trained from scratch (Dec 2025) Ministral 3 (3B/8B/14B) Dense 128K Small models, vision baked in, base/instruct/reasoning variants Mistral Small 4 119B total / 6B active 128K Compact MoE (Mar 2026) Mistral OCR 3 — — Document processing, improved on handwriting (Jan 2026)
European AI company
Strong efficiency/performance ratio
40+ languages, vision native across Mistral 3 family
Le Chat consumer product
Mistral AI
Qwen (Alibaba)
Model Parameters Context Notes Qwen 3.6-27B 27B dense 262K Latest dense, strong coding (April 2026) Qwen 3.6-35B-A3B 35B total / 3B active (MoE) 262K Latest MoE, runs on consumer GPUs Qwen3-235B-A22B 235B total / 22B active (MoE) 128K Previous flagship MoE Qwen2.5-VL Various 128K Vision-language QwQ-32B 32B 128K Reasoning model
Strong multilingual (100+ languages)
Extensive model family (code, maths, embedding, reranking variants)
Qwen3-Embedding-8B tops MTEB multilingual leaderboard
Apache 2.0 open weights
Qwen
DeepSeek
Model Parameters Notes DeepSeek-V3.2 671B MoE (37B active) Latest flagship (Dec 2025), rivals Gemini 3.x on reasoning DeepSeek R1 671B MoE (37B active) Strong reasoning, distilled variants available
Competitive with frontier models at a fraction of training cost
Cost-efficient MoE architecture with multi-head latent attention
DeepSeek-V3.2-Speciale variant matches frontier closed models on AIME/HMMT benchmarks
Open weights under MIT licence — free for commercial use
Extremely cost-effective API (0.28/ 0.42 per MTok)
DeepSeek
Others
Phi-4 (Microsoft) — Small but capable, strong reasoning
Gemma 3 (Google) — Open weights, vision support, research-friendly
OLMo 2 (AI2) — Fully open including training data
Grok (xAI) — Available via API
GPT-OSS 20B / 120B (OpenAI, Aug 2025) — OpenAI’s first open-weight release since GPT-2, Apache 2.0. Reasoning-tuned via the o-series recipe; gpt-oss-120b matches or beats o4-mini on competition coding, maths, and tool use while being deployable on-device.
Model Comparison Factors
Capability Benchmarks
See Evaluation & Benchmarking for details.
MMLU — Broad knowledge
HumanEval — Coding
GSM8K — Maths reasoning
GPQA — Graduate-level science
Practical Considerations
Factor Considerations Latency Time to first token, tokens/second Cost Per-token pricing, volume discounts Context How much text can be processed Reliability Uptime, consistency Privacy Data handling, compliance Ecosystem SDKs, documentation, support
Licence Types
Proprietary API — No access to weights (GPT-5.5, Claude)
Gated open — Weights available with restrictions (Llama 4)
Permissive open — Few restrictions (Mistral 3, Qwen 3.6, DeepSeek)
Fully open — Weights, code, and training data (OLMo)
API Providers
Model Providers
Direct from the source:
Aggregators / Routers
Access multiple models through one API:
See Model Serving for self-hosted inference and deployment options.
Choosing a Model
Decision Framework
Task requirements — What capability is most important?
Latency needs — Real-time vs batch processing
Cost constraints — Budget per million tokens
Privacy requirements — Can data leave your environment?
Context needs — How much text per request?
Compliance — Regulatory requirements
Rules of Thumb
Start with a capable model (Claude Sonnet 4.6, GPT-5.5, Gemini 3.1 Pro)
Optimise for cost/speed once it works (Haiku/mini/nano/Flash variants)
Open models for privacy-sensitive use cases (Llama 4, Qwen 3.6, DeepSeek-V3.2)
LLM API prices dropped ~80% from 2025 to 2026 — re-evaluate cost assumptions
Smaller models for high-volume, simple tasks
Staying Current
The landscape changes rapidly. Track developments:
Resources