Reference notes.

Foundation models are large AI models trained on broad data that can be adapted to many downstream tasks. The term encompasses LLMs, vision models, and multimodal systems.

Commercial Models

Anthropic Claude

ModelContextStrengths
Claude Opus 4.71MLatest flagship, strongest coding and long-running agent tasks
Claude Opus 4.61MPrevious flagship, deep reasoning
Claude Sonnet 4.61MBest balance of speed and capability
Claude Haiku 4.5200KFastest, cost-effective
  • Extended thinking for complex reasoning
  • Up to 64K max output tokens
  • Strong instruction following and tool use
  • Constitutional AI training
  • API Documentation

OpenAI GPT

ModelContextStrengths
GPT-5.51M (API) / 400K (Codex)Latest flagship, default in ChatGPT (May 2026)
GPT-5.2400KStrong reasoning, enterprise coding (“Garlic”)
GPT-5.2 Pro400KPremium tier
GPT-5 mini400KBalanced speed and intelligence
GPT-5 nano400KMost cost-efficient
o3 / o3 Pro200KDeep reasoning, STEM
  • 100% on AIME 2025 maths benchmark
  • Strong function calling and tool use
  • API Documentation

Google Gemini

ModelContextStrengths
Gemini 3.1 Pro1MCurrent flagship (Feb 2026), top of leaderboards
Gemini 3 Flash1MFast, 78% on SWE-bench Verified
Gemini 3 Pro1MDeprecated March 2026, replaced by 3.1 Pro
  • Native multimodal (text, images, audio, video)
  • Deep Think capabilities (2.5x reasoning improvement)
  • Strong agentic and coding performance
  • API Documentation

Others

  • Cohere Command — Enterprise focus, RAG-optimised
  • Amazon Nova — AWS Bedrock integration
  • xAI Grok — Strong reasoning, real-time data

Open-Source Models

Meta Llama

ModelParametersContextNotes
Llama 4 Scout109B total / 17B active (16 experts)10MLargest context window available, iRoPE
Llama 4 Maverick400B total / 17B active (128 experts)1MLarger expert pool
Llama 3.3 70B70B dense128KText-only instruct
  • Mixture-of-experts architecture (Llama 4)
  • Native multimodal (text + images)
  • Llama 4 Scout uses Interleaved RoPE (iRoPE) — trained at 256K, extrapolates to 10M
  • Permissive licence (with restrictions)
  • Llama Downloads

Mistral

ModelParametersContextNotes
Mistral Large 3675B total / 41B active (MoE)128KFlagship sparse MoE, trained from scratch (Dec 2025)
Ministral 3 (3B/8B/14B)Dense128KSmall models, vision baked in, base/instruct/reasoning variants
Mistral Small 4119B total / 6B active128KCompact MoE (Mar 2026)
Mistral OCR 3Document processing, improved on handwriting (Jan 2026)
  • European AI company
  • Strong efficiency/performance ratio
  • 40+ languages, vision native across Mistral 3 family
  • Le Chat consumer product
  • Mistral AI

Qwen (Alibaba)

ModelParametersContextNotes
Qwen 3.6-27B27B dense262KLatest dense, strong coding (April 2026)
Qwen 3.6-35B-A3B35B total / 3B active (MoE)262KLatest MoE, runs on consumer GPUs
Qwen3-235B-A22B235B total / 22B active (MoE)128KPrevious flagship MoE
Qwen2.5-VLVarious128KVision-language
QwQ-32B32B128KReasoning model
  • Strong multilingual (100+ languages)
  • Extensive model family (code, maths, embedding, reranking variants)
  • Qwen3-Embedding-8B tops MTEB multilingual leaderboard
  • Apache 2.0 open weights
  • Qwen

DeepSeek

ModelParametersNotes
DeepSeek-V3.2671B MoE (37B active)Latest flagship (Dec 2025), rivals Gemini 3.x on reasoning
DeepSeek R1671B MoE (37B active)Strong reasoning, distilled variants available
  • Competitive with frontier models at a fraction of training cost
  • Cost-efficient MoE architecture with multi-head latent attention
  • DeepSeek-V3.2-Speciale variant matches frontier closed models on AIME/HMMT benchmarks
  • Open weights under MIT licence — free for commercial use
  • Extremely cost-effective API (0.42 per MTok)
  • DeepSeek

Others

  • Phi-4 (Microsoft) — Small but capable, strong reasoning
  • Gemma 3 (Google) — Open weights, vision support, research-friendly
  • OLMo 2 (AI2) — Fully open including training data
  • Grok (xAI) — Available via API
  • GPT-OSS 20B / 120B (OpenAI, Aug 2025) — OpenAI’s first open-weight release since GPT-2, Apache 2.0. Reasoning-tuned via the o-series recipe; gpt-oss-120b matches or beats o4-mini on competition coding, maths, and tool use while being deployable on-device.

Model Comparison Factors

Capability Benchmarks

See Evaluation & Benchmarking for details.

  • MMLU — Broad knowledge
  • HumanEval — Coding
  • GSM8K — Maths reasoning
  • GPQA — Graduate-level science

Practical Considerations

FactorConsiderations
LatencyTime to first token, tokens/second
CostPer-token pricing, volume discounts
ContextHow much text can be processed
ReliabilityUptime, consistency
PrivacyData handling, compliance
EcosystemSDKs, documentation, support

Licence Types

  • Proprietary API — No access to weights (GPT-5.5, Claude)
  • Gated open — Weights available with restrictions (Llama 4)
  • Permissive open — Few restrictions (Mistral 3, Qwen 3.6, DeepSeek)
  • Fully open — Weights, code, and training data (OLMo)

API Providers

Model Providers

Direct from the source:

Aggregators / Routers

Access multiple models through one API:

Cloud Platforms

See Model Serving for self-hosted inference and deployment options.

Choosing a Model

Decision Framework

  1. Task requirements — What capability is most important?
  2. Latency needs — Real-time vs batch processing
  3. Cost constraints — Budget per million tokens
  4. Privacy requirements — Can data leave your environment?
  5. Context needs — How much text per request?
  6. Compliance — Regulatory requirements

Rules of Thumb

  • Start with a capable model (Claude Sonnet 4.6, GPT-5.5, Gemini 3.1 Pro)
  • Optimise for cost/speed once it works (Haiku/mini/nano/Flash variants)
  • Open models for privacy-sensitive use cases (Llama 4, Qwen 3.6, DeepSeek-V3.2)
  • LLM API prices dropped ~80% from 2025 to 2026 — re-evaluate cost assumptions
  • Smaller models for high-volume, simple tasks

Staying Current

The landscape changes rapidly. Track developments:

Resources