Reference notes.

Foundation models are large AI models trained on broad data that can be adapted to many downstream tasks. The term encompasses LLMs, vision models, and multimodal systems.

Commercial Models

Anthropic Claude

ModelContextStrengths
Claude Opus 4.61MPremium intelligence, deep reasoning
Claude Sonnet 4.61MBest balance of speed and capability
Claude Haiku 4.5200KFastest, cost-effective
  • Extended thinking for complex reasoning
  • Up to 64K max output tokens
  • Strong instruction following and tool use
  • Constitutional AI training
  • API Documentation

OpenAI GPT

ModelContextStrengths
GPT-5.2400KFlagship, strong reasoning
GPT-5.2 Pro400KPremium tier
GPT-5 mini400KBalanced speed and intelligence
GPT-5 nano400KMost cost-efficient
o3 / o3 Pro200KDeep reasoning, STEM
  • 100% on AIME 2025 maths benchmark
  • Strong function calling and tool use
  • API Documentation

Google Gemini

ModelContextStrengths
Gemini 3 Pro1MTop-ranked, Deep Think reasoning
Gemini 3 Flash1MFast, 78% on SWE-bench Verified
Gemini 3.1 Pro1MLatest preview, top of leaderboards
  • Native multimodal (text, images, audio, video)
  • Deep Think capabilities (2.5x reasoning improvement)
  • Strong agentic and coding performance
  • API Documentation

Others

  • Cohere Command — Enterprise focus, RAG-optimised
  • Amazon Nova — AWS Bedrock integration
  • xAI Grok — Strong reasoning, real-time data

Open-Source Models

Meta Llama

ModelParametersContextNotes
Llama 4 Scout17B (16 experts)10MLargest context window available
Llama 4 Maverick17B (128 experts)1MLarger expert pool
Llama 3.3 70B70B128KText-only instruct
  • Mixture-of-experts architecture
  • Native multimodal (text + images)
  • Llama 4 Scout has a massive 10M token context window
  • Permissive licence (with restrictions)
  • Llama Downloads

Mistral

ModelParametersContextNotes
Mistral Large~100B128KCommercial flagship
Mixtral 8x22BMoE64KMixture of experts
Mistral OCR 3Document processing
  • European AI company
  • Strong efficiency/performance ratio
  • Le Chat consumer product
  • Mistral AI

Qwen (Alibaba)

ModelParametersContextNotes
Qwen30.6B–235B128KLatest generation, strong all-round
Qwen2.5-VLVarious128KVision-language
QwQ-32B32B128KReasoning model
  • Strong multilingual (100+ languages)
  • Extensive model family (code, maths, embedding, reranking variants)
  • Qwen3-Embedding models (0.6B–8B) top MTEB multilingual leaderboard
  • Fully open weights with permissive licence
  • Qwen

DeepSeek

ModelParametersNotes
DeepSeek-V3.2671B MoE (37B active)Latest flagship, rivals Gemini 3 Pro on reasoning
DeepSeek R1671B MoE (37B active)Strong reasoning, distilled variants available
  • Competitive with frontier models at a fraction of training cost
  • Cost-efficient MoE architecture with multi-head latent attention
  • DeepSeek-V3.2-Speciale variant matches frontier closed models on AIME/HMMT benchmarks
  • Open weights under MIT licence — free for commercial use
  • Extremely cost-effective API (0.42 per MTok)
  • DeepSeek

Others

  • Phi-4 (Microsoft) — Small but capable, strong reasoning
  • Gemma 3 (Google) — Open weights, vision support, research-friendly
  • OLMo 2 (AI2) — Fully open including training data
  • Grok (xAI) — Available via API

Model Comparison Factors

Capability Benchmarks

See Evaluation & Benchmarking for details.

  • MMLU — Broad knowledge
  • HumanEval — Coding
  • GSM8K — Maths reasoning
  • GPQA — Graduate-level science

Practical Considerations

FactorConsiderations
LatencyTime to first token, tokens/second
CostPer-token pricing, volume discounts
ContextHow much text can be processed
ReliabilityUptime, consistency
PrivacyData handling, compliance
EcosystemSDKs, documentation, support

Licence Types

  • Proprietary API — No access to weights (GPT-5.2, Claude)
  • Gated open — Weights available with restrictions (Llama 4)
  • Permissive open — Few restrictions (Mistral, Qwen3, DeepSeek)
  • Fully open — Weights, code, and training data (OLMo)

API Providers

Model Providers

Direct from the source:

Aggregators / Routers

Access multiple models through one API:

Cloud Platforms

See Model Serving for self-hosted inference and deployment options.

Choosing a Model

Decision Framework

  1. Task requirements — What capability is most important?
  2. Latency needs — Real-time vs batch processing
  3. Cost constraints — Budget per million tokens
  4. Privacy requirements — Can data leave your environment?
  5. Context needs — How much text per request?
  6. Compliance — Regulatory requirements

Rules of Thumb

  • Start with a capable model (Claude Sonnet 4.6, GPT-5.2, Gemini 3 Pro)
  • Optimise for cost/speed once it works (Haiku/mini/nano/Flash variants)
  • Open models for privacy-sensitive use cases (Llama 4, Qwen3, DeepSeek-V3.2)
  • LLM API prices dropped ~80% from 2025 to 2026 — re-evaluate cost assumptions
  • Smaller models for high-volume, simple tasks

Staying Current

The landscape changes rapidly. Track developments:

Resources