Foundation models are large AI models trained on broad data that can be adapted to many downstream tasks. The term encompasses LLMs, vision models, and multimodal systems.

Commercial Models

Anthropic Claude

ModelContextStrengths
Claude 4.5 Sonnet200K (1M beta)Best for coding, agentic tasks
Claude 4.5 Opus200KPremium intelligence, deep reasoning
Claude 4.5 Haiku200KFastest, near-frontier intelligence
  • Extended thinking for complex reasoning
  • 64K max output tokens
  • Strong instruction following and tool use
  • Constitutional AI training
  • API Documentation

OpenAI GPT

ModelContextStrengths
GPT-5.2128K+Best for coding, agentic tasks
GPT-5 mini128KFaster, cost-efficient
GPT-5 nano128KFastest, most cost-efficient
GPT-4.1128KSmartest non-reasoning model
o3 / o4-mini128KDeep reasoning, STEM
  • Open-weight models available (gpt-oss-120b, gpt-oss-20b)
  • Sora 2 for video generation
  • Strong function calling and tool use
  • API Documentation

Google Gemini

ModelContextStrengths
Gemini 3 Pro1M+Most intelligent, complex tasks
Gemini 3 Flash1MFrontier intelligence at speed
Gemini 2.5 Flash-Lite1MHigh volume, cost efficient
  • State-of-the-art reasoning and multimodal
  • Extended thinking capabilities
  • Strong agentic and coding performance
  • API Documentation

Others

  • Cohere Command — Enterprise focus, RAG-optimised
  • Amazon Nova — AWS Bedrock integration
  • xAI Grok — Strong reasoning, real-time data

Open-Source Models

Meta Llama

ModelParametersContextNotes
Llama 4 Scout17B (16 experts)128KMultimodal MoE
Llama 4 Maverick17B (128 experts)128KLarger expert pool
Llama 3.3 70B70B128KText-only instruct
  • Mixture-of-experts architecture
  • Native multimodal (text + images)
  • Permissive license (with restrictions)
  • Llama Downloads

Mistral

ModelParametersContextNotes
Mistral Large~100B128KCommercial flagship
Mixtral 8x22BMoE64KMixture of experts
Mistral OCR 3Document processing
  • European AI company
  • Strong efficiency/performance ratio
  • Le Chat consumer product
  • Mistral AI

Qwen (Alibaba)

ModelParametersContextNotes
Qwen3Various128KStrong all-round
Qwen3-VLVarious128KVision-language
Qwen3-TTSVariousText-to-speech
  • Strong multilingual (especially Chinese)
  • Extensive model family (400+ variants)
  • Embedding, reranking, and omni models
  • Qwen

DeepSeek

ModelParametersNotes
DeepSeek-V3.2685B MoELatest flagship
DeepSeek R1VariousStrong reasoning
DeepSeek CoderVariousCode-specialised
  • Competitive with frontier models
  • Cost-efficient training and inference
  • Open weights available
  • DeepSeek

Others

  • Yi (01.AI) — Strong multilingual
  • Phi (Microsoft) — Small but capable
  • Gemma 3 (Google) — Open weights, research-friendly
  • OLMo (AI2) — Fully open including training data
  • Grok (xAI) — Available via API

Model Comparison Factors

Capability Benchmarks

See Evaluation & Benchmarking for details.

  • MMLU — Broad knowledge
  • HumanEval — Coding
  • GSM8K — Math reasoning
  • GPQA — Graduate-level science

Practical Considerations

FactorConsiderations
LatencyTime to first token, tokens/second
CostPer-token pricing, volume discounts
ContextHow much text can be processed
ReliabilityUptime, consistency
PrivacyData handling, compliance
EcosystemSDKs, documentation, support

License Types

  • Proprietary API — No access to weights (GPT-5, Claude)
  • Gated open — Weights available with restrictions (Llama 4)
  • Permissive open — Few restrictions (Mistral, Qwen, DeepSeek)
  • Fully open — Weights, code, and training data (OLMo)

API Providers

Model Providers

Direct from the source:

Aggregators / Routers

Access multiple models through one API:

Cloud Platforms

Choosing a Model

Decision Framework

  1. Task requirements — What capability is most important?
  2. Latency needs — Real-time vs batch processing
  3. Cost constraints — Budget per million tokens
  4. Privacy requirements — Can data leave your environment?
  5. Context needs — How much text per request?
  6. Compliance — Regulatory requirements

Rules of Thumb

  • Start with a capable model (Claude 4.5 Sonnet, GPT-5.2, Gemini 3 Pro)
  • Optimise for cost/speed once it works (mini/nano/Flash variants)
  • Open models for privacy-sensitive use cases (Llama 4, Qwen3, DeepSeek)
  • Smaller models for high-volume, simple tasks

Staying Current

The landscape changes rapidly. Track developments:

Resources