Foundation Models

Foundation models are large AI models trained on broad data that can be adapted to many downstream tasks. The term encompasses LLMs, vision models, and multimodal systems.

Commercial Models

Anthropic Claude

Model	Context	Strengths
Claude 4.5 Sonnet	200K (1M beta)	Best for coding, agentic tasks
Claude 4.5 Opus	200K	Premium intelligence, deep reasoning
Claude 4.5 Haiku	200K	Fastest, near-frontier intelligence

Extended thinking for complex reasoning
64K max output tokens
Strong instruction following and tool use
Constitutional AI training
API Documentation

OpenAI GPT

Model	Context	Strengths
GPT-5.2	128K+	Best for coding, agentic tasks
GPT-5 mini	128K	Faster, cost-efficient
GPT-5 nano	128K	Fastest, most cost-efficient
GPT-4.1	128K	Smartest non-reasoning model
o3 / o4-mini	128K	Deep reasoning, STEM

Open-weight models available (gpt-oss-120b, gpt-oss-20b)
Sora 2 for video generation
Strong function calling and tool use
API Documentation

Google Gemini

Model	Context	Strengths
Gemini 3 Pro	1M+	Most intelligent, complex tasks
Gemini 3 Flash	1M	Frontier intelligence at speed
Gemini 2.5 Flash-Lite	1M	High volume, cost efficient

State-of-the-art reasoning and multimodal
Extended thinking capabilities
Strong agentic and coding performance
API Documentation

Others

Cohere Command — Enterprise focus, RAG-optimised
Amazon Nova — AWS Bedrock integration
xAI Grok — Strong reasoning, real-time data

Open-Source Models

Meta Llama

Model	Parameters	Context	Notes
Llama 4 Scout	17B (16 experts)	128K	Multimodal MoE
Llama 4 Maverick	17B (128 experts)	128K	Larger expert pool
Llama 3.3 70B	70B	128K	Text-only instruct

Mixture-of-experts architecture
Native multimodal (text + images)
Permissive license (with restrictions)
Llama Downloads

Mistral

Model	Parameters	Context	Notes
Mistral Large	~100B	128K	Commercial flagship
Mixtral 8x22B	MoE	64K	Mixture of experts
Mistral OCR 3	—	—	Document processing

European AI company
Strong efficiency/performance ratio
Le Chat consumer product
Mistral AI

Qwen (Alibaba)

Model	Parameters	Context	Notes
Qwen3	Various	128K	Strong all-round
Qwen3-VL	Various	128K	Vision-language
Qwen3-TTS	Various	—	Text-to-speech

Strong multilingual (especially Chinese)
Extensive model family (400+ variants)
Embedding, reranking, and omni models
Qwen

DeepSeek

Model	Parameters	Notes
DeepSeek-V3.2	685B MoE	Latest flagship
DeepSeek R1	Various	Strong reasoning
DeepSeek Coder	Various	Code-specialised

Competitive with frontier models
Cost-efficient training and inference
Open weights available
DeepSeek

Others

Yi (01.AI) — Strong multilingual
Phi (Microsoft) — Small but capable
Gemma 3 (Google) — Open weights, research-friendly
OLMo (AI2) — Fully open including training data
Grok (xAI) — Available via API

Model Comparison Factors

Capability Benchmarks

See Evaluation & Benchmarking for details.

MMLU — Broad knowledge
HumanEval — Coding
GSM8K — Math reasoning
GPQA — Graduate-level science

Practical Considerations

Factor	Considerations
Latency	Time to first token, tokens/second
Cost	Per-token pricing, volume discounts
Context	How much text can be processed
Reliability	Uptime, consistency
Privacy	Data handling, compliance
Ecosystem	SDKs, documentation, support

License Types

Proprietary API — No access to weights (GPT-5, Claude)
Gated open — Weights available with restrictions (Llama 4)
Permissive open — Few restrictions (Mistral, Qwen, DeepSeek)
Fully open — Weights, code, and training data (OLMo)

API Providers

Model Providers

Direct from the source:

Aggregators / Routers

Access multiple models through one API:

OpenRouter — Multiple providers, unified API
Together AI — Open models, fine-tuning
Fireworks AI — Fast inference
Groq — Ultra-fast inference
Anyscale

Cloud Platforms

AWS Bedrock — Multiple providers in AWS
Azure OpenAI — OpenAI in Azure
Google Vertex AI — Gemini and others

Choosing a Model

Decision Framework

Task requirements — What capability is most important?
Latency needs — Real-time vs batch processing
Cost constraints — Budget per million tokens
Privacy requirements — Can data leave your environment?
Context needs — How much text per request?
Compliance — Regulatory requirements

Rules of Thumb

Start with a capable model (Claude 4.5 Sonnet, GPT-5.2, Gemini 3 Pro)
Optimise for cost/speed once it works (mini/nano/Flash variants)
Open models for privacy-sensitive use cases (Llama 4, Qwen3, DeepSeek)
Smaller models for high-volume, simple tasks

Staying Current

The landscape changes rapidly. Track developments:

Artificial Analysis — Performance benchmarks
LMSYS Chatbot Arena — Human preference rankings
Hugging Face Open LLM Leaderboard
Model provider blogs and announcements

Resources

Foundation Models Paper — Stanford’s influential survey
State of GPT — Karpathy’s overview
Open LLM Leaderboard
Chatbot Arena Leaderboard

Rai Notes

Explorer

Foundation Models

Commercial Models

Anthropic Claude

OpenAI GPT

Google Gemini

Others

Open-Source Models

Meta Llama

Mistral

Qwen (Alibaba)

DeepSeek

Others

Model Comparison Factors

Capability Benchmarks

Practical Considerations

License Types

API Providers

Model Providers

Aggregators / Routers

Cloud Platforms

Choosing a Model

Decision Framework

Rules of Thumb

Staying Current

Resources

Graph View

Table of Contents

Backlinks